Not sure if this is the right place, if not please let me know.

GPU prices in the US have been a horrific bloodbath with the scalpers recently. So for this discussion, let’s keep it to MSRP and the lucky people who actually managed to afford those insane MSRPs + managed to actually find the GPU they wanted.

Which GPU are you using to run what LLMs? How is the performance of the LLMs you have selected? On an average, what size of LLMs are you able to run smoothly on your GPU (7B, 14B, 20-24B etc).

What GPU do you recommend for decent amount of VRAM vs price (MSRP)? If you’re using the TOTL RX 7900XTX/4090/5090 with 24+ GB of RAM, comment below with some performance estimations too.

My use-case: code assistants for Terraform + general shell and YAML, plain chat, some image generation. And to be able to still pay rent after spending all my savings on a GPU with a pathetic amount of VRAM (LOOKING AT BOTH OF YOU, BUT ESPECIALLY YOU NVIDIA YOU JERK). I would prefer to have GPUs for under $600 if possible, but I want to also run models like Mistral small so I suppose I don’t have a choice but spend a huge sum of money.

Thanks


You can probably tell that I’m not very happy with the current PC consumer market but I decided to post in case we find any gems in the wild.

  • mlflexer@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 days ago

    It all depends on the size of the model you are running, if it cannot fit in GPU memory, then it has to go back and forth with the host (cpu memory or even disk) and the GPU. This is extremely slow. This is why some people are running LLMs on macs, as they can have a large amount of memory shared between the GPU and CPU, making it viable to fit some larger models in memory.

    • MudMan@fedia.io
      link
      fedilink
      arrow-up
      4
      ·
      2 days ago

      This is… mostly right, but I have to say, macs with 16 gigs of shared memory aren’t all that, you can get many other alternatives with similar memory distributions, although not as fast.

      A bunch of vendors are starting to lean on this by providing small, weaker PCs with a BIG cache of shared RAM. That new Framework desktop with an AMD APU specs up to 128 GB of shared memory, while the mac minis everybody is hyping up for this cap at 24 GB instead.

      I’d strongly recommend starting with a mid-sized GPU on a desktop PC. Intel ships the A770 with 16GB of RAM and the B580 with 12 and they’re both dirt cheap. You can still get a 3060 with 12 GB for similar prices, too. I’m not sure how they benchmark relative to each other on LLM tasks, but I’m sure one can look it up. Cheap as the entry level mac mini is, all of those are cheaper if you already have a PC up and running, and the total amount of dedicated RAM you get is very comparable.

      • mlflexer@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 days ago

        Oh, I thought you could get 128gb ram or more, but I can see it does not make sense with the <24gb… sorry for spreading misinformation, I guess, in this case a GPU of the same GB ram would probably be better

        • MudMan@fedia.io
          link
          fedilink
          arrow-up
          3
          ·
          1 day ago

          You didn’t, I did. The starting models cap at 24, but you can spec up the biggest one up to 64GB. I should have clicked through to the customization page before reporting what was available.

          That is still cheaper than a 5090, so it’s not that clear cut. I think it depends on what you’re trying to set up and how much money you’re willing to burn. Sometimes literally, the Mac will also be more power efficient than a honker of an Nvidia 90 class card.

          Honestly, all I have for recommendations is that I’d rather scale up than down. I mean, unless you also want to play kickass games at insane framerates with path tracing or something. Then go nuts with your big boy GPUs, who cares.

          But for LLM stuff strictly I’d start by repurposing what I have around, hitting a speed limit and then scaling up to maybe something with a lot of shared RAM (including a Mac Mini if you’re into those) and keep rinsing and repeating. I don’t know that I personally am in the market for AI-specific muti-thousand APUs with a hundred plus gigs of RAM yet.