Follow

It predicted 50h for small finetune :NotLikeThis: I give up

· · Web · 2 · 0 · 2

@LukeAlmighty 3080 12gb, i'm using heretic to uncensor InternVL3.5-14B

@matrix For LLMs, the real bottleneck is RAM bandwidth. We had a damaged card that could only run at 10% CPU power, we only noticed a 20% drop in tokens per second.

@r000t So I guess my initial guess was correct, it was spilling over to RAM and PCIe bandwidth became the bottleneck. It just confused me because task manager didn't show any spill over and inference with llama cpp was faster when i let it spill

Sign in to participate in the conversation
Game Liberty Mastodon

Mainly gaming/nerd instance for people who value free speech. Everyone is welcome.