matrix07012 :thotpatrol:: "It predicted 50h for small finetune :NotLikeThis:…" - Game Liberty Mastodon

Dec 02, 2025, 20:53

matrix07012 @matrix@gameliberty.club

1f1ade43dc48f271.png

Dec 02, 2025, 22:06

matrix07012 @matrix@gameliberty.club

I wish I had more VRAM, it's just a 14B parameter model

83ac0e4f39c334e8.png

matrix07012 @matrix@gameliberty.club

It predicted 50h for small finetune I give up

Dec 03, 2025, 16:17 · · Web · · ·

Dec 03, 2025, 20:38

matrix07012 @matrix@gameliberty.club

Why the fuck is it 2x faster on a CPU?

Dec 03, 2025, 17:14

LukeAlmighty 🇨🇿 @LukeAlmighty@gameliberty.club

@matrix
What setup are we talking about?

Dec 03, 2025, 20:41

matrix07012 @matrix@gameliberty.club

@LukeAlmighty 3080 12gb, i'm using heretic to uncensor InternVL3.5-14B

Dec 04, 2025, 04:02

Nietzschean Ekko Enjoyer @r000t@ligma.pro

@matrix For LLMs, the real bottleneck is RAM bandwidth. We had a damaged card that could only run at 10% CPU power, we only noticed a 20% drop in tokens per second.

Dec 04, 2025, 11:13

matrix07012 @matrix@gameliberty.club

@r000t So I guess my initial guess was correct, it was spilling over to RAM and PCIe bandwidth became the bottleneck. It just confused me because task manager didn't show any spill over and inference with llama cpp was faster when i let it spill

Sign in to participate in the conversation