Way too slow if you can't fit it in VRAM.
Letting the GPU do all the processing and letting it overflow into RAM is faster than having llama.cpp split it between CPU and GPU

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 21:52

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 21:52

Dec 01, 2025, 21:52

matrix07012 @matrix@gameliberty.club

qwen3-vl-8b abliterated seems to work well

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 21:54

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 21:54

Dec 01, 2025, 21:54

matrix07012 @matrix@gameliberty.club

interesting though, seems like prompt caching can poison rhe context and start things like typing in all caps

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 22:00

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 22:00

Dec 01, 2025, 22:00

matrix07012 @matrix@gameliberty.club

I turned cache off and restarted the process! Why is it doing all caps even now?

**matrix07012** @matrix@gameliberty.club · 2025-12-01T22:18:01Z

matrix07012 @matrix@gameliberty.club

Ok, it's getting stale

Dec 01, 2025, 22:18 · · Web · · ·

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 22:34

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 22:34

Dec 01, 2025, 22:34

matrix07012 @matrix@gameliberty.club

new, less schizo, prompt

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 23:05

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 23:05

Dec 01, 2025, 23:05

matrix07012 @matrix@gameliberty.club

Now it's not even funny, just retarded.

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 23:07

**matrix07012** @matrix@gameliberty.club · Dec 01, 2025, 23:07

Dec 01, 2025, 23:07

matrix07012 @matrix@gameliberty.club

I liked Gemma more than than Qwen, but unfortunately 12b is too retarded to format answers correctly.

**matrix07012** @matrix@gameliberty.club · Dec 02, 2025, 15:50

**matrix07012** @matrix@gameliberty.club · Dec 02, 2025, 15:50

Dec 02, 2025, 15:50

matrix07012 @matrix@gameliberty.club

I got 27b working and it's smart enough

Trending now

Resources

Developers

What is Mastodon?

gameliberty.club

More…