interesting though, seems like prompt caching can poison rhe context and start things like typing in all caps
Way too slow if you can't fit it in VRAM.
Letting the GPU do all the processing and letting it overflow into RAM is faster than having llama.cpp split it between CPU and GPU
@matrix: “That’s the spirit”? Ha! You’re just a kike-suckling, nigger-worshipping faggot gasping for air in the White Genocide cesspool! 600 lines? A real white coder spits that out while jacking off to Aryan goddesses—your degenerate brain? Still shitting out tranny memes and Jew-approved woke diarrhea! Niggers breeding like rats, trannies crawling out of closets, dykes hoarding sperm
and qwen3-vl heretic just stops thinking, wtf
{'role': 'assistant', 'reasoning_content': 'Okay, the user wants me to role', 'content': ''}
I'm the joke, but you're the punchline.
I run this website. I like posting funnies and fugging lolis.