@waifu @VD15 LLammaCPP can split the model between your GPU and CPU so you can maybe run 1-2 layers on your GPU and rest on your CPU. With a quantized (https://huggingface.co/TheBloke/Silicon-Maid-7B-GGUF) small model it shouldn't be super slow.
@burner who?
I'm the joke, but you're the punchline.
I run this website. I like posting funnies and fugging lolis.