Show newer

@waifu @VD15 LLammaCPP can split the model between your GPU and CPU so you can maybe run 1-2 layers on your GPU and rest on your CPU. With a quantized (huggingface.co/TheBloke/Silico) small model it shouldn't be super slow.

Because a vision softly creeping 🌑
Left its seeds while I was sleeping 🛌🏻
And the vision that was planted in my brain 🧠
Still remains within the sound of Out of Touch Thursday 👯

Show older
Game Liberty Mastodon

Mainly gaming/nerd instance for people who value free speech. Everyone is welcome.