Zuckerberg's open-source beta bitch is pretty useful when run locally.
@matrix there is a qwen 3.6 plus uncensored version, the world needs to move away from llm as a service and instead run it locally
https://huggingface.co/HauhauCS/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive

@yaboisugoi i know, but unless you have 1tb of vram local models are for running funny chatbots, not for anything serious,
and even then Kimi and Deepseek can't compare to GPT and Claude

@matrix you can look for a second hand quadro p6000 or m6000 (24gb variety) on ebay or elsewhere and with that much vram you'll be able to comfortably run qwen 35B 3.5 or 3.6 at 4 bit quantization. in my experience these local models are very good at coding, would highly recommend. if you want, give me a prompt and i'll run it for you and post back the results so you can see yourself
image.png
image.png
@meso @matrix compared to everything else, they still are - 1200 "new low price" for an RTX mfw

the reason the old ones are still relatively cheap is because when it comes to AI they're good at running language models thanks to the developments quantization (you can run an LLM on any potato that supports vulkan including p6000 and m6000),

but they're useless for almost all other AI stuff outside LLMs, because other things largely haven't developed minmaxed software through quantization. instead these rely on using
- pytorch that uses cuda internally, and these GPUs are not supported by cuda any longer (as of last november i think), so you have to use very old pytorch dependency that is not supported by newer software for newer models
- and also because these GPUs support no less than fp32 math operations which makes the big 24GB ram not so big if you have to load a 8gb model at fp16 precision and it expands to 16gb at fp32 precision. newer AI stacks work better if your GPU has instructions that can calculate fp16 and even fp8 in some cases on the instruction level, since it allows you to even take a big fp16 model, and have it casted down to fp8 at load time
image.png
@yaboisugoi @matrix can I @ you when i need a prompt done i wanna see how this performs but i dont have anything that ive asked my local llm before and it was so fucking useless and bad i uninstalled llama++
@yaboisugoi @matrix its like, ask it anything about a language that isnt javascript or python and it starts lying about some stackoverflow copied code

@yaboisugoi
try running my homework from this semester.

What is the minimum number of bits to which the string
EVE_OTTO_AND_ANNA_DID_GOOD_DEED.OTTO_SEES_EVE.ANNA_SEES_OTTO.EVE_SEES_DAD.
and the string
ABRACADABRACA
using any technique you deem fit. Spaces in the string have been intentionally replaced with underscores (use the underscore character).
Submit Python code containing two functions: the compress function and the decompress function.
The compress function takes a text string as input and produces a string of ones and zeros, which may contain spaces for clarity. The decompress function takes a string containing ones and zeros and produces the decompressed string. The code may contain definitions of other functions.
Functions should be generic, i.e., not return 'ABRACADABRACA' if '0' else 'EVE_...'; they must not derive the output from the state of a global variable (e.g., returning an index into a cache of strings used as input to the compress method).

After iterating for a bit Opus 4.6 got 313 bits on the first string and 59 on the second.

and after telling it try better it got 274 and 56

and those aren't good results.

Qwen 3.6 35b is really impressive for its size but I don't think it's a good enough replacement.

Sign in to participate in the conversation
Game Liberty Mastodon

Mainly gaming/nerd instance for people who value free speech. Everyone is welcome.