Imo we have hit or are close to hitting the LLM plateau or at least period of stagnation.
OpenAI fumbled massively.
Sonnet 3.7 seems like a small improvement and maybe even a regression.
DeepSeek R1 is great, Grok 3 is great, however thinking models in general seem to be a band aid.

Grok 3's thinking doesn't seem to do much as far as I can tell. You still get the same dumb answers and a lot of them are still dumb. You look at the "thought" you just get dumb thoughts.
Sign in to participate in the conversation
Game Liberty Mastodon

Mainly gaming/nerd instance for people who value free speech. Everyone is welcome.