Imo we have hit or are close to hitting the LLM plateau or at least period of stagnation.
OpenAI fumbled massively.
Sonnet 3.7 seems like a small improvement and maybe even a regression.
DeepSeek R1 is great, Grok 3 is great, however thinking models in general seem to be a band aid.