>release a great model>cripple it post launch>ship a new worse model>claim its better
I really hope it's just Opus 5 prototype
It becomes smarter when you say it's a benchmark
@matrix To be fair, these models are trained with the discussion on the faults of previous models. So chances are that if you ask a similar question just worded different, but using the same logical question and it might still shit the bed.
Mainly gaming/nerd instance for people who value free speech. Everyone is welcome.