The crazy thing is they aren’t really designed for that, especially the reasoning part.
I pictured LLMs proliferating in the way SGLang was taking it: fill in the blank. You’d give raw completion models (not chat finetunes) chunks of structured content (like JSON schemas, names in fields, more complex schemes filled programatically) and it would fill in fields, with intelligent context caching. Like, if you needed to finish a programming function, maybe you give it the file as context and guide it along.
Temperature based sampling felt like a quick hack, just until either looping or structured sampling was figured out.
…But they didn’t fix any of that.
Papers came out on the issues. But the big AI houses didn’t implement any of it. They basically kept it exactly the same and just kept scaling it to burn more compute.
What I am getting at is that OpenAI/Anthropic development is way, way more conservative than anyone thinks, and they basically took temporary hacks and made them bigger. That’s why you can’t trust the code they churn out, because random mistakes and so many other issues are literally part of their core.
Then they turn around and sell them as confident engagement engines…
LLMs in general are probably the wrong shape of model, even for the things LLMs can actually do. But the hyper-scalers categorically will not entertain any other kind of model.
The neural network space will be much more interesting after the crash. Hopefully with a minimum period of self-proclaimed haters shouting “guess it was nothing!” at people trying to build this whole new kind of software for its demonstrable utility instead of a four-comma cocaine fantasy.
The crazy thing is they aren’t really designed for that, especially the reasoning part.
I pictured LLMs proliferating in the way SGLang was taking it: fill in the blank. You’d give raw completion models (not chat finetunes) chunks of structured content (like JSON schemas, names in fields, more complex schemes filled programatically) and it would fill in fields, with intelligent context caching. Like, if you needed to finish a programming function, maybe you give it the file as context and guide it along.
Temperature based sampling felt like a quick hack, just until either looping or structured sampling was figured out.
…But they didn’t fix any of that.
Papers came out on the issues. But the big AI houses didn’t implement any of it. They basically kept it exactly the same and just kept scaling it to burn more compute.
What I am getting at is that OpenAI/Anthropic development is way, way more conservative than anyone thinks, and they basically took temporary hacks and made them bigger. That’s why you can’t trust the code they churn out, because random mistakes and so many other issues are literally part of their core.
Then they turn around and sell them as confident engagement engines…
It’s utterly mind boggling to me.
LLMs in general are probably the wrong shape of model, even for the things LLMs can actually do. But the hyper-scalers categorically will not entertain any other kind of model.
The neural network space will be much more interesting after the crash. Hopefully with a minimum period of self-proclaimed haters shouting “guess it was nothing!” at people trying to build this whole new kind of software for its demonstrable utility instead of a four-comma cocaine fantasy.
Four comma k fantasies aren’t much better.
Unfortunately, Elmo’s nation-state wealth is not a fantasy.
No. Money is very real and obeys real rules.
And Elon Musk really has a metric shitload of it. Orders of magnitude more than Sam Altman. Roughly as much as Sam Altman imagines he deserves.
I know. I’m glad money is a real thing for both of them, and functions by defined rules.