Browse Comments — Clean (de-noised)
Close reading of the corpus at each pipeline stage: raw → clean → relevant → coded.
4.2K
comments matched
· page 2 of 210
Mike, This is fascinating. The idea that we’ve been flattening meaning by forcing everything through transcripts feels obvious once you say it, but most of the industry still treats that as “good enough.” The ensemble approach makes a lot of sense, especially if scaling alone is hitting diminishing returns. What has surprised you most once you started testing this in real-world voice use cases.
Is this anything like perplexity, which takes your llm input and then farms it out to the llm it thinks is best?
Certainly interesting but what’s the CTA, read more? Or is there an API service I missed where we can try it out, please?
This is wild - in a space obsessed with “bigger LLMs,” you’re basically saying the future is orchestration, not just size. The 1% compute + voice-first angle is huge. Curious, where do you see ELMs beating monoliths first in real-world deployments ... contact centers, gaming, something else?
Kevin Turner we've already done it in both! The gaming successes are most public though - check out our work with Call of Duty among others.
This is the approach I thought every AI architecture was going to be, expert agents and that communicates instantly with other agents to solve questions and problems. I mean even know people are talking about how AI is working as groups to create some cool things
This is really interesting work. Curious how you balance dynamic routing across the ensemble with latency and consistency guarantees at inference time?
Would this be compatible w the emerging "world model" (concept) based models?
This would be great to see other companies driving down the cost of Ai development and engineering even with a few teething issues DeepSeek was still able to launch and claim a good market share congratulations hope to see it in action.
I’m waiting for this to be the norm. May be able to save us time from building it in house for OrahVision Inc
What is the latency in ms?
This points to a deeper shift than cost curves. What’s breaking isn’t just the price of inference, it’s the assumption that intelligence must live inside monoliths. When scaling hits diminishing returns, architecture becomes the lever. Ensembles move the bottleneck from raw compute to orchestration, signal interpretation, and aggregation logic. That’s a fundamentally different design philosophy, and it aligns much more closely with how real-world intelligence actually works. If this holds, the next frontier isn’t bigger models competing with OpenAI or DeepSeek AI on brute force. It’s smaller, purpose-built systems coordinated intelligently, closer to perception, context, and decision-making. The age of scaling was inevitable. An age of architecture was always next.
How does this enable better domain - model fit? I'd be curious if AI can natively understand which models to pick for which domain for the same problem, different domains? (And the reproducibility of the same)
I don’t see how running multiple models with lower levels of intelligence can aggregate to higher level reasoning. Is there a model that chooses the optimal response/token?
Peter Signore think of it like sensory modalities. We still do have a highly intelligent model (the orchestrator), but instead of it just trying "model everything in its imagination" as LLMs do, it gets feedback and structure from a variety of more reliable submodels to actually understand what it's reviewing
Mike Pappas How do you know that they "want individual AI-powered solutions to specific tasks"? Seems like a reach (unless people have told you that directly in an interview). I feel like it may be more accurate to say: 1. They want AI to cost less (so they aren't worried about driving their bill through the roof -> see Jackson Oaks) 2. Give them faster responses (so they don't sit there staring at it wondering what to do in the meantime) 3. While still accurately and actually solving their problem (e.g. doing the task) There's a reason I use Opus for almost everything: I trust the response will actually be good and be a solution to my problem: good email, working software, etc. Personally I don't care in any way shape or form about "individual AI-powered solutions to specific tasks" I just want my "task" (problem) done (solved) for cheap. xD
I think your sales pitch is confusing even for an engineer. Not sure how this will work when I can have the same from llm models. Also deep seek as fallen behind and has disappeared from mainstream discussions.
Seems like this would benefit from something that I have been working on check out my paper and if it sounds like something that you would be interested in let me know. This could just as easily be used with otherI AI/ Learning models.
Will this "1% of the compute" use significantly less energy/electricity? The energy issue is out of control.
Duncan Huth After reading your site I'm SO impressed! Hopefully the environmental benefits are good because this could be huge!