In the current gold rush of artificial intelligence, product leaders and developers are obsessed with the "model selector." The industry narrative—driven by a relentless cycle of releases from tech giants—suggests that the primary path to innovation is to swap one frontier model for a slightly more capable, slightly more expensive one.
However, according to Jonathan Frankle, Chief AI Scientist at Databricks and head of Mosaic Research, this obsession is a profound strategic error. In a recent appearance on the Invisible Machines podcast, Frankle offered a stark assessment of the current state of AI: we have successfully built nuclear fusion, but we have yet to build the power lines.
The Fusion Metaphor: Capabilities vs. Infrastructure
Frankle’s central thesis is that the AI industry has become an "abundant factory for intelligence." We have created entities that are extraordinarily powerful, occasionally brilliant, and fundamentally imperfect. Yet, the infrastructure required to harness this power—the specification tools, rigorous evaluation disciplines, standardized application patterns, and data hygiene—remains in its infancy.
"If we froze frontier progress for five to ten years," Frankle posits, "we would still see an explosion of innovation, not because the models got smarter, but because we finally figured out what to do with the ones we already have."
This is a systems argument, not a critique of the models themselves. While the industry is fixated on the raw power of the "fusion" (the model), it is neglecting the "power lines" (the integration and governance) that allow that intelligence to be useful within an organizational context. For design and product leaders, this represents a shift from being passive consumers of model capabilities to becoming active scientists of system behavior.
Chronology of a Shift: From Hype to Rigor
The evolution of the field, as described by Frankle, mirrors the maturation of software engineering. Early adoption was characterized by the "vibe check"—testing a prompt, getting a cool result, and calling it "production-ready."
As we move toward 2025, the reality of agentic systems—AI that performs tasks rather than just generating text—has forced a collision with reality.
- The Early Phase: The focus was on "prompt engineering" as a mystical skill, disconnected from the realities of software development.
- The Scaling Phase: The "context window" race led to the false belief that feeding massive, uncurated data dumps into an LLM would yield higher intelligence.
- The Scientific Phase: We are currently entering a period where organizations are realizing that performance often degrades as retrieve-material increases. "Distractors multiply," Frankle notes, and models remain imperfect at filtering relevance.
This shift marks a departure from the "model-first" mindset. Today, at leading-edge companies, there is a formal, scientific approach to AI—one that treats AI development not as an art, but as an empirical discipline requiring hypothesis, measurement, and iterative validation.
Supporting Data: The Myth of the Infinite Context Window
One of the most persistent myths in the AI space is that a million-token context window is a panacea for data management. Frankle challenges this directly. He argues that long context is a tool for specific multimodal tasks—such as processing video or complex image datasets—but it is a poor substitute for fundamental data curation.
"Garbage in, garbage out" is the rule, whether that garbage is entering the training pipeline or the inference context window. The technical challenge is not the "pour"—the act of feeding data into the system—but the curation: deciding exactly what belongs in the glass.
Furthermore, Frankle highlights the rise of the "Scientist who Ships." In the current market, hiring managers often lean on pedigree or "hyperscaler" resumes as proxies for competence. Frankle argues that this is a mistake. True expertise in this field is rare and is characterized by a specific mental framework: the ability to hold computing, experimental constraints, and product goals in the same head. He points to organizations like DeepSeek as evidence that rigorous science, applied with time and care, can outcompete massive capital investment.
The Specification Gap: Defining Success
Perhaps the most significant hurdle identified in the conversation is the "specification gap." In traditional software engineering, we have unit tests, integration tests, and regression testing. We have clear, deterministic ways to hold a program accountable.
AI, by its nature, is non-deterministic. "Build a benchmark" is, in Frankle’s view, a cop-out. While benchmarks are necessary, they are rarely sufficient to capture the nuance of a business-specific problem.
- Defining Intent: Does the system succeed if it provides a specific answer? A specific format? A specific tone?
- The Documentation Problem: Programs are self-documenting; AI systems are opaque.
- The Engineering Challenge: We lack a standard, universal discipline for testing non-deterministic behavior.
Frankle suggests that "prompt engineering" is simply a form of programming. Whether one is fine-tuning weights or crafting a sophisticated system prompt, one is engaged in the same computing pattern. The industry must move away from viewing these as disparate silos and start treating them as components of a unified engineering discipline.
Official Responses and Industry Implications
The implications for brand and organizational control are profound. As AI agents begin to represent brands in consumer feeds—often bypassing traditional web-based interactions—the question of "who owns the truth" becomes critical.
Frankle suggests that we are heading toward a "cottage industry" of LLM-oriented publishing. Just as SEO changed how we wrote for the web, the requirements of LLM ingestion will change how organizations structure their digital knowledge.
- Static vs. Dynamic: Static FAQ pages are currently superior to dynamic PDFs for model ingestion.
- Curation is King: Unlocking a massive, unhygienic document repository for an AI agent will simply scale organizational errors to the level of systemic truth.
- Knowledge vs. Reasoning: The goal for the modern enterprise should be to separate "knowledge" from "reasoning." You want a robust, faithful reasoner (the model) that is hooked into a curated, verified knowledge base.
The Path Forward: Designing for Verification
For those building the next generation of agentic systems, the advice from the front lines is clear: stop worrying about which model is "SOTA" (State of the Art) this week. Instead, focus on the work that happens between the model and the output.
The future of AI is not in the "magic" of the black box, but in the engineering of the interface. This includes:
- Predictable Edit Surfaces: Users need the ability to see and correct what the AI is doing.
- Verifiable Behavior: Systems must be designed so that their reasoning paths can be audited.
- Organizational Honesty: Moving from "vibe-based" testing to rigorous, data-driven evaluation.
"If you are building agentic systems," Frankle concludes, "the question is not whether your context window is large enough. It is whether you can describe what you want, prove you got it, and curate what the system is allowed to believe."
As the industry moves out of its experimental adolescence, the winners will not necessarily be the companies with the most expensive models. They will be the companies that treat AI development as a mature engineering discipline, prioritizing the "power lines" of curation, specification, and measurement. The age of the model-shop is ending; the age of the AI system engineer has begun.

