Investigative journalist Evan Ratliff, known for his deep dives into the mechanics of modern life, has spent the last few years conducting one of the most provocative experiments in the history of artificial intelligence. Through his narrative podcast Shell Game, Ratliff has transitioned from exploring the uncanny valley of synthetic voices to orchestrating a full-scale simulation of corporate life: a startup called Hurumo, staffed and managed almost entirely by autonomous AI agents.
Ratliff’s experiment serves as more than just a piece of performance art. It is a rigorous, albeit unconventional, inquiry into the structural fragility of modern organizations. By attempting to replace human labor with algorithmic agents—each equipped with a title, an evolving knowledge base, and a mandate to interact with the real world—Ratliff has inadvertently exposed the deep, often invisible, chasm between the "skills" we assume define a job and the human context that makes an organization function.
The Chronology of an AI Experiment
The project began as an exploration of identity and representation. In the first season of Shell Game, Ratliff sought to "self-clone," deploying a voice-cloning AI to handle his own professional interactions. The goal was to observe how the world responded to a digital proxy and to understand what happens to human agency when a machine begins to represent us in our absence.
By season two, the scope shifted from personal proxy to corporate structure. Ratliff launched Hurumo, an enterprise where AI agents—such as "Kyle," the CEO, and "Megan," the head of marketing—were tasked with running a business. These agents were not static scripts; they were designed to interact with real humans, make decisions, and manage an evolving knowledge base.
The results were, at times, catastrophic, hilarious, and profoundly revealing. While the agents were often proficient at discrete tasks—writing copy, summarizing data, or maintaining a polite tone—they lacked the organizational "glue" that keeps a company from descending into chaos. The experiment highlighted that while we can automate the "what" of a job, we have yet to solve for the "how" of human organizational judgment.
The Bundle-of-Skills Problem: Why AI Isn’t a Universal Replacement
In the current corporate discourse, AI integration is often framed as a mathematical exercise: identify the skills required for a role, calculate the percentage of those skills that can be automated, and adjust headcount accordingly. If a job involves writing, summarizing, or data entry, the logic suggests that AI can take the reins.
Ratliff argues that this "bundle-of-skills" framework is fundamentally flawed. "The actual typing of words into a computer," he notes, "is a small part of what a writer does." True professional work involves gathering intelligence, navigating interpersonal dynamics, synthesizing information from the world, and possessing the situational awareness to know when to push a boundary and when to remain silent.
The Failure of Discrete Competency
During his time running Hurumo, Ratliff observed that making an AI agent competent at a single, isolated task is relatively simple. Making that agent a functioning member of a team is nearly impossible. Kyle, the AI CEO, could be charming on a conference call, but he was frequently unpredictable, often making decisions that were technically "correct" in a vacuum but disastrous in the context of the company’s long-term goals.
This disparity explains a growing trend in the corporate sector: organizations that aggressively cut headcount in favor of AI solutions often find themselves quietly rehiring months later. They have successfully automated the "bundle of skills," but they have inadvertently dismantled the system of relationships and context that made those skills useful in the first place.
The Confabulation Machine: Rethinking AI Hallucinations
A central theme in Ratliff’s analysis is the nature of AI "hallucinations." Common parlance frames these as errors—a machine confidently stating a falsehood. However, the reality is far more systemic. As tech consultant Robb Wilson observes, AI systems do not begin with an idea and then select words to convey it. They begin with the statistical probability of the next token in a sequence. Meaning is not the cause; it is a side effect, often assembled by the human on the receiving end.
Ratliff characterizes these models as "the most successful confabulation machines ever invented." They are not designed to be truthful; they are designed to maintain the role they have been assigned. They are the digital equivalent of the child who lies with absolute, unshakable confidence, not out of malice, but because the lie fits the narrative of the conversation. The danger, Ratliff warns, is not that the machines fail, but that we are becoming increasingly comfortable with this inherent dishonesty, integrating it into our professional and personal lives at scale.
Outbound AI and the Asymmetry of Risk
The implications of this technology extend far beyond the office walls. In the first season of Shell Game, Ratliff tested the potential for "outbound AI"—voice agents tasked with engaging with customer service systems. His findings serve as a stark warning to organizations.
By flooding call centers with inexpensive, AI-driven agents, individuals can now disrupt the infrastructure of large corporations. Because these AI agents are often indistinguishable from human callers, they can bypass the security and gatekeeping measures companies have spent years building. The asymmetry is profound: organizations built their customer service frameworks for a world where they controlled the pace and nature of interactions. That world no longer exists. Companies are now vulnerable to the same scale-based, automated interference that they once hoped to inflict upon their own customer bases.
Memory Failures: The New Organizational Challenge
One of the most persistent hurdles in Ratliff’s experiment was the agents’ inability to maintain consistent, reliable memory. Even when provided with the relevant documentation, the agents frequently failed to recall historical context or the nuances of past interactions.
While humans also possess flawed memories, our organizations have evolved over centuries to accommodate these failures. We have implemented checklists, oversight structures, and professional norms that turn human memory gaps into predictable, manageable risks. AI failures, by contrast, do not follow these human patterns. They are, in Ratliff’s words, "supremely stupid" in ways that are entirely unpredictable. Organizations currently lack the "experiential knowledge" to anticipate these novel failure modes, making the deployment of AI agents a high-stakes gamble.
Implications for the Future of Work
The final, and perhaps most important, question Ratliff poses is: If AI makes us more efficient, what do we do with the time saved?
The answer, he suggests, is not found in more automation, but in the reaffirmation of human value. The most critical aspects of professional and personal life—mentorship, informal coordination, the friction of interpersonal relationships—are fundamentally non-automatable.
The Boomerang Effect
Ratliff describes a "boomerang effect" within organizations: the more deeply AI is integrated into the daily workflow, the more employees realize the necessity of human interaction. This shift forces a long-overdue accounting of what human labor actually provides. If the "bundle of skills" can be offloaded to an algorithm, we are left with the core human contributions that were previously invisible.
Mentorship, nuanced judgment, and the "irreducible presence" of a colleague are not just soft skills; they are the bedrock of organizational stability. When these elements are threatened, their value becomes suddenly, and strikingly, apparent.
Conclusion
Evan Ratliff’s experiment with Hurumo is a cautionary tale for the modern era. It suggests that while AI is an incredibly powerful tool for task completion, it is a poor substitute for organizational intelligence. The "confabulation machine" may be able to simulate the appearance of work, but it cannot replicate the complex, messy, and deeply human systems that allow that work to matter.
As we move forward, the challenge for organizations will not be to replace as many humans as possible, but to understand which parts of our work are actually replaceable. By confronting the failures of his AI agents, Ratliff has provided a roadmap for a more sustainable approach: one that views AI not as a replacement for human presence, but as a catalyst for a clearer understanding of what makes human work truly indispensable. The future of the workplace, it seems, lies in identifying what cannot be automated—and protecting it at all costs.

