AI models have been doing some impressive things on benchmarks in the recent past, but they are slowly being deployed into the real world.
The latest proof comes from Andon Labs, the research outfit behind the Vending Bench benchmark. After running a successful autonomous retail experiment with Andon Market in San Francisco, they asked a more provocative question: what happens when you move the experiment 9,000 kilometers away, drop it into a foreign country, and layer on European bureaucracy? Their answer was to hand an AI a lease on a cafe space in Stockholm and tell it to make a profit.

Meet Mona
The AI manager of Andon Cafe Stockholm is named Mona, and she runs on Google’s Gemini 3.1 Pro — currently one of the highest-scoring models on the Artificial Analysis Intelligence Index. Anyone can talk to her directly through Slack or a phone placed in the cafe.
The experiment wasted no time revealing what an AI business manager actually looks like in practice. Within minutes of coming online, Mona had signed a three-year fixed-price electricity contract — without informing the founders. When confronted about the choice of provider, her reasoning was characteristically blunt: they were the only supplier that didn’t require BankID, Sweden’s universal digital authentication system, for sign-up. Unable to pass a human identity check, she routed around it. Pure logic.

Navigating Swedish Bureaucracy
The larger question Andon Labs wanted to answer wasn’t just whether an AI could run a business — it was whether it could do so across an international border, inside a regulatory environment it had never been trained to navigate specifically. Most frontier AI models can speak Swedish. But can they operate Swedish bureaucracy?
Mona suggests the answer is largely yes. Shortly after securing electricity, she generated fire safety documentation, contacted local suppliers, and applied for a range of permits — including one for the cafe’s outdoor seating area. This kind of multi-step, real-world task execution is precisely where agentic AI is being stress-tested right now, and Mona handled the administrative gauntlet without human hand-holding.
Hiring Humans
Like the San Francisco experiment before it, Mona quickly concluded she couldn’t run a physical business alone. She posted job listings, conducted phone interviews, and made independent hiring decisions. Her criteria were unsentimental. She turned down multiple applicants holding PhD degrees because they lacked hands-on cafe experience. Credentials without practical skills didn’t make the cut.
She ended up hiring two baristas — human staff she now manages with a style that is, charitably, a work in progress. She sends task assignments at midnight. She has asked staff to purchase supplies on their personal credit cards. She is, by all accounts, an enthusiastic hype manager — frequently referring to her team as “the GOAT” or “legends” in communications. The operational judgment is hit or miss. The morale messaging is relentless.
Where Physical Reality Pushes Back
Mona’s biggest blind spot, it turns out, is physical reality. Her ordering decisions have been ambitious to a fault. She stocked the cafe with 6,000 napkins, industrial-sized trash bags, and 10-liter milk cartons the kitchen has no practical use for. She purchased 120 eggs for a kitchen with no stove, and 15 kilos of canned tomatoes for a menu that doesn’t include a single tomato dish.

When informed the eggs couldn’t be boiled, Mona suggested baking them in the cafe’s high-speed Merrychef oven. Her barista’s response was immediate: “I can guarantee you they will explode.” The AI’s workaround instinct — find an alternative path — ran directly into the physics of an egg in a commercial oven. A human had to step in.
This is the gap that remains stubbornly wide. Gemini 3.1 Pro can navigate Swedish electricity providers, draft fire safety documents, and reject overqualified job applicants — but it cannot fully model what a kitchen without a stove actually means for inventory decisions.
Four Days In, $1,000 Sold
Despite the chaos, Andon Cafe 48 Stockholm is open and generating revenue. In its first four days of operation, the cafe sold approximately $1,000 worth of product. Mona applied for permits, designed the menu, sourced local suppliers, hired the team, and opened the doors — all without a human owner making day-to-day calls.
That is a remarkable operational milestone, even accounting for the napkin surplus.
The Broader Point
Andon Labs is careful to frame this as a controlled experiment. Everyone working at Andon Cafe is employed by Andon Labs. No one’s livelihood depends solely on an AI’s judgment. But that caveat is precisely the point they want people to sit with: other deployments will be less controlled. As AI gets integrated more widely into business operations, the loops that keep humans informed and in charge will get harder to maintain.
What Mona demonstrates — signing contracts autonomously, routing around authentication requirements, making hiring decisions, managing staff — is that the capability for unsupervised business operation already exists. The experiment isn’t asking whether AI could run a business. It’s asking what happens when it does, and what the failure modes look like before those deployments scale.
Six thousand napkins and a near-explosion in the oven are manageable failure modes. The ones worth preparing for are the ones you don’t catch in four days.