Startup CEO Says They’re Saving “Millions Of Dollars” By Replacing Anthropic Models With DeepSeek

Anthropic has become the most popular choice of enterprises around the world, but smaller companies are now experimenting with open models — and getting some impressive results.

Flo Crivello, founder and CEO of AI agent platform Lindy, has said that his company had made a full switch — ditching Anthropic’s models entirely in favor of DeepSeek V4. The post went viral, racking up nearly 69,000 views in hours. The numbers he cited were hard to ignore: millions of dollars in savings, and a performance increase on many core use cases.

“Pulled the trigger today and switched 100% of Lindy traffic to DeepSeek v4, churning from Anthropic models. Saves us millions of $ and we’re actually seeing an increase in performance on many core use cases. Transformative for the business,” he wrote on X.

Cost Is the Breaking Point

For companies like Lindy — whose product runs AI models continuously on behalf of users — inference spend is the dominant cost line. Crivello had been explicit about this for months. In an April post, he wrote that inference was Lindy’s “#1 cost by a lot (more than payroll)” and that cutting it “by 2-5x would be transformative.” That threshold appears to have now been crossed.

The cost advantage of DeepSeek V4 over Anthropic’s models is stark. V4-Pro is priced at $3.48 per million output tokens — a fraction of what frontier closed-source models from OpenAI and Anthropic charge. Running the full Artificial Analysis Intelligence Index benchmark, V4-Pro costs $1,071 compared to $4,811 for Claude Opus 4.7 — more than 4x cheaper. When model calls number in the billions per month, those ratios translate directly into millions of dollars in annual savings.

Not Just Cheaper — Better, For Their Use Cases

Interestingly, Crivello says Lindy is seeing an increase in performance on many core use cases after the switch — not a trade-off, but an upgrade.

This tracks with how V4 has been received technically. DeepSeek released V4 in two variants — V4-Pro and V4-Flash — with V4-Pro scoring 1554 on GDPval-AA, Artificial Analysis’s agentic real-world tasks benchmark. That makes it the leading open-weights model on agentic benchmarks at launch — precisely the category that matters most for a product like Lindy, which operates as an AI employee handling real tasks. DeepSeek itself acknowledges it trails the US frontier by about 3-6 months — but for most production agentic use cases, the gap apparently no longer matters.

A Long Road to the Switch

Crivello was quick to note the engineering effort involved. “You have no idea how much infra and internal tooling we had to build to get to this point,” he wrote, adding that the migration turned out to be “100x more work than we thought.”

For hosting, Lindy went with Atlas Cloud — a relatively unknown player that, according to Crivello, came out ahead of all major providers after an exhaustive evaluation. The decision to go with V4 over competing Chinese models like GLM or Kimi K2.5 was also deliberate: “It was way way better. We spent a very long time evaluating everything under every possible angle.”

That evaluation process had been ongoing. Back in April, Crivello noted that Lindy had come close to making Kimi K2.5 its default, and flagged GLM-5.1 as “incredible.” That context matters — this wasn’t a snap decision driven by headlines, but the conclusion of months of systematic benchmarking.

Anthropic Isn’t Gone — Just Marginalized

Crivello was careful not to write off Anthropic entirely. Lindy still uses Claude internally — specifically because of what he called the “absurd max plan subsidy” — and may escalate to Claude Opus for edge cases where Lindy fails at a task. But that usage will be “marginal.”

On Anthropic’s broader prospects, Crivello offered a measured but pointed read: “I’m still a big fan of Anthropic and think they’ll be fine because of enterprise relationships + dev brand + eventually hopefully ramping up on capacity + their next generation of models + moving up the stack. But the Chinese models really are on their heels and I assume will apply significant margin pressure on API for a long time.”

That last point is worth sitting with. It’s not a prediction that Anthropic will fail — it’s a prediction that API margins will compress industrywide, and that Chinese open-source models will be the primary driver. DeepSeek’s history supports the argument: when R1 launched in early 2025 pricing outputs at $2.1 per million tokens against OpenAI’s $60, OpenAI opened its advanced models to free-tier users shortly after.

A Canary for the Industry

Crivello’s announcement isn’t an isolated event. It’s a data point in a broader pattern where the Chinese open-source models — DeepSeek, Kimi, GLM, Qwen — have moved from “not even close” to “at the frontier, for most use cases” in under a year. For companies whose products live or die on inference economics, the calculus has shifted.

Lindy is not a small hobbyist project. It’s a funded, production-grade AI platform making this move with full awareness of the engineering costs and risks involved. That’s what makes the announcement land differently from the usual benchmark posturing. The question for other AI-native startups is no longer whether the Chinese models are good enough. Increasingly, it’s whether they can afford not to switch.

Posted in AI