In 2018, I built my first Facebook Messenger chatbot for a sports bar in Burlington. It handled reservations, birthday bookings, and trivia night sign-ups. It worked. Customers loved it. The owner loved it. Then Facebook changed their API terms, the bot broke, and we had to rebuild from scratch.

That experience taught me something I've watched play out dozens of times since: the technical build is rarely the hard part. The hard part is building something that survives contact with real users, real edge cases, and the inevitable platform changes that no one predicted.

Eight years later, I'm still building automated conversation systems — now using Claude and GPT-4o instead of ManyChat and Dialogflow. The technology is unrecognizably better. The failure modes are exactly the same.

Lesson 1: Scope creep kills more AI projects than bad technology

In 2019, I took on a chatbot project for a dental clinic. The brief: handle appointment bookings and answer FAQs. Simple. By week three, the client wanted it to handle insurance billing questions, refer to specific staff by name, and track patient loyalty points. By week six, we had a system too complex to maintain and too brittle to trust.

I see this pattern constantly with LLM projects now. A company starts with "let's build a customer support bot" and by the second sprint they want it to process refunds, update CRM records, draft escalation emails, and also be their internal knowledge base. Each added scope layer multiplies the surface area for failure.

The fix: Define the one thing the AI needs to do well. Not three things. One. Build that. Get it working in production. Then — and only then — expand scope based on what users actually need, not what the planning meeting assumed they would.

Lesson 2: The handoff to a human is the most important feature

Every chatbot I built before 2021 either tried to handle everything (and failed embarrassingly on edge cases) or immediately offered to "connect you with a team member" the moment things got complicated (which made the bot feel useless).

The bots that actually worked — the ones clients renewed and expanded — had a third option: graceful escalation with context. The bot said something like: "This one's outside what I can handle. I'm flagging this for our team with a summary of your situation so you don't have to repeat yourself. Expect a reply within 2 hours."

Today this is even more important with AI agents that can take real actions. An agent that tries to handle a billing dispute it doesn't fully understand and makes it worse is much worse than an agent that recognizes its limits and escalates cleanly.

The fix:Before you write a single line of code or a single prompt, design the escalation paths. What triggers a handoff? What information goes with it? Who receives it? How fast do they respond? These aren't afterthoughts — they're core features.

Lesson 3: People anthropomorphize — design for it, don't fight it

In 2018, the conventional wisdom was: always tell users they're talking to a bot. Be totally transparent. We did this. And we watched users immediately disengage. Not because they were deceived — because the moment they knew it was a bot, they assumed it was stupid.

The opposite was also a disaster. Bots that tried too hard to seem human created uncanny valley problems and real trust issues when users eventually figured it out.

The sweet spot we landed on: be honest about what you are, but give the AI a real identity and consistent voice. Not a fake name and a fake backstory. A genuine character — warm, direct, capable of a joke when appropriate. Users trusted these systems more than the ones that either apologized for being a bot or pretended not to be one.

With Claude, this is easier than it's ever been. The models are genuinely good at maintaining consistent voice and personality. The work is in the prompt — being specific about tone, what to avoid, what the persona values.

Lesson 4: The first two weeks of production are your most valuable data

I've built systems that performed perfectly in testing and fell apart on day three of production. I've also built systems I was nervous about that sailed through and turned out to handle edge cases elegantly.

You cannot predict what users will actually say or ask. You can only instrument well and watch.

For the first two weeks of any production AI deployment, I watch: escalation rate (what % of conversations the AI couldn't handle), resolution rate (what % it resolved without human intervention), and the raw transcripts of conversations that ended in escalation. Those transcripts tell you exactly what to fix next.

The fix:Budget for a "tuning sprint" after launch. Not optional polish — essential calibration. The production data from the first two weeks is more valuable than all the pre-launch testing combined.

Lesson 5: If leadership isn't using it, no one will

I learned this one the hard way with an internal knowledge bot we built for a 40-person professional services firm. The bot was excellent — it answered questions about HR policy, processes, and internal tools accurately and quickly. Adoption after 90 days: about 12%.

The problem wasn't the bot. The problem was that the CEO still emailed his EA when he needed to find a policy. The operations director still slacked the team lead directly. Leadership didn't use the tool, so no one below them felt they needed to either.

The projects that stick are the ones where someone senior — ideally the CEO or the person with the most credibility on that team — visibly uses and endorses the tool in the first 30 days.

What's actually different in 2026

The failure modes are the same. The stakes are higher, because the technology is powerful enough that a poorly designed system can cause real problems at scale — wrong information sent to real customers, automated actions taken incorrectly, privacy data exposed.

What's different is that when you get it right, the results are dramatically better than anything we could build in 2018. A well-designed Claude agent with good context and clear constraints can handle conversations that would have required a human two years ago.

The businesses winning with AI right now are the ones treating it like any other operations project: clear scope, measured rollout, ownership, and the patience to tune based on real data. They're not the ones who went fastest. They're the ones who went smart.

The same principles that kept early chatbots alive are running inside our own products today. AEC Benefits runs a cron-driven email outreach engine that fires 9-to-5, Monday through Friday — CASL-compliant by design, not policy. BoxBuddy handles subscription sync across the App Store, RevenueCat, and the server without a support ticket. Both survived contact with real users because the scope was tight and the escalation paths were designed from day one.

What 8 years of building chatbots taught us about deploying AI today