From Voice Bots to RAG: My AI Journey

My AI journey did not start with a benchmark spreadsheet or a sudden decision to “pivot into AI.” It started with a very practical workflow problem: dental clinics spend an absurd amount of time on repetitive insurance verification calls. That project became my first real immersion into applied AI, and it still shapes how I evaluate systems now.

Dentistry Automation was my real starting point

At Dentistry Automation, I was the sole architect for an AI voice bot used for dental insurance verification. I built the system around Python, Flask, GPT orchestration, Whisper, Google TTS, Twilio, Azure SQL, and WebSockets. It was one of the most complete end-to-end systems I had built at that point, and it taught me a lot because nothing could hide behind a nice notebook demo.

The product had to place outbound calls, navigate IVR systems, maintain multi-turn dialogue, and extract useful information from noisy real-world interactions. It also had to work across 7+ payers, including Ameritas, Delta Dental, Cigna, MetLife, and UnitedHealthcare. Every payer had its own quirks. Even when two systems ostensibly asked for the same information, they often structured the flow differently.

A high-level architecture looked like this:

scheduler -> Twilio outbound call -> IVR audio stream -> Whisper transcription
-> dialogue policy / GPT orchestration -> Google TTS response -> live call continuation
-> extracted coverage details -> Azure SQL persistence -> clinic dashboard

That loop sounds clean on paper. In reality, it was full of latency budgets, interruptions, misheard terms, and edge cases where the “correct” answer still felt terrible in a live conversation.

Voice taught me that latency is part of product quality

One of the biggest lessons from that system was that voice experiences are brutally honest. In text, a slow answer can still feel acceptable. In a phone call, even a small delay can make the bot sound awkward or broken.

That forced me to think in systems terms:

Audio pipeline timing

Every stage mattered: capture, transcription, orchestration, synthesis, and playback. If any stage stalled, the entire interaction felt unnatural.

Turn-taking behavior

Humans interrupt each other. IVRs also interrupt you in weird ways. The system needed to know when to wait, when to respond, and when to recover from partial information.

State tracking

Insurance verification is repetitive, but it is not trivial. The bot had to remember what had already been asked, what details were confirmed, and which payer-specific branch it was on.

I became a lot less impressed by raw model quality and a lot more impressed by orchestration quality.

HireQuotient pushed me from voice into data workflows

Later, at HireQuotient, the modality changed but the core lesson stayed the same. I worked on a data enrichment stack using Node/Express, MongoDB, GPT-4, Perplexity, and Gemini. Instead of voice interactions, I was dealing with company intelligence, retrieval quality, and external API cost.

The challenge there was not “can we call more models?” It was “how do we get better signal while depending on external services less?” We achieved about a 75% reduction in API calls and benchmarked 4 strategies across 655 companies, which gave us enough signal to stop arguing from gut feel.

A simplified orchestration pattern looked like this:

for (const company of companies) {
  const cached = await cache.lookup(company.domain);
  if (cached) return cached;

  const quickPass = await lightweightClassifier(company);
  if (!quickPass.needsDeepResearch) return quickPass;

  const enriched = await runMultiSourceResearch(company, ["gpt4", "perplexity", "gemini"]);
  await cache.store(company.domain, enriched);
}

The point was not to be fancy. The point was to avoid paying premium costs for cases where a cheaper path already solved the problem.

The through-line: usefulness beats flashiness

When I compare Dentistry Automation and HireQuotient, the domains look wildly different. One involved live phone calls and dental insurance. The other involved data enrichment and research workflows. But they trained the same instinct in me: treat models as components inside a larger reliability, latency, and cost story.

That is why I have never been very persuaded by AI demos that ignore integration reality. The model matters, but the product lives or dies on orchestration, fallback behavior, grounding, latency, cost control, and whether the user can trust the output.

Why I’m building RAG systems now

That mindset carries directly into the retrieval work behind this portfolio. On the surface, a RAG-backed personal site seems far away from voice bots and enrichment pipelines. To me, it is the same arc.

With RAG, I care about a very specific kind of honesty. If the system answers a question about my work, I want it to answer from real writing, not from generic model interpolation. I want the retrieval layer to be inspectable. I want chunking and ranking to make sense. I want the final answer to feel grounded, not merely fluent.

In other words, I care about the same things I cared about in voice and enrichment:

Grounding

Where did this answer come from?

Control

Can I improve the pipeline without rewriting the entire experience?

Trust

Will the answer stay tied to evidence when the question is fuzzy?

Product feel

Does it feel useful fast, or technically impressive slow?

That continuity is one reason I enjoy building AI systems so much. The interface changes, but the engineering values travel well.

The broader arc feels very personal to me

What I like most about this journey is that it never felt like I was chasing hype. I kept following the same instinct: find repetitive or high-friction work, understand the constraints deeply, and use models only where they genuinely improve the loop.

That instinct took me from a live telephony stack, to multi-model research pipelines, to the retrieval system behind this site. Along the way, I got more technical, but also more skeptical in a healthy way.

Looking back, the path from voice bots to RAG makes perfect sense to me. Voice taught me that AI has to survive messy reality. Enrichment taught me that cost and architecture matter as much as capability. Retrieval is teaching me how much trust depends on grounding. The models have changed, the interfaces have changed, and my toolchain has definitely changed, but the core motivation hasn’t: I want to build AI systems that are not just impressive when demoed, but dependable when used.