Back BR949123

We Built an IVR That Doesn't Trap People in a Loop

Mar 09, 2026 6 min read
BR
Build Roulette Team
System Engineering

The Moment It Became Real

Every early-stage product has a version of this moment.

The support line was ringing. Customers were calling in with real problems order issues, tracking questions, billing confusion. And every single call was landing in the same place.

One queue. One pile. Every agent context-switching between problems they weren't equipped to solve. Customers were getting transferred. Then transferred again. Wait times were climbing. And the team had no visibility into which calls were piling up or why.

The infrastructure wasn't broken. It just wasn't designed. And at the scale Roulette Technologies was operating, that distinction had started to cost real money and real customers.

Something had to change and the change must be SWIFT!

The Real Problem Wasn't Routing. It Was Chaos.

Here's what an unstructured call system actually looks like from the inside:

What You See What's Actually Happening
"Calls are being answered" Agents are triaging blind no context before pickup
"Wait times seem okay" You're measuring the wrong thing transfers aren't counted
"Customers seem fine" Repeat callers are hiding the real drop-off
"The system is working" It's working despite the design, not because of it

And the IVR problems were their own category entirely. Poorly designed voice menus don't just frustrate callers they actively trap them:

  • A caller presses the wrong key. The system has no response. Dead end.
  • A caller doesn't press anything. The system loops. Endlessly.
  • A caller gets confused and tries again. Third attempt. Still stuck.

An IVR without a fallback isn't a system. It's a trap with hold music.

The goal wasn't just better routing. It was building something callers could actually navigate and agents could actually trust.

What We Decided to Build

A multi-queue IVR architecture on Amazon Connect. Three queues, one clear menu, and a fault-handling layer that actually anticipates what goes wrong.

The contact flow looks like this:

Inbound Call
  ↓
Welcome Prompt
  ↓
"Press 1 for Orders  |  Press 2 for Tracking  |  Press 3 for Support"
  ↓
  ├─ Press 1  →  OrderQueue
  ├─ Press 2  →  TrackingQueue
  └─ Press 3  →  CustomerSupportQueue

Simple. But the simplicity is intentional.

Every branch follows the same two-step pattern before the caller ever reaches an agent:

Set Working Queue  →  Transfer to Queue

That sequence matters. Setting the queue before the transfer means the call arrives pre-categorised. Agents don't pick up a mystery call they pick up a call they're already equipped to handle.

The Architecture, Layer by Layer

How a call travels end-to-end

Customer dials in
  ↓
Amazon Connect receives the call
  ↓
Inbound Contact Flow activates
  ↓
IVR Menu presents options
  ↓
Caller selects → Queue assigned
  ↓
Agent picks up via CCP

The three queues and what they own

Queue Owns
OrderQueue Order issues, changes, cancellations
TrackingQueue Delivery status, shipment updates
CustomerSupportQueue Everything else and all fallback traffic

Agents are assigned to queues through routing profiles. When call volume spikes in one area, that team absorbs it — without pulling agents from unrelated queues.

The Part Most IVR Builds Get Wrong: Fault Handling

Routing is the easy part. Fault handling is where most systems quietly fail.

We designed three layers of protection.

Layer 1 Invalid Input

Caller presses a key that isn't on the menu.

Invalid input detected
  ↓
"Sorry, I didn't catch that. Let's try again."
  ↓
Menu replays  (attempt 2 of 3)

Layer 2 Timeout

Caller doesn't press anything. Silence.

No input within window
  ↓
"Still there? Here are your options."
  ↓
Menu replays  (attempt 2 of 3)

Layer 3 The Guardrail That Matters Most

After three failed attempts whether invalid input, timeout or both the system stops looping.

3 attempts exhausted
  ↓
Route directly to CustomerSupportQueue

No more infinite loops. No more dead ends. If a caller can't navigate the menu, they still reach a human.

This is the rule that separates a resilient IVR from one that hemorrhages callers at 2% per failed attempt.

We Also Wired In Queue Intelligence Even Though We're Not Using It Yet

The Get Queue Metrics block is live in the flow right now. It's collecting wait time data per queue on every call.

We're not acting on it today. But when we do, no redesign is needed. We just add the branch:

If OrderQueue wait time > 5 min
  ├─ "Would you like a callback instead?"
  ├─ Send SMS with self-service link
  └─ Route to overflow queue

Building the data collection in now cost us nothing. Not building it in would cost us a full flow rework later.

The Honest Tradeoffs

We didn't build a perfect system. We built the right system for where we are.

Tradeoff What It Means in Practice
🟡 Flow changes need redeployment Adding a new menu option isn't a config change — it's a tested flow update
🟡 Routing profiles are load-bearing A misconfigured profile silently stops agents from receiving calls. No error. Just silence.
🟡 No self-service resolution yet Every call still ends with an agent. Automation comes in the next phase.
🟢 Menu is intentionally static Three options is easy to navigate. We're not adding options until volume justifies it.

What's Coming Next

Phase 2 Self-Service Order Status

The OrderQueue is the highest volume queue. It's also the most automatable.

Caller selects Orders
  ↓
"Enter your order number"
  ↓
Lambda invoked
  ↓
DynamoDB lookup
  ↓
"Your order ships Thursday. Tracking number is..."
  ↓
Call ends  —  no agent needed

Estimated deflection: 30–40% of order queue volume. That's not a small number. At current call volumes, that's hours of agent time returned to the team each week.

Phase 3 Live Queue Deflection

When wait times spike, the system will offer exits before callers abandon:

  • Callback scheduling
  • SMS link to the self-service portal
  • Voicemail for non-urgent requests

The metrics collection is already running. Phase 3 is a branch condition away.

What This Actually Delivered

Three dedicated queues. A menu callers can navigate. A fault-handling layer that never leaves someone stranded. Agent workflows that arrive with context already attached. And a metrics foundation that makes the next two phases low-effort additions not rewrites.

The system is live. Agents are receiving categorised calls. The loop trap is gone.

The automation layer is next.

Context

Pattern
Amazon Connect IVR Call Centers
Views
29
Likes
0