Modern voice agents combine speech recognition (ASR), language understanding via LLM and speech synthesis (TTS) in real time. Unlike classic IVRs, they handle contextual conversations, interruptions and memory. They work well for tier-1 customer service, bookings, lead qualification. Cost per minute: 0.05-0.30 € with cloud APIs.
Until a few years ago, when a customer called a company and found themselves in front of an automated voice menu, the reaction was almost always the same: frustration. Those rigid systems, those endless menus, that synthetic voice that understood nothing outside precise keywords. Today the situation has changed substantially, and the term "voice agent" has nothing to do with that technology.
A modern voice agent is an AI-based system that conducts real voice conversations, in natural language, bidirectionally. It doesn't wait for you to say the right word: it understands context, handles ambiguity, responds coherently and, when needed, integrates with business systems to retrieve information or perform actions. The difference compared to a traditional IVR is the same as between a paper form and a consultant who listens to you.
How it works, in practice
Behind a voice agent there are three main components working in sequence, in very tight timeframes.
The first is speech recognition, or STT (speech-to-text): it converts the call audio to text in real time. The second is the language model, which interprets the text, reasons about the most appropriate response and decides if and how to engage business systems. The third is speech synthesis, or TTS (text-to-speech): it transforms the textual response into natural audio, with intonation, pauses and rhythm very close to human speech.
All this must happen in less than 400-500 milliseconds for the conversation not to feel artificial. This is where the quality of an implementation shows: perceived latency is one of the factors determining whether the user trusts the agent or abandons the call.
The part often underestimated is integration with existing systems. A truly useful voice agent is not isolated: it accesses the CRM to recognise the customer, consults the business system to verify order status, opens a ticket, updates a calendar. Without this connection with real company data, it remains a façade tool.

The difference from a chatbot
The question comes up often: "But isn't it the same as a chatbot?" No, and the difference is not just the channel.
A text chatbot operates in an environment where the user types, re-reads, corrects and waits. Voice works differently: it's synchronous, doesn't allow long pauses, requires the agent to handle interruptions, overlaps and uncertainties in speech. Anyone who has ever had to say "no, wait, I meant something else" knows what we're talking about.
Use cases also tend to differ. Chatbots work well for asynchronous requests, app or web support, flows where the user has time to read and reflect. Voice agents adapt better to contexts where voice is the natural channel: the customer calling on mobile while in the car, the supplier wanting a quick update, the patient booking a visit. It's not about choosing one or the other absolutely, but understanding which tool fits where.
Where they bring real ROI today
The use cases generating concrete results right now concentrate in some specific areas.
First-level customer support. Handling repetitive requests, those a human operator answers two hundred times a day, is the most widespread use case with the most immediate return. Order status, hours, product information, return procedures. Data show 40% to 70% of calls handled entirely without human intervention, with zero waiting times.
Bookings and appointments. Medical practices, beauty centres, workshops, restaurants: all contexts where the phone is still the primary channel and managing bookings requires dedicated staff. A voice agent can handle the schedule autonomously, send confirmations, handle cancellations and rescheduling.
Inbound lead qualification. When a potential customer calls outside hours or finds the line busy, they often don't call back. A voice agent can collect relevant information, qualify interest and pass the contact to sales with a briefing ready.
Notifications and outbound follow-up. Not just inbound calls: voice agents can make outbound calls to confirm appointments, collect feedback after a purchase or notify updates on a case. More effective than an SMS, less invasive than a human call for certain contexts.

The right questions to ask before investing
Deploying a voice agent isn't a plug-and-play project, at least not if you want it to truly work. Before starting, it's worth answering some concrete questions.
What problem are you trying to solve? Starting from technology is the most common mistake. If the answer is "I want a voice agent because it's modern", the project probably won't generate value. If the answer is "we receive 300 calls a month about order status and can't keep up", then you can build something useful.
Are your systems integrable? A voice agent without access to real data is like an operator who doesn't know the company. Before thinking of the agent, it's worth understanding if CRM, business systems and other tools have accessible APIs or if preliminary integration work is needed.
Who handles the exception? A good voice agent knows when the situation exceeds its competence and transfers the call to a human operator with context ready. Designing this handoff is as important as designing the automatic flows.
What is the language and context? English is well handled by recent models, but there are regional differences, sector-specific terminology and nuances that must be tested on the field. An agent trained on a specific sector performs better than a generic one.
When custom-build makes sense
On the market there are no-code platforms allowing voice agent configuration in a few hours. For simple and standardised use cases, they can be a valid solution. When needs become more specific, however, limits show: complex conversational flows, integrations with proprietary systems, vertical scenario handling, GDPR compliance with full control over data.
In these cases, building a custom voice agent with architecture designed around real business processes makes the difference between a showcase tool and one that becomes operational part of the business.
At Redergo we work on both fronts: we evaluate with the client whether an existing solution covers their needs, or we design and develop the agent from architecture to integration, with particular attention to conversation quality and connection with already in-use systems. If you're evaluating this kind of project, contact us for an initial assessment.



