Building (and Testing) a Voice Assistant with Twilio Voice, Bespoken, and Rasa

In this blog post, we will build a voice assistant based on combining Twilio Voice with Rasa Open Source. Follow along and by the end you will have:

  • A voice-driven Helpdesk Assistant
  • An understanding of how Twilio Voice works
  • An understanding of Rasa Open Source
  • An industrial-grade conversation-driven development workflow, based around Bespoken’s testing and training tools for voice and chat

Let’s jump in!

The Architecture

There are a few key aspects to our architecture. First is the main flow of the system. In this case, we can see that Twilio handles receiving phone calls, performing transcription of what the caller says, and then passing the data to Rasa to map what the user says to intents and manage the dialog sequence.

Conversation-Driven Development

At the same time, we also keep an eye on our conversational-driven development workflow. This is every bit as important, because it defines HOW we train and improve this system to ensure delightful user experiences. For this, we store all our model (and code) in Github. Whenever we make a change, we automatically retrain and re-test the model using Rasa and Bespoken.

Setting Up the Local Environment

To get started, clone this repository:

https://github.com/bespoken/voice-helpdesk-assistant

Everything Rasa-related you need for this tutorial is in the repository - the code, the training data as well as some easy-to-use Docker Compose files for running your environment locally.

Install ngrok - this allows us to build and test our assistant with ease:

https://ngrok.com/download

Open the newly cloned project in your IDE of choice (we recommend VS Code) and fire up ngrok:

ngrok http 5005

Take note of the Forwarding URL listed by ngrok - we will make use of it in the next step.

Setting Up Twilio Voice

Go to Twilio.com and sign up for a Free Trial:

Follow the onboarding wizard and select the following options:

  • Do You Write Code: Yes
  • What Is Your Preferred Language: Python
  • What Is Your Goal Today: Use Twilio In A Project
  • What Do You Want To Do First: Make Or Receive A Phone Call

Once you have completed these steps, you will be prompted to “Get a Number”:

After you have provisioned a number, click on the “Numbers” item on the left-hand side of the menu. Then choose “Manage Numbers”, and then click on the new number you just created. Scroll down, and you will be able to configure your Twilio webhook:

This is the URL that Twilio will POST data to when someone calls the phone number you created. To develop locally, let’s take the URL we got from ngrok and paste it here in the “A Call Comes In” URL field.

We also need to configure our Twilio credentials in our project. They can be found by clicking on the home icon from within Twilio:

In your local project:

  • Make a copy of the file example.env and save it as .env
  • Replace the TWILIO_ACCOUNT_SID with the value from the Twilio Dashboard
  • Replace the TWILIO_AUTH_TOKEN with the value from the Twilio Dashboard

That’s all we need to do in Twilio. Now, we can fire up our Rasa instance.

Running Rasa Locally

We are using Docker compose to run Rasa Open Source locally. This configuration provides three containers:

  • The Rasa Open Source server
  • The action server for handling form validations and custom behavior
  • The “duckling” server - a tool for extracting entities like dates and email addresses

We can run all three with a single command. In a terminal SEPARATE from where you are running ngrok (and make sure ngrok is still running):

docker-compose up

That will start all our services, like so:

Go ahead and try it out to see that it works - make a call to the phone number provided by Twilio. Wasn’t that fun?

Conversation-Driven Development

Now let’s make a change to our service - something simple, but that illustrates the process we talked about at the outset.

We are going to add a synonym for one of our entities to our domain. I shared my prototype with a friend, and he noticed while talking to the assistant that when he said “high” priority, it did not understand him correctly. Instead, the bot thought I said: “hi” - this misclassification of a homophone is a classic challenge when working with Voice AI. We’re going to show you how to resolve it.

First, to verify our issue, we run our end-to-end tests. We do this using Bespoken. First we install Bespoken from the command line:

npm install bespoken-tools -g

If you don’t already have npm on your machine, get it here.

Then we run our tests:

bst test --config tests/testing.json

We can see that we get an error at the step where our user says “high” in the test script.

Now, we add our synonym to the data/nlu.md file:

We retrain our model:

docker exec rasa rasa train

And then we run our tests again:

Bingo! All fixed. And with that, we have illustrated two key steps in conversation-driven development - specifically steps 1, 4 and 6. We shared the design, we tested, and we fixed (and then we tested again, of course :-)).

Next Steps

Check out the project on Github to learn more about:

  • Deploying the bot to a server environment
  • Setting up a handoff to ServiceNow for tracking issues created by the bot
  • Running an industrial-grade CI/CD process based on Github Actions
  • And learn more about testing and conversation-driven development on the Rasa blog.