Level 3 Contextual Assistants: Beyond Answering Simple Questions

When the Messenger platform opened, chatbots emerged in force. Unfortunately, these early chatbots were disappointing to use, as people found they couldn't get their message across, let alone have a conversation. As a result, it was suggested the current technology was insufficient - i.e. that only Artificial General Intelligence will be able to provide for a true conversational experience. But the world doesn't have to wait for Tony Stark to create J.A.R.V.I.S to start having conversations that are useful.

In a recent article, we suggest thinking of chatbots along a tiered model of "Five Levels of AI Assistants". This enables us to think beyond just two disparate levels: a basic AI that responds to questions, or an overpowered AGI that can answer everything. Right now, most chatbots are handling basic FAQs in customer service use cases. But what we see in our community is that the next level of "contextual assistants" is actively expanding. We're seeing use cases emerge in other important enterprise functions such as sales, marketing, and internal processes.

This post will explain further what we've identified as key capabilities of such level 3 assistants to handle contextual conversations with contextual data. Let's start with an example conversation between an assistant from a renters insurance company and their customer that we would count as a "contextual conversation" - don't worry, there is more explanation coming for what all those words mean:

What does context actually mean?

People use the word "context" in many different ways. For us, it is all about the context of a conversation. Understanding the context of the conversation is key as it can either dramatically shorten conversations or make it possible to deal with ambiguous user input.

From our experience, it boils down to different types of context with different contextual data that impact the flow of the conversation:

User Profile: Data about a user can vary a lot based on the use case or industry, e.g. age, name, address or which insurance policy they purchased.
User Goal: A user goal is usually the starting point of a conversation and can be, for example, a specific problem. All user goals are intents but not all intents are user goals (e.g. yes/no is not a user goal)
Conversation History: What has been said before contains a lot of important contextual data, especially to clarify what the user meant. For example, a pronoun like "it" might refer to a noun before like "MacBook" (coreference).
User Response: A user's response can be directly connected to the context it was asked in. For example, a "yes" answer in itself doesn't mean much but needs to be connected to a specific question before.
World Knowledge: This is a tricky one as it is very broad. It could be anything from knowing the location of the user so that "downtown" means "Manhattan" if the user is in New York or "rangers" means "New York Rangers" because the conversation is about ice hockey. It could also include data from sensors or feeds (e.g. the prevailing weather or fuel level in a car)

What an assistant needs to handle contextual conversations

For us, a contextual assistant can handle any user goal gracefully and help accomplish it as best as possible. This doesn't mean that the assistant can answer everything but it at least handles the conversation to help the user, for example by handing it over to a human. It is not an easy job for a computer - in fact, it is the reason we started Rasa.

To handle those different types of contextual data, we think that there are six different key capabilities for an AI assistant to bring it from a "Level 2" FAQ assistant to a "Level 3" Contextual Assistant:

Gracefully Handle Any User Goal: Assistants won't have super intelligence overnight but it is still very important that they can handle any user goal gracefully. This can either mean that they actually handle a full conversation end-to-end because they were trained to do so or are able to generalize from previous conversations. Alternatively, this could also mean to hand-off to a human directly (e.g. directly in this conversation) or indirectly (e.g. phone number of customer service)
Read/Write Contextual Data: It is important for the AI assistant to be able to read existing or write new contextual data to a secure data stores e.g. through APIs. This lets conversations change data in existing systems and ensure that returning users don't start from scratch.
Disambiguation: Human language usually contains a lot of ambiguity - that's why it is important to remove uncertainty. For example, this could be achieved by asking the user to confirm before proceeding.
Extract Contextual Data: Language can contain a lot of relevant structured data (e.g. dates, cities) that needs to be extracted from unstructured text.
Change of Context: The user might at any point in time change their mind. This can affect any type of contextual data from user response (e.g. typo "2016" instead of "2017") to a change of user goal (e.g. now want to cancel renters insurance policy)
Business Logic: Companies tend to have many different rules that influence either what the assistant may ask (e.g. the insurance company needs specific information to properly insure different items) or the content of the message (e.g. calculating pricing through an API and then weaving it into the message)

Being able to handle context in a conversation is important to bring AI assistants to the next level and to a broader audience but it is not easy. The Rasa community consists of thousands of developers over the world pushing the boundaries, helping each other and building contextual assistants. We're constantly working on improving the capabilities of our tools and also make them easier to use. Get started now with Rasa open source Rasa!

You want to share your thoughts on that topic? Join the discussion!