Failing Gracefully with Rasa

Rasa Stack is an open source machine learning-based framework for building level 3 contextual AI assistants. In Rasa Core version 0.13 we published the new TwoStageFallbackPolicy which provides a user-friendly conversation flow to resolve misclassified user messages and enables an easy integration of hand-offs in case this is not possible.

Rasa TwoStageFallback Policy

Although machine learning approaches to understand human language and to handle contextual dialogues are constantly improving, it is still quite common that AI assistants commit errors. However, single errors should not lead to failing conversations. Instead, a good AI assistant should be able to recover from errors and get back on track.

A common reason for failing conversations are wrong classifications of user messages. This might be caused by

  • Lack of training data
  • Spelling errors
  • Out of scope user messages like “I want pizza”
  • Ambiguous messages, e.g. “How to get started with Rasa?” could mean “How to install Rasa?” or “How to use Rasa?”

While the lack of training data can be fixed by collecting more training data, the other problems are harder to handle. For this reason Rasa already offers a fallback policy. This policy checks whether the Rasa Stack component Rasa NLU responsible for understanding user messages was able to classify the user message confidently. If the classification confidence is below a certain threshold, it triggers a fallback action. This action could then be used to ask the user to rephrase their message or to trigger a hand-off. However, when interacting with a chatbot this fallback behavior can do more harm than good as you can see in the example below.

Old Rasa Fallback Policy

Failing Gracefully: The New TwoStageFallbackPolicy

Based on our experience and the feedback from Rasa users we have developed the new TwoStageFallbackPolicy. But what does it do differently compared to the old fallback policy?

  • Natural conversation flows to correct misclassified user messages
  • Quick and robust message rephrasing
  • Detailed and customizable control of the single fallback steps

Like the original Fallback Policy, the TwoStageFallback policy takes over in case a NLU classification is below a specified confidence level. Based on the NLU classification it offers the user a list of likely intents which they can choose from. The intent selection is typically done using buttons to narrow down the range of possible inputs and to save the user some typing. In case none of the suggested intents match the user’s intent, the policy triggers a request to rephrase their message.

To avoid mindless repetitions as shown above, users are only asked to rephrase once. If their rephrasing was also ambiguous, they are offered a second list of possible intents. Upon failure of this second attempt to classify the intent, an ultimate fallback is triggered to e.g. perform hand-off to a human. The gif at the top of this article shows these new flows.

By splitting the fallback in the two stages affirmation and rephrasing the policy allows more natural conversation flows which resemble the way humans would act in case they are not sure whether they understood something correctly. Having multiple fallback stages further gives the AI assistant an extra chance to recover.

To use the new fallback policy simply replace “FallbackPolicy” with “TwoStageFallbackPolicy” in the Rasa Core policy configuration with the snippet below. It is not required to provide any additional stories since the fallback policy automatically purges the event history after a successful rephrasing.

Customizing the Fallback Policy

When designing the TwoStageFallbackPolicy we paid extra attention to its customizability so developers can customize the fallback behavior according to their individual use cases. This can e.g. be done by overwriting the affirmation action, the rephrasing action as well as the ultimate fallback action with custom actions.

In particular, overwriting the prompt for the affirmation request is a sensible thing to do since the default implementation uses the raw intent names for clarification. Hence, If the bot is not sure whether a user message has the intent make_reserverations, it would ask the user: “Did you mean make_reserverations?” and offer two buttons Yes and No. A more user-friendly option would probably be “Did you mean ‘I want to make a reservation’”?
The following snippet reads the mappings between intent and meaningful prompt from a csv file and uses that to create the affirmation request:

Apart from overwriting the actions, you can also customize the thresholds, the intent which is triggered when a user denies the affirmation of one of the suggested intents, and the actions which are used for the fallbacks.
A detailed description of the possible customizations of the single actions can be found here.

Finding a Good NLU Threshold

When using the fallback policy, it is crucial to select a good threshold for the confidence levels of the user message which have been classified by Rasa NLU. You want the fallback policy to jump in wherever necessary without harassing your users with affirmation requests. Unfortunately, there is no universal threshold which is suitable for all use cases since confidence levels depend on the selected Rasa NLU pipeline and your training data. However, a simple way to gain intuition is to use the built-in evaluation feature of Rasa NLU. Ideally you have some spare test data which you can evaluate on and which was not used previously to train your Rasa NLU model. If this is not the case use this little snippet to separate some of your training data as test data:

Then train a new Rasa NLU model on the training data, and evaluate it on the test data:

This gives you a histogram of the confidences, e.g.:

hist

Ideally the missed predictions have low confidence values and the correct predictions (hits) have high confidence values. However, in reality these ranges overlap as shown in the example. To reduce these overlaps you can

  • Add more examples to the training data
  • Check for wrongly labeled NLU examples
  • Tune the hyperparameters of the intent classifier

Referring to the example above, a NLU threshold of 0.45 is probably a good option since this leads only to a few misclassified intents without having the fallback policy jump in too often. In case correct classifications have precedence over possible too frequent affirmation requests, you might select a more aggressive threshold like 0.75 or even 0.9.

What’s Next

The new TwoStageFallback policy is a powerful addition to the toolset of bot makers to create impressive and robust bots. AI assistants are still in their early days and by adding the new fallback policy we acknowledge that failing gracefully is as important to the user experience as fulfilling the user goals successfully right away.
However, it is our conviction that machine learning will push NLP and hence AI assistants to the next level. Therefore, we will be working to double-down on our research efforts and to make the chatbot development accessible for everyone.

You are having ideas how to improve Rasa? Join our community of makers.

Have you already tried out the new TwoStageFallbackPolicy? Join the discussion in the Rasa forum.