You May Not Need to Fine-tune: ConveRT Featurizer Makes Sentence Representations More Efficient

In this short blog post, we introduce a new featurizer based on a recently proposed sentence encoding model, ConveRT, and explain how you can use it in Rasa to get very strong performance with a model that trains in minutes on a CPU.

ConveRT as a sentence encoder

Henderson et al. from PolyAI recently open sourced a new sentence encoding model called ConveRT. The model consists of 6 layers of transformer modules and is trained on  a response selection task using conversations from Reddit . Response selection is the task of selecting the most appropriate response to a user utterance from a list of candidate responses, as shown in Figure 1. ConveRT is available as a TFHub Module with usage details in this repository. Please refer to the paper and repository for more details on the architecture and the dataset.

Conversational Response Selection, source: Henderson et. al.

The authors show significant gains when sentence encodings from ConveRT are directly transferred (without any fine-tuning) to a downstream intent classification task. We decided to test ConveRT by extracting sentence-level representations and feeding them into Rasa’s EmbeddingIntentClassifier. We evaluated the intent classification performance on 2 datasets. We compared ConveRT and BERT, using both as feature extractors, and used the same architecture for intent classification on top. Across both datasets, ConveRT gives an improvement of 5-8 points over BERT when testing the F1 score. The more common way to use BERT is to fine-tune the model on your dataset, but that requires a GPU and at least a few hours. We found that in multiple cases the performance of ConveRT + classifier without fine-tuning is quantitatively comparable to BERT + classifier with fine-tuning. The power of sentence-level representations from ConveRT is that you can get top performance with a model that trains in couple of minutes on a CPU.

Note that pre-trained ConveRT is only available in English (at least for now), so if you are working in another language you unfortunately won’t benefit from these new embeddings.

ConveRT for intent classification inside Rasa

With Rasa 1.5.0, we are excited to add a new ConveRT-based featurizer as a component to Rasa: ConveRTFeaturizer. We use the component as a featurizer, meaning that we do not fine-tune the parameters of ConveRT while training an intent classification model. We have also included a new pre-configured pipeline, pretrained_embeddings_convert, which uses ConveRTFeaturizer to extract sentence-level representations and feeds them to EmbeddingIntentClassifier for intent classification. You can use it with the following config:

language: "en"
pipeline: "pretrained_embeddings_convert"

You can also use any other featurizer, like a word or char level CountVectorsFeaturizer in conjunction with ConveRTFeaturizer if needed for your dataset:

language: “en”
pipeline: 
  - name: “WhitespaceTokenizer”
  - name: “ConveRTFeaturizer”
  - name: “CountVectorsFeaturizer”
  - name: “EmbeddingIntentClassifier”

If your training data is in English, the pretrained_embeddings_convert pipeline is definitely a good starting point to train NLU models on your dataset. As mentioned before, if your training data is not in English, we recommend sticking to the supervised_embeddings pipeline, which trains embeddings specifically on your dataset’s vocabulary.

Conclusion

We would love to hear your feedback on the new featurizer and pipeline. Share your use cases and suggestions for using ConveRT with Rasa on this forum thread. We would also be keen to see how the new pipeline performs compared to baseline models you’ve trained already. Join the discussion and contribute evaluation numbers for your dataset, before and after. Stay tuned for more feature releases and happy experimenting!