Train on Larger Datasets Using Less Memory with Sparse Features

With Rasa Open Source 1.6.0, we released sparse features for the NLU pipeline. In this short blog post, we explain what sparse features are and show that we can now train on larger datasets using less memory.

What are sparse features?

Before looking at sparse features, let’s quickly recap what features are in general. When training a Rasa NLU model the training data, e.g. the example user messages, are converted into features, e.g. vectors with numerical values. Let’s look at an example:

The CountVectorsFeaturizer in Rasa creates a bag-of-words representation of user messages. Assume we have the following vocabulary:

(1) Rasa (6) Berlin (11) is
(2) located (7) in (12) based
(3) chatbot (8) assistant (13) San
(4) Francisco (9) company (14) startup
(5) builds (10) conversational (15) great

and the user message “Rasa is based in Berlin”. The featurizer creates a vector of vocabulary size, e.g. 15, and counts how often the corresponding words in the vocabulary occur in the user message. E.g. the user message would be converted into

[1 0 0 0 0 1 1 0 0 0 1 1 0 0 0]

Using this kind of feature representation, some users occasionally encountered memory issues when training a model on a large dataset. With the Rasa 1.6.0 release, this is now a thing of the past.

The example vector above contains a lot of zeros because the vocabulary is typically very large and the actual user message contains just a few of the words in the vocabulary. However, as every number in the vector takes up some memory during training, the needed memory might increase quickly. Sparse features get rid of the zeros in the feature vectors and just store the positions and values of the non-zero values. This way, a lot of memory can be saved and thus we can train on larger datasets.

Sparse features in Rasa

To use sparse features, you simply need to download the latest Rasa release (> 1.6.0) and use Rasa as usual. You do not need to set any additional parameter in your configuration—sparse features will be used during NLU training out of the box.

We trained Rasa on a dataset (part of https://github.com/xliuhw/NLU-Evaluation-Data) and measured how much memory we saved using sparse features. For the test we used the following pipeline:

language: "en"
pipeline:
 - name: WhitespaceTokenizer
 - name: CountVectorsFeaturizer
   analyzer: "word"
 - name: CountVectorsFeaturizer
   analyzer: "char_wb"
   min_ngram: 1
   max_ngram: 5
 - name: EmbeddingIntentClassifier

On a dataset with almost 10,000 training examples we needed roughly up to 13.5 GB of memory during training when using Rasa 1.5.0. Training on the same dataset using Rasa 1.6.0 we only used up to 1.5 GB of memory. So using sparse features we save 11 GB of memory!

Conclusion

We would love to hear your feedback and any questions you might have. Share your use case and success stories with Rasa in the forum. And if you’ve tested sparse features on your data set, we would love to hear about how it went and how much memory you save. Stay tuned for more feature releases and happy coding!