Introducing Rasa NLU Examples

Conversational AI is an experimental field. This is in part because the field is relatively new but also because no chatbot is the same. Best practices for a digital assistant that needs to handle purchase orders in English may not directly apply to one that needs to handle customer support in Zulu.

The research team at Rasa is building and researching tools that cover many use-cases but at the same time they do not have access to all of the data that our users have. This means that we’re limited in the experiments that we might do.

This made us wonder. Would it help our community if instead of asking for data to be shared we might instead share more tools? This way no data needs to be shared but we can still empower our users by allowing them to customise their machine learning configuration.

This is why we’re happy to announce a new project on github; rasa nlu examples. The goal of this library is to host more experimental rasa nlu components that are supported by the community. This gives us the opportunity to share some experimental ideas but it also means that users can contribute and share their components.

The library is still small but already comes with useful components. The printer component from a previous blogpost is currently supported and we also offer two new sources of word embeddings; fasttext embeddings (available in 157 languages) as well as the lightweight byte-pair embeddings (available in 275 languages, including some multi-language embeddings).

Quick Start

Using the NLU example components is easy. You can install the repo using pip via github.

pip install git+https://github.com/RasaHQ/rasa-nlu-examples

From here you can add components to your pipeline. The pipeline below adds French Byte-Pair embeddings to the pipeline.

language: fr
pipeline:
- name: WhitespaceTokenizer
- name: CountVectorsFeaturizer
  OOV_token: oov.txt
  token_pattern: (?u)\b\w+\b
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: rasa_nlu_examples.featurizers.dense.BytePairFeaturizer
  lang: fr
  vs: 200000
  dim: 300
- name: DIETClassifier
  epochs: 200

You can find more details in the benchmarking guide.

Goals

The goal of the library is to be a `contrib`-like library. We’ll be able to allow for more experimental features because the example components won’t need to go through the same vetting process our Rasa Open Source library. There will still be a small review process to make sure that the tools that get added are useful to the Rasa community and we’ll also make sure that the tools receive unit tests.

Another goal of the library is to offer examples of implemented components such that it is easier for you to write your own. We hope this library will inspire folks to contribute the ir own ideas to the growing Rasa ecosystem and we’d love to hear what components you can come up with.

You can find the documentation here. Happy hacking!