Justina Petraityte is a data scientist at a video games development company Radiant Worlds. Combining her passion for AI and conversational bots, Justina recently used Rasa NLU to build SHIBA, a bot that works like an AI Data Analyst. Learn more about SHIBA and join the Rasa Gitter community to talk about building bots with Rasa.
Hi Justina, tell us a little bit about yourself.
Hello! My name is Justina Petraityte and I am a Data Scientist at the game developer Radiant Worlds. As a Data Scientist, I perform in-depth analysis on player behavioural data, look for insights which are valuable for our game designers and stakeholders and work on various Machine Learning projects. I am very passionate about Machine Learning and AI and it’s applications to real world problems. These days I am very excited to see a soaring potential of intelligent conversational bots and I believe that in a near future they are going to revolutionise the way we work, communicate or handle our daily tasks.
What did you build? What does your bot do?
The bot is called SHIBA which stands for Slack Hosted Interface for Business Analytics. In simple words — SHIBA works like an AI Data Analyst which is capable of fetching, aggregating, visualising data and finding useful insights based on simple natural language conversations on Slack. The main reason why I wanted to create SHIBA was to make the data more accessible to the decision makers in our company. With this intelligent assistant anyone in our company is just a few simple questions away from data insights needed to prepare for meetings or make decisions.
How did Rasa NLU come in hand when developing this bot?
Broadly speaking, SHIBA is responsible for data reporting which means that the bot has to understand what data it is being asked. Rasa NLU was extremely helpful for intent classification — teaching SHIBA to understand what the question is about (for example, if a user asks for the number of quests completed, SHIBA should know that this question is related to quests data etc.). Rasa NLU was also used for entity extraction — useful details like dates, country names and other important bits which are used to filter, aggregate or add additional dimensions to the data.
What training data did you use?
As long as SHIBA has to deal with domain specific data, at first I had to generate most of the training data myself, using examples of data requests that our analytics team usually gets from other teams and stakeholders. Now, I am enriching the training dataset by using the log of conversations which users have with SHIBA. I learned that it is very important to keep in mind that training data has to be rich and diverse to create the most intelligent and flexible bots.
What were the most interesting and challenging parts of your project?
Every stage of this project had some exciting and interesting aspects — starting with training the Rasa NLU interpreter and seeing how quickly it picks up new information or investigating why it made some specific mistakes and ending with teaching the bot to perform new tasks. There is also something very exciting in creating an AI which can basically do parts of my own job.
The biggest challenge when developing the bot was related to domain specifics. In our business language, quite often we use acronyms instead of full names (it especially applies to the names of metrics and KPIs). It means that a lot of data requests can have a very similar structure and vocabulary and the only difference between them is a three or four digit KPI or metric name. We wanted to create this bot as user friendly as possible without putting any constrains on how the users should formulate data requests and questions, so the hardest part was to teach SHIBA to provide correct answers to those requests which are very similar in terms of structure and vocabulary, but refer to different intents.
How did you go about labelling training data?
At the beginning, most of the training data labelling was done manually — thinking of different ways to formulate requests for specific KPIs or metrics and setting their names as intents. Later on, we implemented a question — reply — feedback system where AI users could provide some feedback on whether or not SHIBA responded in a correct way. Correctly answered questions along with extracted intents and entities were directly added to training dataset while incorrectly answered questions were first reviewed, corrected and then joined to training dataset as well.
What do you think about the future of chatbots / NLU technology in general?
I think that chatbots are one of the trendiest things amongst industries these days. And it’s not surprising, because their potential is enormous. Chatbots are bridging the gap between advanced technologies and people who want to complete time consuming everyday tasks in the most efficient and comfortable way. In my opinion, in the future every business and every household will be using natural language AI assistants which will completely change the way we work and live.
I think that up till this point NLU technology made a huge progress and I am sure that it is going to get way more attention in the near future. I think that in upcoming few years we will see a breakthrough of NLU systems which will not only be able to understand natural language inputs, but will also be able to provide accurate and reasonable answers in natural human language.