DIET

Why Rasa uses Sparse Layers in Transformers

By Johannes Mosig and Vladimir Vlasov. Feed forward neural network layers are typically fully connected, or dense. But do we actually need to connect every input…

Johannes E. M. Mosig