Sam Sucik

Machine Learning Intern at Rasa

2 posts

Pruning BERT to accelerate inference

After previously discussing various ways of accelerating models like BERT, in this blog post we empirically evaluate the pruning approach. You can: read about the implementation…

Sam Sucik

Compressing BERT for faster prediction

Let's look at compression methods for neural networks, such as quantization and pruning. Then, we apply one to BERT using TensorFlow Lite.…

Sam Sucik