Sam Sucik

2 posts

Pruning BERT to accelerate inference

After previously discussing various ways of accelerating models like BERT, in this blog post we empirically evaluate the pruning approach. You can: read about the implementation…

Compressing BERT for faster prediction

Let's look at compression methods for neural networks, such as quantization and pruning. Then, we apply one to BERT using TensorFlow Lite.…