Pruning BERT to accelerate inference
After previously discussing various ways of accelerating models like BERT, in this blog post we empirically evaluate the pruning approach. You can: read about the implementation…
Compressing BERT for faster prediction
Let's look at compression methods for neural networks, such as quantization and pruning. Then, we apply one to BERT using TensorFlow Lite.…