TinyDenoiser: RNN-based Speech Enhancement on a Multi-Core MCU with Mixed FP16-INT8 Post-Training Quantization
Manuele Rusci
MSCA Post-Doc Fellow
the Katholieke Universiteit Leuven
This talk presents an optimized methodology to design and deploy Speech Enhancement (SE) algorithms based on Recurrent Neural Networks (RNNs) on a state-of-the-art MicroController Unit (MCU), with 1+8 general-purpose RISC-V cores and support for vector 8-bit integer (INT8) and 16-bit floating-point (FP16) arithmetic. To achieve low-latency execution, we propose a software pipeline interleaving parallel computation of LSTM or GRU recurrent units with manually-managed memory transfers of the model parameters. To ensure minimal accuracy degradation with respect to the full-precision models, we also propose a novel FP16-INT8 Mixed-Precision Post-Training Quantization (PTQ) scheme that compresses the recurrent layers to 8-bit while the bit precision of remaining layers is kept to FP16. Experiments are conducted on multiple LSTM and GRU based SE models belonging to the TinyDenoiser family and featuring up to 1.24M parameters. Thanks to the proposed approach, we speed-up the computation by up to 4× with respect to the lossless FP16 baselines, while showing a low-degradation of the PESQ score. Our design results 10× more energy efficient than state-of-the-art SE solutions deployed on single-core MCUs that make use of smaller models and quantization-aware training.