A Hybrid ASR System for Southern Dutch

Bob Van Dyck; Bagher BabaAli; Dirk  Van Compernolle

Authors

Bob Van Dyck KU Leuven
Bagher BabaAli University of Tehran
Dirk Van Compernolle KU Leuven

Abstract

Classical hybrid models for automatic speech recognition were recently outperformed by end-toend models on popular benchmarks such as LibriSpeech. However, in many real life situations, hybrid systems can prevail due to independent training, optimization and tuning of the acoustic and language models. In this work, we implemented a state-of-the-art hybrid system for Southern Dutch. For the acoustic model, we train a HMM-DNN on 155 hrs of the Corpus Gesproken Nederlands (CGN) with a rather standard Kaldi recipe. As reference, we reused language models developed during our N-Best 2008 evaluation. We further investigated the effect of language model order and size on WER for a variety of test sets (held out data from CGN, N-Best dev and test sets). Best results, 10.12% WER on the N-Best test set, are obtained with a 400k lexicon and a 4-gram language model (with 231M parameters). This new hybrid system outperforms our older HMM-GMM based N-Best system by over 40%. Pruning away 90% of the LM parameters yields a compact model suitable for small scale real-time apps while only taking a 10% relative hit on performance.

A Hybrid ASR System for Southern Dutch

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

Most read articles by the same author(s)