Title
CNN-based multichannel end-to-end speech recognition for everyday home environments
Date Issued
01 September 2019
Access level
open access
Resource Type
conference paper
Author(s)
Watanabe S.
Hori T.
Nakadai K.
Ogata T.
Waseda University
Publisher(s)
European Signal Processing Conference, EUSIPCO
Abstract
Casual conversations involving multiple speakers and noises from surrounding devices are common in everyday environments, which degrades the performances of automatic speech recognition systems. These challenging characteristics of environments are the target of the CHiME-5 challenge. By employing a convolutional neural network (CNN)-based multichannel end-to-end speech recognition system, this study attempts to overcome the presents difficulties in everyday environments. The system comprises of an attention-based encoder-decoder neural network that directly generates a text as an output from a sound input. The multichannel CNN encoder, which uses residual connections and batch renormalization, is trained with augmented data, including white noise injection. The experimental results show that the word error rate is reduced by 8.5% and 0.6% absolute from a single channel end-to-end and the best baseline (LF-MMI TDNN) on the CHiME-5 corpus, respectively.
Volume
2019-September
Language
English
OCDE Knowledge area
Ingeniería eléctrica, Ingeniería electrónica
Scopus EID
2-s2.0-85075609674
Resource of which it is part
European Signal Processing Conference
ISSN of the container
22195491
ISBN of the container
978-908279703-9
Conference
27th European Signal Processing Conference, EUSIPCO 2019
Sponsor(s)
The work has been supported by MEXT Grant-in-Aid for Scientific Research (A), No. 15H01710, except for the contribution of Mitsubishi Electric Research Laboratories (MERL). Ministry of Education, Culture, Sports, Science and Technology 15H01710 MEXT
Sources of information: Directorio de Producción Científica Scopus