잠시만 기다려 주세요. 로딩중입니다.

심층신경망 기반의 음성인식을 위한 절충된 특징 정규화 방식

Compromised feature normalization method for deep neural network based speech recognition

말소리와 음성과학 2020년 12권 3호 p.65 ~ 71
김민식, 김형순,
소속 상세정보
김민식 ( Kim Min-Sik ) - Pusan National University Department of Electrical Engineering
김형순 ( Kim Hyung-Soon ) - Pusan National University Department of Electrical Engineering

Abstract


Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.

키워드

speech recognition; feature normalization; environmental mismatch; deep neural network

원문 및 링크아웃 정보

등재저널 정보