잠시만 기다려 주세요. 로딩중입니다.

Transformer 네트워크를 이용한 음성신호 변환

Voice-to-voice conversion using transformer network

말소리와 음성과학 2020년 12권 3호 p.55 ~ 63
김준우, 정호영,
소속 상세정보
김준우 ( Kim June-Woo ) - Kyungpook National University Department of Artificial Intelligence
정호영 ( Jung Ho-Young ) - Kyungpook National University Department of Artificial Intelligence

Abstract


Voice conversion can be applied to various voice processing applications. It can also play an important role in data augmentation for speech recognition. The conventional method uses the architecture of voice conversion with speech synthesis, with Mel filter bank as the main parameter. Mel filter bank is well-suited for quick computation of neural networks but cannot be converted into a high-quality waveform without the aid of a vocoder. Further, it is not effective in terms of obtaining data for speech recognition. In this paper, we focus on performing voice-to-voice conversion using only the raw spectrum. We propose a deep learning model based on the transformer network, which quickly learns the voice conversion properties using an attention mechanism between source and target spectral components. The experiments were performed on TIDIGITS data, a series of numbers spoken by an English speaker. The conversion voices were evaluated for naturalness and similarity using mean opinion score (MOS) obtained from 30 participants. Our final results yielded 3.52±0.22 for naturalness and 3.89±0.19 for similarity.

키워드

voice conversion; transformer network; signal-to-signal conversion

원문 및 링크아웃 정보

등재저널 정보