AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

Patton, Brian; Agiomyrgiannakis, Yannis; Terry, Michael; Wilson, Kevin; Saurous, Rif A.; Sculley, D.

Computer Science > Computation and Language

arXiv:1611.09207 (cs)

[Submitted on 28 Nov 2016]

Title:AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

Authors:Brian Patton, Yannis Agiomyrgiannakis, Michael Terry, Kevin Wilson, Rif A. Saurous, D. Sculley

View PDF

Abstract:Developers of text-to-speech synthesizers (TTS) often make use of human raters to assess the quality of synthesized speech. We demonstrate that we can model human raters' mean opinion scores (MOS) of synthesized speech using a deep recurrent neural network whose inputs consist solely of a raw waveform. Our best models provide utterance-level estimates of MOS only moderately inferior to sampled human ratings, as shown by Pearson and Spearman correlations. When multiple utterances are scored and averaged, a scenario common in synthesizer quality assessment, AutoMOS achieves correlations approaching those of human raters. The AutoMOS model has a number of applications, such as the ability to explore the parameter space of a speech synthesizer without requiring a human-in-the-loop.

Comments:	4 pages, 2 figures, 2 tables, NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1611.09207 [cs.CL]
	(or arXiv:1611.09207v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1611.09207

Submission history

From: Brian Patton [view email]
[v1] Mon, 28 Nov 2016 15:51:25 UTC (370 KB)

Computer Science > Computation and Language

Title:AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators