Intro
One of the popular measuress in the synthesized speech is the naturalness: measuring how natural the synthesized speech is. One of the most popular naturalness evaluation methods for speech/audio is mean opinion score (MOS). Its scale typically ranges from 1 (lowest naturalness like old robot) to 5 (highest naturalness like human) with 1 granularity (which is called five-point Likert Scale). Through podonos, you will evaluate the naturalness of your speech/audio in a fully managed way.
Example
Our first example uses AWS Polly to generate synthesized human voice and uses podonos for evaluation. Of course, you can use your own TTS (text-to-speech) model, or even your own voice. Here is a code example that you can immediately execute.python
1
Create a Client
Let’s first create a new instance of
Client
.python
2
Create an Evaluator
Then, you create a new instance of
Evaluator
:python
3
Add files
Now, you add every synthesized speech files to the evaluator.
python
4
Close
Finally, close the
Evaluator
object.python