Intro

An effective way of evaluating subjective preferences between two pieces of speech or audio is the preference test. This evaluation helps to determine which of the two audio samples is more favored by evaluators.

The Preference Test uses a scale to capture listener preferences, typically ranging from -2 to 2, with 1 granularity. The scale is as follows:

  • 2: A is strongly preferred
  • 1: A is preferred
  • 0: About the same
  • -1: B is preferred
  • -2: B is strongly preferred

Through the Preference Test, you can gather insights into which audio sample is more appealing to your audience, allowing for more informed decisions in audio production and refinement.

Example

In this example, let’s assume you are developing a new speech synthesis model, named “myTTS”. You will compare your own model developed yesterady and the one developed today. Here is a code example that you can immediately execute:

import podonos
from podonos import *
import myTTS

text = "Hello, how is your day going?"
language = 'en-gb'

audio1_path = 'model1_speech.wav'
audio2_path = 'model2_speech.mp3'

speech = myTTS(text=text, lang=language, slow=False)
speech.save(audio2_path)

client = podonos.init()
etor = client.create_evaluator(
    name="Speech AI Preferences Test",
    desc="Preference test between speech synthesis models",
    type="PREF",
    lan=language,
    num_eval=10
)

etor.add_files(
    file0=File(path=audio1_path, model_tag='myTTS v1', tags=["model1", "epoch5"]),
    file1=File(path=audio2_path, model_tag='myTTS v2', tags=["model2", "epoch5"])
)
etor.close()

Once the evaluation steps finish, you can check the preferences in your Workspace.