Intro

The Comparative Similarity (CSMOS) evaluation is designed to assess which of two audio samples is more similar to a reference audio. This evaluation is particularly useful in scenarios where the goal is to match or mimic a reference audio, such as in voice cloning or audio restoration tasks.

  • Objective: Determine the similarity of two audio samples to a reference audio.
  • Use Case: Ideal for applications requiring audio matching or quality assessment against a standard.
  • Type: CSMOS in the SDK.

Example

1

Initialize the Client

Begin by initializing the Podonos client with your API key.

import podonos

client = podonos.init("<API_KEY>")
2

Create the Evaluator

Set up the evaluator for a CSMOS evaluation.

evaluator = client.create_evaluator(
    name="Comparative Similarity Test",
    desc="Evaluate similarity of audio samples to a reference",
    type="CSMOS"
)

CSMOS is only allowed by the create_evaluator method

3

Add Files for Evaluation

Add two audio samples and one reference audio. The reference file must be specified with is_ref=True.

from podonos import File

evaluator.add_files(
    file0=File(path="audio_sample1.wav", model_tag="Sample 1", tags=["test"], is_ref=False),
    file1=File(path="audio_sample2.wav", model_tag="Sample 2", tags=["test"], is_ref=False),
    file2=File(path="reference_audio.wav", model_tag="Reference", tags=["reference"], is_ref=True)
)
  • File Order: Ensure the reference file is the third file in the add_files method.
4

Finalize the Evaluation

Close the evaluator to complete the setup.

evaluator.close()

Key Considerations

  • File Configuration: The reference file must be clearly marked with is_ref=True and should be the last file in the add_files method.
  • Evaluation Logic: The CSMOS evaluation logic will compare the two audio samples against the reference to determine which is more similar.
  • Applications: Useful for tasks like voice cloning, audio restoration, and quality assurance where matching a reference is critical.

Use Case

Consider a scenario where you are developing a new speech synthesis model and want to evaluate how closely the generated audio matches a reference recording. Using CSMOS, you can objectively assess which version of your model produces audio that is more similar to the desired reference.