Skip to main content

Podonos

This is a base module. You can import:
python
import podonos
from podonos import *

init()

Initialize the module and return an instance of Client.
api_key
string
API key you obtained from the workspace. For details, see the Get API key. If this is not set, the package tries to read PODONOS_API_KEY from the environment variable. Throws an error if both of them are not available.
api_url
string
default:"https://prod.podonosapi.com"
API base URL. You usually do not need to set this unless you are using a private, staging, or development API endpoint.
Returns an instance of Client.
python
client = podonos.init(api_key="<API_KEY>")

# Optional: pass the production API endpoint explicitly
client = podonos.init(api_key="<API_KEY>", api_url="https://prod.podonosapi.com")

Client

Client manages one or more Evaluator instances and evaluation history.

create_evaluator()

Create a new instance of Evaluator. One evaluator supports a single type of evaluation throughout its life cycle. If you want multiple types of evaluation, create multiple evaluators by calling create_evaluator() multiple times.
name
string
Name of this evaluation session. If empty, a random name is automatically generated and used.
desc
string
Description of this evaluation session. This field is for your records, so later you can see how you generated the output files or trained your model.
type
string
default:"NMOS"
Evaluation type. One of the following:
TypeDescription
NMOSNaturalness Mean Opinion Score
QMOSQuality Mean Opinion Score
CMOSReference-based comparison (compare target audio against reference)
CSMOSComparative similarity test between two target audios and one reference
SMOSSimilarity Mean Opinion Score
P808Speech quality by ITU-T P.808
PREFPreference test between two audios/speeches
CUSTOM_SINGLECustom single-stimulus evaluation
CUSTOM_DOUBLECustom double-stimulus evaluation
RANKINGRanking evaluation across multiple stimuli
lan
string
default:"en-us"
Specific language and locale of the speech. Currently we support:
CodeDescription
en-usEnglish (United States)
en-gbEnglish (United Kingdom)
en-auEnglish (Australia)
en-caEnglish (Canada)
en-inEnglish (India)
ko-krKorean (Korea)
zh-cnMandarin (China)
es-esSpanish (Spain)
es-mxSpanish (Mexico)
fr-frFrench (France)
fr-caFrench (Canada)
de-deGerman (Germany)
ja-jpJapanese (Japan)
it-itItalian (Italy)
pl-plPolish (Poland)
pt-ptPortuguese (Portugal)
pt-brPortuguese (Brazil)
id-idIndonesian (Indonesia)
hi-inHindi (India)
ta-inTamil (India)
kn-inKannada (India)
ml-inMalayalam (India)
si-lkSinhala (Sri Lanka)
ar-egArabic (Egypt)
ar-aeArabic (UAE)
ar-saArabic (Saudi Arabia)
audioGeneral audio file
We will add more soon. Please check later again.
granularity
float
default:"1.0"
Granularity of evaluation scales. Supported values are 1.0 and 0.5.
num_eval
int
default:"10"
Number of evaluations per sample. For example, if this is 10 for NMOS type evaluation, each audio file will be assigned to 10 humans, and the statistics of the evaluation output will be computed and presented in the final report.
due_hours
int
default:"12"
Expected due time of the final report in hours. Must be at least 12. Depending on the hours, the pricing may change.
use_annotation
bool
default:"False"
Enable annotation to collect free-form text feedback from evaluators.
use_loudness_normalization
bool
default:"True"
Enable loudness normalization to ensure consistent audio volume levels during evaluation.
auto_start
bool
default:"False"
If True, the evaluation automatically starts when you finish uploading the files. If False, you go to Workspace, confirm the evaluation session, and manually start the evaluation.
max_upload_workers
int
default:"20"
Maximum number of upload worker threads. If you experience a slow upload, please increase the number of workers.
verify_batch_size
int
default:"100"
Batch size for file verification API calls. Must be between 1 and 1000.
api_timeout
tuple[float, float]
default:"(5, 30)"
Timeout tuple for API requests, in seconds.
verify_timeout
tuple[float, float]
default:"(5, 120)"
Timeout tuple for file verification requests, in seconds.
upload_timeout
tuple[float, float]
default:"(10, 300)"
Timeout tuple for direct file upload requests, in seconds.
resume_upload
bool
default:"False"
Enable SDK-local upload ledger/resume support. The ledger records upload progress and the original evaluation contract so interrupted uploads can be resumed safely.
upload_state_path
string
default:"None"
Optional SQLite upload ledger path when resume_upload=True. Use the same path with resume_evaluator() if the upload is interrupted.
Returns an instance of Evaluator.
etor = client.create_evaluator()

resume_evaluator()

Resume uploads for an existing evaluation using an SDK upload ledger. Add files again in the same order as the interrupted run so the ledger can recover each file’s remote object identity.
Resume is safety-checked against the local ledger. The ledger must contain the original evaluation contract and session configuration for evaluation_id; otherwise the SDK refuses to resume. The SDK restores the original session details from the ledger where available, so do not use resume_evaluator() to change the evaluation type, language, template, batch size, or file identities.During resume, completed ledger rows are reused only when the current local file still matches the recorded path, content hash, size, and upload manifest. If a completed metadata or verification row no longer matches the local file, start a fresh evaluation or remove the stale ledger row before resuming.
evaluation_id
string
required
Evaluation ID to resume.
upload_state_path
string
required
Path to the SDK upload ledger created by a previous upload run. It must contain the original evaluation contract for evaluation_id.
name
string
Optional session name.
desc
string
Optional session description.
type
string
default:"NMOS"
Evaluation type. Uses the same supported values as create_evaluator().
lan
string
default:"en-us"
Language code. Uses the same supported values as create_evaluator().
granularity
float
default:"1.0"
Granularity of evaluation scales.
num_eval
int
default:"10"
Number of evaluations per sample.
due_hours
int
default:"12"
Expected due time of the final report in hours.
use_annotation
bool
default:"False"
Enable annotation to collect free-form text feedback from evaluators.
use_loudness_normalization
bool
default:"True"
Enable loudness normalization for evaluation.
auto_start
bool
default:"False"
If True, the evaluation automatically starts after files are uploaded.
max_upload_workers
int
default:"20"
Maximum number of upload worker threads.
verify_batch_size
int
default:"100"
Batch size for file verification API calls.
api_timeout
tuple[float, float]
default:"(5, 30)"
Timeout tuple for API requests, in seconds.
verify_timeout
tuple[float, float]
default:"(5, 120)"
Timeout tuple for file verification requests, in seconds.
upload_timeout
tuple[float, float]
default:"(10, 300)"
Timeout tuple for direct file upload requests, in seconds.
python
etor = client.resume_evaluator(
    evaluation_id="<EVALUATION_ID>",
    upload_state_path="./podonos-upload-ledger.sqlite3",
)

create_evaluator_from_template()

When you create an evaluation using a template, all the questions and options defined in the template are automatically assigned to the new evaluation. This ensures consistency and saves time by reusing pre-defined content.
name
string
required
Name of this evaluation session. Required and must be non-empty.
template_id
string
required
The unique identifier of the template to base the new evaluation on.
desc
string
Description of this evaluation session. This field is for your records.
num_eval
int
default:"10"
Number of evaluations per sample.
use_annotation
bool
default:"False"
Enable annotation to collect free-form text feedback from evaluators. Cannot be used together with an annotations array in template JSON.
use_loudness_normalization
bool
default:"True"
Enable loudness normalization to ensure consistent audio volume levels during evaluation.
auto_start
bool
default:"False"
If True, the evaluation automatically starts when you finish uploading the files.
max_upload_workers
int
default:"20"
Maximum number of upload worker threads.
verify_batch_size
int
default:"100"
Batch size for file verification API calls.
api_timeout
tuple[float, float]
default:"(5, 30)"
Timeout tuple for API requests, in seconds.
verify_timeout
tuple[float, float]
default:"(5, 120)"
Timeout tuple for file verification requests, in seconds.
upload_timeout
tuple[float, float]
default:"(10, 300)"
Timeout tuple for direct file upload requests, in seconds.
resume_upload
bool
default:"False"
Enable SDK-local upload ledger/resume support. The ledger records upload progress and the original evaluation contract so interrupted uploads can be resumed safely.
upload_state_path
string
default:"None"
Optional SQLite upload ledger path when resume_upload=True. Use the same path with resume_evaluator() if the upload is interrupted.
etor = client.create_evaluator_from_template(
    name="Voice naturalness evaluation",
    desc="new_model_vs_competitor_model",
    num_eval=10,
    template_id="abcdef",
    use_loudness_normalization=True,
)

create_evaluator_from_template_json()

Create a new evaluation using a JSON template. This allows you to define custom evaluation structures programmatically.
json
Dict
Template JSON as a dictionary. Optional if json_file is provided.
json_file
string
Path to the JSON template file. Optional if json is provided.
name
string
Name of this evaluation session. Optional; if omitted, the SDK generates a name.
custom_type
string | CustomType
default:"SINGLE"
Type of evaluation. Accepts either a string or CustomType enum value.
ValueDescriptionbatch_sizeFile Configuration
SINGLESingle stimulus evaluation11 stimulus
DOUBLEDouble stimulus evaluation22 stimuli (no reference)
SINGLE_REFReference-based comparison (CMOS style)21 reference + 1 stimulus
RANKINGRanking evaluation2+Multiple stimuli
You can use string values ("SINGLE", "DOUBLE", "SINGLE_REF", "RANKING") or the CustomType enum from podonos.common.enum:
from podonos.common.enum import CustomType

custom_type=CustomType.SINGLE_REF
custom_type="SINGLE_REF"
desc
string
Description of this evaluation session. Optional.
lan
string
default:"en-us"
Language for evaluation. See supported languages in create_evaluator().
num_eval
int
default:"10"
Number of evaluations per sample.
use_annotation
bool
default:"False"
Enable annotation to collect free-form text feedback from evaluators. Cannot be used together with an annotations array in template JSON. When using custom annotation questions, define them in the annotations array instead.
use_loudness_normalization
bool
default:"True"
Enable loudness normalization to ensure consistent audio volume levels during evaluation.
auto_start
bool
default:"False"
If True, the evaluation automatically starts when you finish uploading the files.
max_upload_workers
int
default:"20"
Maximum number of upload workers. Must be a positive integer.
verify_batch_size
int
default:"100"
Batch size for file verification API calls.
api_timeout
tuple[float, float]
default:"(5, 30)"
Timeout tuple for API requests, in seconds.
verify_timeout
tuple[float, float]
default:"(5, 120)"
Timeout tuple for file verification requests, in seconds.
upload_timeout
tuple[float, float]
default:"(10, 300)"
Timeout tuple for direct file upload requests, in seconds.
resume_upload
bool
default:"False"
Enable SDK-local upload ledger/resume support. The ledger records upload progress and the original evaluation contract so interrupted uploads can be resumed safely.
upload_state_path
string
default:"None"
Optional SQLite upload ledger path when resume_upload=True. Use the same path with resume_evaluator() if the upload is interrupted.
# Using JSON dictionary
template = {
    "questions": [
        {
            "type": "SCORED",
            "question": "How natural is the voice?",
            "description": "Rate the quality of the voice",
            "options": [
                {"label_text": "Excellent"},
                {"label_text": "Good"},
                {"label_text": "Fair"},
                {"label_text": "Poor"},
                {"label_text": "Bad"},
            ],
        }
    ]
}

evaluator = client.create_evaluator_from_template_json(
    json=template,
    name="Quality Test",
    custom_type="SINGLE",
)
Returns an instance of Evaluator. Here’s the JSON template for reference:
Question: Represents the main question posed to evaluators about the audio being assessed. It guides evaluators on the specific aspect of the audio they should focus on during the evaluation.
ParameterDescriptionRequiredNotes
typeType of question. Options: SCORED, NON_SCORED, COMPARISONYesDetermines the structure and requirements of the question
questionThe main question textYesMust be provided for all question types
descriptionAdditional details or context for the questionNoOptional for all question types
optionsList of possible options. Only for SCORED and NON_SCORED typesConditionalMust have between 1 and 9 options for SCORED and NON_SCORED types
scaleScale for comparison. Only for COMPARISON typeConditionalMust be an integer between 2 and 9 for COMPARISON type
allow_multipleAllows multiple selections. Only for NON_SCORED typeYes for NON_SCOREDEnables multiple choice selection
has_otherIncludes an “Other” option. Only for NON_SCORED typeNoAdds an option for evaluators to specify an unlisted choice
has_noneIncludes a “None” option. Only for NON_SCORED typeNoAdds an option for evaluators to select none of the listed choices
related_modelRelated model for the question. Only for Double Evaluation type.ConditionalSelect which model the question is related to.
anchor_labelLabels for the ends of the comparison scale. Only for COMPARISON type.ConditionalProvides context for what each end of the scale represents.
Important Notes:
  • SCORED and NON_SCORED questions can have a maximum of 9 options.
  • NON_SCORED questions must specify allow_multiple.
  • COMPARISON type questions must have a scale between 2 and 9.
  • related_model consists of ALL, MODEL_A and MODEL_B. Default is ALL. The related_model is only used for the question (not for instructions).

flash_eval()

Run automatic evaluation on an audio file and return a FlashEvalResult.
file_path
string
required
Path to the audio file to evaluate.
language
string
Language code for model routing. Currently available: en-us, es-es. When omitted, the default en-us model is used. Ignored when category="noise_quality".
category
string
Evaluation category. Defaults to naturalness when omitted. Currently available: naturalness, noise_quality.
Returns a FlashEvalResult with these fields:
FieldDescription
naturalnessNaturalness score when category="naturalness"
noise_qualityNoise quality score when category="noise_quality"
fileThe evaluated File object
idOptional evaluation ID
messageOptional service message
python
result = client.flash_eval(file_path="path/to/audio.wav")
print(result.naturalness)

spanish_result = client.flash_eval(file_path="path/to/audio_es.wav", language="es-es")
noise_result = client.flash_eval(file_path="path/to/audio.wav", category="noise_quality")
print(noise_result.noise_quality)

get_evaluation_list()

Returns a JSON containing all your evaluations.
evaluations = client.get_evaluation_list()
print(evaluations)
The output JSON looks like:
[
  {
    "id": "<UUID>",
    "title": "How natural my synthetic voices are",
    "internal_name": null,
    "description": "Used latest internal model. Epoch 10, alpha 0.1",
    "batch_size": 1,
    "status": "ACTIVE",
    "created_time": "2024-06-25T01:40:43.429Z",
    "updated_time": "2024-06-26T13:21:34.801Z"
  }
]

get_eval_template_info()

Gets detailed information about an evaluation template by its ID.
template_id
string
The unique identifier of the evaluation template to retrieve information for.
Returns a Python dictionary containing detailed template information.
template_info = client.get_eval_template_info("abcdef")
print(template_info)
The returned dictionary has this shape:
{
  "id": "<UUID>",
  "code": "abcdef",
  "title": "Voice Quality Assessment",
  "description": "Template for evaluating voice naturalness and quality",
  "language": Language.ENGLISH_AMERICAN,
  "eval_type": "Single",
  "created_time": datetime.datetime(2024, 6, 25, 1, 40, 43, 429000, tzinfo=datetime.timezone.utc),
  "updated_time": datetime.datetime(2024, 6, 26, 13, 21, 34, 801000, tzinfo=datetime.timezone.utc)
}
FieldDescription
idUnique identifier (UUID) of the template
codeTemplate code used for identification
titleDisplay name of the template
descriptionDetailed description of the template’s purpose
languageLanguage enum value from the template, such as Language.ENGLISH_AMERICAN for en-us
eval_typeType of evaluation: Single, Double, or Triple
created_timedatetime.datetime value for when the template was created
updated_timedatetime.datetime value for when the template was last modified

get_stats_json_by_id()

Returns a list of JSONs containing the statistics of each stimulus for the evaluation referenced by the id.
evaluation_id
string
Evaluation id. See get_evaluation_list().
group_by
string
default:"question"
Group by criteria. Options are “question”, “script”, or “model”. Default is “question”. Note that “script” and “model” are only available for single-question evaluations.
evaluations = client.get_evaluation_list()
for eval in evaluations:
    stats = client.get_stats_json_by_id(eval['id'], group_by='question')
    print(stats)
FieldDescriptionSCOREDNON_SCORED
frequencyList of score counts: {"score": number, "count": number}[]-
meanAverage score-
medianMedian score-
stdStandard deviation-
semStandard error of the mean-
ci_9595% confidence interval-
optionsEach option name as key with integer value
OTHERThe number of evaluators who selected “Other”
For NON_SCORED questions:
  • The integer value is the number of evaluators who selected the option.
  • All options are included in the response regardless of their value
You can get the statistics of each question by calling get_stats_json_by_id() with group_by set to question, script, or model.
{
  "question": string,
  "description": string,
  "order": int,
  "responses": [
    {
      "name": string,
      "model_tag": string,
      "tags": string[],
      "type": "A" | "B" | "REF",
      "script": string | null,
      "frequency": [
        {
          "score": number,
          "count": number
        }
      ],
      "mean": float | null, // null if the question is not SCORED
      "median": float | null, // null if the question is not SCORED
      "std": float | null, // null if the question is not SCORED
      "sem": float | null, // null if the question is not SCORED
      "ci_95": float | null, // null if the question is not SCORED
    }
  ]
}

download_evaluation_files_by_evaluation_id()

Download all files associated with a specific evaluation, identified by its evaluation_id, from the Podonos evaluation service. It saves these files to a specified directory on the local file system and generates a metadata file describing the downloaded files. Return a string indicating the status of the download operation. This could be a success message or an error message if the download fails.
evaluation_id
string
Evaluation id. See get_evaluation_list().
output_dir
string
The directory path where the downloaded files will be saved. This should be a valid path on the local file system where the user has write permissions.
client.download_evaluation_files_by_evaluation_id(
  evaluation_id="12345",
  output_dir="./output",
)
FieldDescription
file_pathThe path produced by joining output_dir with {model_tag}/{hashed_file_name}. It can be relative or absolute depending on the output_dir you pass.
original_nameThe original name of the file before downloading.
model_tagThe model tag associated with the file, used for categorization.
tagsA list of tags associated with the file, providing additional context or categorization.
File Naming Convention:Each downloaded file is saved in the format {output_dir}/{model_tag}/{file_name}. This means that files are organized into subdirectories named after their model_tag, and the original file name is hashed formatted.

File

A class representing one file, used for adding files in Evaluator.
path
string
required
Path to the file to evaluate. For audio files, we support wav, mp3, and flac formats.
model_tag
string
required
Name of your model (e.g., WhisperTTS) or any unique name (e.g., human).
tags
list[string]
A list of string tags for the file designated by path. You can use this field as properties of the file such as original, synthesized, tom, maria, and so on. Later you can look up or group files with particular tags in the output report.
script
string
Text script of the input audio file.
is_ref
bool
default:"False"
True if this file works as a reference in a comparative evaluation.
meta_data
Dict[str, str | int | float | bool | None]
Optional metadata dictionary for the file. Keys must be strings, and values must be JSON-primitive values.
script_tags
list[string]
Optional tags for the script. These are useful for ranking and script-level grouping.
python
file = File(
    path="/path/to/speech.wav",
    model_tag="my_model",
    tags=["synthesized", "male"],
    script="hello there",
    meta_data={"speaker_id": "spk-001"},
    script_tags=["greeting"],
)

Evaluator

Evaluator manages a single type of evaluation.

add_file()

Add one file to evaluate in a single evaluation question. For a single file evaluation like NMOS, one file to evaluate is added.
file
File
required
Input File. This field is required if type is NMOS, QMOS, P808, or CUSTOM_SINGLE.
etor.add_file(
    file=File(
        path="/path/to/speech_0_0.wav",
        model_tag="my_model",
        tags=["synthesized", "male", "ver1234"],
    )
)

add_files()

Add multiple files for evaluations that require comparison.
file0
File
required
First input file.
file1
File
required
Second input file.
file2
File
Optional third input file. Used for CSMOS evaluations with one reference and two stimuli.
The order and reference requirements depend on evaluation type:
TypeFile requirements
PREF, CUSTOM_DOUBLETwo ordered stimulus files
SMOSTwo unordered stimulus files
CMOSOne reference file and one stimulus file
CSMOSTwo stimulus files followed by one reference file (file2)
file0 = File(path="/path/to/speech0.wav", model_tag="human", tags=["original", "male"])
file1 = File(path="/path/to/speech1.wav", model_tag="my_model", tags=["synthesized", "male", "ver1234"])
etor.add_files(file0=file0, file1=file1)

add_ranking_set()

Add one ranking set for a RANKING evaluation.
files
list[File]
required
Ordered candidate files for one ranking group. Files must be stimuli, not references.
Constraints enforced across calls:
  • All groups must have the same number of files.
  • The order of model_tag must be identical across groups.
  • Files must be stimuli; do not set is_ref=True.
python
etor = client.create_evaluator(type="RANKING", name="Ranking test")

etor.add_ranking_set([
    File(path="/path/to/model_a_001.wav", model_tag="model_a", script_tags=["script_001"]),
    File(path="/path/to/model_b_001.wav", model_tag="model_b", script_tags=["script_001"]),
    File(path="/path/to/model_c_001.wav", model_tag="model_c", script_tags=["script_001"]),
])

close()

Close the evaluation session. Once this function is called, all the evaluation files will be sent to the Podonos evaluation service, the files will go through a series of processing, and delivered to evaluators. Returns a JSON object containing the uploading status.
python
status = etor.close()