Podonos
This is a base module. You can import:python
init()
Initialize the module and return an instance ofClient.
API key you obtained from the workspace. For details, see the Get API key.
If this is not set, the package tries to read
PODONOS_API_KEY from the environment variable.
Throws an error if both of them are not available.API base URL. You usually do not need to set this unless you are using a private, staging, or development API endpoint.
Client.
python
Client
Client manages one or more Evaluator instances and evaluation history.
create_evaluator()
Create a new instance ofEvaluator. One evaluator supports a single type of evaluation throughout its life cycle.
If you want multiple types of evaluation, create multiple evaluators by calling create_evaluator() multiple times.
Name of this evaluation session. If empty, a random name is automatically generated and used.
Description of this evaluation session. This field is for your records, so later you can see how you generated the output files or trained your model.
Evaluation type. One of the following:
| Type | Description |
|---|---|
NMOS | Naturalness Mean Opinion Score |
QMOS | Quality Mean Opinion Score |
CMOS | Reference-based comparison (compare target audio against reference) |
CSMOS | Comparative similarity test between two target audios and one reference |
SMOS | Similarity Mean Opinion Score |
P808 | Speech quality by ITU-T P.808 |
PREF | Preference test between two audios/speeches |
CUSTOM_SINGLE | Custom single-stimulus evaluation |
CUSTOM_DOUBLE | Custom double-stimulus evaluation |
RANKING | Ranking evaluation across multiple stimuli |
Specific language and locale of the speech. Currently we support:
We will add more soon. Please check later again.
| Code | Description |
|---|---|
en-us | English (United States) |
en-gb | English (United Kingdom) |
en-au | English (Australia) |
en-ca | English (Canada) |
en-in | English (India) |
ko-kr | Korean (Korea) |
zh-cn | Mandarin (China) |
es-es | Spanish (Spain) |
es-mx | Spanish (Mexico) |
fr-fr | French (France) |
fr-ca | French (Canada) |
de-de | German (Germany) |
ja-jp | Japanese (Japan) |
it-it | Italian (Italy) |
pl-pl | Polish (Poland) |
pt-pt | Portuguese (Portugal) |
pt-br | Portuguese (Brazil) |
id-id | Indonesian (Indonesia) |
hi-in | Hindi (India) |
ta-in | Tamil (India) |
kn-in | Kannada (India) |
ml-in | Malayalam (India) |
si-lk | Sinhala (Sri Lanka) |
ar-eg | Arabic (Egypt) |
ar-ae | Arabic (UAE) |
ar-sa | Arabic (Saudi Arabia) |
audio | General audio file |
Granularity of evaluation scales. Supported values are
1.0 and 0.5.Number of evaluations per sample. For example, if this is 10 for NMOS type evaluation, each audio file will be assigned to 10 humans, and the statistics of the evaluation output will be computed and presented in the final report.
Expected due time of the final report in hours. Must be at least
12. Depending on the hours, the pricing may change.Enable annotation to collect free-form text feedback from evaluators.
Enable loudness normalization to ensure consistent audio volume levels during evaluation.
If True, the evaluation automatically starts when you finish uploading the files. If False, you go to Workspace, confirm the evaluation session, and manually start the evaluation.
Maximum number of upload worker threads. If you experience a slow upload, please increase the number of workers.
Batch size for file verification API calls. Must be between
1 and 1000.Timeout tuple for API requests, in seconds.
Timeout tuple for file verification requests, in seconds.
Timeout tuple for direct file upload requests, in seconds.
Enable SDK-local upload ledger/resume support. The ledger records upload progress and the original evaluation contract so interrupted uploads can be resumed safely.
Optional SQLite upload ledger path when
resume_upload=True. Use the same path with resume_evaluator() if the upload is interrupted.Evaluator.
resume_evaluator()
Resume uploads for an existing evaluation using an SDK upload ledger. Add files again in the same order as the interrupted run so the ledger can recover each file’s remote object identity.Resume is safety-checked against the local ledger. The ledger must contain the original evaluation contract and session configuration for
evaluation_id; otherwise the SDK refuses to resume. The SDK restores the original session details from the ledger where available, so do not use resume_evaluator() to change the evaluation type, language, template, batch size, or file identities.During resume, completed ledger rows are reused only when the current local file still matches the recorded path, content hash, size, and upload manifest. If a completed metadata or verification row no longer matches the local file, start a fresh evaluation or remove the stale ledger row before resuming.Evaluation ID to resume.
Path to the SDK upload ledger created by a previous upload run. It must contain the original evaluation contract for
evaluation_id.Optional session name.
Optional session description.
Evaluation type. Uses the same supported values as
create_evaluator().Language code. Uses the same supported values as
create_evaluator().Granularity of evaluation scales.
Number of evaluations per sample.
Expected due time of the final report in hours.
Enable annotation to collect free-form text feedback from evaluators.
Enable loudness normalization for evaluation.
If True, the evaluation automatically starts after files are uploaded.
Maximum number of upload worker threads.
Batch size for file verification API calls.
Timeout tuple for API requests, in seconds.
Timeout tuple for file verification requests, in seconds.
Timeout tuple for direct file upload requests, in seconds.
python
create_evaluator_from_template()
When you create an evaluation using a template, all the questions and options defined in the template are automatically assigned to the new evaluation. This ensures consistency and saves time by reusing pre-defined content.Name of this evaluation session. Required and must be non-empty.
The unique identifier of the template to base the new evaluation on.
Description of this evaluation session. This field is for your records.
Number of evaluations per sample.
Enable annotation to collect free-form text feedback from evaluators. Cannot be used together with an
annotations array in template JSON.Enable loudness normalization to ensure consistent audio volume levels during evaluation.
If True, the evaluation automatically starts when you finish uploading the files.
Maximum number of upload worker threads.
Batch size for file verification API calls.
Timeout tuple for API requests, in seconds.
Timeout tuple for file verification requests, in seconds.
Timeout tuple for direct file upload requests, in seconds.
Enable SDK-local upload ledger/resume support. The ledger records upload progress and the original evaluation contract so interrupted uploads can be resumed safely.
Optional SQLite upload ledger path when
resume_upload=True. Use the same path with resume_evaluator() if the upload is interrupted.create_evaluator_from_template_json()
Create a new evaluation using a JSON template. This allows you to define custom evaluation structures programmatically.Template JSON as a dictionary. Optional if
json_file is provided.Path to the JSON template file. Optional if
json is provided.Name of this evaluation session. Optional; if omitted, the SDK generates a name.
Type of evaluation. Accepts either a string or
You can use string values (
CustomType enum value.| Value | Description | batch_size | File Configuration |
|---|---|---|---|
SINGLE | Single stimulus evaluation | 1 | 1 stimulus |
DOUBLE | Double stimulus evaluation | 2 | 2 stimuli (no reference) |
SINGLE_REF | Reference-based comparison (CMOS style) | 2 | 1 reference + 1 stimulus |
RANKING | Ranking evaluation | 2+ | Multiple stimuli |
"SINGLE", "DOUBLE", "SINGLE_REF", "RANKING") or the CustomType enum from podonos.common.enum:Description of this evaluation session. Optional.
Language for evaluation. See supported languages in
create_evaluator().Number of evaluations per sample.
Enable annotation to collect free-form text feedback from evaluators. Cannot be used together with an
annotations array in template JSON. When using custom annotation questions, define them in the annotations array instead.Enable loudness normalization to ensure consistent audio volume levels during evaluation.
If True, the evaluation automatically starts when you finish uploading the files.
Maximum number of upload workers. Must be a positive integer.
Batch size for file verification API calls.
Timeout tuple for API requests, in seconds.
Timeout tuple for file verification requests, in seconds.
Timeout tuple for direct file upload requests, in seconds.
Enable SDK-local upload ledger/resume support. The ledger records upload progress and the original evaluation contract so interrupted uploads can be resumed safely.
Optional SQLite upload ledger path when
resume_upload=True. Use the same path with resume_evaluator() if the upload is interrupted.Evaluator.
Here’s the JSON template for reference:
- Question
- Option
- Anchor Label
- Instruction
- Annotation
Question: Represents the main question posed to evaluators about the audio being assessed. It guides evaluators on the specific aspect of the audio they should focus on during the evaluation.| Parameter | Description | Required | Notes |
|---|---|---|---|
type | Type of question. Options: SCORED, NON_SCORED, COMPARISON | Yes | Determines the structure and requirements of the question |
question | The main question text | Yes | Must be provided for all question types |
description | Additional details or context for the question | No | Optional for all question types |
options | List of possible options. Only for SCORED and NON_SCORED types | Conditional | Must have between 1 and 9 options for SCORED and NON_SCORED types |
scale | Scale for comparison. Only for COMPARISON type | Conditional | Must be an integer between 2 and 9 for COMPARISON type |
allow_multiple | Allows multiple selections. Only for NON_SCORED type | Yes for NON_SCORED | Enables multiple choice selection |
has_other | Includes an “Other” option. Only for NON_SCORED type | No | Adds an option for evaluators to specify an unlisted choice |
has_none | Includes a “None” option. Only for NON_SCORED type | No | Adds an option for evaluators to select none of the listed choices |
related_model | Related model for the question. Only for Double Evaluation type. | Conditional | Select which model the question is related to. |
anchor_label | Labels for the ends of the comparison scale. Only for COMPARISON type. | Conditional | Provides context for what each end of the scale represents. |
Important Notes:
SCOREDandNON_SCOREDquestions can have a maximum of 9 options.NON_SCOREDquestions must specifyallow_multiple.COMPARISONtype questions must have a scale between 2 and 9.related_modelconsists ofALL,MODEL_AandMODEL_B. Default isALL. Therelated_modelis only used for the question (not for instructions).
flash_eval()
Run automatic evaluation on an audio file and return aFlashEvalResult.
Path to the audio file to evaluate.
Language code for model routing. Currently available:
en-us, es-es. When omitted, the default en-us model is used. Ignored when category="noise_quality".Evaluation category. Defaults to
naturalness when omitted. Currently available: naturalness, noise_quality.FlashEvalResult with these fields:
| Field | Description |
|---|---|
naturalness | Naturalness score when category="naturalness" |
noise_quality | Noise quality score when category="noise_quality" |
file | The evaluated File object |
id | Optional evaluation ID |
message | Optional service message |
python
get_evaluation_list()
Returns a JSON containing all your evaluations.get_eval_template_info()
Gets detailed information about an evaluation template by its ID.The unique identifier of the evaluation template to retrieve information for.
| Field | Description |
|---|---|
id | Unique identifier (UUID) of the template |
code | Template code used for identification |
title | Display name of the template |
description | Detailed description of the template’s purpose |
language | Language enum value from the template, such as Language.ENGLISH_AMERICAN for en-us |
eval_type | Type of evaluation: Single, Double, or Triple |
created_time | datetime.datetime value for when the template was created |
updated_time | datetime.datetime value for when the template was last modified |
get_stats_json_by_id()
Returns a list of JSONs containing the statistics of each stimulus for the evaluation referenced by theid.
Evaluation id. See
get_evaluation_list().Group by criteria. Options are “question”, “script”, or “model”. Default is “question”. Note that “script” and “model” are only available for single-question evaluations.
| Field | Description | SCORED | NON_SCORED |
|---|---|---|---|
frequency | List of score counts: {"score": number, "count": number}[] | ✓ | - |
mean | Average score | ✓ | - |
median | Median score | ✓ | - |
std | Standard deviation | ✓ | - |
sem | Standard error of the mean | ✓ | - |
ci_95 | 95% confidence interval | ✓ | - |
options | Each option name as key with integer value | ✓ | ✓ |
OTHER | The number of evaluators who selected “Other” | ✓ | ✓ |
For NON_SCORED questions:
- The integer value is the number of evaluators who selected the option.
- All options are included in the response regardless of their value
- Single
- Double (including CSMOS)
You can get the statistics of each question by calling
get_stats_json_by_id() with group_by set to question, script, or model.download_evaluation_files_by_evaluation_id()
Download all files associated with a specific evaluation, identified by itsevaluation_id, from the Podonos evaluation service. It saves these files to a specified directory on the local file system and generates a metadata file describing the downloaded files.
Return a string indicating the status of the download operation. This could be a success message or an error message if the download fails.
Evaluation id. See
get_evaluation_list().The directory path where the downloaded files will be saved. This should be a valid path on the local file system where the user has write permissions.
| Field | Description |
|---|---|
file_path | The path produced by joining output_dir with {model_tag}/{hashed_file_name}. It can be relative or absolute depending on the output_dir you pass. |
original_name | The original name of the file before downloading. |
model_tag | The model tag associated with the file, used for categorization. |
tags | A list of tags associated with the file, providing additional context or categorization. |
File Naming Convention:Each downloaded file is saved in the format
{output_dir}/{model_tag}/{file_name}. This means that files are organized into subdirectories named after their model_tag, and the original file name is hashed formatted.File
A class representing one file, used for adding files inEvaluator.
Path to the file to evaluate. For audio files, we support
wav, mp3, and flac formats.Name of your model (e.g.,
WhisperTTS) or any unique name (e.g., human).A list of string tags for the file designated by
path. You can use this field as properties of the file such as original, synthesized, tom, maria, and so on. Later you can look up or group files with particular tags in the output report.Text script of the input audio file.
True if this file works as a reference in a comparative evaluation.
Optional metadata dictionary for the file. Keys must be strings, and values must be JSON-primitive values.
Optional tags for the script. These are useful for ranking and script-level grouping.
python
Evaluator
Evaluator manages a single type of evaluation.
add_file()
Add one file to evaluate in a single evaluation question. For a single file evaluation likeNMOS, one file to evaluate is added.
Input
File. This field is required if type is NMOS, QMOS, P808, or CUSTOM_SINGLE.add_files()
Add multiple files for evaluations that require comparison.First input file.
Second input file.
Optional third input file. Used for
CSMOS evaluations with one reference and two stimuli.| Type | File requirements |
|---|---|
PREF, CUSTOM_DOUBLE | Two ordered stimulus files |
SMOS | Two unordered stimulus files |
CMOS | One reference file and one stimulus file |
CSMOS | Two stimulus files followed by one reference file (file2) |
add_ranking_set()
Add one ranking set for aRANKING evaluation.
Ordered candidate files for one ranking group. Files must be stimuli, not references.
- All groups must have the same number of files.
- The order of
model_tagmust be identical across groups. - Files must be stimuli; do not set
is_ref=True.
python
close()
Close the evaluation session. Once this function is called, all the evaluation files will be sent to the Podonos evaluation service, the files will go through a series of processing, and delivered to evaluators. Returns a JSON object containing the uploading status.python

