minimax/voice-clone

POST

/openapi/v2/rhart-audio/text-to-audio/voice-clone

MiniMax Voice Clone is a premier synthesis pipeline powered by the advanced Speech-02 and Speech 2.6 HD/Turbo architectures. It transforms a few seconds of reference audio into a highly consistent Voice ID, preserving precise timbre, accents, and nuanced prosody without requiring transcripts. Supporting 40+ languages, it excels in cross-lingual code-switching and emotive storytelling. With the Turbo model delivering sub-250ms latency, it offers a production-ready, low-latency solution for real-time interactive dialogue, gaming, and high-fidelity branded voice experiences.

Request

Authorization

Header Params

Body Params application/jsonRequired

Example

{
    "audio": "https://www.runninghub.cn/view?filename=8ff07bf7a789afcbe91a8da77a07d2ef8d8137a65a6e60bb956a1d0fcbf319b7.wav&type=input&subfolder=&Rh-Comfy-Auth=eyJ1c2VySWQiOiIzZjY1MTNlNWEwNjY1N2I4OGYyNjU5NTEzYmU3ZDM0YyIsInNpZ25FeHBpcmUiOjE3NzE0MDg4OTQ3MjksInRzIjoxNzcwODA0MDk0NzI5LCJzaWduIjoiZGI3MmMwZTgxYjM5ZmNkYzMxNzlkNDBmYTczNDE0ZWEifQ==&Rh-Identify=3f6513e5a06657b88f2659513be7d34c&rand=0.06611614675835809",
    "custom_voice_id": "Elegant_Man",
    "text": "基于 Speech-02 与最新 Speech 2.6 HD/Turbo 系列打造的尖端声纹克隆引擎。它仅需数秒音频样本即可实现高保真的零样本（Zero-shot）克隆，精准复刻目标说话人的音色、口音与独特的叙事风格。",
    "accuracy": 0.7,
    "need_noise_reduction": false,
    "need_volume_normalization": false,
    "model": "speech-02-hd"
}

Request Code Samples

Shell

JavaScript

Java

Swift

PHP

Python

HTTP

Objective-C

Ruby

OCaml

Dart

curl --location --request POST 'https://www.runninghub.ai/openapi/v2/rhart-audio/text-to-audio/voice-clone' \
--header 'Authorization: Bearer [Your API KEY]' \
--header 'Authorization: Bearer [Your API KEY]' \
--header 'Content-Type: application/json' \
--data-raw '{
    "audio": "https://www.runninghub.cn/view?filename=8ff07bf7a789afcbe91a8da77a07d2ef8d8137a65a6e60bb956a1d0fcbf319b7.wav&type=input&subfolder=&Rh-Comfy-Auth=eyJ1c2VySWQiOiIzZjY1MTNlNWEwNjY1N2I4OGYyNjU5NTEzYmU3ZDM0YyIsInNpZ25FeHBpcmUiOjE3NzE0MDg4OTQ3MjksInRzIjoxNzcwODA0MDk0NzI5LCJzaWduIjoiZGI3MmMwZTgxYjM5ZmNkYzMxNzlkNDBmYTczNDE0ZWEifQ==&Rh-Identify=3f6513e5a06657b88f2659513be7d34c&rand=0.06611614675835809",
    "custom_voice_id": "Elegant_Man",
    "text": "基于 Speech-02 与最新 Speech 2.6 HD/Turbo 系列打造的尖端声纹克隆引擎。它仅需数秒音频样本即可实现高保真的零样本（Zero-shot）克隆，精准复刻目标说话人的音色、口音与独特的叙事风格。",
    "accuracy": 0.7,
    "need_noise_reduction": false,
    "need_volume_normalization": false,
    "model": "speech-02-hd"
}'

Responses

🟢200成功

application/json

Task result query endpoint: /openapi/v2/query

Body

Examples

Submit response exampleTask success response example

{
    "taskId": "2013508786110730241",
    "status": "RUNNING",
    "errorCode": "",
    "errorMessage": "",
    "results": null,
    "clientId": "f828b9af25161bc066ef152db7b29ccc",
    "promptTips": "{\"result\": true, \"error\": null, \"outputs_to_execute\": [\"4\"], \"node_errors\": {}}"
}

Modified at 2026-03-12 20:00:01

minimax/speech-2.8-turbo

kling-lip-sync/tts