Run in Apifox
MiniMax Voice Clone is a premier synthesis pipeline powered by the advanced Speech-02 and Speech 2.6 HD/Turbo architectures. It transforms a few seconds of reference audio into a highly consistent Voice ID, preserving precise timbre, accents, and nuanced prosody without requiring transcripts. Supporting 40+ languages, it excels in cross-lingual code-switching and emotive storytelling. With the Turbo model delivering sub-250ms latency, it offers a production-ready, low-latency solution for real-time interactive dialogue, gaming, and high-fidelity branded voice experiences.
Request Body Params application/json Required
{
"audio" : "https://www.runninghub.cn/view?filename=8ff07bf7a789afcbe91a8da77a07d2ef8d8137a65a6e60bb956a1d0fcbf319b7.wav&type=input&subfolder=&Rh-Comfy-Auth=eyJ1c2VySWQiOiIzZjY1MTNlNWEwNjY1N2I4OGYyNjU5NTEzYmU3ZDM0YyIsInNpZ25FeHBpcmUiOjE3NzE0MDg4OTQ3MjksInRzIjoxNzcwODA0MDk0NzI5LCJzaWduIjoiZGI3MmMwZTgxYjM5ZmNkYzMxNzlkNDBmYTczNDE0ZWEifQ==&Rh-Identify=3f6513e5a06657b88f2659513be7d34c&rand=0.06611614675835809" ,
"custom_voice_id" : "Elegant_Man" ,
"text" : "基于 Speech-02 与最新 Speech 2.6 HD/Turbo 系列打造的尖端声纹克隆引擎。它仅需数秒音频样本即可实现高保真的零样本(Zero-shot)克隆,精准复刻目标说话人的音色、口音与独特的叙事风格。" ,
"accuracy" : 0.7 ,
"need_noise_reduction" : false ,
"need_volume_normalization" : false ,
"model" : "speech-02-hd"
} Request Code Samples
curl --location --request POST 'https://www.runninghub.ai/openapi/v2/rhart-audio/text-to-audio/voice-clone' \
--header 'Authorization: Bearer [Your API KEY]' \
--header 'Authorization: Bearer [Your API KEY]' \
--header 'Content-Type: application/json' \
--data-raw '{
"audio": "https://www.runninghub.cn/view?filename=8ff07bf7a789afcbe91a8da77a07d2ef8d8137a65a6e60bb956a1d0fcbf319b7.wav&type=input&subfolder=&Rh-Comfy-Auth=eyJ1c2VySWQiOiIzZjY1MTNlNWEwNjY1N2I4OGYyNjU5NTEzYmU3ZDM0YyIsInNpZ25FeHBpcmUiOjE3NzE0MDg4OTQ3MjksInRzIjoxNzcwODA0MDk0NzI5LCJzaWduIjoiZGI3MmMwZTgxYjM5ZmNkYzMxNzlkNDBmYTczNDE0ZWEifQ==&Rh-Identify=3f6513e5a06657b88f2659513be7d34c&rand=0.06611614675835809",
"custom_voice_id": "Elegant_Man",
"text": "基于 Speech-02 与最新 Speech 2.6 HD/Turbo 系列打造的尖端声纹克隆引擎。它仅需数秒音频样本即可实现高保真的零样本(Zero-shot)克隆,精准复刻目标说话人的音色、口音与独特的叙事风格。",
"accuracy": 0.7,
"need_noise_reduction": false,
"need_volume_normalization": false,
"model": "speech-02-hd"
}' Responses application/json
Task result query endpoint: /openapi/v2/query
{
"taskId" : "2013508786110730241" ,
"status" : "RUNNING" ,
"errorCode" : "" ,
"errorMessage" : "" ,
"results" : null ,
"clientId" : "f828b9af25161bc066ef152db7b29ccc" ,
"promptTips" : "{\"result\": true, \"error\": null, \"outputs_to_execute\": [\"4\"], \"node_errors\": {}}"
} Modified at 2026-03-12 20:00:01