# RunningHub-API ## Docs - [Log of Update](https://www.runninghub.ai/runninghub-api-doc-en/doc-8287462.md): - Getting Started [Instructions for Use](https://www.runninghub.ai/runninghub-api-doc-en/doc-8287463.md): - Getting Started [About nodeInfoList](https://www.runninghub.ai/runninghub-api-doc-en/doc-8287464.md): - Getting Started [About Enterprise ComfyUI API](https://www.runninghub.ai/runninghub-api-doc-en/doc-8287465.md): - Getting Started [Native ComfyUI API Integration Guide](https://www.runninghub.ai/runninghub-api-doc-en/doc-8287466.md): - Getting Started [API Error Code Reference](https://www.runninghub.ai/runninghub-api-doc-en/doc-8287467.md): - Quick Create [About Quick Create Invocation](https://www.runninghub.ai/runninghub-api-doc-en/doc-8287468.md): - Integration Example [Complete integration example](https://www.runninghub.ai/runninghub-api-doc-en/doc-8287469.md): - Integration Example [Complete Integration Example – Advanced Edition](https://www.runninghub.ai/runninghub-api-doc-en/doc-8287470.md): - Integration Example [Task Progress Display Example](https://www.runninghub.ai/runninghub-api-doc-en/doc-8287471.md): - Integration Example [Full Workflow Integration Example](https://www.runninghub.ai/runninghub-api-doc-en/doc-8287472.md): ## API Docs - Quick Create [Obtain quick create - model library style parameter data](https://www.runninghub.ai/runninghub-api-doc-en/api-425761031.md): This interface is used for the Text to Image module under the Quick Create menu. Its purpose is to obtain the style data from the model library, which is used to fill in the parameters when calling the Quick Create API interface. - Quick Create [Initiate Quick Create Task](https://www.runninghub.ai/runninghub-api-doc-en/api-425761032.md): Under the Quick Create menu, select the module page you want to use and click on the "API Access" option to view the examples, The input parameters such as webappId, quickCreateCode, and nodeInfoList can be obtained. For more details, please refer to the "About Quick Create Invocation" instructions. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-image-to-video-q3-pro-fast](https://www.runninghub.ai/runninghub-api-doc-en/api-432893794.md): Vidu Q3-pro-fast image-to-video model generates high-quality video from a reference image as the start frame. Features audio-video synchronization at the fastest speed with Q3-pro level quality. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-start-end-to-video-q3-pro-fast](https://www.runninghub.ai/runninghub-api-doc-en/api-432893793.md): Vidu Q3-pro-fast start-end-to-video model generates smooth transition videos from specified start and end frame images. Features audio-video sync at the fastest speed with Q3-pro level quality. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-start-end-to-video-q2-turbo](https://www.runninghub.ai/runninghub-api-doc-en/api-425766774.md): By leveraging bi-frame guidance, it anchors subject identity and layout between start and end images with exceptional efficiency. The Turbo pipeline optimizes temporal smoothing to eliminate flicker while maintaining structural integrity at high throughput. It intelligently respects depth and parallax, offering adaptive camera movements and ultra-fast turnarounds. This makes it the perfect choice for creators who require cinematic coherence and logical motion during high-frequency creative iterations. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-start-end-to-video-q2-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-425766775.md): Vidu Q2 Start End Pro is a professional-grade model engineered for seamless video interpolation between a start and end frame. Utilizing advanced bi-frame guidance, it infers natural, object-aware motion while maintaining strict consistency in identity, lighting, and layout. It is the definitive tool for bridging shots and complex transitions, expertly simulating cinematic camera paths while preserving intricate details like faces and hands. The result is a flicker-free, high-fidelity sequence that balances sharp textures with authentic, smooth cinematic movement. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-image-to-video-q2-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-425766776.md): Vidu Q2 Pro is a high-performance Image-to-Video AI designed to breathe cinematic life into still frames. It excels at generating natural camera pans and fluid character animations while maintaining rigorous structural integrity. Unlike traditional tools, Q2 Pro focuses on "Identity Preservation," ensuring faces, hands, and intricate textures remain sharp and distortion-free. With its layout-aware dynamics, it delivers realistic parallax and consistent lighting, making backgrounds and foregrounds move in perfect harmony. Optimized for speed and quality, it empowers creators to produce social-ready, professional-grade clips that bridge the gap between photography and high-end cinematography. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-image-to-video-q2-turbo](https://www.runninghub.ai/runninghub-api-doc-en/api-425766777.md): Vidu Q2 Turbo is a high-speed Image-to-Video engine optimized for rapid creative iteration. it transforms static references into fluid, cinematic clips with unparalleled efficiency. The "Turbo Temporal Smoothing" technology ensures stable motion by eliminating flicker, while preserving intricate details like faces and textures. Featuring depth-aware motion and natural camera paths (such as pans and dollies), it maintains structural integrity without "rubbery" distortion. Designed for creators who demand fast turnarounds, Q2 Turbo is the ideal solution for high-throughput workflows, seamless transitions, and dynamic social storytelling. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-image-to-video-q3-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-425766778.md): Vidu Q3 Image-to-Video breathes "audio-visual life" into static imagery. It inherits native aspect ratios and styles while intelligently synthesizing matching ambient soundscapes and dialogue. Its core strength lies in deep compositional understanding, utilizing Smart Cutting to evolve a single frame into a 16-second narrative sequence. With up to 2K fidelity and precise text rendering, it ensures every generated movement and sound remains in perfect harmony with the original image's physical and aesthetic logic. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-image-to-video-q2-pro-fast](https://www.runninghub.ai/runninghub-api-doc-en/api-425766779.md): Vidu Q2 Pro Fast bridges the gap between cinematic quality and rapid production. By leveraging the high-fidelity DNA of Q2 Pro with accelerated processing, it transforms static images into fluid, professional videos in seconds. With superior object-aware consistency and intuitive camera path estimation, it empowers creators to iterate faster without compromising on detail, making it the premier choice for agile, high-end content creation. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-start-end-to-video-q2-pro-fast](https://www.runninghub.ai/runninghub-api-doc-en/api-425766780.md): Vidu Q2 Pro Fast (Start-End) excels in creating seamless video transitions by anchoring motion between two keyframes. Integrating the cinematic fidelity of Q2 Pro with an accelerated processing engine, it ensures absolute consistency in identity, lighting, and layout. This model minimizes temporal flickering while simulating natural camera paths and complex object motion, making it an essential tool for creators who demand professional-grade continuity and rapid iteration in visual storytelling. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-image-to-video-q3-turbo](https://www.runninghub.ai/runninghub-api-doc-en/api-425766781.md): Vidu-Image-to-Video-q3-turbo specializes in expanding static concepts into narratively rich 16-second sequences with unparalleled subject consistency. Utilizing the Q3 native multimodal architecture, it transforms images into coherent narratives with logical cause-and-effect movements. The model significantly reduces flickering and structural distortion, delivering cinematic transitions and synchronized audio directly from a single visual reference. It is optimized for speed, enabling creators to rapidly iterate from keyframes to production-ready narrative units. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-start-end-to-video-q3-turbo](https://www.runninghub.ai/runninghub-api-doc-en/api-425766782.md): Vidu-Start-End-Frame-q3-turbo is a high-efficiency trajectory-controlled engine designed for rapid industrial content creation. It enables 16-second synchronized audio-visual output by intelligently mapping the evolution between an initial and a final image. Optimized for speed, the turbo version ensures fluid motion and consistent narrative logic, making it a robust tool for creators needing quick iterations without sacrificing cinematic quality. Its "Director's Mindset" handles camera cuts and audio alignment with professional precision. - Standard Model API > Video Generation & Processing > image-to-video > Vidu [Vidu-start-end-to-video-q3-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-425766783.md): Vidu-Start-End-Frame-q3-pro is the flagship model within the Q3 series, pushing the boundaries of physical simulation and visual fidelity. It provides surgical precision over 16-second video trajectories, allowing for intricate narrative arcs between two distinct frames. Engineered for production-grade results, the Pro version excels in rendering complex lighting, textures, and synchronized multimodal audio. It acts as a professional digital cinematographer, ensuring that every transition is logically rigorous, aesthetically breathtaking, and physically accurate. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-v3.0-pro-image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766791.md): Kling V3.0 Pro Image-to-Video is Kuaishou’s flagship image-to-video model, offering the highest tier of visual fidelity and motion realism. Specifically engineered for high-end production, it delivers superior visual detail and cinematic rendering that surpasses the Standard tier. The model introduces advanced start-to-end frame guidance, allowing creators to strictly control transitions between two reference images. With integrated synchronized sound generation and support for dual-character voices, it empowers users to transform static concepts into professional-grade, multimodal cinematic narratives with unparalleled precision. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-video-o3-pro/reference-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-427096755.md): Kling Video O3 Pro Reference-to-Video is a premium engine engineered for unmatched identity consistency across cinematic frames. By extracting features from up to 7 reference images, it ensures that characters, props, and scenes remain perfectly recognizable throughout complex sequences. As the highest fidelity model in the Kling family, it supports video-guided motion for precise action transfer while offering flexible audio options, including original sound retention or AI-generated effects. It provides a robust, professional-grade solution for transforming static references into production-ready narratives with absolute visual integrity. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-video-o1/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766784.md): Kling Omni Video O1 (I2V Standard) transforms static images into high-quality dynamic videos while strictly preserving subject identity and visual consistency. It excels in generating natural motion and realistic physics, ensuring smooth scene dynamics throughout the sequence. Built for professional production, the model offers flexible clip durations when reference frames are provided. With its optimized REST API, fast response times, and predictable pricing, it provides a reliable and cost-effective solution for creators requiring stable, high-frequency video generation without the hindrance of cold starts. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-v3.0-std-image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766792.md): Kling V3.0 Standard Image-to-Video represents Kuaishou's latest advancement in transforming static visuals into high-fidelity, cinematic narratives. Building upon its predecessor V2.6, this version delivers a significant upgrade in motion consistency and visual realism. A standout feature is the start-to-end frame guidance, enabling creators to define precise transitions by uploading two reference points. Complementing its visual prowess, the model integrates synchronized sound effects and dual-voice character support, offering a truly multimodal production suite. With granular CFG scale adjustments, it provides professional-grade control for seamless and expressive animation. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-video-o3-std/reference-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-427096754.md): Kling Video O3 Standard Reference-to-Video is a professional-grade engine engineered for unmatched character and style consistency. By leveraging reference images and optional motion-guiding videos, the model preserves identities and specific aesthetic elements across dynamic scenes. Its standout multi-reference support allows for the seamless blending of diverse characters and elements into a single coherent narrative. With flexible durations of up to 15 seconds and versatile audio options—including original audio retention or new SFX generation—it provides a robust solution for creators demanding high-precision visual storytelling. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-video-o1/start-to-end](https://www.runninghub.ai/runninghub-api-doc-en/api-425766785.md): By specifying both the starting and ending visual states, the model utilizes its advanced spatiotemporal reasoning to synthesize logically consistent intermediate motion and transitions. This capability effectively eliminates the unpredictability of AI video endings, enabling precise execution of character actions, physical transformations, and complex camera movements. Whether for a 5 or 10-second sequence, it ensures cinematic continuity and natural blending, transforming random synthesis into a predictable, professional storytelling engine. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-video-o3-pro-image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766790.md): Kling Video O3 Pro Image-to-Video stands as Kuaishou's most formidable engine for high-fidelity storytelling. Powered by MVL (Multi-modal Visual Language) technology, it excels in maintaining strict subject consistency while integrating advanced physics simulations and natural motion. This Pro-tier model supports flexible durations up to 15 seconds and features optional start-to-end frame guidance for precise narrative control. With synchronized sound generation and cinematic rendering, it transforms static imagery into breathtakingly realistic, production-ready video masterpieces with unparalleled artistic precision. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-video-o3-std-image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766789.md): Kling Video O3 Standard Image-to-Video balances the powerful O3 architecture with exceptional cost-efficiency. It transforms static images into smooth, natural videos with flexible durations ranging from 3 to 15 seconds. By supporting optional start-to-end frame guidance, it ensures precise control over complex motion transitions. Combined with synchronized sound generation, this model provides an affordable yet high-quality solution for creators looking to produce cinematic content with enhanced motion realism. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-elements](https://www.runninghub.ai/runninghub-api-doc-en/api-427096753.md): A multi-element locking feature launched by Kuaishou Kling, designed specifically for video generation. Supports simultaneous locking of 1-4 visual elements across different categories—including human characters, animals, object props, and scene environments. After uploading reference images, the system extracts core features of each element and maintains their visual identity throughout video generation, ensuring consistent appearance regardless of camera movement or lighting changes. Control interactions between elements using prompt tags like "Figure 1/2/3" to achieve complex narratives such as character dialogues, animal performances, and object manipulations. Ideal for AI short dramas, virtual IP operations, product showcases, and creative storytelling requiring cross-shot consistency of multiple elements. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-v2.5-turbo-std/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766788.md): Kling V2.5 Turbo Standard delivers high-quality image-to-video generation at lower cost. From one image and a short prompt, it creates smooth, cinematic 720p clips that faithfully preserve style, lighting, and mood. Despite the resolution, refined motion synthesis ensures clean, stable, and detailed animation suitable for most use cases. Built with optimized pipelines, it offers fast inference for high-volume workflows. Crucially, its text comprehension and narrative coherence match the premium Turbo Pro version—producing well-timed, semantically accurate motion—making it ideal for cost-conscious creators. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-v2.5-turbo-pro/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766786.md): Kling 2.5 Turbo Pro (Image-to-Video) turns one image and a text prompt into cinematic video with fluid motion and strong intent alignment. It features first–last frame control: provide a start and end image, and the model generates a smooth transition between them. A new text-timing engine interprets multi-step instructions for coherent, well-paced scenes. Improved dynamics deliver realistic high-speed action and complex camera moves with minimal jitter or tearing. Refined conditioning preserves color, lighting, and mood—ensuring style consistency even during intense motion. Ideal for ads, social clips, and creative previews requiring speed, realism, and visual fidelity. - Standard Model API > Video Generation & Processing > image-to-video > kling [kling-v2.6-pro-image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766787.md): Kling 2.6 Audio Image-to-Video elevates static frames into immersive cinematic clips through joint audio-video co-generation. By using an input image as the anchor, it synthesizes realistic motion alongside synchronized soundscapes—including character voices, ambient noise, and SFX—in a single pass. This model ensures perfect alignment between on-screen movements and auditory timing, providing a streamlined, all-in-one solution for creators seeking high-fidelity storytelling, product explainers, and professional-grade social media content. - Standard Model API > Video Generation & Processing > image-to-video > wan [wan-2.7/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-438555147.md): Wan 2.7 Image-to-Video transforms reference images into smooth, high-quality cinematic videos. You can guide the generation using a single start frame or define precise transitions by providing both start and end frames. Combined with text prompts, it outputs stunning 720P or 1080P videos up to 15 seconds in duration. Additionally, it supports audio input for synchronized rhythm and pacing , alongside negative prompts to avoid unwanted elements, giving you absolute creative control. - Standard Model API > Video Generation & Processing > image-to-video > wan [wan-2.2/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-430967128.md): Alibaba's open-source Image-to-Video model (2025), built on pioneering MoE architecture (27B total / 14B active params) for efficient, high-fidelity animation from a single reference image + prompt. Outputs smooth 480P/720P/1080P videos at 24fps (typically 5/8s clips) with excellent motion realism, strong content consistency, and cinematic aesthetic control (lighting, composition, color via prompts). Excels in complex, natural movements without artifacts, ideal for creative shorts, product demos, and effects. - Standard Model API > Video Generation & Processing > image-to-video > wan [wan-2.6-reference-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766794.md): Wan2.6-r2v is a reference-driven video generation model from Alibaba's Tongyi Wanxiang 2.6 series, supporting multimodal inputs (text/image/video). It supports 720P/1080P resolutions. The model can restore character appearance from reference images or videos, support single-character performance or multi-character interaction, and features intelligent multi-shot scheduling capabilities. - Standard Model API > Video Generation & Processing > image-to-video > wan [wan-2.6-reference-to-video-flash](https://www.runninghub.ai/runninghub-api-doc-en/api-425766795.md): Wan2.6-r2v-flash is the fast reference-driven video generation model from Alibaba's Tongyi Wanxiang 2.6 series. It supports uploading up to 5 URL to generate new videos based on character identity, style, and scene layout from the references. This version offers faster generation speed, supports both 720P and 1080P resolutions, offers 2-10 seconds duration options, can generate videos with or without audio, and supports both single-shot and multi-shot narrative modes. - Standard Model API > Video Generation & Processing > image-to-video > wan [wan-2.6-image-to-video-flash](https://www.runninghub.ai/runninghub-api-doc-en/api-425766793.md): Wan 2.6 I2V Flash is Alibaba's high-speed video generation engine optimized for rapid iteration and narrative depth. It transforms static images into fluid 15-second clips at 1080p resolution, featuring optional synchronized AI audio or custom audio uploads. A standout feature is its Multi-Shot mode, enabling cinematic scene transitions within a single generation. With a built-in prompt enhancer and high-speed Flash architecture, it delivers professional-grade motion and structural consistency with industry-leading turnaround times. - Standard Model API > Video Generation & Processing > image-to-video > wan [wan-2.2-video/start-to-end](https://www.runninghub.ai/runninghub-api-doc-en/api-430967129.md): Wan 2.2 Image-to-Video is a streamlined model engineered for professional-grade, cinematic production. Delivering sharp, high-quality outputs suitable for final deliverables, it excels in generating stunning visuals, including complex sci-fi scenes. Beyond standard image-to-video capabilities, it offers advanced start-to-end frame interpolation for seamless transitions. With straightforward parameters and robust negative prompt support to exclude unwanted elements, Wan 2.2 provides creators with precise control and a highly efficient workflow. - Standard Model API > Video Generation & Processing > image-to-video > alibaba > wan [alibaba/wan-2.6/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766796.md): WAN 2.6 Image-to-Video turns an image and prompt into a cinematic 5–15s clip (up to 1080p). It animates while preserving identity, outfit, and style, and can auto-split ideas into consistent multi-shot sequences. Guided by both image and text, it delivers coherent, filmic motion—ideal for social clips, ads, or prototyping. - Standard Model API > Video Generation & Processing > image-to-video > seedance [seedance-2.0/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-438555149.md): seedance 2.0 Image-to-Video for highest quality. Supports first-frame and first-last-frame modes, transforming static images into 4-15 second dynamic videos with audio generation. - Standard Model API > Video Generation & Processing > image-to-video > seedance [seedance-2.0-fast/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-438555148.md): seedance 2.0 Fast Image-to-Video, optimized for speed and cost efficiency. Supports first-frame and first-last-frame modes for quick image-to-video transformation. - Standard Model API > Video Generation & Processing > image-to-video > seedance [seedance-v1.5-pro-image-to-video-fast](https://www.runninghub.ai/runninghub-api-doc-en/api-425766797.md): Seedance V1.5 Pro Fast is a high-speed Image-to-Video engine by ByteDance, optimized for rapid iteration and cinematic polish. By anchoring visuals to a first-frame image, it preserves subject identity, lighting, and composition while synthesizing coherent motion and precise camera behaviors like orbits or dollies. Featuring optional audio generation and seeded repeatability, this model delivers a professional, live-action aesthetic, making it a definitive tool for creators requiring quick turnarounds for commercial shorts and high-fidelity social media content. - Standard Model API > Video Generation & Processing > image-to-video > seedance [seedance-v1.5-pro-image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766798.md): Seedance V1.5 Pro (I2V) by ByteDance is a specialized model designed to "bring keyframes to life" with precision. It transforms a single reference image into a coherent video clip, preserving the original composition while introducing prompt-guided motion and camera dynamics. From subtle portraits to cinematic product pans, it offers expert control over movements like dolly-ins or handheld shakes. With flexible aspect ratios and tunable resolution, it provides a practical and efficient workflow for creators targeting social feeds, stories, and professional digital banners. - Standard Model API > Video Generation & Processing > image-to-video > hailuo [hailuo-02-i2v-standard](https://www.runninghub.ai/runninghub-api-doc-en/api-425766804.md): Hailuo 02 (Standard, I2V) is an advanced image-to-video model built on the evolving MiniMax framework. It is engineered to deliver native 768p clarity, ensuring crisp frames and high-fidelity visuals without upscaling. The model excels in simulating robust physics and complex motion, making it ideal for action-heavy scenes involving realistic debris, cloth dynamics, and handheld camera shakes. With its high prompt responsiveness and superior cinematic continuity, it faithfully follows detailed scene directions while maintaining smooth frame-to-frame transitions with minimal artifacts. It bridges the gap between static imagery and dynamic, believable cinematography. - Standard Model API > Video Generation & Processing > image-to-video > hailuo [hailuo-02-standard](https://www.runninghub.ai/runninghub-api-doc-en/api-425766799.md): Hailuo 02 Standard is a versatile unified model within the MiniMax framework, offering both Text-to-Video and Image-to-Video capabilities. It delivers native 768p resolution for crisp, high-quality frames without upscaling. The model is highly regarded for its realistic physics and motion, accurately simulating natural effects like debris, water, and cloth alongside authentic handheld camera shakes. Supporting 6s and 10s durations, it ensures strong prompt adherence and stable temporal transitions. With its consistent and repeatable outputs, Hailuo 02 Standard is the perfect tool for creators requiring reliable motion and cinematic continuity. - Standard Model API > Video Generation & Processing > image-to-video > hailuo [hailuo-2.3/i2v-standard](https://www.runninghub.ai/runninghub-api-doc-en/api-425766800.md): MiniMax’s premier image-to-video model, crafted to animate static images into seamless cinematic clips. It merges natural motion synthesis with high physical realism, bringing still visuals to life with precision. The model excels in cinematic camera moves like panning and zooming while realistically simulating dynamics such as wind and light reflections. By strictly preserving the original composition, lighting, and character details, it ensures structural consistency throughout the animation. Offering flexible 6- or 10-second durations, it delivers professional-grade fidelity ideal for storytelling, advertising, and high-end product demonstrations. - Standard Model API > Video Generation & Processing > image-to-video > hailuo [hailuo-2.3-fast/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766802.md): high-speed evolution of MiniMax’s premier video generation model, delivering cinematic quality 30% to 50% faster than the standard version. It allows creators to produce 6–10 second clips up to twice as fast, significantly reducing costs while maintaining impressive visual stability and smooth motion. Ideal for batch content creation and rapid iteration, the model includes a built-in safety checker and automatic prompt enhancement to optimize 768p outputs. It perfectly balances speed and quality, empowering creators to maintain high-throughput workflows without sacrificing professional-grade fidelity. - Standard Model API > Video Generation & Processing > image-to-video > hailuo [hailuo-2.3-fast-pro/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766803.md): MiniMax’s high-speed 1080p video engine, engineered for creators who demand professional-grade quality with rapid turnaround. It delivers cinematic videos 30% to 50% faster than standard models, offering up to twice the generation speed for 6-second clips. Despite its accelerated performance, it maintains exceptional visual stability, clear details, and balanced lighting. With features like automatic prompt enhancement and a built-in safety checker, Fast Pro optimizes high-throughput workflows and batch creation. It is the ultimate tool for achieving high-fidelity 1080p results in half the time. - Standard Model API > Video Generation & Processing > image-to-video > hailuo [hailuo-2.3/i2v-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-425766801.md): Hailuo 2.3 Pro represents the pinnacle of MiniMax’s image-to-video technology, transforming static images into stunning, native 1080p cinematic videos. Designed for professional creators and studios, it features next-generation motion rendering that accurately simulates complex lighting shifts, realistic physics, and organic fabric movements. The model ensures impeccable visual fidelity and stable composition across every frame, delivering film-grade warmth and depth in just 5 seconds. It bridges the gap between still photography and high-end cinematography, offering a seamless and efficient solution for professional-grade digital storytelling. - Standard Model API > Video Generation & Processing > image-to-video > hailuo [hailuo-02-i2v-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-425766805.md): Hailuo 02 (I2V Pro) represents a milestone in AI video generation, engineered for high-end cinematic realism and physical accuracy. It delivers native 1080P HD output directly from the model, ensuring professional-grade clarity and fine textures across all frames. With enhanced motion dynamics and physics simulation, it captures complex actions and lighting transitions with lifelike precision. Featuring 5-second duration options and intelligent scene transitions, Hailuo 02 ensures seamless continuity and strong prompt adherence. It is the premier choice for creators seeking predictable, high-fidelity results for advertising and cinematic storytelling. - Standard Model API > Video Generation & Processing > image-to-video > hailuo [hailuo-02-fast](https://www.runninghub.ai/runninghub-api-doc-en/api-425766806.md): Hailuo 02 Fast is the high-throughput variant of the Hailuo 02 engine, optimized for rapid iteration and creative exploration. It animates single static images into smooth 6s or 10s clips with impressive prompt-aware motion and robust physics. Designed for speed and cost-efficiency, it allows creators to quickly prototype story beats and perform batch A/B testing. Despite its focus on velocity, the model maintains a stable temporal flow with minimal flicker, ensuring realistic dynamics for cloth, debris, and camera shakes. It is the definitive creator-friendly tool for predictable, high-speed video production. - Standard Model API > Video Generation & Processing > image-to-video > midjourney [midjourney-image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766807.md): Midjourney’s video capability excels at capturing the "cinematic essence" of static art. It transforms single frames or dual "start-end" anchors into 5-second high-fidelity sequences with precise motion tracking. Supporting both 480p and 720p resolutions, it maintains absolute stylistic consistency while allowing creators to toggle between subtle ambient shifts and dramatic camera movements. This tool bridges the gap between illustration and motion design, delivering professional-grade visual flow that remains perfectly true to the original artwork’s aesthetic soul. - Standard Model API > Video Generation & Processing > image-to-video > sora [sora-2/image-to-video-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766808.md): OpenAI Sora 2 — Image-to-Video enables users to convert a single reference image into a coherent video clip with perfectly synchronized audio. Leveraging Sora 2’s core technological advances, its image-to-video pipeline can perfectly preserve the subject identity, lighting effects and scene composition, while intelligently synthesizing realistic motion effects and professional camera dynamics for stunning visual presentation. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > image-to-video > sora [sora-2/image-to-video-pro-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766813.md): Sora 2 Image-to-Video Pro transforms a single reference image into a high-fidelity video clip with seamless audio-visual synchronization. Building on Sora 2’s core architecture, it features an advanced "Identity Lock" to preserve faces, style, and composition with unmatched precision. The model infers 3D structures for convincing parallax and realistic physics-aware motion, ensuring secondary elements like cloth and hair behave naturally. Supporting professional resolutions up to 1080p and durations of 4s, 8s, 12s, 16s or 20s, it offers elite steerability for cinematic camera dynamics and synchronized soundscapes, making it a definitive tool for high-end digital storytelling and production. Utilizes official native API protocol, which currently lacks access to personal Cameo libraries from Web/App versions; and does not support @ syntax for character referencing. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > image-to-video > sora [sora-2/image-to-video-pro-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766809.md): OpenAI Sora 2 — Image-to-Video-Pro empowers the conversion of a single reference image into a smooth, coherent video clip with highly synchronized audio. Based on Sora 2’s core advanced algorithms, its pipeline perfectly retains identity, lighting and composition of the reference image, and synthesizes ultra-believable motion trajectories and professional cinematic camera dynamics for premium video output. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > image-to-video > sora [sora-2/text-to-video-pro-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766812.md): Sora 2 Pro is a state-of-the-art video and audio synthesizer that redefines cinematic realism through advanced physics-aware motion and synchronized acoustics. Building on its predecessor, it masters complex scene reasoning, ensuring stable identities and fluid camera movements without warping. The Pro version excels in preserving high-frequency textures and delivering perfectly aligned lip-sync and ambient audio. With flexible options for 4s, 8s, 12s, 16s or 20s durations and professional resolutions, it offers creators unmatched steerability and temporal consistency, transforming text into high-fidelity narratives with flawless physical and structural integrity. Utilizes official native API protocol, which currently lacks access to personal Cameo libraries from Web/App versions; and does not support @ syntax for character referencing. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > image-to-video > sora [sora-2/image-to-video-realistic-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766810.md): enables converting a single reference image into a coherent, ultra-realistic video clip with perfectly synchronized audio. It integrates all core advantages of the four Sora 2 series products, including identity lock-in, accurate physics, 3D depth perception, cinematic camera moves, fine detail retention and strong steerability. With official direct connection for stable performance, it exclusively supports real-person subject generation, ensuring natural motion and true-to-life visual effects. Utilizes official native API protocol, which currently lacks access to personal Cameo libraries from Web/App versions; and does not support @ syntax for character referencing. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > image-to-video > sora [sora-2/image-to-video-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766811.md): Sora 2 (I2V) transforms a single reference image into a coherent, high-fidelity video clip with synchronized audio. It features an advanced "Identity Lock" that preserves faces, textures, and lighting with remarkable precision. By inferring 3D structures, the model delivers convincing parallax and depth transitions. Its physics-aware engine ensures that secondary motions, such as flowing hair or fabric, behave naturally. With support for 4s, 8s, or 12s durations and 720P flexible resolutions, Sora 2 (I2V) offers creators strong steerability and temporal consistency, turning static compositions into vivid, cinematic narratives without sacrificing the integrity of the original reference. Utilizes official native API protocol, which currently lacks access to personal Cameo libraries from Web/App versions; and does not support @ syntax for character referencing. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > image-to-video > sora [sora-2/image-to-video-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-439502919.md): OpenAI Sora 2 — Image-to-Video enables users to convert a single reference image into a coherent video clip with perfectly synchronized audio. Leveraging Sora 2’s core technological advances, its image-to-video pipeline can perfectly preserve the subject identity, lighting effects and scene composition, while intelligently synthesizing realistic motion effects and professional camera dynamics for stunning visual presentation. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > image-to-video > sora [sora-2/image-to-video-pro-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-439801955.md): OpenAI Sora 2 — Image-to-Video-Pro empowers the conversion of a single reference image into a smooth, coherent video clip with highly synchronized audio. Based on Sora 2’s core advanced algorithms, its pipeline perfectly retains identity, lighting and composition of the reference image, and synthesizes ultra-believable motion trajectories and professional cinematic camera dynamics for premium video output. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > image-to-video > xai > grok [xai/grok-imagine/image-to-video-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766814.md): It is engineered to breathe life into static concepts while maintaining absolute subject identity. By deeply analyzing the geometric structure and material properties of a reference image, it synthesizes motion that aligns perfectly with physical intuition. The model excels in temporal stability and lighting inheritance, ensuring that core elements remain undistorted during intense transitions. It provides a seamless bridge from a single visual reference to a high-tension, cinematic sequence characterized by fluid motion and professional-grade rendering. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > image-to-video > xai > grok [xai/grok-imagine/image-to-video-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766815.md): Grok Imagine Video by xAI is a powerful image-to-video generation model designed to bring static images to life. By simply uploading a reference image and providing a motion prompt, users can create cinematic videos featuring smooth, natural movements, seamless scene continuity, and synchronized audio. It is the ultimate tool for transforming still moments into dynamic visual narratives. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > image-to-video > google > veo3.1 [google/veo3.1-pro/start-end-to-video-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766816.md): It represents the next evolution in cinematic video synthesis from DeepMind. It transforms still images or start-and-end frame pairs into high-fidelity 1080p motion sequences with stunning temporal continuity. Standing out with its native audio generation, the model automatically crafts synchronized soundscapes that breathe life into the visuals. Whether executing complex camera dollies or seamless scene morphing, Veo 3.1 delivers professional-grade consistency and narrative depth for storyboarding and creative production. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > image-to-video > google > veo3.1 [google/veo3.1-fast/image-to-video-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766819.md): Google Veo3.1 I2V converts static images into cinematic dynamic videos with smooth realistic motion and natural lighting, delivering results 30% faster than the standard version. Preserves original image composition and visual style, enables native synchronized audio generation, supports dialogue & lip-sync, ideal for social content creation, concept visualization and casual creative storytelling with high cost performance. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > image-to-video > google > veo3.1 [google/veo3.1-fast/image-to-video-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766821.md): Google Veo 3 I2V Fast is the high-speed, cost-optimized variant of DeepMind’s image-to-video model. Delivering results up to 30% faster, it transforms static images into cinematic motion sequences with realistic lighting and synchronized native audio. Uniquely supporting resolutions up to 4K, it features advanced dialogue lip-sync and strict style preservation, making it the perfect tool for rapid social content creation and concept visualization. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > image-to-video > google > veo3.1 [google/veo3.1-pro/image-to-video-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766817.md): Google Veo 3.1 I2V is DeepMind’s latest image-to-video evolution, transforming still images or start-end frame pairs into high-fidelity cinematic sequences. Whether animating a single shot or interpolating between two frames, it delivers realistic motion, lighting, and synchronized native audio. With support for up to 4K resolution and flexible aspect ratios, it creates fluid transitions while preserving the original artistic style. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > image-to-video > google > veo3.1 [google/veo3.1-fast/start-end-to-video-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766820.md): Veo 3.1 Fast is engineered for creators who prioritize speed and rapid iteration without sacrificing structural control. In Start & End Frame mode, it delivers near-instantaneous interpolation, bridging two visual anchors with fluid motion in seconds. This model is optimized for low-latency workflows and high-volume prototyping. While maximizing throughput, it maintains impressive consistency in geometry and motion logic. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > image-to-video > google > veo3.1 [google/veo3.1-pro/reference-to-video-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766818.md): Powered by Google DeepMind’s next-generation architecture, Veo 3.1 Reference-to-Video transforms up to three static images into coherent 8-second cinematic clips. Delivering up to 4k resolution with synchronized native audio, this model excels at visual consistency—preserving subject identity, lighting, and texture across frames. It seamlessly interprets text and visual cues to create smooth, narrative-driven motion for characters and products. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > image-to-video > skyreels [skyreels-v4/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-430967130.md): SkyReels V4 Image-to-Video transforms static images into dynamic short videos at 1080p resolution. Supporting JPG, PNG, GIF, and BMP inputs, it uses text prompts to precisely control motion and camera movement, ideal for animating designs, products, and creative assets. - Standard Model API > Video Generation & Processing > image-to-video > ltx [ltx-2.3/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-434990842.md): Lightricks' next-generation image-to-video foundation model, delivering comprehensive quality improvements over the LTX-2 series. A completely rebuilt VAE architecture significantly enhances sharpness in hair, text, and edge details while greatly reducing frozen frames and static Ken Burns effects for more authentic motion. Native 9:16 portrait support enables direct creation of social media-native content without cropping. Generates matching ambient sound effects and visual motion in a single pass, achieving perfect audio-visual alignment within 5-20 second durations, truly bringing static photos to life. - Standard Model API > Video Generation & Processing > image-to-video > ltx [ltx-2.3/image-to-video-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-434990843.md): A LoRA inference version designed for users needing personalized visual styles in image-to-video generation. Building on LTX-2.3's foundation, it supports simultaneous loading of up to three custom LoRA adapters to directly inject brand-specific aesthetics, character likenesses, or camera languages into the generation pipeline. Whether maintaining consistent product visuals, ensuring character continuity across shots, or achieving specific cinematic camera movements, lightweight LoRA modules provide precise control without retraining the entire model. Ideal for scaled brand content production, IP character animation, and stylized commercial video creation. - Standard Model API > Video Generation & Processing > image-to-video > pixverse [pixverse-v6/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-439502920.md): PixVerse V6 Image-to-Video animates a reference image into a cinematic video clip with natural motion. Upload a photo, describe the movement, and get high-quality animation preserving subject appearance and composition. Supports 360p-1080p, 1-15s duration, and optional audio. The Thinking mode allows the model to apply extended reasoning to complex or detailed scene descriptions, and with the built-in prompt enhancer tool, it automatically optimizes action descriptions to improve output content. - Standard Model API > Video Generation & Processing > reference-to-video [wan-2.7-reference-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-438555155.md): Wan 2.7 Reference-to-Video is an advanced AI model that transforms character, prop, or scene references from existing media into entirely new video shots. By uploading reference materials (requiring at least 1 reference image or video, up to a combined maximum of 5), alongside text prompts, the model generates smooth videos that strictly preserve the original identity and visual style. Supporting 720P and 1080P resolutions and equipped with negative prompt capabilities, it seamlessly brings your visual assets into fresh contexts with remarkable consistency. - Standard Model API > Video Generation & Processing > reference-to-video [skyreels-v3/reference-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-430967133.md): SkyReels V3 Reference-to-Video supports 1-4 reference images for precise subject and style control. Through multi-image guidance, it generates videos that closely match your reference materials, ideal for brand-consistent marketing, character animation, and IP visualization. - Standard Model API > Video Generation & Processing > reference-to-video [Vidu-reference-to-video-q3](https://www.runninghub.ai/runninghub-api-doc-en/api-437377721.md): Shengshu Technology's latest reference-to-video model from the Vidu Q3 series, designed for professional video generation scenarios. Supports uploading 1-7 images as subject references and 3-16 second audio-visual synchronized output. Excels in intelligent scene cutting and multi-camera consistency, maintaining visual coherence across complex multi-angle scenes. Offers multiple resolution options from 540p to 1080p, compatible with both subject library calls and temporary subject references. Suitable for cinematic content production requiring precise camera control. - Standard Model API > Video Generation & Processing > reference-to-video [seedance-2.0/multimodal-video](https://www.runninghub.ai/runninghub-api-doc-en/api-438555154.md): seedance 2.0 Multimodal Video for highest quality. Supports multimodal reference, video editing, and extension. Combine text, images (up to 9), videos (up to 3), and audio (up to 3) to produce 4-15 second high-quality videos. - Standard Model API > Video Generation & Processing > reference-to-video [Vidu-reference-to-video-q2](https://www.runninghub.ai/runninghub-api-doc-en/api-425766822.md): Vidu Q2 Reference-to-Video transforms static concepts into expressive, cinematic narratives with exceptional fidelity. It excels at capturing subtle micro-expressions, rhythmic breathing, and authentic eye movements, breathing life into portraits with remarkable realism. Supporting up to seven reference images for multi-angle guidance, it provides granular control over motion amplitude and complex camera dynamics like pans and zooms. With professional-grade identity preservation and flexible aspect ratios, it is the premier solution for creators seeking to bridge the gap between high-end static imagery and fluid, emotional storytelling. - Standard Model API > Video Generation & Processing > reference-to-video [seedance-2.0-fast/multimodal-video](https://www.runninghub.ai/runninghub-api-doc-en/api-438555153.md): seedance 2.0 Fast Multimodal Video, optimized for speed and cost efficiency. Supports multimodal reference, video editing, and extension with flexible multi-modal inputs. - Standard Model API > Video Generation & Processing > reference-to-video [kling-video-o1-std/refrence-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766823.md): Kling Omni Video O1 (Reference-to-Video) is a groundbreaking multi-modal model from Kuaishou, designed to generate creative scenarios while maintaining absolute subject identity. By utilizing advanced feature extraction from multiple reference viewpoints, it preserves the consistent appearance of characters, props, and scene elements throughout the video. This "Identity Lock" technology allows creators to place familiar subjects into entirely new environments with fresh poses and camera movements. It offers unparalleled creative freedom by balancing stable character replication with dynamic motion control, making it an essential tool for high-end, subject-driven video production. - Standard Model API > Video Generation & Processing > reference-to-video [seedance-v1-lite-reference-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766824.md): Seedance V1 Lite is a cutting-edge multi-reference video model from ByteDance, uniquely designed to integrate up to 4 distinct reference images into a single coherent scene. It excels at maintaining the identity of humans, animals, and objects while facilitating natural, prompt-guided interactions between them. Featuring an AI-powered prompt enhancer and precise camera control for stable shots, the model empowers creators to build complex multi-subject narratives with high visual fidelity. With seed-based reproducibility and flexible duration, it is a versatile tool for professional-grade storytelling and creative visual production. - Standard Model API > Video Generation & Processing > reference-to-video [skyreels-v4/omni-reference](https://www.runninghub.ai/runninghub-api-doc-en/api-430967134.md): SkyReels Omni Reference is a versatile AI video generation model supporting keyframe guidance, character consistency, motion reference, subject replacement, background replacement, object removal, and more. Using @tag references to flexibly combine image and video inputs, it delivers precise multi-modal video generation for creative advertising, character animation, and professional video editing. - Standard Model API > Video Generation & Processing > reference-to-video [Vidu-reference-to-video-q3-mix](https://www.runninghub.ai/runninghub-api-doc-en/api-437377720.md): The balanced optimization version of Shengshu Technology's Vidu Q3 series reference-to-video model, delivering strong performance in visual quality and motion dynamics. Supports intelligent scene cutting and audio-visual synchronized generation. Generates 1-16 second videos at 720p and 1080p resolutions. Note: Current version does not support subject library calls. Ideal for creative scenarios prioritizing balanced image quality and motion performance without subject library requirements. - Standard Model API > Video Generation & Processing > text-to-video > Vidu [Vidu-text-to-video-q3-pro-fast](https://www.runninghub.ai/runninghub-api-doc-en/api-432893792.md): Vidu Q3-pro-fast text-to-video model delivers Q3-pro level quality at significantly faster generation speed. Supports audio-video synchronization and storyboard generation, producing 1-16s high-quality videos for rapid creative iteration. - Standard Model API > Video Generation & Processing > text-to-video > Vidu [Vidu-text-to-video-q2](https://www.runninghub.ai/runninghub-api-doc-en/api-425766825.md): Vidu is a premier AI text-to-video tool designed to transform prompts into high-quality, 720p cinematic sequences. It excels in delivering fluid motion, realistic lighting, and natural camera movements with professional depth of field. Unlike standard models, Vidu ensures superior temporal consistency, minimizing flicker for smooth, coherent transitions. By deeply understanding complex semantic descriptions, it empowers creators to produce expressive characters and dynamic scenes with ease. Whether for storytelling or visual effects, Vidu bridges the gap between imagination and professional-grade cinematography, offering unparalleled creative flexibility and visual impact. - Standard Model API > Video Generation & Processing > text-to-video > Vidu [Vidu-text-to-video-q3-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-425766826.md): Vidu Q3 Text-to-Video redefines AI cinematography by integrating synchronized audio and visual generation. It empowers creators to embed dialogue and SFX directly via prompts, ensuring perfect audio-visual alignment. With "Smart Shot Cutting," the model orchestrates professional-grade transitions automatically. Supporting up to 2K resolution and customizable 1-16s durations, Vidu Q3 bridges the gap between script and screen, offering built-in text rendering for a truly end-to-end production experience. - Standard Model API > Video Generation & Processing > text-to-video > Vidu [Vidu-text-to-video-q3-turbo](https://www.runninghub.ai/runninghub-api-doc-en/api-425766827.md): idu-Text-to-Video-q3-turbo is a pioneering "Drama-First" engine designed for industrial-grade content production. It is the first to achieve 16-second synchronized audio-visual output, allowing for complete narrative arcs within a single shot. Featuring a "Director's Mindset," it automatically manages camera cuts and synchronizes dialogue, ambient sounds, and emotional beats in real-time. The turbo version optimizes generation speed while maintaining cinematic fidelity, transforming AI video from a creative toy into a robust narrative productivity tool. - Standard Model API > Video Generation & Processing > text-to-video > kling [kling-video-o3-std-text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766831.md): Kling Video O3 Standard is a flagship text-to-video engine within Kuaishou's O3 lineup, pushing visual fidelity and motion realism beyond the V3.0 models. It masterfully balances high-end cinematic quality with cost-effective production, offering flexible durations from 3 to 15 seconds. With built-in synchronized sound generation and support for various aspect ratios (16:9, 9:16, 1:1), it provides a versatile and immersive solution for creators demanding professional-grade, multi-platform video assets. - Standard Model API > Video Generation & Processing > text-to-video > kling [kling-video-o1/text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766828.md): Kling Omni Video O1 is Kuaishou's unified multi-modal video engine, optimized for stable production and cost efficiency. Powered by Multi-Modal Visual Language (MVL), it accurately interprets text prompts, visual contexts, and subject identities to deliver high-quality videos with coherent motion. The model supports a comprehensive creative workflow, including text-to-video, image-to-video, and professional video editing. By maintaining exceptional subject consistency and temporal stability, it provides a reliable, all-in-one solution for creators seeking a perfect balance between visual quality, generation speed, and operational cost. - Standard Model API > Video Generation & Processing > text-to-video > kling [kling-v3.0-pro-text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766833.md): Kling V3.0 Pro is Kuaishou’s flagship text-to-video model, engineered for ultimate visual fidelity and motion realism. Surpassing the Standard tier, the Pro version delivers cinematic-grade rendering with unparalleled detail and fluid dynamics, capturing the most intricate lighting and physical interactions. It features a fully integrated multimodal workflow, supporting synchronized sound generation and dual-character voiceovers. With professional-grade controls like negative prompts and CFG scaling, it provides the precision required for high-end production, transforming complex text descriptions into breathtaking, production-ready cinematic masterpieces. - Standard Model API > Video Generation & Processing > text-to-video > kling [kling-v3.0-std-text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766834.md): Kling V3.0 Standard is Kuaishou's latest frontier in text-to-video synthesis, designed to deliver high-fidelity, cinematic visuals from simple descriptions. This version brings a quantum leap in motion dynamics and visual clarity over V2.6. It stands out by offering synchronized sound effect generation and support for dual-character voices, enabling lifelike dialogue directly within the video. With professional tools like negative prompts and CFG scale control, it allows creators to masterfully balance prompt precision with cinematic artistry. - Standard Model API > Video Generation & Processing > text-to-video > kling [kling-video-o3-pro-text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766832.md): Kling Video O3 Pro is Kuaishou's flagship text-to-video engine, powered by advanced MVL (Multi-modal Visual Language) technology. It delivers top-tier visual fidelity and cinematic motion by integrating natural physics simulations and precise semantic understanding. The model excels in maintaining strict subject consistency over flexible durations from 3 to 15 seconds. With professional-grade controls for multiple aspect ratios and optional synchronized sound, it transforms text prompts into production-ready masterpieces with unparalleled realism and artistic precision. - Standard Model API > Video Generation & Processing > text-to-video > kling [kling-v2.5-turbo-pro/text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766829.md): Kling 2.5 Turbo Pro is a fast, high-fidelity text-to-video model that generates cinematic clips with ultra-smooth motion and strong prompt alignment. It interprets multi-step instructions via a new text-and-timing controller to create coherent, dynamic scenes. Improved training reduces artifacts like jitter or frame drops, even during complex motion. Optimized pipelines enable faster generation without quality loss, while enhanced style conditioning locks in color, lighting, and mood—ensuring visual consistency throughout. Ideal for ads, social content, and creative prototyping where speed, realism, and style matter. - Standard Model API > Video Generation & Processing > text-to-video > kling [kling-v2.6-pro-text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766830.md): Kling 2.6 Audio T2V redefines AI production through native joint audio-video synthesis. Instead of post-generation dubbing, it synthesizes visuals and soundscapes simultaneously, ensuring perfect alignment between camera motion, character actions, and sound effects. Featuring character-aware voices and scene-driven audio design, it transforms plain text into cinematic clips with built-in narration and professional ambience. This integrated script-to-scene pipeline offers a seamless, high-fidelity solution for social media, storyboarding, and immersive digital storytelling. - Standard Model API > Video Generation & Processing > text-to-video > alibaba > wan [alibaba/wan-2.6/text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766835.md): Alibaba Wan 2.6 Text-to-Image (alibaba/wan-2.6/text-to-image) is Alibaba’s text-to-image generation model for creating high-quality visuals from a single natural-language prompt. It’s built for practical creative workflows—concept art, product visuals, portraits, and stylized imagery—where you want strong prompt adherence plus flexible custom sizing. - Standard Model API > Video Generation & Processing > text-to-video > seedance [seedance-2.0/text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-438555151.md): seedance 2.0 Text-to-Video for highest quality. Generate 4-15 second videos from text prompts with multiple aspect ratios, audio generation, and web search enhancement. - Standard Model API > Video Generation & Processing > text-to-video > seedance [seedance-2.0-fast/text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-438555150.md): seedance 2.0 Fast Text-to-Video, optimized for speed and cost efficiency. Quickly generate 4-15 second videos from text prompts with multiple aspect ratios, audio generation, and web search enhancement. - Standard Model API > Video Generation & Processing > text-to-video > seedance [seedance-v1.5-pro-text-to-video-fast](https://www.runninghub.ai/runninghub-api-doc-en/api-425766836.md): Seedance V1.5 Pro Fast (T2V) is ByteDance’s production-grade engine optimized for rapid text-to-video workflows. It excels at transforming natural language into cinematic clips with exceptional prompt adherence and expressive motion. Designed for fast iteration, it maintains aesthetic stability across diverse aspect ratios while offering optional audio generation and seed control. This model is a definitive tool for creators requiring quick turnarounds for professional-quality social media content, storyboarding, and high-fidelity advertising concepts. - Standard Model API > Video Generation & Processing > text-to-video > seedance [seedance-v1.5-pro-text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766837.md): Seedance 1.5 Pro (T2V) is a production-tier model by ByteDance Seed, engineered for cinematic realism and high-impact motion. It excels in prompt alignment, accurately capturing complex emotional tones and shot instructions for professional ad creatives and short dramas. Featuring fine-grained facial acting and stable aesthetic harmony, it delivers a natural, live-action look with remarkable consistency. With flexible duration control (4–12s) and multiple aspect ratios, it serves as a versatile powerhouse for generating premium, emotionally resonant content directly from text. - Standard Model API > Video Generation & Processing > text-to-video > hailuo [hailuo-02-t2v-standard](https://www.runninghub.ai/runninghub-api-doc-en/api-425766841.md): Hailuo 02 is a versatile Text-to-Video model from MiniMax, designed to balance cinematic quality with production efficiency. It delivers crisp 768p video clips with exceptional prompt adherence and believable physical simulations. From the natural movement of cloth and water to realistic camera shakes, the model ensures smooth temporal flow and minimal artifacts. With support for 6s or 10s durations and a highly repeatable generation process, Hailuo 02 empowers creators to ideate faster and achieve professional-grade motion continuity at a lower cost. It is the go-to tool for stable, high-fidelity creative storytelling. - Standard Model API > Video Generation & Processing > text-to-video > hailuo [hailuo-02-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-425766838.md): Hailuo 02 Pro is the premium endpoint within the MiniMax framework, engineered for creators who demand elite cinematic realism and physical accuracy. It delivers sharper native 1080p output with superior color depth and intricate micro-details. The model excels in complex motion and physics simulation—naturalizing debris, cloth dynamics, and collisions while ensuring cleaner temporal coherence with minimal flicker. Supporting both T2V and I2V modes with optional end-frame guidance, Hailuo 02 Pro offers reliable prompt adherence and seamless camera transitions for professional-grade digital production. - Standard Model API > Video Generation & Processing > text-to-video > hailuo [hailuo-2.3-t2v-standard](https://www.runninghub.ai/runninghub-api-doc-en/api-425766839.md): Hailuo 2.3 Standard is the cutting-edge AI video generation model from MiniMax, engineered to deliver cinematic-grade results with advanced physics rendering. It excels in simulating complex dynamics like water flow and debris movement with remarkable physical consistency. The model is distinguished by its seamless scene transitions and high reliability, ensuring reproducible results for precise creative control. Offering flexible 6-second or 10-second durations, it provides professional-level fidelity at a highly competitive price point. Hailuo 2.3 Standard empowers creators to achieve high-end production quality with a perfect balance of reliability and cost-efficiency. - Standard Model API > Video Generation & Processing > text-to-video > hailuo [hailuo-2.3-t2v-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-425766840.md): Hailuo 2.3 Pro is the flagship text-to-video engine from MiniMax, designed for professional creators seeking cinematic realism and superior visual coherence. It transforms complex text prompts into high-fidelity 1080p videos, seamlessly merging professional quality with advanced physical simulations. The model excels in modeling realistic lighting, shadows, and intricate camera movements while maintaining exceptional semantic accuracy. By ensuring character consistency and offering a refined film-like aesthetic, Hailuo 2.3 Pro delivers visually stunning 5-second clips that meet the highest standards of modern digital storytelling. - Standard Model API > Video Generation & Processing > text-to-video > hailuo [hailuo-02-t2v-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-425766842.md): Hailuo 2.3 T2V Pro is a premium text-to-video model engineered to transform plain text into cinematic 1080p videos with exceptional visual fidelity. It delivers native Full-HD frames directly from the model, ensuring sharp, production-ready clarity without upscaling. Distinguished by its enhanced motion and physics engine, it masterfully simulates complex dynamics like debris, cloth, and realistic camera shakes. With superior temporal consistency and strong prompt adherence, the model minimizes flicker and ensures reliable, repeatable outputs. It is the definitive tool for creators seeking film-like sequences with professional-grade motion and structural stability. - Standard Model API > Video Generation & Processing > text-to-video > sora [sora-2/text-to-video-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766843.md): OpenAI Sora 2 — Text-to-Video is a state-of-the-art integrated video and audio generator, built on the original Sora technical foundation. It surpasses all previous video models with more accurate physical motion, ultra-sharp realistic visual effects, perfectly synchronized audio, stronger controllability and a much wider stylistic expression range for diverse creative needs. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > text-to-video > sora [sora-2/text-to-video-pro-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766844.md): OpenAI Sora 2 — Text-to-Video-Pro is an industry-leading video and audio generation model built on the original Sora framework. It achieves breakthrough upgrades with ultra-accurate physical simulation, extreme realism, perfectly synchronized audio, enhanced steerability and an expanded stylistic range, delivering top-tier video creation performance for all scenarios.Ultra-realistic. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > text-to-video > sora [sora-2/text-to-video-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766845.md): Sora 2 is OpenAI’s cutting-edge video-and-audio generator, setting a new benchmark for cinematic realism. Building on the original Sora foundation, it features physics-aware motion that accurately simulates momentum and collisions. Key advancements include full audio synchronization (lip-sync and ambient sounds) and superior temporal consistency across complex, multi-subject scenes. Sora 2 excels in high-frequency detail, preserving lifelike textures without artificial sharpening. With its cinematic camera literacy and wide stylistic range, it offers creators unprecedented steerability, transforming detailed prompts into coherent, high-fidelity narratives with flawless audio-visual alignment.support 4s/8s/12s. Utilizes official native API protocol, which currently lacks access to personal Cameo libraries from Web/App versions; and does not support @ syntax for character referencing. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > text-to-video > sora [sora-2/text-to-video-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-439801953.md): OpenAI Sora 2 — Text-to-Video is a state-of-the-art integrated video and audio generator, built on the original Sora technical foundation. It surpasses all previous video models with more accurate physical motion, ultra-sharp realistic visual effects, perfectly synchronized audio, stronger controllability and a much wider stylistic expression range for diverse creative needs. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > text-to-video > sora [sora-2/text-to-video-pro-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-439801954.md): OpenAI Sora 2 — Text-to-Video-Pro is an industry-leading video and audio generation model built on the original Sora framework. It achieves breakthrough upgrades with ultra-accurate physical simulation, extreme realism, perfectly synchronized audio, enhanced steerability and an expanded stylistic range, delivering top-tier video creation performance for all scenarios.Ultra-realistic. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > text-to-video > xai > grok [xai/grok-imagine/text-to-video-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766846.md): Leveraging xAI’s advanced reasoning capabilities, it excels at translating intricate prompts into visually stunning, logically coherent narratives. The model demonstrates superior performance in managing cinematic camera movements and complex physical interactions, ensuring that every frame adheres to realistic physics and lighting. It is designed for creators who demand high-precision storytelling and unparalleled visual fidelity without semantic loss. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > text-to-video > xai > grok [xai/grok-imagine/text-to-video-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766847.md): Grok Imagine Video Text-to-Video by xAI is an advanced model that creates high-quality videos entirely from text descriptions. By simply detailing the desired scene, motion, and visual style, users can generate cinematic footage featuring realistic movements and rich atmospheric depth. With customizable settings—including flexible durations, multiple aspect ratios like 16:9 and 9:16, and 480p or 720p resolution outputs—this tool empowers creators to seamlessly bring their imagination to life from scratch. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > text-to-video > google > veo3.1 [google/veo3.1-pro/text-to-video-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766848.md): Google's flagship advanced AI Text-to-Video model, Veo3.1 Premium mode. Enables native text-to-video with fully synchronized ambient sound, dialogue and music, supports dialogue lip-sync, subject consistency and video interpolation. Generates top-tier cinematic videos with natural lighting, smooth camera transitions and strong narrative consistency, full flagship functions for professional storytelling and marketing, ultra-premium quality with ultra-high pricing. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > text-to-video > google > veo3.1 [google/veo3.1-pro/text-to-video-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766849.md): Google Veo 3.1 T2V is DeepMind’s latest flagship model, designed to bring cinematic storytelling to life. It generates high-fidelity 4k videos with synchronized native audio, including realistic dialogue and lip-sync. Featuring advanced subject consistency (via reference images) and seamless video interpolation, Veo 3.1 offers precise control over motion and lighting. With flexible options for duration (4s/6s/8s) and aspect ratio, it stands as one of the most advanced generative video systems available. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > text-to-video > google > veo3.1 [google/veo3.1-fast/video-extend-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766850.md): Veo 3.1 Video Extend (Fast) is optimized for high-speed video continuation, prioritizing low latency and rapid iteration. It allows users to seamlessly append 7-second segments to existing Veo clips, returning a single merged file. Sharing the same logic as the standard model, it supports up to 20 chained extensions (max 148s). This endpoint is ideal for rapid story development, testing ad variations, and accelerating production feedback loops. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > text-to-video > google > veo3.1 [google/veo3.1-pro/video-extend-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766851.md): Veo 3.1 Video Extend enables the seamless continuation of existing Veo-generated clips by appending a new 7-second segment. Unlike a restart, it offers "true continuation," preserving the original style, motion, and framing. Users can chain up to 20 extensions to create a single, merged video file up to 148 seconds long. This endpoint is essential for evolving narratives, ensuring cinematic coherence without disjointed transitions. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > text-to-video > google > veo3.1 [google/veo3.1-fast/text-to-video-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766852.md): Google's latest advanced AI Text-to-Video model, Veo3.1 Fast mode. Features native text-to-video with synchronized audio & video generation, delivers high-quality videos with basic cinematic realism and smooth motion. Equipped with natural scene presentation and accurate audio-visual synchronization, outstanding quality at an ultra-low price, the best cost-effective choice for daily creative needs and casual video generation scenarios. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > text-to-video > google > veo3.1 [google/veo3.1-fast/text-to-video-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766853.md): Google Veo 3.1 T2V Fast is the high-speed, cost-optimized iteration of DeepMind’s generative model. Delivering cinematic 4k video up to 30% faster than the standard version, it excels in natural motion, realistic lighting, and synchronized native audio. With unique support for character dialogue and precise lip-sync, it is the ideal solution for creators requiring rapid, high-quality output for storytelling, marketing, and short-form content. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > text-to-video > cinematic [cinematic-video-generator](https://www.runninghub.ai/runninghub-api-doc-en/api-429524493.md): The Cinematic Video Generator is a dual-mode AI model designed for high-end creative productions. It delivers Hollywood-grade videos with stunning visual fidelity, professional color grading, and dramatic lighting. Whether using pure text-to-video (T2V) or guiding the output with up to four reference images (I2V), it ensures smooth, naturally directed motion and camera control. With support for multiple aspect ratios, it is the ultimate engine for professional cinematic storytelling. - Standard Model API > Video Generation & Processing > text-to-video > wan [wan-2.7/text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-438555152.md): Alibaba's Wan 2.7 Text-to-Video is an advanced model that transforms natural language prompts into high-quality, cinematic clips with crisp details and stable motion. Known for its strong instruction-following capabilities, it is ideal for ads, explainer videos, and social media content. The model supports 720P and 1080P resolutions across flexible aspect ratios. With additional features like audio track synchronization, negative prompt control, and an optional prompt expansion mode, it delivers precise and professional results for diverse creative workflows. - Standard Model API > Video Generation & Processing > text-to-video > wan [wan-2.2/text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-430967131.md): An image-to-video specialized model based on Wan-2.2 architecture, supporting uploading first and last frame images to generate 5-second or 8-second dynamic videos, and supporting multiple resolution outputs. Employs MoE dual-expert system (high-noise expert for structure/layout, low-noise expert for detail refinement), generating natural camera movements and object dynamics while preserving input image subject features, lighting, and composition. Particularly suitable for portrait photo animation, product showcase videos, and creative concept visualization, enabling professional video storytelling from a single image. - Standard Model API > Video Generation & Processing > text-to-video > skyreels [skyreels-v4/text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-430967132.md): SkyReels V4 Text-to-Video is a next-generation AI video engine that generates up to 15-second videos at 1080p resolution. With flexible aspect ratios, AI-powered sound effects, and multiple quality modes, it transforms text descriptions into cinematic visual content for creative shorts, ads, and social media. - Standard Model API > Video Generation & Processing > text-to-video > ltx [ltx-2.3/text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-434990844.md): Lightricks' open-source text-to-video foundation model released March 2026. A new 4x larger text connector significantly improves understanding of complex prompts, with greatly enhanced accuracy in rendering multiple subjects, spatial relationships, and stylistic instructions. The rebuilt VAE delivers sharper details while an upgraded vocoder enables clearer synchronized audio generation. Supports native 1080p in both portrait and landscape formats, with multiple frame rates (24/48fps), outputting complete audio-visual content in 5-20 seconds without post-production dubbing. - Standard Model API > Video Generation & Processing > text-to-video > ltx [ltx-2.3/text-to-video-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-434990845.md): A text-to-video LoRA customization version for professional creators and brands, opening deep personalization capabilities on top of LTX-2.3's powerful text understanding. Supports up to three LoRA adapters working simultaneously to codify specific visual styles, signature characters, or proprietary camera techniques into the generation pipeline. Through the dual-drive mode of "text description + LoRA style," it achieves precise alignment between creative intent and brand visual identity. Ideal for series content production requiring visual consistency, cross-project character operations, and stylized advertising campaigns, enabling text-driven video generation to truly serve brand asset accumulation. - Standard Model API > Video Generation & Processing > text-to-video > pixverse [pixverse-v6/text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-439502923.md): PixVerse V6 text-to-video is the latest text-to-video model from PixVerse, producing high-fidelity cinematic video with accurate motion, lighting, and composition. Supports 360p to 1080p resolution, 1-15s flexible duration, eight aspect ratios, optional audio generation, and a thinking mode for complex scenes. - Standard Model API > Video Generation & Processing > video-edit [wan-2.7/video-edit](https://www.runninghub.ai/runninghub-api-doc-en/api-438555157.md): Alibaba's Wan 2.7 Video Edit is a powerful multi-modal model designed for prompt-driven video modifications. By processing text prompts, a single source video, and up to three reference images, it effortlessly executes instruction-based editing, reference-guided modifications, and video transitions. The model outputs 30fps MP4 videos at either 720P or 1080P resolution, with customizable durations ranging from 2 to 10 seconds. Additionally, users can leverage negative prompts for precise control and choose to retain the original audio track or allow the model to auto-generate it. - Standard Model API > Video Generation & Processing > video-edit [kling-video-o3-pro/video-edit](https://www.runninghub.ai/runninghub-api-doc-en/api-427096757.md): Kling Video O3 Pro Video Edit represents the pinnacle of AI-powered video post-production, offering professional-grade transformations through simple natural language prompts. By moving beyond manual timelines and masks, it allows creators to effortlessly swap objects, alter scenes, and shift styles while preserving original motion and structure. The Pro version supports up to 4 reference images, providing precise visual guidance for complex tasks. With its advanced scene-level understanding and superior temporal coherence, it delivers seamless, high-fidelity results that maintain cinematic integrity across every frame. - Standard Model API > Video Generation & Processing > video-edit [kling-video-o3-std/video-edit](https://www.runninghub.ai/runninghub-api-doc-en/api-427096756.md): Kling Omni Video O3 Video-Edit (Standard) enables precise, natural-language-driven video transformations. It specializes in localized 3-15s edits, allowing users to seamlessly remove/replace objects, swap backgrounds, restyle scenes, and adjust lighting or weather conditions. The model is engineered with strong temporal consistency, ensuring that modifications remain stable across frames. - Standard Model API > Video Generation & Processing > video-edit [kling-video-o1-std/edit-video](https://www.runninghub.ai/runninghub-api-doc-en/api-425766854.md): Kling Omni Video O1 (Video-Edit) is a revolutionary video editing model that enables pixel-level semantic reconstruction through natural language commands. Powered by the MVL system, it accurately interprets creative intent to add, remove, or modify elements within a video. From swapping character attire and removing background distractions to transforming lighting, weather, and camera perspectives, the model ensures context-aware, coherent multi-frame changes. By simplifying complex post-production into intuitive text instructions, Kling Video O1 empowers creators to reshape environments and styles with unprecedented ease and cinematic precision. - Standard Model API > Video Generation & Processing > video-edit [xai/grok-imagine/edit-video-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766855.md): Grok Imagine Video Edit by xAI is an innovative model that transforms existing videos using intuitive text prompts. By uploading a source video and describing the desired changes, users can seamlessly apply new aesthetics—such as anime, cartoon, or cinematic looks. The model excels at maintaining strict temporal consistency across all frames, ensuring smooth, flicker-free results. Offering outputs in 480p or 720p, it empowers creators to seamlessly modify and upgrade their footage using natural language instructions. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Video Generation & Processing > motion-control [kling-v3.0-pro-motion-control](https://www.runninghub.ai/runninghub-api-doc-en/api-427096752.md): The professional motion control version of the Kling V3.0 series, delivering comprehensive upgrades in image quality and motion accuracy over the Std tier. Upload a character image and driving video to make the character accurately replicate dances, gestures, or movement trajectories from the video. Employs the same 3D Spacetime Joint Attention architecture with Chain-of-Thought reasoning, but significantly enhances character detail preservation, motion fluidity, and physical realism. Supports 1080p high-resolution output with precise reproduction of clothing textures, facial expressions, and complex hand gestures. Dual-mode support (Image Mode 10s/Video Mode 30s) combined with audio preservation enables direct generation of fully synchronized audio-visual content. Ideal for cinematic-grade professional production, high-end commercial advertising, and IP character animation. - Standard Model API > Video Generation & Processing > motion-control [kling-v3.0-std-motion-control](https://www.runninghub.ai/runninghub-api-doc-en/api-427096751.md): The motion control base version of Kuaishou's Kling V3.0 series, designed for users needing to transfer motion from reference videos to static images. Upload a character image and driving video to make the character accurately replicate dances, gestures, or movement trajectories from the video. Employs 3D Spacetime Joint Attention mechanism to achieve physically realistic motion transfer while preserving character identity. Supports dual orientation modes—"Image Mode" (up to 10s, maintaining original perspective) and "Video Mode" (up to 30s, following driving video perspective)—providing cost-effective motion generation for social media content, virtual presenters, and creative short videos. - Standard Model API > Video Generation & Processing > motion-control [kling-v2.6-pro-motion-control](https://www.runninghub.ai/runninghub-api-doc-en/api-425766857.md): Kling v2.6 Pro Motion Control is Kuaishou’s advanced motion transfer model designed to animate static references with high-fidelity dynamics extracted from video clips. By capturing intricate postures, limb movements, and gestures from 3 to 30-second source videos, it applies seamless animation to any character while preserving strict identity and temporal consistency. The model offers flexible orientation controls—balancing between the reference image’s aspect ratio or the source video’s framing—and supports optional audio preservation. Enhanced by prompt-guided refinement, it empowers creators to adjust lighting, textures, and atmosphere for production-ready cinematic outputs. - Standard Model API > Video Generation & Processing > motion-control [bytedance/dreamactor-v2](https://www.runninghub.ai/runninghub-api-doc-en/api-428583622.md): As ByteDance's advanced motion transfer model, it effortlessly brings any static image to life. By simply inputting an image and a driving video, the model flawlessly transfers complex body gestures, subtle facial expressions, and precise lip movements to your character. Breaking previous boundaries, it now supports multi-person synchronization, anime illustrations, and even pets. Delivering smooth, timing-accurate animations, it ensures exceptional consistency in character features and background details through an incredibly simple two-input workflow. - Standard Model API > Video Generation & Processing > motion-control [kling-v2.6-std-motion-control](https://www.runninghub.ai/runninghub-api-doc-en/api-425766856.md): Kling V2.6 Standard Motion Control is an efficient motion transfer engine designed to breathe life into static images with professional precision. By mapping the movement path from a source video onto a reference image, it creates seamless animations for dance, action, and character-driven sequences. The model excels in identity preservation, ensuring the subject's appearance remains consistent even during complex maneuvers. Supporting extended durations of up to 30 seconds and optional audio retention, it provides a versatile solution for high-quality, long-form character animation. - Standard Model API > Video Generation & Processing > video-tools [sora-upload-character-official](https://www.runninghub.ai/runninghub-api-doc-en/api-431303963.md): Official stable edition of Sora 2 Characters. Upload a 2-4 second video clip with a character name to extract identity features and generate a reusable character ID. Reference the character in Sora 2 Text-to-Video to maintain consistent appearance across scenes and shots. Ideal for episodic content, brand mascots, and multi-scene storytelling. Official edition with high stability. - Standard Model API > Video Generation & Processing > video-tools [pixverse-v6/extend](https://www.runninghub.ai/runninghub-api-doc-en/api-439502921.md): PixVerse V6 Extend continues an existing video clip with AI-generated footage that seamlessly matches the original motion and style. Describe what happens next to get a natural, motion-consistent extension with optional style control, negative prompting, and synchronized audio. - Standard Model API > Video Generation & Processing > video-tools [pixverse-v6/transition](https://www.runninghub.ai/runninghub-api-doc-en/api-439502922.md): PixVerse V6 Transition creates smooth AI-generated video transitions between a start image and an optional end image. Describe the transformation and get a natural, motion-consistent clip with optional style control, multi-clip mode, and synchronized audio. - Standard Model API > Video Generation & Processing > video-tools [sora-upload-character-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766858.md): Create the corresponding image based on the video. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Video Generation & Processing > video-tools [rh-video-upscaler](https://www.runninghub.ai/runninghub-api-doc-en/api-437131897.md): Ultimate Video Upscaler is the world's most advanced AI video super-resolution model, converting low-resolution videos into crisp 720p, 1080p, 2K, or 4K footage. It delivers exceptional frame-to-frame consistency, minimizes flicker and artifacts, reconstructs fine details like hair and fabric textures, and preserves smooth motion dynamics across fast action and camera pans. - Standard Model API > Video Generation & Processing > video-tools [rh-video-fps-increaser](https://www.runninghub.ai/runninghub-api-doc-en/api-437131898.md): AI Video FPS Increaser doubles your video frame rate through intelligent frame interpolation, eliminating stutters and judder for fluid, natural motion. Works on any footage — action shots, cinematic clips, animations, and screen recordings. Simple one-input workflow with affordable per-second pricing. - Standard Model API > Video Generation & Processing > video-effects [skyreels-v3/video-restyling](https://www.runninghub.ai/runninghub-api-doc-en/api-430967135.md): SkyReels V3 Video Restyling transforms any video into stunning artistic styles including Cyberpunk, Anime, Van Gogh, Lego, Pixel Art, and more. Supporting input videos up to 30 seconds, it preserves original motion while applying breathtaking new visual aesthetics. - Standard Model API > Video Generation & Processing > video-extend [wan-2.7/video-extend](https://www.runninghub.ai/runninghub-api-doc-en/api-438555156.md): Wan 2.7 Video Extend seamlessly continues your existing video clips with high-quality, AI-generated footage. By providing a source video and a text prompt, you can direct the narrative and action of the extended segment. The model supports 720P and 1080P resolutions and outputs MP4 videos up to 15 seconds long. It also features optional audio track input to guide pacing and negative prompts for precise visual control, ensuring a natural and consistent cinematic extension. - Standard Model API > Video Generation & Processing > video-extend [skyreels-v3/single-shot-video-extension](https://www.runninghub.ai/runninghub-api-doc-en/api-430967136.md): SkyReels V3 Single-shot Video Extension naturally continues your video with 5-10 seconds of seamless footage. Guided by text prompts, it maintains strong scene and character continuity, perfect for extending short clips and expanding narrative content. - Standard Model API > Video Generation & Processing > video-extend [skyreels-v3/shot-switching-video-extension](https://www.runninghub.ai/runninghub-api-doc-en/api-430967137.md): SkyReels V3 Shot Switching Video Extension combines video extension with professional camera techniques including Cut In, Cut Out, Shot/Reverse Shot, Multi-Angle, and Cut Away. It brings cinematic narrative rhythm and camera language to creative video production. - Standard Model API > Video Generation & Processing > audio-to-video [kling-lip-sync/identify-face](https://www.runninghub.ai/runninghub-api-doc-en/api-432893796.md): Kling's foundational model designed for facial feature extraction and identity consistency. It performs face detection on videos, returning face data including face ID, face screenshot URL, lip-sync compatible time intervals, and session ID for identity locking in subsequent lip-sync video generation. - Standard Model API > Video Generation & Processing > audio-to-video [kling-lip-sync/lip-sync-video](https://www.runninghub.ai/runninghub-api-doc-en/api-432893795.md): Kling AI's lip-sync video generation model that achieves frame-level synchronization between character lip movements and audio content based on input face recognition result videos and audio. Supports real humans, 3D and 2D animated characters, processing both local audio uploads and online synthesized voiceovers. Employs audio-aligned frame interpolation strategies to ensure accurate lip shape restoration even for phonetically challenging syllables, with generation duration extendable to minute-level. - Standard Model API > 3D Generation & Processing > text-to-3D [hunyuan3d-v3.1/text-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-425766859.md): Tencent Hunyuan Text-to-3D V3.1 is a next-generation production-grade engine designed for high-fidelity 3D asset creation. It features a massive 3.6 billion voxel scale with a 1536³ geometric resolution, ensuring intricate details in every mesh. The standout PartGen 1.5 technology enables automatic semantic segmenting, allowing for functional component separation right out of the box. By decoupling geometry from texture, it delivers professional-grade structural accuracy and material realism, making it a powerful tool for game development and industrial design. - Standard Model API > 3D Generation & Processing > image-to-3D [hitem3d-v15/image-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-426716809.md): Math Magic's general-purpose image-to-3D model supporting single-image reconstruction at multiple resolutions. Offers four tiers: 512³, 1024³, 1536³, and 1536³ Pro for enhanced geometric detail. Provides both geometry-only and all-in-one (geometry+texture) generation modes for gaming, 3D printing, and film production workflows. - Standard Model API > 3D Generation & Processing > image-to-3D [hunyuan3d-v3.1/image-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-425766860.md): Tencent Hunyuan Image-to-3D V3.1 is a professional-grade engine designed to transform 2D images into high-fidelity 3D assets. The defining feature of this version is its support for 8-view synchronous input, which eliminates geometric blind spots and ensures flawless restoration of complex, asymmetrical objects. Powered by a 3.6 billion voxel scale and 1536³ resolution, the model achieves industry-leading consistency between the reference image and the generated mesh. It provides a seamless, production-ready workflow for e-commerce, digital twins, and character design, maintaining exquisite detail from every conceivable angle. - Standard Model API > 3D Generation & Processing > image-to-3D [hitem3d-v2/image-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-426716810.md): Hitem3D v2.0 is an architectural upgrade to v1.5 featuring an improved texture synthesis pipeline. Delivers enhanced geometric fidelity, texture consistency, and material generation. Specifically optimized for full-color 3D printing with better color accuracy and surface quality. Shares the same four resolution tiers as v1.5 while producing superior structural detail and visual realism. - Standard Model API > 3D Generation & Processing > image-to-3D [hitem3d-v15/multi-image-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-426716811.md): Multi-view reconstruction variant based on v1.5 architecture, accepting 2-4 images of the same object from different angles. Fuses multi-perspective information to improve 360-degree geometric consistency and reduce inference uncertainty for occluded areas and back-side structures. Retains the same four resolution tiers and dual generation modes as the single-image version. - Standard Model API > 3D Generation & Processing > image-to-3D [hitem3d-v2/multi-image-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-426716812.md): Multi-view reconstruction variant built on v2.0 architecture, combining improved texture pipeline with multi-perspective inputs. Further enhances geometric fidelity and texture surface consistency over v1.5 multi-image version, with specific optimization for material coherence across complex objects in multi-view scenarios. Supports 2-4 image inputs and full-color 3D printing workflows. - Standard Model API > 3D Generation & Processing > image-to-3D [hitem3d-portrait-v21/image-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-426716813.md): Single-image reconstruction model specifically optimized for portrait generation from Math Magic, upgraded from v2.0 general architecture. Specially trained for facial structures, strand-level hair details, and micro-structures like eyelashes. Capable of reconstructing high-precision human geometry at high resolution tiers. Designed for digital humans, collectible figurines, and avatars requiring high-fidelity facial reproduction. - Standard Model API > 3D Generation & Processing > image-to-3D [hitem3d-portrait-v21/multi-image-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-426716814.md): Multi-view variant of Portrait v2.1, supporting 2-4 portrait photos. Combines multi-perspective inputs with portrait-specialized architecture to improve 360-degree head geometric consistency and facial feature accuracy. Specifically applicable to commissioned figurine production requiring precise likeness reproduction of specific individuals, reducing inference errors through multi-angle inputs. - Standard Model API > 3D Generation & Processing > image-to-3D [hitem3d-portrait-v20/image-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-426716815.md): Second-generation portrait-specific single-image reconstruction model from Math Magic, based on v2.0 general architecture. Optimized for foundational head structure reconstruction and facial proportion accuracy, supporting hair and facial detail generation. As the predecessor to v2.1, it provides reliable portrait generation capabilities for avatar and bust 3D asset creation. - Standard Model API > 3D Generation & Processing > image-to-3D [hitem3d-portrait-v20/multi-image-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-426716816.md): Multi-view variant of Portrait v2.0 from Math Magic, supporting 2-4 portrait photos. Enhances head geometric completeness and facial feature accuracy through multi-perspective information supplementation. Suitable for scenarios requiring more stable facial reconstruction results. Combines v2.0 architecture's multi-view fusion capabilities to provide more reliable geometric foundation for human 3D digitization. - Standard Model API > 3D Generation & Processing > image-to-3D [hitem3d-portrait-v15/image-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-426716817.md): Math Magic's first portrait-specific model, developed on v1.5 general architecture. Specifically optimized for face and bust generation with the same four resolution tiers as the general version. Specially trained for human head structures, capable of generating textured realistic facial models. Applicable to digital humans, sculptures, and virtual avatars. - Standard Model API > 3D Generation & Processing > image-to-3D [hitem3d-portrait-v15/multi-image-to-3d](https://www.runninghub.ai/runninghub-api-doc-en/api-426716818.md): Multi-view variant of Portrait v1.5, supporting 2-4 portrait photos. Improves facial 360-degree consistency and geometric stability through multi-angle information fusion, addressing inference challenges for side and back structures in single-image portrait reconstruction. Suitable for creative scenarios requiring high-consistency head models, with the same resolution tiers as the single-image version. - Standard Model API > Audio Generation & Processing > text-to-audio [minimax/speech-2.8-hd](https://www.runninghub.ai/runninghub-api-doc-en/api-425766861.md): MiniMax Speech 2.8 HD is a premium TTS engine engineered for studio-grade vocal production. It excels in accurately restoring the subtle nuances of real human speech while comprehensively enhancing timbre similarity for an authentic listening experience. Moving beyond standard synthesis, the HD processing ensures richer, cleaner audio suitable for professional broadcasting. With 17+ diverse voice presets, lifelike interjections (like laughs or gasps), and granular control over pitch and bitrate, it delivers a high-fidelity solution where every breath and inflection is rendered with breathtaking precision. - Standard Model API > Audio Generation & Processing > text-to-audio [minimax/music-2.5](https://www.runninghub.ai/runninghub-api-doc-en/api-425766868.md): MiniMax Music 2.5 represents a major leap in AI music synthesis, focusing on high-fidelity output and granular control. It delivers significant enhancements across four pillars: instrumentation, vocal realism, structural precision, and stylized sound design. By leveraging humanized timbre simulation, it achieves a "real voice" texture with natural emotional flow. A standout feature is its structural accuracy, supporting 14+ segment markers (like Build-up, Interlude, and Hook) for complex compositions. With automated sound design filters for genres like Rock or Jazz, it transforms lyrics into studio-quality productions with professional mixing. - Standard Model API > Audio Generation & Processing > text-to-audio [minimax/speech-02-hd](https://www.runninghub.ai/runninghub-api-doc-en/api-425766865.md): MiniMax Speech-02-HD is a premium TTS model characterized by excellent rhythm, stability, and high restoration similarity. Engineered for creators and developers, it delivers studio-grade clarity across multiple languages, including Chinese, English, and Japanese. With advanced emotional nuance capture and real-time streaming capabilities, it ensures a seamless, human-like listening experience. Its ability to handle up to 10,000 characters with professional-grade articulation makes it perfect for long-form content and interactive applications. - Standard Model API > Audio Generation & Processing > text-to-audio [minimax/speech-02-turbo](https://www.runninghub.ai/runninghub-api-doc-en/api-425766866.md): MiniMax Speech-02-Turbo is a high-performance TTS model built for speed and rhythmic precision. It maintains superior stability and rhythm while featuring enhanced multilingual capabilities, providing a seamless experience for global applications. With 17+ preset voices and support for custom voice cloning, it allows for highly personalized and emotionally resonant audio production. Its excellent performance ensures low-latency generation without sacrificing the natural, human-like intonation required for professional-grade content creation. - Standard Model API > Audio Generation & Processing > text-to-audio [minimax/speech-2.6-hd](https://www.runninghub.ai/runninghub-api-doc-en/api-425766862.md): MiniMax Speech 2.6 HD is a professional-grade TTS engine optimized for ultra-low latency and exceptional naturalness. Featuring a major normalization upgrade, it delivers crisp articulation and fluid rhythm across 40+ global languages, including specialized dialects. The model excels in maintaining cross-lingual similarity and accent fidelity, preserving "age" timbre and regional nuances with high precision. Designed for real-time streaming, it ensures seamless audio generation for live meetings and podcasts, creating an immersive, lifelike interactive experience. - Standard Model API > Audio Generation & Processing > text-to-audio [minimax/speech-2.6-turbo](https://www.runninghub.ai/runninghub-api-doc-en/api-425766863.md): MiniMax Speech 2.6 Turbo is a high-performance TTS engine optimized for extreme speed and cost-effectiveness. Ideal for voice chat and digital human applications, it delivers crisp articulation and natural pronunciation across 40+ global languages. The model represents a significant leap in multilingual rhythm and accuracy, preserving regional accents and unique age-based timbres with industry-leading nuance. With its advanced real-time streaming capabilities, it ensures ultra-low latency, making it the premier choice for seamless, high-frequency interactive experiences worldwide. - Standard Model API > Audio Generation & Processing > text-to-audio [minimax/speech-2.8-turbo](https://www.runninghub.ai/runninghub-api-doc-en/api-425766864.md): MiniMax Speech 2.8 Turbo is a cutting-edge TTS engine designed to deliver broadcast-quality, highly expressive audio. Moving beyond traditional synthesis, it integrates 17+ diverse voice presets with sophisticated emotional intelligence, allowing for seamless transitions between moods. The model’s standout feature is its ability to inject lifelike interjections—such as laughs, sighs, and gasps—making AI interactions indistinguishable from human speech. With granular control over audio parameters and a custom pronunciation dictionary, it provides a versatile solution for high-fidelity vocal production. - Standard Model API > Audio Generation & Processing > text-to-audio [minimax/voice-clone](https://www.runninghub.ai/runninghub-api-doc-en/api-425766867.md): MiniMax Voice Clone is a premier synthesis pipeline powered by the advanced Speech-02 and Speech 2.6 HD/Turbo architectures. It transforms a few seconds of reference audio into a highly consistent Voice ID, preserving precise timbre, accents, and nuanced prosody without requiring transcripts. Supporting 40+ languages, it excels in cross-lingual code-switching and emotive storytelling. With the Turbo model delivering sub-250ms latency, it offers a production-ready, low-latency solution for real-time interactive dialogue, gaming, and high-fidelity branded voice experiences. - Standard Model API > Audio Generation & Processing > text-to-audio [kling-lip-sync/tts](https://www.runninghub.ai/runninghub-api-doc-en/api-432893797.md): A text-to-speech generation model from Kling AI, supporting multilingual and multi-dialect synthesis . It generates online voiceovers from text descriptions or replicates specific voices through custom voice features. Supports speech speed adjustment (0.8-2x), multiple emotional style selections, and integrates with the lip-sync model to achieve audio-visual synchronized lip driving. - Standard Model API > Image Generation & Processing > reference-to-image [Vidu-reference-to-video-q2-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-425766869.md): Vidu Q2 Pro is the flagship evolution for high-precision video synthesis. It redefines "Reference-to-Video" by supporting multi-modal inputs, including up to 7 images or 2 video clips as control sources. Featuring its signature "AI Acting" capability, it delivers cinematic 1080p visuals with flawless subject consistency and subtle emotional nuances. It is the definitive tool for professionals seeking complete control over video editing and recurring character consistency. - Standard Model API > Image Generation & Processing > image-to-image > midjourney [midjourney-text-to-image-niji6](https://www.runninghub.ai/runninghub-api-doc-en/api-425766870.md): Niji 6 is a specialized model for anime aesthetics, blending Japanese art styles with illustration techniques. It excels in stylized lighting (Tyndall effect/Cel-shading) and creates expansive, narrative-driven anime scenes. - Standard Model API > Image Generation & Processing > image-to-image > midjourney [midjourney-text-to-image-v61](https://www.runninghub.ai/runninghub-api-doc-en/api-425766871.md): V6.1 refines the V6 architecture, enhancing clarity and speed. Maintaining strong semantic understanding, it significantly reduces noise, delivering exceptional purity and sharpness in macro photography and minimalist designs. - Standard Model API > Image Generation & Processing > image-to-image > midjourney [midjourney-text-to-image-v6](https://www.runninghub.ai/runninghub-api-doc-en/api-425766872.md): V6 is a milestone for "precise prompting," supporting natural language over complex keywords. It accurately embeds text within images and establishes a serious, authentic aesthetic for commercial and realistic art. - Standard Model API > Image Generation & Processing > image-to-image > midjourney [midjourney-text-to-image-v7](https://www.runninghub.ai/runninghub-api-doc-en/api-425766873.md): V7, the 2025 flagship, marks the era of "physical realism." It resolves complex anatomical distortions and introduces a global illumination algorithm. Its imagery rivals 4K cinematography in dynamic range and texture fidelity. - Standard Model API > Image Generation & Processing > image-to-image > seedream [seedream-v4.5/image-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-425766875.md): ByteDance’s premium image editing model, engineered for professional-grade, prompt-driven retouching. Unlike standard AI filters, it maintains high fidelity to the original image—preserving facial identity, poses, lighting, and color palettes with precision. Key highlights include batch processing for up to 10 images to ensure aesthetic consistency and 4K-ready output for ultra-crisp detail. With its exceptional prompt adherence, the model accurately interprets nuanced instructions for clothing, backgrounds, and moods. It is the definitive tool for creators seeking sophisticated edits with minimal artifacts and realistic textures. - Standard Model API > Image Generation & Processing > image-to-image > seedream [seedream-v4/image-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-425766876.md): It excels at swapping outfits, adjusting makeup, and re-materializing products while maintaining perfect subject identity, lighting, and composition. The model delivers exceptional fidelity for human skin tones, fabric textures, and intricate brand logos, making it ideal for high-end e-commerce and influencer workflows. By utilizing structured prompts, Seedream 4.0 ensures production-ready consistency across multiple variants, empowering creative teams to execute rapid A/B testing and marketing campaigns with professional-grade visual accuracy and efficiency. - Standard Model API > Image Generation & Processing > image-to-image > seedream [seedream-v5-lite/image-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-425766874.md): focuses on ultimate visual control and deep image editing. Beyond basic repainting, it introduces powerful multi-image feature fusion and "Reference-to-Sequential Generation." Whether seamlessly reconstructing styles, materials, and elements from multiple inputs, or expanding references into perfectly consistent visual series, this API enables high-degree artistic manipulation and industrial-grade design delivery while preserving the core characteristics of the original references. - Standard Model API > Image Generation & Processing > image-to-image > nano [nano-banana2-gemini31flash/image-to-image-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766878.md): An image-to-image and editing endpoint powered by a highly efficient visual engine. It enables rapid style transfer, inpainting, or background replacement via image and text inputs. Nano Banana 2 maintains the core structure and reference features of the original image while applying extensive modifications, making it ideal for dynamic, interactive design tools. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Image Generation & Processing > image-to-image > nano [nano-banana-pro/edit-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766879.md): Google Nano Banana Pro (Gemini 3.0 Pro Image) Edit supports professional image editing with high-quality 4K-capable ultra-clear output, driven by the advanced Gemini 3.0 Pro Image model to ensure perfect visual effects. It offers an out-of-the-box REST inference API, achieves leading industry performance, has no coldstart latency at all, and provides cost-effective & affordable pricing for all scenarios. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Image Generation & Processing > image-to-image > nano [nano-banana2-gemini31flash/image-to-image-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766880.md): Nano Banana 2 Edit redefines visual manipulation by merging Google’s advanced CV research with intuitive semantic control. Capable of professional 4K outputs, it excels at translating natural language into precise pixel-level modifications. With a unique capacity for 14-image multi-reference compositing, it ensures seamless subject consistency and intelligent localization. From complex re-lighting to sophisticated text translation within visuals, it offers a fast, flexible, and context-aware editing workflow for modern creators. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Image Generation & Processing > image-to-image > nano [nano-banana/edit-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766882.md): Nano-Banana is an advanced all-in-one model for image generation and professional editing, which can produce stunning photorealistic visuals and customized stylized graphics, while achieving ultra-precise inpainting, outpainting and one-click background replacement. It comes with a ready-to-use REST inference API, delivers unrivaled performance, has zero coldstarts, and features highly affordable pricing. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Image Generation & Processing > image-to-image > nano [nano-banana/edit-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-426017751.md): Google Nano-Banana Edit is an advanced AI-powered image editing model that transforms complex visual manipulation into intuitive natural language commands. Built on cutting-edge computer vision, it accurately interprets spatial relationships to execute precise edits—like object replacement or color tuning—while flawlessly preserving the original lighting, texture, and tone. It delivers professional-grade, seamless results for concept art, photography, and everyday design.Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Image Generation & Processing > image-to-image > nano [nano-banana-pro/edit-ultra-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766877.md): Nano Banana Pro Edit (Gemini 3.0 Pro Image) redefines visual transformation by making professional editing as intuitive as natural conversation. Built on Google’s advanced computer vision research, it excels in context-aware modifications, allowing users to reshape scenes while preserving complex object relationships and lighting integrity. With native 4K output, professional camera-style controls, and automated multilingual text rendering, it bridges the gap between raw imagination and production-ready design. It is the definitive tool for maintaining brand consistency across diverse aspect ratios, offering unmatched precision and semantic intelligence in every edit. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Image Generation & Processing > image-to-image > nano [nano-banana-pro/edit-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766881.md): Google Nano Banana Pro (Gemini 3.0 Pro Image) Edit empowers professional-level image editing with ultra-high-resolution output, leveraging the powerful technical support of Gemini 3.0 Pro Image model. It provides a fully ready-to-use REST inference API with top-tier industry performance, zero cold start latency, and highly competitive & affordable pricing that fits diverse business and individual usage scenarios perfectly. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Image Generation & Processing > image-to-image > gpt [gpt-image-1.5/edit-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766883.md): A cost-efficient image editing model powered by OpenAI’s GPT image technology. It enables users to refine, modify, or transform existing images via natural language instructions while preserving the original style, composition, and visual integrity. Features strong visual understanding, targeted edits, multi-image support, and context-aware refinement, delivering professional-quality results at low cost for rapid prototyping and creative workflows. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Image Generation & Processing > image-to-image > grok [grok-image/image-to-image/channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-427096758.md): In the image-to-image mode, Grok 4.2 transforms into a highly controllable visual design engine. By uploading basic line art, composition sketches, or existing photos as visual anchors, users can achieve precise style transfers, targeted inpainting, and texture upgrades via text commands. This workflow strictly preserves the original image's core spatial structure, significantly enhancing design efficiency.Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Image Generation & Processing > image-to-image > qwen [qwen-image/edit-2511-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-430967115.md): The LoRA inference version of Qwen-Image-Edit-2511, supporting custom LoRA adapters for personalized editing. Inherits the base model's 20-billion parameter architecture and character consistency while allowing injection of specific styles, characters, or visual concepts through custom LoRAs. Supports up to three simultaneous modules for style composition and fine control, maintaining bilingual text rendering capabilities. - Standard Model API > Image Generation & Processing > image-to-image > qwen [qwen-image/edit-2511](https://www.runninghub.ai/runninghub-api-doc-en/api-430967116.md): A 20-billion parameter image editing model from Alibaba's Qwen team, built on MMDiT architecture. Version 2511 delivers significant improvements over 2509 in character consistency, multi-subject scene stability, and editing controllability. Supports dual semantic and appearance editing modes with built-in community LoRA capabilities, enabling background replacement, style transfer, and clothing modifications while preserving facial structure and identity. - Standard Model API > Image Generation & Processing > image-to-image > qwen [qwen-image-2.0-pro/image-edit](https://www.runninghub.ai/runninghub-api-doc-en/api-428583625.md): A professional-grade image editing model by Alibaba's Qwen team, offering the highest processing quality in the 2.0 edit family. This version advances beyond the standard tier in complex instruction comprehension and output fidelity, supporting up to 2K resolution for precise editing control in professional image processing and commercial visual production workflows. - Standard Model API > Image Generation & Processing > image-to-image > qwen [qwen-image-2.0/image-edit](https://www.runninghub.ai/runninghub-api-doc-en/api-428583626.md): An intelligent image editing model from Alibaba's Qwen team that allows users to modify uploaded images through text instructions. It features improved understanding of editing commands and enhanced quality preservation during modifications, supporting up to 2K resolution processing for tasks like style adjustments, element additions or removals, and detail refinement based on existing images. - Standard Model API > Image Generation & Processing > image-to-image > z [z-image-turbo/image-to-image-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-430967117.md): The LoRA version of image-to-image generation, launched by Alibaba's Tongyi Lab for style customization needs, supports the simultaneous loading of up to three custom LoRA adapters on top of image transformation. By adjusting the transformation intensity (0.0-1.0), it enables continuous control from subtle enhancements to complete reimagining, allowing LoRA modules to inject specific artistic styles, brand visuals, or character identities. Ideal for creative workflows requiring structural preservation of input images while enabling style transfer, brand content serialization, or character-consistent variant generation. - Standard Model API > Image Generation & Processing > image-to-image > z [z-image-turbo/image-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-430967118.md): The base image-to-image version of Z-Image Turbo from Alibaba's Tongyi Lab, achieving full-spectrum control from quality enhancement to creative reinterpretation through a single strength parameter. Low strength (0.0-0.3) functions as an intelligent enhancer, sharpening details and improving texture without altering content; high strength (0.8-1.0) uses the input as loose inspiration for artistic recreation. 8-step sampling delivers sub-second response with custom output dimensions and bilingual text rendering, providing a minimal yet powerful tool for photographer retouching, designer concept iteration, and rapid prototyping. - Standard Model API > Image Generation & Processing > image-to-image > wan [wan-2.2/image-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-430967119.md): The image-to-image transformation model in the Wan-2.2 series, leveraging 14-billion parameter MoE architecture for high-quality image redrawing and style transfer. Supports precise generation control through text prompts, achieving various creative effects including quality enhancement, style conversion, and element replacement while maintaining input image structural information. Employs dual-expert collaborative mechanism to ensure optimal balance between detail richness and semantic consistency, providing designers with an efficient visual iteration tool. - Standard Model API > Image Generation & Processing > image-to-image > f [f-kontext-dev-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-430967120.md): An open-source image editing model developed by Black Forest Labs specifically for developers, researchers, and advanced users, supporting LoRA adapters. Kontext can handle both text and image inputs. Using rectified flow architecture, it enables precise editing of existing images through natural language instructions while maintaining character and object consistency across multiple rounds without fine-tuning. Supports style transfer, background replacement, and inpainting. - Standard Model API > Image Generation & Processing > image-to-image > f [f-2-dev/edit-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-434990846.md): The LoRA customized model of FLUX.2 Dev editing version, combining 32-billion parameter high-precision editing capabilities with lightweight adapter flexibility. Supports injection of specific styles or brand visuals through LoRA modules, achieving personalized editing while maintaining multi-reference consistency and 4MP high resolution. Ideal for professional teams needing to batch process product image stylization, maintain character cross-project consistency, and conduct seasonal marketing campaign asset updates, providing "high-precision + high-efficiency + personalization" integrated editing solutions. - Standard Model API > Image Generation & Processing > image-to-image > f [f-2-dev/edit](https://www.runninghub.ai/runninghub-api-doc-en/api-434990847.md): The image editing specialized version of FLUX.2 Dev, based on 32-billion parameter architecture for high-precision prompt-driven editing. Supports single and multi-reference editing workflows, precisely executing clothing changes, color adjustments, pose tweaks, and element replacement while preserving character core identity, product geometry, and material textures. 4MP resolution output combined with professional-grade control capabilities provides production-level solutions for game LiveOps, e-commerce product variants, and marketing asset iteration. - Standard Model API > Image Generation & Processing > image-to-image > f [f-2-klein-9b/edit](https://www.runninghub.ai/runninghub-api-doc-en/api-434990848.md): The 9-billion parameter flagship image editing model of the FLUX.2 Klein family. Significant improvement in detail richness and editing precision compared to the 4B version, supporting more complex multi-reference blending and advanced semantic editing. 4-step distillation maintains sub-second inference, ideal for professional design workflows requiring extreme image quality, cinematic concept art, and high-end commercial advertising production. - Standard Model API > Image Generation & Processing > image-to-image > f [f-2-klein-4b/edit](https://www.runninghub.ai/runninghub-api-doc-en/api-434990849.md): The base image editing version of FLUX.2 Klein 4B, with unified architecture supporting both text-to-image and image-to-image editing tasks. Precise editing control through text prompts, achieving style transfer, element replacement, and detail enhancement while preserving original subject features, lighting, and composition. 4-step distillation delivers sub-second response, providing designers with efficient visual iteration tools. - Standard Model API > Image Generation & Processing > image-to-image > f [f-2-klein-4b/edit-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-434990850.md): An image editing LoRA version based on FLUX.2 Klein 4B, designed for users needing to preserve original image structure while applying style transformations. Supports both single-reference and multi-reference editing workflows, injecting specific artistic styles or brand visuals through LoRA adapters, completing precise edits in sub-second speeds. Ideal for e-commerce product image batch stylization, brand asset rapid iteration, and creative concept exploration, achieving flexible "original structure + custom style" combinations. - Standard Model API > Image Generation & Processing > text-to-image > midjourney [midjourney-text-to-image-niji7](https://www.runninghub.ai/runninghub-api-doc-en/api-425766884.md): Built on V7, Niji 7 elevates anime creation to "theatrical movie" standards. It supports complex perspective and dynamic composition. Its breakthrough lies in capturing motion, making stills feel like frames from high-end feature films. - Standard Model API > Image Generation & Processing > text-to-image > nano [nano-banana-pro/text-to-image-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766887.md): Google's Nano Banana Pro (Gemini 3.0 Pro Image) is an industry-leading cutting-edge text-to-image model, which can generate high-definition 4K images with excellent quality and has been fully optimized for mobile phone devices to ensure smooth operation. It is equipped with a ready-to-use REST inference API, has the best performance, zero coldstarts, and extremely affordable pricing. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Image Generation & Processing > text-to-image > nano [nano-banana2-gemini31flash/text-to-image-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766889.md): Nano Banana 2 (Gemini 3.1 Flash Image) is Google’s high-performance generative model, blending lightning-fast speeds with professional 4K visual fidelity. Engineered for creators, it excels in complex text rendering, cinematic lighting, and multi-character consistency. Supporting diverse aspect ratios and real-world knowledge integration, it transforms simple prompts into breathtaking, photorealistic masterpieces in seconds—offering the perfect synergy of agility and artistic precision. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Image Generation & Processing > text-to-image > nano [nano-banana/text-to-image-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766890.md): Google Nano Banana is a state-of-the-art text-to-image model that can intelligently generate high-quality images flexibly based on diverse natural language prompts, covering multiple visual styles and scene requirements. It provides a fully ready-to-use REST inference API, guarantees the best performance, completely avoids cold start delays, and adopts a highly affordable pricing strategy for all users. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Image Generation & Processing > text-to-image > nano [nano-banana-pro/text-to-image-ultra-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766885.md): Google’s lightweight powerhouse designed for near-instant, native 4K visual synthesis. It redefines creative workflows by merging high-fidelity realism with context-aware natural language editing. Standout features include intelligent multilingual text rendering with auto-translation and professional camera-style controls over focus and depth of field. With robust character consistency and extreme aspect ratio flexibility, it provides a production-ready engine for marketing, social media, and high-end design, delivering unmatched clarity and stylistic depth. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Image Generation & Processing > text-to-image > nano [nano-banana2-gemini31flash/text-to-image-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766886.md): A lightweight text-to-image endpoint engineered for high-concurrency and rapid response. As the core of Nano Banana 2, it balances visual fidelity with high throughput, instantly transforming text prompts into high-quality assets. Ideal for modern API platforms requiring real-time previews, rapid iteration, or large-scale batch generation. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Image Generation & Processing > text-to-image > nano [nano-banana-pro/text-to-image-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766888.md): Google's Nano Banana Pro (Gemini 3.0 Pro Image) is a cutting-edge text-to-image generation model, boasting superior high-res image rendering capabilities and undergoing in-depth optimization for mobile phone terminals. It delivers a ready-to-use REST inference API, ensures the best running performance in the market, eliminates cold start issues thoroughly, and offers cost-effective & affordable pricing for global users. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Image Generation & Processing > text-to-image > nano [nano-banana/text-to-image-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-426017750.md): Google Nano Banana Text-to-Image is a lightweight yet powerful AI model designed for creators needing rapid, high-quality visuals. In seconds, it transforms simple text prompts into expressive, realistic images with remarkable clarity and composition. Supporting diverse styles from photorealism to illustration, it accurately interprets subject-background relationships and produces clean, balanced lighting. Optimized for speed and low compute costs, it is ideal for rapid prototyping and generating social content efficiently without requiring design skills.Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Image Generation & Processing > text-to-image > gpt [gpt-image-1.5/text-to-image-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766892.md): A multimodal text-to-image generation model balancing low latency with cost efficiency. Features strong prompt comprehension for rapid high-fidelity image generation, suitable for UI design, concept art, product mockups, and creative visualization workflows. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Image Generation & Processing > text-to-image > gpt [gpt-image-1.5/text-to-image-channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-425766891.md): A cost-efficient multimodal text-to-image generation model powered by OpenAI’s GPT image technology. It combines strong prompt understanding with optimized image synthesis to produce high-quality visuals from natural language. Ideal for UI design, concept art, product mockups, and creative visualization, offering low latency and scalable cost-effectiveness for rapid iteration and production workflows. Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Image Generation & Processing > text-to-image > gpt [gpt-image-1.5/image-to-image-official-stable](https://www.runninghub.ai/runninghub-api-doc-en/api-425766893.md): A cost-efficient image editing model enabling modifications to existing images through text instructions. Interprets complex descriptions to perform adjustments from detailed refinements to complete style transformations, while preserving original lighting, color tones, and structural composition. Supports multiple image inputs for reference comparison. Official-stable: stable and highly efficient, with pricing lower than purchasing directly from the official model provider. - Standard Model API > Image Generation & Processing > text-to-image > seedream [seedream-v4.5/text-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-425766895.md): ByteDance’s premier high-resolution image generation model, refined through advanced architecture and large-scale training. It sets a new standard for typography and poster composition, rendering sharp, legible text for professional marketing layouts. The model excels in designer-level spatial hierarchy, ensuring clear placement of titles, logos, and body text. With superior prompt adherence and customizable output up to 4K resolution, Seedream 4.5 delivers exceptional aesthetic quality. It is an ideal solution for creators requiring high-fidelity branded visuals and complex, text-heavy creative designs. - Standard Model API > Image Generation & Processing > text-to-image > seedream [seedream-v5-lite/text-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-425766896.md): a next-generation intelligent visual creation engine. It supports rapid, high-precision single image generation from pure text and features breakthrough sequential image creation. Empowered by Chain-of-Thought (CoT) reasoning and real-time web search (RAG), the API understands complex long-text contexts to produce unified, logically coherent series of illustrations or time-sensitive posters with real-time data, offering developers a highly scalable automated content solution. - Standard Model API > Image Generation & Processing > text-to-image > seedream [seedream-v4/text-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-425766894.md): ByteDance’s layout-focused text-to-image model, generates structured grids, triptychs, and comic layouts with whitespace for text/CTAs. Ensures consistent aesthetics across multi-frame series and delivers 4K outputs (4096×4096) for marketing assets. - Standard Model API > Image Generation & Processing > text-to-image > grok [grok-image/text-to-image/channel-low-price](https://www.runninghub.ai/runninghub-api-doc-en/api-427096759.md): Grok 4.2's text-to-image mode empowers creators to build magnificent visual worlds entirely from scratch. By simply inputting natural language descriptions, the model accurately parses semantics to generate images with ultra-high definition, rich details, and perfect lighting. From complex text typography and photorealistic shots to imaginative fantasy scenes, a single prompt instantly turns creative concepts into vivid reality.Channel-low-price: priced significantly lower than the Official Stable Edition, but stability is not guaranteed. - Standard Model API > Image Generation & Processing > text-to-image > qwen [qwen-image/text-to-image-2512](https://www.runninghub.ai/runninghub-api-doc-en/api-430967121.md): Qwen Image 2512 is Alibaba's advanced text-to-image model, distinguished by its exceptional text rendering capabilities. It accurately generates legible text across multiple languages and layouts, making it the perfect tool for creating posters, logos, and typography-heavy designs. The model excels at comprehending complex prompts, effortlessly managing intricate spatial relationships. Offering custom aspect ratios and consistent, high-quality outputs across diverse artistic styles, it provides creators with unparalleled flexibility and professional-grade results. - Standard Model API > Image Generation & Processing > text-to-image > qwen [qwen-image/text-to-image-2512-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-430967122.md): The LoRA-customized version of Qwen-Image-2512, supporting custom adapters for personalized style injection. Maintains the base model's realism and text capabilities while allowing lightweight LoRA modules to codify specific artistic styles, brand visuals, or character likenesses. Ideal for brand teams requiring cross-project visual consistency, IP content developers, and style explorers, perfectly combining high-quality generation with personalization needs. - Standard Model API > Image Generation & Processing > text-to-image > qwen [qwen-image-2.0/text-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-428583624.md): A high-efficiency text-to-image model by Alibaba's Qwen team, delivering a significant speed boost while maintaining image quality. Supports complex Chinese/English text rendering and diverse art styles, with output up to 2K resolution (2048x2048) and 1-6 batch generation. Achieves an optimal balance between quality and performance, ideal for rapid creative iteration and content production. - Standard Model API > Image Generation & Processing > text-to-image > qwen [qwen-image-2.0-pro/text-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-428583623.md): A professional-grade text-to-image model by Alibaba's Qwen team, generating high-quality images from text descriptions. Excels at text rendering, realistic textures, and semantic instruction following. Supports complex Chinese/English text rendering, multi-line layouts, and paragraph-level text generation. Outputs up to 2K resolution (2048x2048) with 1-6 batch generation, ideal for poster design, commercial visual production, and premium content creation. - Standard Model API > Image Generation & Processing > text-to-image > z [z-image/turbo-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-430967123.md): The LoRA inference version of Z-Image Turbo for text-to-image generation from Alibaba's Tongyi Lab, supporting custom LoRA adapters for personalized visual creation. Maintains sub-second generation speed while injecting specific styles, characters, or brand aesthetics through lightweight LoRA modules (18-150MB) without modifying the 6-billion parameter base model. Ideal for creators and commercial teams needing rapid iteration of specific visual concepts, cross-scene character consistency, and brand-exclusive generation workflows. - Standard Model API > Image Generation & Processing > text-to-image > z [z-image/turbo](https://www.runninghub.ai/runninghub-api-doc-en/api-430967124.md): Alibaba's Tongyi Lab's ultra-fast text-to-image foundation model, delivering sub-second generation with just 6 billion parameters. Uses innovative S3-DiT single-stream architecture requiring only 8 sampling steps to produce photorealistic quality comparable to much larger models. Specializes in solving bilingual text rendering challenges with superior accuracy for embedded Chinese and English text. Runs smoothly on 16GB VRAM, enabling high-frequency content production, real-time interactive applications, and scalable commercial deployment with extreme cost-effectiveness. - Standard Model API > Image Generation & Processing > text-to-image > wan [wan-2.2/text-to-image-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-430967125.md): The text-to-image LoRA customized version in the Wan-2.2 ecosystem, supporting personalized image generation through custom adapters. Building on efficient MoE architecture inference, allows users to inject specific artistic styles, brand visual languages, or proprietary character likenesses, achieving precise combination of text descriptions with personalized aesthetics. Supports multiple LoRA module stacking, providing flexible customization solutions for creative teams needing rapid visual concept iteration and cross-project style consistency. - Standard Model API > Image Generation & Processing > text-to-image > wan [wan-2.7/text-to-image-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-438555143.md): Wan 2.7 Text-to-Image Pro generates images up to 4K resolution with built-in thinking mode for superior prompt understanding. Ideal for print-ready assets, large-format displays, fashion lookbooks, and production-grade creative work requiring maximum detail and fidelity. - Standard Model API > Image Generation & Processing > text-to-image > wan [wan-2.7/text-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-438555144.md): Wan 2.7 Text-to-Image generates high-quality, richly detailed images from text descriptions with built-in thinking mode for enhanced prompt understanding and coherent compositions. Supports custom size output and multiple aspect ratios for social media, marketing, concept art, and creative exploration. - Standard Model API > Image Generation & Processing > text-to-image > f [f-krea-dev-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-430967126.md): A collaboration between Black Forest Labs and Krea AI, fine-tuned for aesthetic quality on the FLUX.1-dev architecture. Eliminates common AI artifacts like plastic skin textures and oversaturated colors, delivering natural film-like lighting and authentic details. Maintains full LoRA ecosystem compatibility with FLUX.1-dev adapters while offering a distinctive aesthetic, suited for commercial creative projects requiring photorealism and cinematic visuals. - Standard Model API > Image Generation & Processing > text-to-image > f [f-dev-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-430967127.md): FLUX.1 [dev] with integrated LoRA (Low-Rank Adaptation) support, enabling personalized generation via pre-trained adapters without retraining all 12 billion parameters. Supports multiple LoRA weight blending for rapid style switching. - Standard Model API > Image Generation & Processing > text-to-image > f [f-2-dev/text-to-image-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-434990851.md): The text-to-image LoRA customized version of FLUX.2 Dev, supporting deep personalization on top of 32-billion parameter extreme quality. Codifies specific artistic styles, brand visuals, or character likenesses through LoRA adapters, combined with multi-reference consistency preservation, achieving perfect fusion of "high-fidelity generation + personalized style". Ideal for creative teams and commercial brands needing to maintain brand asset consistency, conduct character cross-scene operations, and produce high-end customized content. - Standard Model API > Image Generation & Processing > text-to-image > f [f-2-dev/text-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-434990852.md): Black Forest Labs' latest 32-billion parameter open-source text-to-image model. flux-2-dev is the 32B open-weight version based on the FLUX.2 base model, currently the strongest open-source image generation and editing model available. It can perform both text-to-image generation and multi-input image editing tasks within a single checkpoint. The model can generate, edit, and composite images based on text instructions, offering excellent cost-performance advantages. - Standard Model API > Image Generation & Processing > text-to-image > f [f-2-klein-9b/text-to-image-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-434990853.md): The text-to-image LoRA customized version of FLUX.2 Klein 9B, opening deep personalization capabilities on top of 9-billion parameter high-quality generation. Supports injection of specific artistic styles, brand visual languages, or proprietary character likenesses through LoRA adapters, achieving precise alignment between creative intent and visual aesthetics. Sub-second generation speed combined with multi-reference input provides flexible customization solutions for professional teams requiring cross-project style consistency, IP content developers, and high-end commercial brands. - Standard Model API > Image Generation & Processing > text-to-image > f [f-2-klein-9b/text-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-434990854.md): A 9-billion parameter ultra-fast text-to-image model from Black Forest Labs. As a derivative series of the professional-grade FLUX.2, Klein maintains near-top-tier generation quality through architectural optimization and distillation while significantly reducing hardware requirements and inference latency. Applications cover real-time creative design, social media content generation, rapid UI/UX prototyping, game art previews, educational visualization, and other fields—particularly suitable for interactive applications requiring low latency and cost-friendly hardware. - Standard Model API > Image Generation & Processing > text-to-image > f [f-2-klein-4b/text-to-image](https://www.runninghub.ai/runninghub-api-doc-en/api-434990855.md): The ultra-fast text-to-image base model of the FLUX.2 Klein family, featuring a 4-billion parameter flow transformer architecture with 4-step distillation for sub-second inference. As a derivative series of the professional-grade FLUX.2, Klein maintains near-top-tier generation quality through architectural optimization and distillation while significantly reducing hardware requirements and inference latency. Generates high-quality images quickly from natural language descriptions with excellent cost-performance ratio. - Standard Model API > Image Generation & Processing > text-to-image > f [f-2-klein-4b/text-to-image-lora](https://www.runninghub.ai/runninghub-api-doc-en/api-434990856.md): A 4-billion parameter ultra-fast text-to-image LoRA customization version from Black Forest Labs. As a derivative series of the professional-grade FLUX.2, Klein maintains near-top-tier generation quality through architectural optimization and distillation while significantly reducing hardware requirements and inference latency. This model supports loading custom LoRA adapters for personalized style injection. Open-source license allows commercial use, requiring only 13GB VRAM to run smoothly. Ideal for creators and small-to-medium teams needing rapid iteration of specific visual styles and brand consistency. - Standard Model API > Image Generation & Processing > text-to-image > f [f-dev](https://www.runninghub.ai/runninghub-api-doc-en/api-435844478.md): A 12-billion parameter text-to-image model by Black Forest Labs using rectified flow transformer architecture. Distilled from FLUX.1 [pro] via guidance distillation, it achieves near-flagship quality with improved efficiency. Supports text-to-image, image-to-image, and inpainting modes up to 1536×1536 resolution. - Standard Model API > Image Generation & Processing > image-tools [wan-2.7/image-edit-pro](https://www.runninghub.ai/runninghub-api-doc-en/api-438555145.md): Wan 2.7 Image Edit Pro delivers professional-grade prompt-driven image editing with up to 2K output resolution and 1-9 reference image support. Designed for product retouching, high-resolution background swaps, detailed style transfers, and production workflows requiring maximum fidelity. - Standard Model API > Image Generation & Processing > image-tools [wan-2.7/image-edit](https://www.runninghub.ai/runninghub-api-doc-en/api-438555146.md): Wan 2.7 Image Edit is a prompt-driven image editing model supporting 1-9 reference images. Describe edits in natural language while preserving composition, subject identity, and structure. Ideal for fashion editing, background replacement, product retouching, and creative iteration. - Standard Model API > Other [pixverse-v5.6/text-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-432893798.md): PixVerse V5.6 text-to-video model delivers upgraded visual fidelity, smoother motion, and sharper detail over V5.5. Supports 360p-1080p resolution, 5/8/10s duration, built-in AI Thinking for complex prompts, and optional audio co-generation—ideal for social content, ads, and creative visualization. - Standard Model API > Other [pixverse-v5.6/image-to-video](https://www.runninghub.ai/runninghub-api-doc-en/api-432893799.md): PixVerse V5.6 image-to-video model transforms a single image into cinematic video clips with upgraded subject fidelity, sharper detail, and smoother motion over V5.5. Supports 360p-1080p, 5/8/10s duration, AI Thinking optimization, and audio co-generation—ideal for character animation, product showcases, and social content. - Task Query & webhook [Check Task Status](https://www.runninghub.ai/runninghub-api-doc-en/api-425761033.md): - Task Query & webhook [Check Task Output](https://www.runninghub.ai/runninghub-api-doc-en/api-425761034.md): - Task Query & webhook [Get Webhook Event Details](https://www.runninghub.ai/runninghub-api-doc-en/api-425761035.md): This API is designed to assist in debugging user webhooks. It allows you to query the detailed status of the current webhook event using a taskId. - Task Query & webhook [Resend Specific Webhook Event](https://www.runninghub.ai/runninghub-api-doc-en/api-425761036.md): The webhookId refers to the id field returned by the Get Webhook Event Details API. - Task Query & webhook [Query generation result (V2)](https://www.runninghub.ai/runninghub-api-doc-en/api-425767807.md): - ComfyUI Workflows [Start ComfyUI Task 1 - Basic](https://www.runninghub.ai/runninghub-api-doc-en/api-425761092.md): > This method of running a workflow is equivalent to pressing the "Run" button without altering any of the original parameters. - ComfyUI Workflows [Start ComfyUI Task 2 - Advanced](https://www.runninghub.ai/runninghub-api-doc-en/api-425761093.md): # Start ComfyUI Task (Advanced) - ComfyUI Workflows [Get Workflow JSON](https://www.runninghub.ai/runninghub-api-doc-en/api-425761094.md): - ComfyUI Workflows [Cancel ComfyUI Task](https://www.runninghub.ai/runninghub-api-doc-en/api-425761095.md): - AI App [Start AI App Task](https://www.runninghub.ai/runninghub-api-doc-en/api-425761096.md): Supports specifying a `webhookUrl`. You can view a sample `nodeInfoList` on the AI App details page. - AI App [Get API call examples for AI application](https://www.runninghub.ai/runninghub-api-doc-en/api-425761097.md): We provide a demo of AI application interface request invocation. You can refer to the demo to quickly initiate interface invocation. - Resource Upload [文件上传(新)](https://www.runninghub.ai/runninghub-api-doc-en/api-425761098.md): - Resource Upload [Upload Resource(image\video\audio\Compressed Files)](https://www.runninghub.ai/runninghub-api-doc-en/api-425761099.md): # RunningHub Resource Upload Guide (Images, Audio, Video, Compressed Files) - Resource Upload [Upload Lora](https://www.runninghub.ai/runninghub-api-doc-en/api-425761100.md): ### ⚠️ Important Notice - [Get Account Information](https://www.runninghub.ai/runninghub-api-doc-en/api-425761030.md): ## Schemas - [Get Workflow JSON Request](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340065.md): - [TaskRunWebappByKeyRequest](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340066.md): - [Generate task submission results](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340067.md): - [Get Workflow JSON Response](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340068.md): - [Start ComfyUI Task Request 1](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340069.md): - [Start ComfyUI Task Request 2](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340070.md): - [Start ComfyUI Task Request -webhook](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340071.md): - [Start ComfyUI Task Response](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340072.md): - [TaskCreateResponse](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340073.md): - [Check Task Status Request](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340074.md): - [Node Input Information](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340075.md): - [Get Account Information Request](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340076.md): - [Upload Resource Request](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340077.md): - [Get Webhook Event Details Request](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340078.md): - [Resend Specific Webhook Request](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340079.md): - [R?](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340080.md): - [RWorkflowDuplicateResponse](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340081.md): - [RAccountStatusResponse](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340082.md): - [WorkflowDuplicateResponse](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340083.md): - [AccountStatusResponse](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340084.md): - [WorkflowDuplicateRequest](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340085.md): - [ApiUploadLoraRequest](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340086.md): - [RString](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340087.md): - [RTaskUploadResponse](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340088.md): - [TaskUploadResponse](https://www.runninghub.ai/runninghub-api-doc-en/schema-252340089.md):