04.04.2026 03:37
**Internet Sources**
*Luisa Crawford | April 03, 2026*
---
Alibaba's cutting-edge Wan 2.7 AI video generation suite has made its debut on Together AI's cloud platform, with text-to-video functionality now live at a striking $0.10 per second of footage. This deployment represents the first significant cloud release since Alibaba unveiled the four-model suite in late March, signaling a major expansion of accessible AI video tools for creators and production teams worldwide.
The text-to-video model, accessible through the endpoint Wan-AI/wan2.7-t2v, delivers impressive flexibility, supporting both 720p and 1080p resolutions while generating clips ranging from 2 to 15 seconds in length. Notably, the system accepts audio input to drive generation—a capability that sets it apart from more basic offerings. Perhaps most significantly, multi-shot narrative control operates directly through prompt language, enabling filmmakers and content creators to maintain cohesive storytelling without resorting to fragmented workflows that plague many current AI video platforms.
At present, only the text-to-video functionality is available, though Together AI has confirmed that image-to-video and reference-to-video capabilities are scheduled for release in the near future. The upcoming image-to-video model will support first-frame, first-and-last-frame, and continuation generation workflows—features particularly valuable for storyboarding professionals. A 3×3 grid-to-video feature aims to serve teams constructing structured content from static assets. The reference-to-video model promises to be especially valuable for production work, accepting both reference images and videos as inputs while handling multi-character interactions and complex scene composition at resolutions up to 1080p for clips lasting up to 10 seconds.
The fourth model in the suite, Video Edit, tackles what many consider the most persistent challenge in AI video production: the inability to revise content without starting entirely from scratch. Together AI's implementation will support instruction-based editing through text prompts, reference image-based modifications, style transfer, and temporal feature cloning—allowing motion, camera work, and effects to be extracted from source media and applied to new creations. For creative teams, consolidating these capabilities within a single API surface eliminates the coordination headaches that currently plague AI video production pipelines, where generating in one tool, editing in another, and manually patching results has become the industry norm.
