Qwen3.5-Omni: all-modal audio, video and vibe coding
Qwen3.5-Omni is presented as an all-modal model for text, image, audio, video, speech and real-time interaction. It highlights 215 reported SOTA tasks, 113-language speech recognition, 36-language speech generation, long audio/video understanding and audio-video vibe coding. This page reads it as a workflow-expansion release: voice, camera and video become direct inputs for code, content operations and enterprise assistants.