Alibaba Qwen has launched the full-modal large model Qwen3.5-Omni, featuring three sizes of Instruct versions, including Plus, Flash, and Light. The model supports a 256k long context, over 10 hours of audio input, and more than 400 seconds of 720P (1FPS) audio-video input. It is pre-trained natively as a full-modal model on massive amounts of text, visual data, and over 100 million hours of audio-video data, demonstrating outstanding full-modal perception and generation capabilities. Compared to Qwen3-Omni, Qwen3.5-Omni shows significantly enhanced multilingual capabilities, supporting speech recognition in 113 languages and dialects, and speech generation in 36 languages and dialects. It also introduces new real-time interaction features such as semantic interruption, voice cloning, and voice control, making the conversational experience closer to that of a real human.