Well-known Chinese and global AI models and applications by category
This SmarToken catalog tracks well-known global and Chinese AI models by where the teams are based, whether the release is proprietary or open, and application categories such as general LLMs, reasoning, image generation, video, music, audio and world-generation models. It was last updated on June 1, 2026.
Key takeaways
01This SmarToken catalog tracks well-known AI models and applications as of June 1, 2026.
02We separate models from outside China and China-based teams, proprietary and open releases, and application categories such as general models, reasoning, image, video, music, audio and world-generation models.
03The gap between Chinese and overseas models keeps narrowing, although overseas teams still hold an overall lead, especially in general-purpose models.
Well-known Chinese and global AI models and applications by category video guide. A short SmarToken video for AI Model Applications Catalog: China and Global Models 2026, focused on model evaluation, tradeoffs and the current discussion.
Weekly update: June 1 to June 5, 2026
This week's update adds Step's open reasoning model Step-3.7-Flash and MiniMax's planned-open-release agent model MiniMax M3.
Agent models, including models optimized for OpenClaw, are still an emerging category and are temporarily grouped with all-purpose multimodal or general models. Hybrid reasoning models are now mainstream, but many model families still release their general and thinking versions separately, so this catalog lists them as separate general and reasoning models. OCR models and 3D models are not yet treated as full categories in this version.
A SmarToken editorial graphic based on the LMSYS Arena snapshot discussed in the June 2026 update. It keeps the ranking signal without reusing a third-party screenshot.
Step: the China-based open model group updated the Step-3.7-Flash reasoning model.
MiniMax: the China-based open model group updated the planned-open-release MiniMax M3 agent model.
Agent, OCR and 3D model categories are still handled cautiously in this catalog.
Catalog note
How we classify it
Reader caveat
Agent models are still emerging.
They are temporarily grouped with all-purpose multimodal or general models.
The category may change as agent models mature.
Hybrid reasoning models dominate the market.
General and thinking versions are still listed separately when releases differ.
Compare the exact variant, not only the model family.
OCR and 3D models are not yet included.
They are not considered mature enough as standalone categories in this snapshot.
Future versions may add them.
Models outside China: overall lead remains, especially in general models
After two years of rapid development, the gap between Chinese AI models and models from outside China is shrinking, but overseas teams still keep an overall lead, especially in general-purpose models.
Google, OpenAI and Anthropic not only alternate in model performance leadership, but also still shape the main direction of the industry. There is also a practical warning for China-based companies: many overseas models require network workarounds to access from China, and their freer output style may create compliance risks in Chinese enterprise use.
Google, OpenAI and Anthropic are the most important overseas trend-setters.
Overseas models are often harder to access from China.
Chinese enterprise use may need extra compliance review because output policies differ.
Overseas proprietary group: general and reasoning models
The overseas proprietary section lists the main general and reasoning models from OpenAI, Google DeepMind, Anthropic, xAI and Mistral.
The table below keeps a catalog style. It is a dated market snapshot, so benchmark claims and release details need checking against current vendor pages before production use.
GPT-5.5 shifts from chat interaction toward autonomous agents.
Gemini 3.5 Flash focuses on fast iterative coding, multimodal understanding and long workflows.
Claude Opus 4.8 is strong in long-cycle, high-risk enterprise workflows.
Model or family
Category
Summary
GPT-5.5 / GPT-5.5 Instant
General
OpenAI's April 2026 general model shifts toward autonomous agents, native multimodality, 400K Codex context and stronger planning; Instant later became the default ChatGPT model with lower hallucination and more natural language.
Gemini 3.5 Flash
General
Google DeepMind's May 2026 model supports long-context work and focuses on fast iterative coding, advanced multimodal understanding, long workflows and multi-step tool use.
Claude Opus 4.8 / Sonnet 4.6 / Haiku 4.5
General
Anthropic's line is especially strong for coding, agent work, multimodal reasoning and enterprise workflows; Chinese users should also consider policy and account-risk issues before relying on it.
Grok 4.1
General
xAI's model follows a closed-latest, open-older-version strategy. The analysis says 4.1 improved emotional intelligence, writing human-likeness and hallucination rate.
Mistral 3 Medium / Mistral Large 2
General
Mistral is treated as Europe's main AI lab, currently less active in frontier general-model competition and more focused on smaller models and niche innovations.
GPT-5.5 Thinking / GPT-5.5 Pro
Reasoning
Thinking is positioned for fast logical breakdowns and concise answers; Pro is the high-precision version for difficult tasks such as FrontierMath and Expert-SWE.
Gemini 3.1 Pro / Gemini 3 Deep Think
Reasoning
Major improvements include code-based animation, complex system synthesis, interactive design and difficult multimodal reasoning.
Claude Opus 4.8 Thinking
Reasoning
The key feature is Effort Control, allowing users to adjust thinking depth across low, high, extra and maximum settings.
Grok 4.1 Thinking / Magistral Medium v1.2
Reasoning
Grok is xAI's thinking model; Magistral is a fast second-tier reasoning option with multimodal support.
Overseas proprietary group: image, video, music, audio and world-generation models
This catalog gives separate categories to image, video, music, audio and world-generation models because applications are now part of the model landscape. Here, world-generation models means systems that create or simulate interactive 3D environments.
This section follows the same catalog style: model name first, category second, and a short practical description. Some products are listed as applications rather than base models.
Google, OpenAI, Midjourney and Black Forest Labs appear in the image category.
Google Veo/Omni, Runway, Pika, Luma, Stable Video Diffusion and Midjourney appear in the video category.
World-generation models are treated as a new category, led by Genie 3 and World Labs products.
Google DeepMind's image line is stronger in text rendering, multilingual localization, complex instructions, character consistency and overall generation quality.
GPT-Image 2 / DALL·E 3
Image
OpenAI's image model is a GPT-native visual system with thinking mode, internet-aware generation, self-checking and improved commercial production output.
Midjourney v7 / Flux 2
Image
Midjourney remains an older image pioneer with slow updates; Flux 2 combines high-quality generation and editing in one architecture.
Veo 3.1 / Google Omni
Video
Veo improves audio, instruction following and realism; Google Omni combines Gemini reasoning with video editing and generation grounded in physical consistency.
Runway Gen-4.5 / Pika 2.5 / Luma AI / Stable Video Diffusion / Midjourney video
Video
These are major video-generation products, with limitations such as object disappearance, causal errors or local hardware requirements.
Lyrics 3 / Suno 5.0
Music
Lyrics 3 supports text, image and video inputs and automatic lyric generation; Suno 5.0 improves studio-level quality, track separation and style control.
Stable Audio / MuseNet / V2A
Audio
These are audio-generation systems. V2A generates background audio for video from input footage and text prompts.
Genie 3 / World Labs Marble / World Labs RTFM
World-generation model
Genie 3 renders interactive virtual worlds in real time; World Labs focuses on persistent 3D environments and real-time generative worlds with spatial memory.
Overseas open model group
The overseas open model section is shorter. It lists Mistral, Gemma, Phi, gpt-oss, Magistral, Muse Spark, Flux, Stable Diffusion and a few media models.
These open models form part of the wider international baseline, although Chinese open models occupy more of the practical comparison space in this catalog.
Mistral Large 3 is listed as a sparse MoE model with 675B total parameters and 41B active parameters.
Gemma 4 is efficient for personal hardware, agent use and multimodal tasks.
Muse Spark is small and fast, with strengths in science, mathematics, health and multimodal perception.
Model or family
Category
Summary
Mistral Large 3 / Ministral 3
General
Mistral Large 3 is listed as a 675B sparse MoE model with 256K context; Ministral 3 provides 3B, 8B and 14B edge models with multimodal ability.
Gemma 4 / Phi-4
General
Gemma 4 emphasizes compute and memory efficiency, local deployment, function calling and multimodal understanding; Phi-4 is treated as a compact high-performing Microsoft model.
gpt-oss / Phi-reasoning / Magistral Small / Muse Spark
Reasoning
This group includes OpenAI's gpt-oss models, Microsoft's Phi reasoning line, Mistral's open Magistral Small and Meta's Muse Spark.
Flux 2 dev / Stable Diffusion
Image
Flux 2 dev is a strong open-weight image generation and editing model; Stable Diffusion remains the open local-deployment reference.
Hunyuan Video 1.5 / Wan 2.2 / audio and music models
Media
Open video, image, audio, music and world-generation entries appear later in the Chinese open model section.
Chinese models: the gap is shrinking, especially in specialist categories
Chinese large-language models were initially rushed into the market, but after more than a year of development the gap with leading overseas teams has narrowed, especially in music, image generation, video generation and reasoning models.
Many early Chinese LLMs were launched quickly to catch the market, but the strongest Chinese players have since become serious contenders in several categories.
The Chinese proprietary group includes Doubao, Qwen, GLM, Kimi, Tencent Yuanbao/Hunyuan, Step, ERNIE and MiMo.
The Chinese open model group includes DeepSeek, Qwen, GLM, Kimi, MiniMax, ERNIE, MiMo, Hunyuan and Step.
The page repeatedly treats Chinese progress by category, not as a single national ranking.
Chinese proprietary group: general and reasoning models
The Chinese proprietary section starts with consumer and platform model families such as Doubao, Qwen, GLM, Kimi, Tencent Yuanbao, Step, ERNIE and MiMo.
The table below follows the same category order used throughout this catalog and keeps the main practical distinctions clear.
Doubao Seed 2.0 is a first-tier Chinese model family after steady progress in 2025.
Qwen is closer to applications and broader than DeepSeek's performance-first focus.
Kimi was once China's long-document reading leader, but it became quieter after entering 2025.
Model or family
Category
Summary
Doubao Seed 2.0
General
ByteDance's general model application line. Seed 2.0 improves visual reasoning, temporal and motion perception, instruction following and complex agent-task performance.
Qwen3.6 Plus / Qwen3.5 Omni
General / All-purpose multimodal
Alibaba's Qwen line is application-oriented and broad. Qwen3.6 Plus emphasizes agentic coding and 1M context; Qwen3.5 Omni supports long audio/video input, real-time interaction and multilingual speech.
GLM-5V-Turbo / GLM-5-Turbo
Agent / General
Zhipu AI's models are described as agent-oriented, with GLM-5V-Turbo focused on visual programming and GLM-5-Turbo focused on tool use, instruction following and long-chain execution.
Kimi
General
Moonshot's Kimi was once China's long-document reading king, though its market presence became quieter after entering 2025.
Tencent Yuanbao / Hunyuan HY2.0
General
Tencent's Yuanbao, formerly Hunyuan, is a MoE model family with improved reasoning, coding, agent and instruction-following ability, though with weaker visibility after integrating DeepSeek-R1.
ERNIE 5.1
General
Baidu's model is an ultra-sparse MoE with much lower pretraining cost than comparable models and a four-stage post-training pipeline.
MiMo-V2.5-Pro / MiMo-V2.5
Agent / General
Xiaomi's MiMo models emphasize long-cycle autonomous planning, multimodal action and 1M context in a unified model.
Doubao Seed 2.0 Pro / Qwen3.7 Max / Step R1 v-mini / X1 Turbo / HY2.0 Think
Reasoning
These are major Chinese closed reasoning models, with Qwen3.7 Max especially framed around long autonomous execution and cross-framework generalization.
Chinese proprietary group: video, music, audio, image and world-generation models
The Chinese application section is broad. It includes Seedance, Kling, Hailuo, Qingying, Wan, MiniMax Music, Mureka, StepAudio, Speech, Qwen TTS, Kling Image, Seedream, Wan image models and Happy Oyster.
This part is closer to an application catalog than a base-model comparison, so the table keeps a product-oriented structure.
Seedance 2.0 is presented as the strongest video model after following Kling and Veo for more than a year.
MiniMax Music 2.6 is described through four real music-production scenarios.
Chinese image and video models are areas where Chinese teams are catching up or overtaking.
Model or product
Category
Summary
Seedance 2.0 / Kling 3.0 Omni / Kling 3.0
Video
Seedance 2.0 improves reference control and editing; Kling 3.0 Omni focuses on multimodal consistency, while Kling 3.0 focuses on professional video generation and narrative control.
Hailuo 2.3 / Qingying / Wan 2.7 Video / HappyHorse
Video
MiniMax Hailuo improves body movement, facial expression and physics; Zhipu Qingying supports cinematic parameters; Wan 2.7 covers text-to-video, image-to-video, reference video and editing; HappyHorse is listed as a placeholder dark-horse entry.
MiniMax Music 2.6 / Mureka V7
Music
MiniMax Music improves Chinese-style music, epic low-end sound, lo-fi or indie-folk looseness and cover workflows; Mureka V7 uses MusicCoT to plan the musical structure before filling content.
StepAudio focuses on contextual TTS and emotional delivery; MiniMax Speech reduces latency and improves imperfect voice cloning; Qwen3-TTS emphasizes Chinese and English stability and many expressive voices.
Kling Image 3.0 Omni / Seedream 4.5 / Wan 2.7 Image
Image
Kling Image focuses on cinematic narrative visuals; Seedream is optimized for Chinese semantics and advanced editing instructions; Wan 2.7 strengthens virtual-avatar design, color palette control and long text/formula rendering.
Happy Oyster
World-generation model
Alibaba's token team is running a small internal test for a world-generation model, with limited access at the time of this update.
Chinese open model group: general and reasoning models
The Chinese open model section is the most detailed part of this catalog. It lists DeepSeek, Qwen, GLM, Kimi, MiniMax, ERNIE, MiMo, Hunyuan and Step.
Open Chinese models are a major part of the market. This section pays close attention to total parameters, active parameters, context length, agent ability, multimodal support and deployment efficiency.
DeepSeek V4 Pro and V4 Flash are framed as the main April 2026 open-model updates.
Qwen's open models cover dense multimodal models, sparse MoE models, VL models, Omni models and the Qwen3-Next architecture.
MiniMax M3 is listed as an upcoming open agent model with 1M context and native multimodal input.
Model or family
Category
Summary
DeepSeek V4 Pro / V4 Flash
General
DeepSeek V4 Pro is a 1.6T total, 49B active hybrid reasoning flagship with 1M context, compressed sparse attention and stronger agent ability; V4 Flash is smaller, faster and cheaper with similar simple-task ability but weaker hard-task performance.
The reasoning section gathers open Chinese thinking models, with Step-3.7-Flash newly highlighted for production agent workflows, multimodal understanding, search and compatibility with mainstream agent frameworks.
Chinese open model group: media and world-generation models
The final catalog section lists Chinese open models for video, image, audio, music and world generation.
This is where the catalog moves beyond LLMs into generation systems and world-building tools. The category split matters because these products should not be judged only by text-model benchmarks.
Hunyuan Video 1.5 and Wan 2.2 are listed as open video models.
Z-Image, Qwen-Image, Hunyuan Image, GLM-Image and CogView are listed as open image models.
HunyuanVideo-Foley, ACE-Step and HY-World 2.0 are listed for audio, music and world generation.
Model or family
Category
Summary
Hunyuan Video 1.5 / Wan 2.2
Video
Hunyuan Video 1.5 is a smaller open video model with improved motion, aesthetics and preference alignment; Wan 2.2 supports 720P, 24fps text-to-video and image-to-video generation on consumer GPUs.
Z-Image / Qwen-Image
Image
Z-Image is a 6B scalable single-stream DiT image model with Turbo, Base and Edit variants; Qwen-Image is a 20B image foundation model strong in complex text rendering and precise image editing.
Hunyuan Image / GLM-Image / CogView
Image
Hunyuan Image is a large open image MoE model; GLM-Image combines autoregressive and diffusion decoding; CogView remains Zhipu's bilingual DiT image model.
HunyuanVideo-Foley
Audio
Tencent's open 3B audio-generation model creates layered sound effects by understanding video frames and text descriptions together.
ACE-Step
Music
StepFun and ACE Studio's 3.5B open music model is listed as the open music entry.
HY-World 2.0
World-generation model
Tencent Hunyuan's world-generation model accepts text, single-view images, multi-view images and video, then generates 3D world representations such as meshes or 3D Gaussian splats.
Update log: why this page is a time-stamped snapshot
This catalog includes a long update log covering 2026 and 2025. The log is part of its value because it shows how quickly model categories and release details change.
The latest update entries add Claude Opus 4.8, Qwen3.7-Max, Gemini 3.5 Flash, Gemini Omni, ERNIE 5.1, GPT-5.5 Instant, DeepSeek V4, GPT-Image 2, Hy3-preview, MiMo, StepAudio, Kimi K2.6, Qwen3.6, GLM-5V-Turbo, Wan 2.7, Gemma 4 and other model families. Older 2025 entries track updates to DeepSeek, Qwen, GLM, MiniMax, Kimi, Hunyuan, Claude, Gemini, Grok, Flux, Runway, Kling, Wan and many others.
The catalog should be read with its update date visible.
Release names, pricing, access routes and benchmark positions may change after publication.
The update log explains why the page is useful as a living catalog, not a permanent verdict.
Update period
Examples from the update log
Meaning
June 2026
Step-3.7-Flash and MiniMax M3.
The latest update emphasizes reasoning and agent models.
May 2026
Claude Opus 4.8, Qwen3.7-Max, Gemini 3.5 Flash, Gemini Omni, ERNIE 5.1 and GPT-5.5 Instant.
Top overseas and Chinese labs keep refreshing general, reasoning and multimodal models.
April 2026
DeepSeek V4, GPT-5.5, GPT-Image 2, Hy3-preview, MiMo, StepAudio, Kimi K2.6, Qwen3.6 and new world-model entries.
The April log is one of the densest updates in the catalog.
2025 updates
DeepSeek, Qwen, GLM, MiniMax, Kimi, Hunyuan, Claude, Gemini, Grok, Flux, Runway, Kling, Wan and others.
The catalog tracks model changes month by month rather than treating the market as static.
Common mistakes to avoid
Mistake
Treating one article as a final ranking
Why it hurts
Model releases, pricing, quotas and benchmark positions can change quickly.
Better move
Use the analysis as a shortlist, then run current checks against your own workload.
Mistake
Choosing by brand instead of task
Why it hurts
A strong chat model may still be weak for long documents, coding agents, multimodal work or low-latency routes.
Better move
Define the job first, then compare models with prompts, files or media that match that job.
Mistake
Copying claims without a current verification check
Why it hurts
Benchmark numbers, context windows, API names and prices may be dated or provider-specific.
Better move
Confirm high-impact details against official docs, model cards or live provider pages.
Use this page to understand the model family, the evaluation angle and the current conversation around it. Then choose one or two realistic prompts, documents or media tasks and test whether the model behaves well in your own workflow.
FAQ
These questions reflect recurring reader concerns around Chinese model knowledge, evaluation and fast-moving model releases.
What is the main point of Well-known Chinese and global AI models and applications by category?
This SmarToken catalog tracks well-known global and Chinese AI models by where the teams are based, whether the release is proprietary or open, and application categories such as general LLMs, reasoning, image generation, video, music, audio and world-generation models. It was last updated on June 1, 2026.
How should readers use the Chinese model context here?
Use it as market and product context, then verify technical claims, pricing, quotas and release details against official pages or your own tests before making a decision.
Why is there a short video with the page?
The video gives a fast visual summary of the model story, while the written page carries the caveats, comparisons and practical checks.
References and verification
SmarToken tracks public model releases, technical reports, product announcements and market signals to keep this catalog useful.
Technical claims need to be treated as dated unless they are confirmed by current official model cards, technical reports or provider announcements.
Pricing, quota, availability and benchmark details can change after the review date, so production decisions should use current vendor pages and direct workload tests.