Evaluation8 min read/Updated 2026-05-25

Well-known Chinese and global AI models and applications by category

This SmarToken catalog tracks well-known global and Chinese AI models by where the teams are based, whether the release is proprietary or open, and application categories such as general LLMs, reasoning, image generation, video, music, audio and world-generation models. It was last updated on June 1, 2026.

Key takeaways

01This SmarToken catalog tracks well-known AI models and applications as of June 1, 2026.
02We separate models from outside China and China-based teams, proprietary and open releases, and application categories such as general models, reasoning, image, video, music, audio and world-generation models.
03The gap between Chinese and overseas models keeps narrowing, although overseas teams still hold an overall lead, especially in general-purpose models.

Well-known Chinese and global AI models and applications by category video guide. A short SmarToken video for AI Model Applications Catalog: China and Global Models 2026, focused on model evaluation, tradeoffs and the current discussion.

Weekly update: June 1 to June 5, 2026

This week's update adds Step's open reasoning model Step-3.7-Flash and MiniMax's planned-open-release agent model MiniMax M3.

Agent models, including models optimized for OpenClaw, are still an emerging category and are temporarily grouped with all-purpose multimodal or general models. Hybrid reasoning models are now mainstream, but many model families still release their general and thinking versions separately, so this catalog lists them as separate general and reasoning models. OCR models and 3D models are not yet treated as full categories in this version.

SmarToken editorial chart summarizing selected LMSYS Arena ranking signals from May 28 2026 for Claude, Muse Spark, Qwen3.7-Max and GLM-5.1. — A SmarToken editorial graphic based on the LMSYS Arena snapshot discussed in the June 2026 update. It keeps the ranking signal without reusing a third-party screenshot.

Official referenceLMArena leaderboard spaceCheck the current leaderboard space before treating a dated ranking snapshot as current.SmarToken guideChinese AI model capability guideUse this companion guide to compare Chinese model strengths by task type.

Step: the China-based open model group updated the Step-3.7-Flash reasoning model.
MiniMax: the China-based open model group updated the planned-open-release MiniMax M3 agent model.
Agent, OCR and 3D model categories are still handled cautiously in this catalog.

Catalog note	How we classify it	Reader caveat
Agent models are still emerging.	They are temporarily grouped with all-purpose multimodal or general models.	The category may change as agent models mature.
Hybrid reasoning models dominate the market.	General and thinking versions are still listed separately when releases differ.	Compare the exact variant, not only the model family.
OCR and 3D models are not yet included.	They are not considered mature enough as standalone categories in this snapshot.	Future versions may add them.

Models outside China: overall lead remains, especially in general models

After two years of rapid development, the gap between Chinese AI models and models from outside China is shrinking, but overseas teams still keep an overall lead, especially in general-purpose models.

Google, OpenAI and Anthropic not only alternate in model performance leadership, but also still shape the main direction of the industry. There is also a practical warning for China-based companies: many overseas models require network workarounds to access from China, and their freer output style may create compliance risks in Chinese enterprise use.

Google, OpenAI and Anthropic are the most important overseas trend-setters.
Overseas models are often harder to access from China.
Chinese enterprise use may need extra compliance review because output policies differ.

Overseas proprietary group: general and reasoning models

The overseas proprietary section lists the main general and reasoning models from OpenAI, Google DeepMind, Anthropic, xAI and Mistral.

The table below keeps a catalog style. It is a dated market snapshot, so benchmark claims and release details need checking against current vendor pages before production use.

Official referenceOpenAI model documentationVerify the latest OpenAI model names, modalities and availability directly with OpenAI.Official referenceClaude models overviewCheck Anthropic's current Claude family, lifecycle and model details before production use.Official referenceGemini API modelsUse Google's Gemini model docs to confirm current model IDs and capability notes.

GPT-5.5 shifts from chat interaction toward autonomous agents.
Gemini 3.5 Flash focuses on fast iterative coding, multimodal understanding and long workflows.
Claude Opus 4.8 is strong in long-cycle, high-risk enterprise workflows.

Model or family	Category	Summary
GPT-5.5 / GPT-5.5 Instant	General	OpenAI's April 2026 general model shifts toward autonomous agents, native multimodality, 400K Codex context and stronger planning; Instant later became the default ChatGPT model with lower hallucination and more natural language.
Gemini 3.5 Flash	General	Google DeepMind's May 2026 model supports long-context work and focuses on fast iterative coding, advanced multimodal understanding, long workflows and multi-step tool use.
Claude Opus 4.8 / Sonnet 4.6 / Haiku 4.5	General	Anthropic's line is especially strong for coding, agent work, multimodal reasoning and enterprise workflows; Chinese users should also consider policy and account-risk issues before relying on it.
Grok 4.1	General	xAI's model follows a closed-latest, open-older-version strategy. The analysis says 4.1 improved emotional intelligence, writing human-likeness and hallucination rate.
Mistral 3 Medium / Mistral Large 2	General	Mistral is treated as Europe's main AI lab, currently less active in frontier general-model competition and more focused on smaller models and niche innovations.
GPT-5.5 Thinking / GPT-5.5 Pro	Reasoning	Thinking is positioned for fast logical breakdowns and concise answers; Pro is the high-precision version for difficult tasks such as FrontierMath and Expert-SWE.
Gemini 3.1 Pro / Gemini 3 Deep Think	Reasoning	Major improvements include code-based animation, complex system synthesis, interactive design and difficult multimodal reasoning.
Claude Opus 4.8 Thinking	Reasoning	The key feature is Effort Control, allowing users to adjust thinking depth across low, high, extra and maximum settings.
Grok 4.1 Thinking / Magistral Medium v1.2	Reasoning	Grok is xAI's thinking model; Magistral is a fast second-tier reasoning option with multimodal support.

Overseas proprietary group: image, video, music, audio and world-generation models

This catalog gives separate categories to image, video, music, audio and world-generation models because applications are now part of the model landscape. Here, world-generation models means systems that create or simulate interactive 3D environments.

This section follows the same catalog style: model name first, category second, and a short practical description. Some products are listed as applications rather than base models.

Google, OpenAI, Midjourney and Black Forest Labs appear in the image category.
Google Veo/Omni, Runway, Pika, Luma, Stable Video Diffusion and Midjourney appear in the video category.
World-generation models are treated as a new category, led by Genie 3 and World Labs products.

Model or product	Category	Summary
Gemini 3.1 Flash Image / Gemini 3 Pro Image / Imagen 4	Image	Google DeepMind's image line is stronger in text rendering, multilingual localization, complex instructions, character consistency and overall generation quality.
GPT-Image 2 / DALL·E 3	Image	OpenAI's image model is a GPT-native visual system with thinking mode, internet-aware generation, self-checking and improved commercial production output.
Midjourney v7 / Flux 2	Image	Midjourney remains an older image pioneer with slow updates; Flux 2 combines high-quality generation and editing in one architecture.
Veo 3.1 / Google Omni	Video	Veo improves audio, instruction following and realism; Google Omni combines Gemini reasoning with video editing and generation grounded in physical consistency.
Runway Gen-4.5 / Pika 2.5 / Luma AI / Stable Video Diffusion / Midjourney video	Video	These are major video-generation products, with limitations such as object disappearance, causal errors or local hardware requirements.
Lyrics 3 / Suno 5.0	Music	Lyrics 3 supports text, image and video inputs and automatic lyric generation; Suno 5.0 improves studio-level quality, track separation and style control.
Stable Audio / MuseNet / V2A	Audio	These are audio-generation systems. V2A generates background audio for video from input footage and text prompts.
Genie 3 / World Labs Marble / World Labs RTFM	World-generation model	Genie 3 renders interactive virtual worlds in real time; World Labs focuses on persistent 3D environments and real-time generative worlds with spatial memory.

Overseas open model group

The overseas open model section is shorter. It lists Mistral, Gemma, Phi, gpt-oss, Magistral, Muse Spark, Flux, Stable Diffusion and a few media models.

These open models form part of the wider international baseline, although Chinese open models occupy more of the practical comparison space in this catalog.

Mistral Large 3 is listed as a sparse MoE model with 675B total parameters and 41B active parameters.
Gemma 4 is efficient for personal hardware, agent use and multimodal tasks.
Muse Spark is small and fast, with strengths in science, mathematics, health and multimodal perception.

Model or family	Category	Summary
Mistral Large 3 / Ministral 3	General	Mistral Large 3 is listed as a 675B sparse MoE model with 256K context; Ministral 3 provides 3B, 8B and 14B edge models with multimodal ability.
Gemma 4 / Phi-4	General	Gemma 4 emphasizes compute and memory efficiency, local deployment, function calling and multimodal understanding; Phi-4 is treated as a compact high-performing Microsoft model.
gpt-oss / Phi-reasoning / Magistral Small / Muse Spark	Reasoning	This group includes OpenAI's gpt-oss models, Microsoft's Phi reasoning line, Mistral's open Magistral Small and Meta's Muse Spark.
Flux 2 dev / Stable Diffusion	Image	Flux 2 dev is a strong open-weight image generation and editing model; Stable Diffusion remains the open local-deployment reference.
Hunyuan Video 1.5 / Wan 2.2 / audio and music models	Media	Open video, image, audio, music and world-generation entries appear later in the Chinese open model section.

Chinese models: the gap is shrinking, especially in specialist categories

Chinese large-language models were initially rushed into the market, but after more than a year of development the gap with leading overseas teams has narrowed, especially in music, image generation, video generation and reasoning models.

Many early Chinese LLMs were launched quickly to catch the market, but the strongest Chinese players have since become serious contenders in several categories.

SmarToken guideChinese model routing matrixCompare candidate model families by workload, routing role and API decision point.SmarToken guideSmarToken model catalogBrowse the model routes currently exposed through SmarToken before shortlisting APIs.

The Chinese proprietary group includes Doubao, Qwen, GLM, Kimi, Tencent Yuanbao/Hunyuan, Step, ERNIE and MiMo.
The Chinese open model group includes DeepSeek, Qwen, GLM, Kimi, MiniMax, ERNIE, MiMo, Hunyuan and Step.
The page repeatedly treats Chinese progress by category, not as a single national ranking.

Chinese proprietary group: general and reasoning models

The Chinese proprietary section starts with consumer and platform model families such as Doubao, Qwen, GLM, Kimi, Tencent Yuanbao, Step, ERNIE and MiMo.

The table below follows the same category order used throughout this catalog and keeps the main practical distinctions clear.

Official referenceByteDance SeedReview ByteDance's official Seed page for current model-family positioning.Official referenceQwen3 official blogUse Alibaba's Qwen page for model-family context and deployment notes.Official referenceZ.ai GLM-5 release notesCheck Z.ai's release notes for GLM agent, coding and platform details.

Doubao Seed 2.0 is a first-tier Chinese model family after steady progress in 2025.
Qwen is closer to applications and broader than DeepSeek's performance-first focus.
Kimi was once China's long-document reading leader, but it became quieter after entering 2025.

Model or family	Category	Summary
Doubao Seed 2.0	General	ByteDance's general model application line. Seed 2.0 improves visual reasoning, temporal and motion perception, instruction following and complex agent-task performance.
Qwen3.6 Plus / Qwen3.5 Omni	General / All-purpose multimodal	Alibaba's Qwen line is application-oriented and broad. Qwen3.6 Plus emphasizes agentic coding and 1M context; Qwen3.5 Omni supports long audio/video input, real-time interaction and multilingual speech.
GLM-5V-Turbo / GLM-5-Turbo	Agent / General	Zhipu AI's models are described as agent-oriented, with GLM-5V-Turbo focused on visual programming and GLM-5-Turbo focused on tool use, instruction following and long-chain execution.
Kimi	General	Moonshot's Kimi was once China's long-document reading king, though its market presence became quieter after entering 2025.
Tencent Yuanbao / Hunyuan HY2.0	General	Tencent's Yuanbao, formerly Hunyuan, is a MoE model family with improved reasoning, coding, agent and instruction-following ability, though with weaker visibility after integrating DeepSeek-R1.
ERNIE 5.1	General	Baidu's model is an ultra-sparse MoE with much lower pretraining cost than comparable models and a four-stage post-training pipeline.
MiMo-V2.5-Pro / MiMo-V2.5	Agent / General	Xiaomi's MiMo models emphasize long-cycle autonomous planning, multimodal action and 1M context in a unified model.
Doubao Seed 2.0 Pro / Qwen3.7 Max / Step R1 v-mini / X1 Turbo / HY2.0 Think	Reasoning	These are major Chinese closed reasoning models, with Qwen3.7 Max especially framed around long autonomous execution and cross-framework generalization.

Chinese proprietary group: video, music, audio, image and world-generation models

The Chinese application section is broad. It includes Seedance, Kling, Hailuo, Qingying, Wan, MiniMax Music, Mureka, StepAudio, Speech, Qwen TTS, Kling Image, Seedream, Wan image models and Happy Oyster.

This part is closer to an application catalog than a base-model comparison, so the table keeps a product-oriented structure.

Seedance 2.0 is presented as the strongest video model after following Kling and Veo for more than a year.
MiniMax Music 2.6 is described through four real music-production scenarios.
Chinese image and video models are areas where Chinese teams are catching up or overtaking.

Model or product	Category	Summary
Seedance 2.0 / Kling 3.0 Omni / Kling 3.0	Video	Seedance 2.0 improves reference control and editing; Kling 3.0 Omni focuses on multimodal consistency, while Kling 3.0 focuses on professional video generation and narrative control.
Hailuo 2.3 / Qingying / Wan 2.7 Video / HappyHorse	Video	MiniMax Hailuo improves body movement, facial expression and physics; Zhipu Qingying supports cinematic parameters; Wan 2.7 covers text-to-video, image-to-video, reference video and editing; HappyHorse is listed as a placeholder dark-horse entry.
MiniMax Music 2.6 / Mureka V7	Music	MiniMax Music improves Chinese-style music, epic low-end sound, lo-fi or indie-folk looseness and cover workflows; Mureka V7 uses MusicCoT to plan the musical structure before filling content.
StepAudio 2.5 TTS / MiniMax Speech 2.6 / Qwen3-TTS	Audio	StepAudio focuses on contextual TTS and emotional delivery; MiniMax Speech reduces latency and improves imperfect voice cloning; Qwen3-TTS emphasizes Chinese and English stability and many expressive voices.
Kling Image 3.0 Omni / Seedream 4.5 / Wan 2.7 Image	Image	Kling Image focuses on cinematic narrative visuals; Seedream is optimized for Chinese semantics and advanced editing instructions; Wan 2.7 strengthens virtual-avatar design, color palette control and long text/formula rendering.
Happy Oyster	World-generation model	Alibaba's token team is running a small internal test for a world-generation model, with limited access at the time of this update.

Chinese open model group: general and reasoning models

The Chinese open model section is the most detailed part of this catalog. It lists DeepSeek, Qwen, GLM, Kimi, MiniMax, ERNIE, MiMo, Hunyuan and Step.

Open Chinese models are a major part of the market. This section pays close attention to total parameters, active parameters, context length, agent ability, multimodal support and deployment efficiency.

SmarToken guideDeepSeek API routeStart with the SmarToken model page if DeepSeek is a candidate route for your workload.Official referenceDeepSeek-R1 GitHubCheck DeepSeek's official repository for reasoning-model details and release context.Official referenceKimi-K2-Instruct model cardUse Moonshot's Hugging Face card for Kimi K2 architecture, context and deployment details.Official referenceMiniMax M2 release pageCheck MiniMax's official release page for agent-model positioning and benchmark context.

DeepSeek V4 Pro and V4 Flash are framed as the main April 2026 open-model updates.
Qwen's open models cover dense multimodal models, sparse MoE models, VL models, Omni models and the Qwen3-Next architecture.
MiniMax M3 is listed as an upcoming open agent model with 1M context and native multimodal input.

Model or family	Category	Summary
DeepSeek V4 Pro / V4 Flash	General	DeepSeek V4 Pro is a 1.6T total, 49B active hybrid reasoning flagship with 1M context, compressed sparse attention and stronger agent ability; V4 Flash is smaller, faster and cheaper with similar simple-task ability but weaker hard-task performance.
Qwen3.6-27B / Qwen3.6-Flash / Qwen3.5 / Qwen3-VL / Qwen3-Omni / Qwen3-Next	General / Multimodal	Qwen's open line is broad, from 27B dense multimodal models to 35B-A3B sparse models, 397B-A17B Qwen3.5, VL, Omni and new architecture routes.
GLM-5.1 / GLM-4.6V	General	Zhipu's open models emphasize long work sessions, planning-execution-iteration loops, agentic coding, multimodal output and rich content creation.
Kimi K2.6	General	Moonshot's open model is a 1T total, 32B active model with 256K context, stronger general agent, code and visual understanding ability than K2.5.
MiniMax M3	Agent / General	MiniMax M3 is a 1M-context, native multimodal agent model with coding, agent collaboration and computer-use ability, using MiniMax Sparse Attention.
ERNIE 4.5 / MiMo V2 Flash / Hy3-preview / Step3-VL	General / Multimodal	Baidu, Xiaomi, Tencent Hunyuan and Step open models are important Chinese entries, each with different MoE, multimodal, reasoning or agent strengths.
DeepSeek R1 / Qwen thinking models / GLM thinking models / Seed OSS / Hunyuan A13B / Kimi thinking models / Step-3.7-Flash / MiniMax M2.1	Reasoning	The reasoning section gathers open Chinese thinking models, with Step-3.7-Flash newly highlighted for production agent workflows, multimodal understanding, search and compatibility with mainstream agent frameworks.

Chinese open model group: media and world-generation models

The final catalog section lists Chinese open models for video, image, audio, music and world generation.

This is where the catalog moves beyond LLMs into generation systems and world-building tools. The category split matters because these products should not be judged only by text-model benchmarks.

Hunyuan Video 1.5 and Wan 2.2 are listed as open video models.
Z-Image, Qwen-Image, Hunyuan Image, GLM-Image and CogView are listed as open image models.
HunyuanVideo-Foley, ACE-Step and HY-World 2.0 are listed for audio, music and world generation.

Model or family	Category	Summary
Hunyuan Video 1.5 / Wan 2.2	Video	Hunyuan Video 1.5 is a smaller open video model with improved motion, aesthetics and preference alignment; Wan 2.2 supports 720P, 24fps text-to-video and image-to-video generation on consumer GPUs.
Z-Image / Qwen-Image	Image	Z-Image is a 6B scalable single-stream DiT image model with Turbo, Base and Edit variants; Qwen-Image is a 20B image foundation model strong in complex text rendering and precise image editing.
Hunyuan Image / GLM-Image / CogView	Image	Hunyuan Image is a large open image MoE model; GLM-Image combines autoregressive and diffusion decoding; CogView remains Zhipu's bilingual DiT image model.
HunyuanVideo-Foley	Audio	Tencent's open 3B audio-generation model creates layered sound effects by understanding video frames and text descriptions together.
ACE-Step	Music	StepFun and ACE Studio's 3.5B open music model is listed as the open music entry.
HY-World 2.0	World-generation model	Tencent Hunyuan's world-generation model accepts text, single-view images, multi-view images and video, then generates 3D world representations such as meshes or 3D Gaussian splats.

Update log: why this page is a time-stamped snapshot

This catalog includes a long update log covering 2026 and 2025. The log is part of its value because it shows how quickly model categories and release details change.

The latest update entries add Claude Opus 4.8, Qwen3.7-Max, Gemini 3.5 Flash, Gemini Omni, ERNIE 5.1, GPT-5.5 Instant, DeepSeek V4, GPT-Image 2, Hy3-preview, MiMo, StepAudio, Kimi K2.6, Qwen3.6, GLM-5V-Turbo, Wan 2.7, Gemma 4 and other model families. Older 2025 entries track updates to DeepSeek, Qwen, GLM, MiniMax, Kimi, Hunyuan, Claude, Gemini, Grok, Flux, Runway, Kling, Wan and many others.

The catalog should be read with its update date visible.
Release names, pricing, access routes and benchmark positions may change after publication.
The update log explains why the page is useful as a living catalog, not a permanent verdict.

Update period	Examples from the update log	Meaning
June 2026	Step-3.7-Flash and MiniMax M3.	The latest update emphasizes reasoning and agent models.
May 2026	Claude Opus 4.8, Qwen3.7-Max, Gemini 3.5 Flash, Gemini Omni, ERNIE 5.1 and GPT-5.5 Instant.	Top overseas and Chinese labs keep refreshing general, reasoning and multimodal models.
April 2026	DeepSeek V4, GPT-5.5, GPT-Image 2, Hy3-preview, MiMo, StepAudio, Kimi K2.6, Qwen3.6 and new world-model entries.	The April log is one of the densest updates in the catalog.
2025 updates	DeepSeek, Qwen, GLM, MiniMax, Kimi, Hunyuan, Claude, Gemini, Grok, Flux, Runway, Kling, Wan and others.	The catalog tracks model changes month by month rather than treating the market as static.

Common mistakes to avoid

Mistake

Treating one article as a final ranking

Why it hurts

Model releases, pricing, quotas and benchmark positions can change quickly.

Better move

Use the analysis as a shortlist, then run current checks against your own workload.

Mistake

Choosing by brand instead of task

Why it hurts

A strong chat model may still be weak for long documents, coding agents, multimodal work or low-latency routes.

Better move

Define the job first, then compare models with prompts, files or media that match that job.

Mistake

Copying claims without a current verification check

Why it hurts

Benchmark numbers, context windows, API names and prices may be dated or provider-specific.

Better move

Confirm high-impact details against official docs, model cards or live provider pages.

Read it as a model briefing, not a setup guide

View model catalog ->

Use this page to understand the model family, the evaluation angle and the current conversation around it. Then choose one or two realistic prompts, documents or media tasks and test whether the model behaves well in your own workflow.

FAQ

These questions reflect recurring reader concerns around Chinese model knowledge, evaluation and fast-moving model releases.

What is the main point of Well-known Chinese and global AI models and applications by category?

How should readers use the Chinese model context here?

Use it as market and product context, then verify technical claims, pricing, quotas and release details against official pages or your own tests before making a decision.

Why is there a short video with the page?

The video gives a fast visual summary of the model story, while the written page carries the caveats, comparisons and practical checks.

References and verification

SmarToken tracks public model releases, technical reports, product announcements and market signals to keep this catalog useful.

Technical claims need to be treated as dated unless they are confirmed by current official model cards, technical reports or provider announcements.

Pricing, quota, availability and benchmark details can change after the review date, so production decisions should use current vendor pages and direct workload tests.

DeepSeek-R1 official repository and technical report linksUsed for R1 release context, reinforcement-learning positioning and distillation caveats.Qwen3 official announcementUsed for Qwen3 model-family context, hybrid thinking and multilingual/app workflow claims.Kimi K2 model cardUsed for Kimi K2 long-context, sparse MoE and agent-workflow context.GLM-4.5 official announcementUsed for GLM-4.5 agent, reasoning and coding positioning.MiniMax M2 announcementUsed for MiniMax M2 coding-agent and task-level evaluation context.ERNIE 4.5 technical reportUsed for ERNIE 4.5 multimodal heterogeneous MoE and active-parameter context.Hunyuan TurboS technical reportUsed for Hunyuan TurboS efficient reasoning, hybrid architecture and context-window claims.ByteDance Seed model publicationsUsed for Doubao/Seed model-family direction, product context and multimodal model signals.