GLM-4.7-Flash: MLA, 3B active parameters and local agent use
GLM-4.7-Flash is presented as a lightweight open model for local coding and agent assistants: 30B total parameters, about 3B active parameters, 200K context and first-time GLM use of DeepSeek-style MLA. This page reads it as an efficiency and deployment story: small-active MoE plus MLA can make local and low-cost agent workflows more realistic, but throughput, latency and current pricing still need verification.