I genuinely do not know how to feel about the AI industry's obsession with massive parameter counts anymore. For the last three years, the narrative was simple. Bigger models meant better reasoning. If you wanted top-tier performance, you needed a cluster the size of a football field.
Then Alibaba dropped Qwen 3.5 in late February 2026, and the old rules suddenly look a bit ridiculous.
The Qwen team rolled out their new generation of models in three quiet waves. They started with a massive 397B flagship model. But what caught my attention was not the big one. It was the 35B parameter model beating their previous 235B giant on core benchmarks.
Half the developer community is losing their minds over how fast these models run locally, while the other half is trying to reverse-engineer the architecture. The truth is that Qwen 3.5 represents a fundamental shift in how we build AI. Brute force scaling is out. Architectural efficiency is in.
The death of brute force scaling
We have been conditioned to look at the total parameter count and judge a model's intelligence. Qwen 3.5 proves this heuristic is broken.
The most interesting model in the new lineup is Qwen3.5-35B-A3B. The "A3B" part means it only activates 3 billion parameters per token. Yet, it consistently outperforms Qwen3-235B-A22B. It achieves this purely through better architecture, cleaner data, and refined reinforcement learning.
There is something unsettling about realizing how much compute we wasted on older models. Qwen 3.5 uses a sparse Mixture-of-Experts (MoE) design combined with what Alibaba calls Gated Delta Networks. Instead of firing up every single neural pathway for a simple prompt, it surgically activates just the 3 billion parameters it needs to get the job done.
The three waves of the rollout
Alibaba did not just dump a single model on GitHub and call it a day. They released an entire ecosystem over three weeks.
First came the flagship Qwen3.5-397B-A17B on February 16. It handles a massive 256K context window and fluently speaks 201 languages. It is the heavy lifter designed for enterprise workloads.
Then on February 24, they released the medium series. This included the highly efficient 122B-A10B, the overperforming 35B-A3B, and a dense 27B model. They also introduced Qwen3.5-Flash, a model optimized for speed.
Finally, on March 2, they open-sourced the small model series under the Apache 2.0 license. These 0.8B, 2B, 4B, and 9B models are not dumbed-down versions. They inherit the exact same native multimodal capabilities and Gated Delta Network architecture as the massive flagship model.
Native vision and language
Most multimodal models feel glued together. You take a language model, bolt on a vision encoder, and hope they talk to each other properly.
Qwen 3.5 is a native vision-language foundation model from the ground up. This means the model does not have to translate an image into text before reasoning about it. It understands pixels and tokens in the same conceptual space.
I keep coming back to how important this is for agents. If you want an AI to navigate a computer interface or understand a complex diagram, the translation layer of older models introduces too much latency and error. Qwen 3.5 just sees it.
What this means for local AI
The release of the small models is the real story here. A 4B parameter model that runs comfortably on an older laptop can now perform reasoning tasks that required a server farm a year ago.
Developers are already building these small models into edge devices and running them locally for privacy-first applications. You do not need an API key to build something smart anymore. The open-weight release of the 0.8B to 9B models means the barrier to entry has vanished.
Official Links
- Hugging Face Model: [Available on Hugging Face Hub]
- Project Page: [ModelScope Qwen 3.5 Release]
- GitHub Repository: [Qwen Chat GitHub]
Conclusion
We are moving past the era where the only way to get smarter AI was to throw more GPUs at it. Qwen 3.5 proves that we can get better results by building smarter architectures.
The 35B model outperforming a 235B model is not just a neat trick. It is the blueprint for the next few years of AI development. We are going to see a massive shift toward highly optimized MoE architectures that run locally and respect user privacy.
If you have been holding off on building with local AI because the models were too large or too slow, your excuse is gone. Go download one of the Qwen 3.5 small models and try it yourself.