MASSIVE > Step-3.5-Flash by StepFun > Agentic & Coding MONSTER > opensource MoE, Apache-2.0 > runs with full context on > 2x RTX PRO 6000/8x RTX 3090s > 196B MoE, only 11B active per token > 256K context via 3:1 sliding window attention > long codebases & long tasks, cost-efficient long-context > benchmarks > 74.4% SWE-bench Verified > 51.0% Terminal-Bench 2.0 > strong reasoning, strong coding, stable agents > sparse MoE + Top-8 routing > with sliding window attention > MTP-3 predicts multiple tokens at once > 100–300 tok/s typical, peaks ~350 tok/s > fast enough for parallel agents, not just chatting > apache-2.0 > openweights > runs locally > Macs, DGX Spark, GPUs > vLLM, SGLang, Transformers, llama.cpp > this is what “Buy a GPU” tried to warn you about...