People really don’t understand what a competitive strategic weapon open source has become and how it works. Alfred Marshall would be proud. Always good to reread the cathedral and bazaar.
Aakash Gupta
Aakash GuptaFeb 16, 02:24
The part most people will skip: NVIDIA just made every voice AI API a commodity. OpenAI charges $0.06/min input and $0.24/min output for Realtime API. Gemini Live bills 25 tokens/second of audio. Every startup building voice agents is hemorrhaging cash on per-minute API fees to run what is fundamentally a pipeline problem: ASR → LLM → TTS, three models stitched together with latency at every seam. PersonaPlex replaces that entire pipeline with one 7B model. Runs on a single A100. Open weights, MIT license, commercial use permitted. Response latency: 0.170 seconds for turn-taking, 0.240 seconds for interruptions. It scores higher on dialog naturalness than Gemini (2.95 vs 2.80 MOS) and handles interruptions better than every commercial system they benchmarked. This tells you everything about NVIDIA’s playbook. They don’t need to charge for the model. They need you to buy the GPU. Every company that self-hosts PersonaPlex instead of paying OpenAI per-minute is another A100/H100 sale. Every voice agent startup that drops their API dependency is another enterprise GPU contract. NVIDIA open-sourced the fishing rod because they sell the lake. Built on the Moshi architecture from Kyutai, fine-tuned with under 5,000 hours of data. The voice AI margin is migrating from the application layer to the hardware layer. And NVIDIA is the only company that profits no matter which model wins. 330,000 downloads in the first month. That’s infrastructure capture disguised as generosity.
In the long run, marginal price wants to trend to marginal cost. In software, that is $0.
12