Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
> you are a person
> who wants to understand llm inference
> you read papers
> “we use standard techniques”
> which ones? where is the code?
> open vllm
> 100k lines of c++ and python
> custom cuda kernel for printing
> close tab
> now you have this tweet
> and mini-sglang
> ~5k lines of python
> actual production features
> four processes
> api server
> tokenizer
> scheduler
> detokenizer
> talk over zeromq
> simple
> scheduler is the boss
> receives requests
> decides: prefill or decode
> batches them
> sends work to gpu
> prefill...

Top
Ranking
Favorites
