DApp Store | Web3 Hub for Events & Games

Trending topics

I really like this research direction! For a long time, I've been talking about the "brain vs. database" analogy of SSMs vs Transformers. An extension of this that I've mentioned offhand a few times is that I think that the tradeoffs change when we start thinking about building multi-component *systems* rather than single models. For example, if one subscribes to the intuition that modern hybrid models are using the SSM as the main "brain-like" processing unit while the attention is primarily for "database-like" caching to aid with precise retrieval, then I hypothesized that perhaps a more optimal system could be a pure SSM language model combined with explicit external knowledge databases and context caches. This is much more analogous to human-like intelligence that is primarily driven by the brain (an SSM) aided by external knowledge stores (books, the internet) and tool use. This paper shows pretty interesting results that SSMs do seem to have very favorable performance compared to Transformers in this regime of agentic models operating with interactive tool use. Glad to see the intuition validated, and I hope more research continues along these lines!

Top

Ranking

Favorites