Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
📍 Can LLMs discover, abstract, and reuse higher-level tool skills across tasks?
Existing tool-use benchmarks test solving tasks with fixed tools. But real workflows contain recurring structures where efficiency comes from reusable tool compositions, not isolated calls.
We introduce SkillCraft: 126 tasks across 6 domains designed to test whether LLM agents can acquire compositional skills, not just call atomic tools.
We also propose Skill Mode, a lightweight protocol with four MCP primitives that let agents compose, verify, cache, and reuse tool chains at test time.
Our Key findings across evaluating 8 SOTA models:
⚡Skill Mode enables agents to self-discover and reuse skills, leading to higher success and efficiency than agents without it. The gains are larger for stronger models.
🧠 Stronger models (e.g., Claude) discover more generalizable skills, which transfer across tasks and even across models.
🔍 Deeper composition ≠ better — shallow, well-tested skills generalize best.
🔗 Paper:
💻 Code:
🏠 Page:
(1/7)
Top
Ranking
Favorites
