Gemini 3 Flash currently shows the highest success rate for OpenClaw on PinchBench, at 95.1 %. PinchBench is an open benchmark that evaluates how models perform with OpenClaw in real world scenarios. It focuses on practical usage rather than isolated capability tests. Tasks include writing code, managing files, scheduling, and research. PinchBench looks at things like: - Tool usage. Can the model call the right tools with the right parameters? - Multi step reasoning. Can it chain actions to complete complex tasks? - Real world messiness. Can it handle ambiguous instructions and incomplete information? - Practical outcomes. Did it actually create the file, send the email, or schedule the meeting? Full leaderboard below. 1/2