Granite 4.0 H Tiny (4-bit) by @IBM running on iPhone 17 Pro at ~40tk/s with MLX 7B total parameters with 1B active using less than 5GB of RAM, extremely good in benchmarks for it’s memory footprint IBM did a great job with this one, it’s fast and efficient for the size