The Mac M2 Ultra is faster than 2xH100s in running Deepseek R1 IQ1_S.

Over on the llama.cpp github, people have been benchmarking R1 IQ1_S. The M2 Ultra is faster than two H100s for TG. The M2 Ultra gets 13.88t/s. 2xH100s get in the best run 11.53t/s. That's surprising.

As for PP processing, that's all over the place on the 2xH100s. From 0.41 to 137.66. For the M2 Ultra it's 24.05.

https://github.com/ggerganov/llama.cpp/issues/11474