@p Hmmm, thinking about it more, I think a lookup table might be the fastest, because the whole table could fit into L1 cache.
Manual addition would probably be best on some older CPU.
Switch case is pretty fast, but could get slow with a poor branch prediction.
Loop has too many jumps and is variable.
@p By "old" I was thinking around turn of the century.
I don't know how to properly benchmark such tiny tasks either. I think you just have to run each one like billion times and see the difference.