@p Whatever accesses memory the least so 1 probably
@p Hmmm, thinking about it more, I think a lookup table might be the fastest, because the whole table could fit into L1 cache.
Manual addition would probably be best on some older CPU.
Switch case is pretty fast, but could get slow with a poor branch prediction.
Loop has too many jumps and is variable.