@matrix The first one is ~25 instructions. A single deref might be faster, but 3/4 also don't touch memory. They do both branch, though. 4 is a loop that boils down to maybe four instructions. I suspect 3 is the slowest but I don't know.

**matrix07012** @matrix@gameliberty.club · Mar 28, 2025, 10:02

**matrix07012** @matrix@gameliberty.club · Mar 28, 2025, 10:02

Mar 28, 2025, 10:02

matrix07012 @matrix@gameliberty.club

@p Hmmm, thinking about it more, I think a lookup table might be the fastest, because the whole table could fit into L1 cache.
Manual addition would probably be best on some older CPU.
Switch case is pretty fast, but could get slow with a poor branch prediction.
Loop has too many jumps and is variable.

**pistolero** @p@fsebugoutzone.org · Mar 28, 2025, 10:14

**pistolero** @p@fsebugoutzone.org · Mar 28, 2025, 10:14

Mar 28, 2025, 10:14

pistolero @p@fsebugoutzone.org

@matrix I think 4 would be fastest on older CPUs, but it depends on whether older means "1980s" or "Pentium IV". (4 would definitely be slowest on a P4.)

It is hard to actually benchmark something that is potentially less expensive than the handful of lines that actually perform the benchmarking.

**matrix07012** @matrix@gameliberty.club · 2025-03-28T10:33:42Z

matrix07012 @matrix@gameliberty.club

@p By "old" I was thinking around turn of the century.
I don't know how to properly benchmark such tiny tasks either. I think you just have to run each one like billion times and see the difference.

Mar 28, 2025, 10:33 · · Web · · ·

**pistolero** @p@fsebugoutzone.org · Mar 28, 2025, 10:43

**pistolero** @p@fsebugoutzone.org · Mar 28, 2025, 10:43

Mar 28, 2025, 10:43

pistolero @p@fsebugoutzone.org

@matrix Yeah, but if it's buried in the branch, then you might not get an actual difference. There are CPU instructions that give you access to high-res timers but I think this might still be hard to do.

Trending now

Resources

Developers

What is Mastodon?

gameliberty.club

More…