**スーパーディック:rising_sun:** @SuperDicq@minidisc.tokyo · Feb 23

**スーパーディック:rising_sun:** @SuperDicq@minidisc.tokyo · Feb 23

スーパーディック:rising_sun: @SuperDicq@minidisc.tokyo

Feb 23

スーパーディック:rising_sun: @SuperDicq@minidisc.tokyo

**LukeAlmighty** @LukeAlmighty@gameliberty.club · Feb 23

**LukeAlmighty** @LukeAlmighty@gameliberty.club · Feb 23

Feb 23

LukeAlmighty @LukeAlmighty@gameliberty.club

@SuperDicq
Thanks, I feel 30 IQ points dumber now.

I get, that the point was, that compiler cannot optimalize by pushing commands place in code, but I still didn't get 80% of that text.

**Coded Artist** @coded_artist@gameliberty.club · 2025-02-23T22:49:05Z

Coded Artist @coded_artist@gameliberty.club

@LukeAlmighty @SuperDicq
The integer overflow I understand, if a + b is greater than MAX_INT, that's an issue.
Not sure how this affects prefetching, but then I'm not an expert.
The rest is Chinese to me.

February 23, 2025 at 10:49 PM · · Web · · ·

**Wolf480pl** @wolf480pl@mstdn.io · Feb 24

**Wolf480pl** @wolf480pl@mstdn.io · Feb 24

Feb 24

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq

Up to "and creating serial dependencies" I understand everything (can explain if you want).

But the rest is "I know some of these words" for me.

**Coded Artist** @coded_artist@gameliberty.club · Feb 25

**Coded Artist** @coded_artist@gameliberty.club · Feb 25

Feb 25

Coded Artist @coded_artist@gameliberty.club

@wolf480pl @LukeAlmighty @SuperDicq I would appreciate that, if you have the time.

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq

LLVM is the reusable compiler mid-end - the programming-language-independent part that does optimizations. It is used eg. by clang, which is a C compiler built on top of LLVM.

LLVM is famous for popularizing the SSA form.

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25 *

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25 *

Feb 25 *

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq
The SSA form is a type of internal representation of a program inside a compiler, in which instructions do not have a destination operand. Instead, every instruction creates a new variable with its result, and other instructions can use that as their source operands, but they cannot modify that variable

This means there are lots of temporary variables, so LLVM must be very good at optimizing them out.
Which means adding a temp var in your code is cheap
2/

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq

ARM and x86 are CPU architectures, i.e. sets of instructions that a CPU can execute.

Most CPUs implementing those architectures have a pipeline - that is, they start processing the next instruction before they're done with the previous one, kinda like an assembly line.

This is great when they know what instruction is going to be executed next. But when there is a branch in the code (eg. an if statement) they don't know ahead of time.

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq
So they have to guess which side of an if is going to be executed, start processing instructions from that side of a branch, and if it later turns out they guessed wrong, they need to undo all of that and start again with the correct side of a branch.

This is called speculative execution.

Branch prediction is the part where the CPU guesses which way a branch is going to go.

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq

Branch misprediction is when it guesses incorrectly, and it costs a lot of time.

If a value used inside an if condition is known some time before the execution reaches the if, that might make branch prediction easier for a sufficiently smart CPU.

But when you obscure which value is which by using arithmetic to swap two variables, the CPU might not be able to see through that, resulting in frequent mispredictions, and making your code slow.

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq

Modern CPUs are also superscalar (which means they execute multiple instructions in parallel) and out-of-order (which means they can reorder instructions that don't depend on each others' results of side effects).

A serial dependency between two instructions means they have to be executed in the same order they appear in the code, because one depends on the other.

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq

If you use arithmetic tricks to swap two values, anything that uses any of the two values in the future depends on those swap instructions, which in turn depend on both the instructions that calculated a, and the instructions that calculated b.

That may prevent CPU from doing things in parallel, or reordering instructions, which again, makes your code slow.

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq

Also, instruction-level parallelism (ILP) is the property of a compiled program which says how many of its instructions can be executed in parallel if you have an infinitely parallel CPU. If every instruction in your program uses the result from the instruction immediately before it, your ILP is 1, and you cannot take advantage of a CPU that can do multiple things at once.

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq

Also, a modern compiler like a recent version of LLVM or GCC with optimizations enabled will see right through your arithmetic tricks, and turn them into a temporary variable (that is then held in a CPU register and never reaches main memory).

**Coded Artist** @coded_artist@gameliberty.club · Feb 25

**Coded Artist** @coded_artist@gameliberty.club · Feb 25

Feb 25

Coded Artist @coded_artist@gameliberty.club

@wolf480pl @LukeAlmighty @SuperDicq So by using a temp variable, a and b will be used where needed, executed in parallel by the CPU, as the swap is basically optimized away by the LLVM?

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25 *

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25 *

Feb 25 *

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq

LLVM optimizations will make sure the temporary variable is not stored in main memory (which is slow), only in a CPU register.

They will use mov instructions to move the values around, which are very cheap on a modern out-of-order CPU, because these CPUs do register renaming - instead of having a separate physical register named "eax", "ebx", etc. for reach register in the ISA, they have dozens of registers, each of which can pretend to be "eax", etc.

**Coded Artist** @coded_artist@gameliberty.club · Feb 25

**Coded Artist** @coded_artist@gameliberty.club · Feb 25

Feb 25

Coded Artist @coded_artist@gameliberty.club

@wolf480pl @LukeAlmighty @SuperDicq Ooh!
Pass by reference on the CPU level?
That's neat.

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq
yeah :D

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq
the conclusion from all of that is:

1. Instead of trying to be clever, try to write code in such a way that it's easy to understand what you're trying to do - chances are the compiler will be able to make sense of that and apply optimizations much better than you could by hand.

2. If in doubt, look at what instructions the compiler generates, eg. with godbolt: https://godbolt.org/z/oW88b77eW

Compiler Explorer - C

int bar(int a, int b); int foo(int a, int b, int c) { if (c) { a = a + b; b = a - b; a = a - b; } return bar(a, b); } int foo2(int…

godbolt.org

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@coded_artist @LukeAlmighty @SuperDicq
there's also this which can show how a CPU would execute specified instructions:

https://uica.uops.info/

but I don't know enough about CPUs to be able to read its output

uiCA

uica.uops.info

**Coded Artist** @coded_artist@gameliberty.club · Feb 25

**Coded Artist** @coded_artist@gameliberty.club · Feb 25

Feb 25

Coded Artist @coded_artist@gameliberty.club

@wolf480pl @LukeAlmighty @SuperDicq I'm a software engineer, and while I had to do some silly optimizations in the past, it was usually graphical, as I work in simulations, and occasionally web.

In my experience, readability is the highest priority.

And while it's rare for me to get into optimizations of this kind, it's fascinating.

Thanks for the explanation!

**スーパーディック:rising_sun:** @SuperDicq@minidisc.tokyo · Feb 25

**スーパーディック:rising_sun:** @SuperDicq@minidisc.tokyo · Feb 25

Feb 25

スーパーディック:rising_sun: @SuperDicq@minidisc.tokyo

@wolf480pl@mstdn.io @coded_artist@gameliberty.club @LukeAlmighty@gameliberty.club Thank you for the explanation, it actually makes a lot of sense. I feel like I understood most of what you said but I couldn't quite put it into words.

Also I recommend you use an instance that is not Mastodon next time to get around the hardcoded 500 character limit so everything can actually be just one post.

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

**Wolf480pl** @wolf480pl@mstdn.io · Feb 25

Feb 25

Wolf480pl @wolf480pl@mstdn.io

@SuperDicq @LukeAlmighty @coded_artist
I actually kinda like splitting things into multiple posts, but if I had a larger character limit, I would split it less / in a more organized way

**スーパーディック:rising_sun:** @SuperDicq@minidisc.tokyo · Feb 25

**スーパーディック:rising_sun:** @SuperDicq@minidisc.tokyo · Feb 25

Feb 25

スーパーディック:rising_sun: @SuperDicq@minidisc.tokyo

@wolf480pl@mstdn.io @LukeAlmighty@gameliberty.club @coded_artist@gameliberty.club The 500 character limit of Mastodon is very arbitrary. Other fedi software don't do this. There's no reason to. It isn't the 1980s. Our storage space isn't limited because by goddamn text.

**GNU/翠星石** @Suiseiseki@freesoftwareextremist.com · Feb 25

**GNU/翠星石** @Suiseiseki@freesoftwareextremist.com · Feb 25

Feb 25

GNU/翠星石 @Suiseiseki@freesoftwareextremist.com

@coded_artist @LukeAlmighty @SuperDicq I understood every word, but I am extremely disappointed that LLVM was used instead of GCC.

Trending now

Resources

Developers

What is Mastodon?

gameliberty.club

More…