@SuperDicq
Thanks, I feel 30 IQ points dumber now.

I get, that the point was, that compiler cannot optimalize by pushing commands place in code, but I still didn't get 80% of that text.

Follow

@LukeAlmighty @SuperDicq
The integer overflow I understand, if a + b is greater than MAX_INT, that's an issue.
Not sure how this affects prefetching, but then I'm not an expert.
The rest is Chinese to me.

@coded_artist @LukeAlmighty @SuperDicq

Up to "and creating serial dependencies" I understand everything (can explain if you want).

But the rest is "I know some of these words" for me.

@coded_artist @LukeAlmighty @SuperDicq

LLVM is the reusable compiler mid-end - the programming-language-independent part that does optimizations. It is used eg. by clang, which is a C compiler built on top of LLVM.

LLVM is famous for popularizing the SSA form.

1/

@coded_artist @LukeAlmighty @SuperDicq
The SSA form is a type of internal representation of a program inside a compiler, in which instructions do not have a destination operand. Instead, every instruction creates a new variable with its result, and other instructions can use that as their source operands, but they cannot modify that variable

This means there are lots of temporary variables, so LLVM must be very good at optimizing them out.
Which means adding a temp var in your code is cheap
2/

@coded_artist @LukeAlmighty @SuperDicq

ARM and x86 are CPU architectures, i.e. sets of instructions that a CPU can execute.

Most CPUs implementing those architectures have a pipeline - that is, they start processing the next instruction before they're done with the previous one, kinda like an assembly line.

This is great when they know what instruction is going to be executed next. But when there is a branch in the code (eg. an if statement) they don't know ahead of time.

3/

@coded_artist @LukeAlmighty @SuperDicq
So they have to guess which side of an if is going to be executed, start processing instructions from that side of a branch, and if it later turns out they guessed wrong, they need to undo all of that and start again with the correct side of a branch.

This is called speculative execution.

Branch prediction is the part where the CPU guesses which way a branch is going to go.

4/

@coded_artist @LukeAlmighty @SuperDicq

Branch misprediction is when it guesses incorrectly, and it costs a lot of time.

If a value used inside an if condition is known some time before the execution reaches the if, that might make branch prediction easier for a sufficiently smart CPU.

But when you obscure which value is which by using arithmetic to swap two variables, the CPU might not be able to see through that, resulting in frequent mispredictions, and making your code slow.

5/

@coded_artist @LukeAlmighty @SuperDicq

Modern CPUs are also superscalar (which means they execute multiple instructions in parallel) and out-of-order (which means they can reorder instructions that don't depend on each others' results of side effects).

A serial dependency between two instructions means they have to be executed in the same order they appear in the code, because one depends on the other.

6/

@coded_artist @LukeAlmighty @SuperDicq

If you use arithmetic tricks to swap two values, anything that uses any of the two values in the future depends on those swap instructions, which in turn depend on both the instructions that calculated a, and the instructions that calculated b.

That may prevent CPU from doing things in parallel, or reordering instructions, which again, makes your code slow.

7/

@coded_artist @LukeAlmighty @SuperDicq

Also, instruction-level parallelism (ILP) is the property of a compiled program which says how many of its instructions can be executed in parallel if you have an infinitely parallel CPU. If every instruction in your program uses the result from the instruction immediately before it, your ILP is 1, and you cannot take advantage of a CPU that can do multiple things at once.

8/

@coded_artist @LukeAlmighty @SuperDicq

Also, a modern compiler like a recent version of LLVM or GCC with optimizations enabled will see right through your arithmetic tricks, and turn them into a temporary variable (that is then held in a CPU register and never reaches main memory).

@wolf480pl @LukeAlmighty @SuperDicq So by using a temp variable, a and b will be used where needed, executed in parallel by the CPU, as the swap is basically optimized away by the LLVM? 🤔

@coded_artist @LukeAlmighty @SuperDicq

LLVM optimizations will make sure the temporary variable is not stored in main memory (which is slow), only in a CPU register.

They will use mov instructions to move the values around, which are very cheap on a modern out-of-order CPU, because these CPUs do register renaming - instead of having a separate physical register named "eax", "ebx", etc. for reach register in the ISA, they have dozens of registers, each of which can pretend to be "eax", etc.

@coded_artist @LukeAlmighty @SuperDicq
the conclusion from all of that is:

1. Instead of trying to be clever, try to write code in such a way that it's easy to understand what you're trying to do - chances are the compiler will be able to make sense of that and apply optimizations much better than you could by hand.

2. If in doubt, look at what instructions the compiler generates, eg. with godbolt: godbolt.org/z/oW88b77eW

@coded_artist @LukeAlmighty @SuperDicq
there's also this which can show how a CPU would execute specified instructions:

uica.uops.info/

but I don't know enough about CPUs to be able to read its output

@wolf480pl @LukeAlmighty @SuperDicq I'm a software engineer, and while I had to do some silly optimizations in the past, it was usually graphical, as I work in simulations, and occasionally web.

In my experience, readability is the highest priority.

And while it's rare for me to get into optimizations of this kind, it's fascinating.

Thanks for the explanation!

@wolf480pl@mstdn.io @coded_artist@gameliberty.club @LukeAlmighty@gameliberty.club Thank you for the explanation, it actually makes a lot of sense. I feel like I understood most of what you said but I couldn't quite put it into words.

Also I recommend you use an instance that is not Mastodon next time to get around the hardcoded 500 character limit so everything can actually be just one post.

@SuperDicq @LukeAlmighty @coded_artist
I actually kinda like splitting things into multiple posts, but if I had a larger character limit, I would split it less / in a more organized way

@wolf480pl@mstdn.io @LukeAlmighty@gameliberty.club @coded_artist@gameliberty.club The 500 character limit of Mastodon is very arbitrary. Other fedi software don't do this. There's no reason to. It isn't the 1980s. Our storage space isn't limited because by goddamn text.

@coded_artist @LukeAlmighty @SuperDicq I understood every word, but I am extremely disappointed that LLVM was used instead of GCC.
Sign in to participate in the conversation
Game Liberty Mastodon

Mainly gaming/nerd instance for people who value free speech. Everyone is welcome.