Performance is rarely magic; it is usually cache lines and branch prediction.
Cache Is the First Truth
Most latency comes from memory-access patterns, not algorithmic notation.
Concurrency Follows Data Shape
More threads amplify lock contention unless locality is fixed first.