The Physics of Memory

I’ve heard that certain algorithms like sorting (or perhaps all algorithms) are improved quite a bit when the accessed data has high memory-locality (e.g. the CPU can load all of the relevant information in it’s cache for processing, instead of hitting RAM). A common way to work with this is using the Entity Component System (ECS) software architecture. However, it can be difficult to realize this for interpreted or OOP languages (Java, Python, Javscript), as they often don’t offer the developer as much control over their object memory location. My question is:

Is it possible to use an ECS-style architecture in Javascript? And can it use that to improve memory locality to get algorithmic performance improvements?

To test this, I created a complex-enough benchmark of… “balls bouncing around in a box”! This follows standard 2d physics engine techniques:

A “broad-phase” collision step (AKA “do bounding boxes intersect?”)
A “narrow-phase” collision step (AKA “given these intersecting boxes, resolve collisions!”)

And of course, I got very carried away and ended up creating a ton of benchmark types below. Try it for yourself!

Background - What is ECS?

If you’ve spent any time in game development, you’ve probably heard of Entity Component System (ECS). If OOP is a row-based database (where each object stores every attribute in one memory chunk), ECS is a column-based database, storing each attribute (or collection of attributes) in separate arrays.

You can probably find much better descriptions of ECS elsewhere, but briefly - instead of building complex class hierarchies like a GameEntity that inherits from MovingObject which inherits from PhysicalObject etc, ECS breaks everything down into three distinct concepts:

Entities: These are nothing more than unique integer IDs. They don’t contain any data or behavior.
Components: Plain data structures that represent individual facets of state (e.g. Position, Velocity, Color). They contain no logic. Each entity can have a dynamic set of components, from 0 to all.
Systems: Functions or loops that execute the game logic. They query entities given a specific set of components to then operate on that data.

Benchmark Choices

I went overboard and tested across five dimensions:

Language: Javascript (testing V8 engine optimizations & TypedArrays) vs. WASM (AssemblyScript for extreme memory control).
Architecture: Traditional OOP (arrays of object references) vs. ECS (cache-friendly Structure of Arrays).
Broad-phase: Spatial AVL Tree ( $O(N \log N)$ balanced tree) vs. Sweep & Prune (sort on X-axis, sweep for overlaps).
Sorting Strategy (for S&P): Tested Insertion Sort (the textbook favorite for nearly-sorted data), Hybrid Quicksort, a zero-copy Hybrid Mergesort, and JS Native Array.sort().
Motion Coherence: Wandering (smooth drift; array stays nearly sorted), Erratic (random teleporting; extreme cache thrashing), and Static (zero motion; isolates pure memory read speed).

Anyways, let’s see how it works! This is hosted on GitHub if you want to play around with it.

Macbook M4 Results

Running all these configurations locally on my Apple M4 Pro chip (plugged in, 200 frames) creates a lot of numbers. If you want the full dataset of all 6 runs, check out the Appendix: Full Benchmark Dataset below.

To cut through information overload, let’s focus on 3 main hypotheses:

H1: Spatial Trees vs. Sweep & Prune

Hypothesis: An $O(N \log N)$ spatial tree is always faster than an $O(N^2)$ Sweep & Prune.
Verdict: Almost always no!

While Big-O complexity dictates that hierarchical trees should need fewer operations, Sweep & Prune operates sequentially on small buffers that fit inside the CPU’s L1 cache. Contiguous memory streaming in flat arrays consistently beats pointer-chasing through tree nodes.

More importantly, when we upgrade Sweep & Prune to use a memory-local Quick Sort (or Merge Sort), it becomes universally faster than spatial trees across all workloads. Interestingly, the WebAssembly version of the spatial tree runs at almost identical speed to pure Javascript. Let’s test this head-to-head at 15,000 wandering bodies:

Broad-phase System	Architecture	Avg Frame Time	99th Percentile	Speedup vs OOP Tree
WASM ECS S&P (Quick)	Flat 1D Array	1.809 ms	2.031 ms	9.02x
Custom ECS S&P (Quick)	Flat 1D Array	4.668 ms	6.111 ms	3.50x
ECS Tree	2D Spatial BVH	9.970 ms	43.666 ms	1.64x
OOP Tree	2D Spatial BVH	16.321 ms	40.756 ms	1.00x

H2: Sorting Strategies at Scale

Hypothesis: Naive Insertion Sort ( $O(N)$ for nearly sorted data) is optimal for real-time physics.
Verdict: Kinda, but Quick Sort is the overall winner and much more consistent.

In everyday 2D simulations (under 5,000 entities), both Insertion Sort and Quick/Merge Sort execute in a tiny fraction of a millisecond. Insertion can be faster here! But Quick Sort or Merge Sort are much more consistent, avoiding worst-case behavior. While Quick Sort and Merge Sort perform virtually identically in static or wandering scenarios, Quick Sort pulls ahead when chaotic motion causes cache misses.

Under extreme gameplay events (explosions, scene transitions, high-velocity collisions), naive Insertion Sort drops into a quadratic $O(N^2)$ jank-fest. At 15,000 erratic bodies:

Naive Insertion Sort: 36.90 ms (JS ECS) / 132.58 ms (JS OOP)
JS ECS Tree (Flat BVH): 16.64 ms
Quick Sort: 5.32 ms (JS ECS)
Merge Sort: 5.67 ms (JS ECS)

Notice that under erratic conditions, naive Insertion Sort becomes nearly 2.2x slower than spatial trees (36.90 ms vs 16.64 ms). By contrast Quick Sort ( $O(N \log N)$ ) maintains rock-solid cache streaming, running ~3.1x faster than spatial trees without any risk of jank.

Sorting Strategy (JS ECS)	Algorithm	Avg Frame Time	99th Percentile
Quicksort (Hybrid)	$O(N \log N)$ in-place	5.318 ms	5.765 ms
Merge Sort (Hybrid)	$O(N \log N)$ zero-copy	5.668 ms	6.334 ms
Native V8 Sort	Timsort / Introsort	6.991 ms	8.146 ms
Insertion Sort	$O(N^2)$ worst-case	36.896 ms	41.801 ms

H3: Javascript vs. WebAssembly

Hypothesis: WebAssembly is required to realize the cache locality benefits of Data-Oriented design.
Verdict: False (though WASM does boost S&P).

Without WASM, simply switching memory layout from objects to the decomposed typed arrays (ECS) in pure Javascript alone yields a 3x to 22x speedup.

While our WebAssembly solver (WASM Quick S&P) is ~2.5x faster than pure Javascript (JS Quick S&P), one major WASM advantage is using the unchecked array access operator, bypassing pointer aliasing runtime safety checks.

Runtime & Memory Layout	Contender	Avg Frame Time	Speedup vs OOP
WebAssembly (SoA)	WASM ECS S&P (Quick)	2.141 ms	3.91x
JavaScript (SoA)	Custom ECS S&P (Quick)	5.318 ms	1.58x
JavaScript (AoS)	OOP S&P (Quick)	8.380 ms	1.00x

🏆 Grand Finale

Question: What seems to be the fastest reasonably architected code that we can come up with? Verdict: Pitting the top spatial trees (WASM Tree, JS ECS Tree) against the Sweep & Prune champions (WASM Quick S&P, JS Quick S&P) across 15,000 chaotic colliding bodies:

Contender	Paradigm	Architecture	Runtime (Erratic)	Verdict
WASM Quick S&P	ECS	1D Flat Array	2.14 ms	🏆 Winner (~7.9x faster than WASM Tree)
JS Quick S&P	ECS	1D Flat Array	5.32 ms	Best pure Javascript engine
WASM Tree	ECS	2D Spatial BVH	16.98 ms	Peak spatial tree performance
JS ECS Tree	ECS	2D Spatial BVH	16.64 ms	JS spatial tree baseline

My Takeaway: Even when compiled to WebAssembly, pointer-chasing and unpredictable tree branching cannot compete with the contiguous L1/L2 cache locality of a flat 1D array sort. And ECS wins are realizable even with plain old javascript, not requiring WASM (but it certainly can help).

Am I missing anything?

What am I missing? What other configurations should we add? Did I mess something up?

FAQs

Why not use Web Workers? Because in a microbenchmark, putting things in Web Workers can actually obscure the real bottlenecks. If we spin up multiple threads, the browser might optimize the main thread in a vacuum, leading to “microbenchmark syndrome.” Keeping it single-threaded makes sure we are directly comparing how the browser’s engine manages memory access under active simulation load.
Why are the 99th percentiles so close to the averages in ECS? Because ECS memory access patterns are highly predictable. The CPU pre-fetcher loads memory caches before the CPU executes instructions, keeping frame-to-frame timings extremely stable. In OOP, heap structures cause sporadic page misses and garbage collection pauses, leading to high 99th percentile frame spikes.
Why not C or C++ instead of AssemblyScript? I attempted to use C++ here, but there were two main hurdles that made it slower than AssemblyScript:
- Pointer Aliasing: AssemblyScript allowed me to easily use unchecked() to disable pointer aliasing protection, which was a bit harder to do with C++ and clang, but definitely not impossible.
- Optimized Math Operations: I could not figure out how to get the math library calls faster in C++ to WASM - somehow AssemblyScript links in really fast math calls for sin/cos/pow/etc, and I failed to get that working with C++.
What is ‘hybrid’ about the Hybrid Merge and Hybrid Quick sorts? Both algorithms optimize sorting performance at small scales. When a subarray recursive branch shrinks to fewer than 12 elements, they switch from recursive partitioning/splitting to a simple Insertion Sort. This eliminates the relatively high call-stack and buffer overhead of divide-and-conquer recursion for tiny arrays, where insertion sort’s near-zero setup overhead makes it much faster.

Appendix

Full Benchmark Dataset

For the statistics nerds and performance purists, here is the full, unadulterated dataset of our run of 200 frames across all 18 registered simulator configurations.

1. Static Behavior (Absolute Coherence - 5,000 Entities)

System	Avg Frame Time	99th Percentile	Speedup
WASM ECS S&P (Insertion)	0.148 ms	0.178 ms	4.43x
WASM ECS S&P (Merge)	0.182 ms	0.226 ms	3.60x
WASM ECS S&P (Quick)	0.202 ms	0.243 ms	3.24x
Custom ECS S&P (Insertion)	0.434 ms	0.484 ms	1.51x
Custom ECS S&P (Quick)	0.445 ms	0.525 ms	1.47x
Custom ECS S&P (Merge)	0.445 ms	0.507 ms	1.47x
Custom ECS S&P (Native)	0.601 ms	0.807 ms	1.09x
OOP S&P (Native)	0.604 ms	0.821 ms	1.08x
OOP S&P (Insertion)	0.654 ms	0.912 ms	1.00x
OOP S&P (Quick)	0.706 ms	1.067 ms	0.93x
OOP S&P (Merge)	0.720 ms	1.044 ms	0.91x
WASM Tree	0.742 ms	0.859 ms	0.88x
ECS Tree	0.887 ms	1.052 ms	0.74x
bitECS S&P (Insertion)	0.888 ms	0.967 ms	0.74x
bitECS S&P (Quick)	0.957 ms	1.067 ms	0.68x
bitECS S&P (Merge)	1.066 ms	2.337 ms	0.61x
OOP Tree	1.104 ms	1.782 ms	0.59x
bitECS S&P (Native)	1.351 ms	3.841 ms	0.48x

2. Wandering Behavior (High Coherence - 5,000 Entities)

System	Avg Frame Time	99th Percentile	Speedup
WASM ECS S&P (Insertion)	0.318 ms	0.355 ms	3.54x
WASM ECS S&P (Merge)	0.343 ms	0.483 ms	3.28x
WASM ECS S&P (Quick)	0.374 ms	0.472 ms	3.01x
Custom ECS S&P (Insertion)	0.845 ms	0.963 ms	1.33x
Custom ECS S&P (Merge)	0.856 ms	1.090 ms	1.31x
Custom ECS S&P (Quick)	0.858 ms	0.994 ms	1.31x
Custom ECS S&P (Native)	1.059 ms	1.723 ms	1.06x
OOP S&P (Insertion)	1.124 ms	1.676 ms	1.00x
OOP S&P (Quick)	1.201 ms	1.769 ms	0.94x
OOP S&P (Merge)	1.325 ms	1.885 ms	0.85x
OOP S&P (Native)	1.379 ms	2.104 ms	0.81x
bitECS S&P (Insertion)	1.588 ms	2.603 ms	0.71x
bitECS S&P (Quick)	1.662 ms	2.384 ms	0.68x
bitECS S&P (Merge)	1.773 ms	2.556 ms	0.63x
bitECS S&P (Native)	1.829 ms	2.673 ms	0.61x
WASM Tree	2.233 ms	2.614 ms	0.50x
ECS Tree	2.353 ms	2.941 ms	0.48x
OOP Tree	2.938 ms	11.642 ms	0.38x

3. Erratic Behavior (Low Coherence - 5,000 Entities)

System	Avg Frame Time	99th Percentile	Speedup
WASM ECS S&P (Quick)	0.462 ms	0.719 ms	28.71x
WASM ECS S&P (Merge)	0.475 ms	0.740 ms	27.91x
Custom ECS S&P (Quick)	1.070 ms	1.605 ms	12.39x
Custom ECS S&P (Merge)	1.125 ms	1.637 ms	11.79x
Custom ECS S&P (Native)	1.487 ms	2.149 ms	8.92x
OOP S&P (Quick)	1.549 ms	1.884 ms	8.56x
OOP S&P (Merge)	1.649 ms	2.101 ms	8.04x
bitECS S&P (Quick)	1.675 ms	2.174 ms	7.92x
bitECS S&P (Merge)	1.743 ms	2.373 ms	7.61x
bitECS S&P (Native)	1.966 ms	2.413 ms	6.74x
OOP S&P (Native)	2.052 ms	2.765 ms	6.46x
WASM ECS S&P (Insertion)	3.433 ms	4.153 ms	3.86x
ECS Tree	4.422 ms	5.360 ms	3.00x
WASM Tree	4.485 ms	4.973 ms	2.96x
Custom ECS S&P (Insertion)	4.511 ms	6.598 ms	2.94x
OOP Tree	5.509 ms	12.983 ms	2.41x
bitECS S&P (Insertion)	11.583 ms	19.861 ms	1.14x
OOP S&P (Insertion)	13.258 ms	18.408 ms	1.00x

4. Static Behavior (Absolute Coherence - 15,000 Entities)

System	Avg Frame Time	99th Percentile	Speedup
WASM ECS S&P (Insertion)	0.959 ms	1.085 ms	5.43x
WASM ECS S&P (Merge)	1.112 ms	1.349 ms	4.69x
WASM ECS S&P (Quick)	1.170 ms	1.449 ms	4.45x
Custom ECS S&P (Insertion)	3.162 ms	3.653 ms	1.65x
Custom ECS S&P (Quick)	3.293 ms	3.769 ms	1.58x
Custom ECS S&P (Merge)	3.395 ms	3.954 ms	1.53x
WASM Tree	3.587 ms	5.116 ms	1.45x
Custom ECS S&P (Native)	3.902 ms	4.741 ms	1.34x
ECS Tree	4.320 ms	5.095 ms	1.21x
OOP S&P (Insertion)	5.210 ms	6.260 ms	1.00x
OOP S&P (Native)	5.423 ms	7.281 ms	0.96x
OOP S&P (Quick)	5.614 ms	6.560 ms	0.93x
OOP S&P (Merge)	5.995 ms	9.221 ms	0.87x
OOP Tree	6.746 ms	9.319 ms	0.77x
bitECS S&P (Quick)	7.456 ms	10.520 ms	0.70x
bitECS S&P (Merge)	7.447 ms	9.227 ms	0.70x
bitECS S&P (Native)	7.712 ms	9.912 ms	0.68x
bitECS S&P (Insertion)	8.448 ms	14.122 ms	0.62x

5. Wandering Behavior (High Coherence - 15,000 Entities)

System	Avg Frame Time	99th Percentile	Speedup
WASM ECS S&P (Insertion)	1.557 ms	1.859 ms	4.71x
WASM ECS S&P (Merge)	1.767 ms	2.186 ms	4.15x
WASM ECS S&P (Quick)	1.809 ms	2.031 ms	4.05x
Custom ECS S&P (Insertion)	4.489 ms	6.483 ms	1.63x
Custom ECS S&P (Quick)	4.668 ms	6.111 ms	1.57x
Custom ECS S&P (Merge)	4.781 ms	5.670 ms	1.53x
Custom ECS S&P (Native)	5.658 ms	8.731 ms	1.30x
OOP S&P (Insertion)	7.335 ms	9.044 ms	1.00x
OOP S&P (Merge)	7.540 ms	9.290 ms	0.97x
OOP S&P (Quick)	7.645 ms	10.421 ms	0.96x
OOP S&P (Native)	8.036 ms	11.292 ms	0.91x
bitECS S&P (Quick)	9.077 ms	10.931 ms	0.81x
WASM Tree	9.083 ms	10.938 ms	0.81x
bitECS S&P (Merge)	9.215 ms	11.554 ms	0.80x
bitECS S&P (Native)	9.674 ms	11.345 ms	0.76x
ECS Tree	9.970 ms	43.666 ms	0.74x
bitECS S&P (Insertion)	10.730 ms	37.820 ms	0.68x
OOP Tree	16.321 ms	40.756 ms	0.45x

6. Erratic Behavior (Low Coherence - 15,000 Entities)

System	Avg Frame Time	99th Percentile	Speedup
WASM ECS S&P (Quick)	2.141 ms	2.588 ms	61.92x
WASM ECS S&P (Merge)	2.226 ms	2.548 ms	59.56x
Custom ECS S&P (Quick)	5.318 ms	5.765 ms	24.93x
Custom ECS S&P (Merge)	5.668 ms	6.334 ms	23.39x
Custom ECS S&P (Native)	6.991 ms	8.146 ms	18.97x
OOP S&P (Quick)	8.380 ms	9.820 ms	15.82x
OOP S&P (Merge)	8.826 ms	12.094 ms	15.02x
bitECS S&P (Quick)	9.692 ms	11.258 ms	13.68x
bitECS S&P (Merge)	10.037 ms	12.722 ms	13.21x
OOP S&P (Native)	10.645 ms	14.415 ms	12.45x
bitECS S&P (Native)	10.877 ms	12.864 ms	12.19x
ECS Tree	16.639 ms	19.107 ms	7.97x
WASM Tree	16.976 ms	19.827 ms	7.81x
OOP Tree	29.261 ms	74.175 ms	4.53x
WASM ECS S&P (Insertion)	32.518 ms	35.499 ms	4.08x
Custom ECS S&P (Insertion)	36.896 ms	41.801 ms	3.59x
bitECS S&P (Insertion)	101.341 ms	122.182 ms	1.31x
OOP S&P (Insertion)	132.579 ms	140.137 ms	1.00x

Published: Wed Jun 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time)