The Physics of Memory
I’ve heard that certain algorithms like sorting (or perhaps all algorithms) are improved quite a bit when the accessed data has high memory-locality (e.g. the CPU can load all of the relevant information in it’s cache for processing, instead of hitting RAM). A common way to work with this is using the Entity Component System (ECS) software architecture. However, it can be difficult to realize this for interpreted or OOP languages (Java, Python, Javscript), as they often don’t offer the developer as much control over their object memory location. My question is:
Is it possible to use an ECS-style architecture in Javascript? And can it use that to improve memory locality to get algorithmic performance improvements?
To test this, I created a complex-enough benchmark of… “balls bouncing around in a box”! This follows standard 2d physics engine techniques:
- A “broad-phase” collision step (AKA “do bounding boxes intersect?”)
- A “narrow-phase” collision step (AKA “given these intersecting boxes, resolve collisions!”)
And of course, I got very carried away and ended up creating a ton of benchmark types below. Try it for yourself!
Background - What is ECS?
If you’ve spent any time in game development, you’ve probably heard of Entity Component System (ECS). If OOP is a row-based database (where each object stores every attribute in one memory chunk), ECS is a column-based database, storing each attribute (or collection of attributes) in separate arrays.
You can probably find much better descriptions of ECS elsewhere, but briefly - instead of building complex class hierarchies like a GameEntity that inherits from MovingObject which inherits from PhysicalObject etc, ECS breaks everything down into three distinct concepts:
- Entities: These are nothing more than unique integer IDs. They don’t contain any data or behavior.
- Components: Plain data structures that represent individual facets of state (e.g.
Position,Velocity,Color). They contain no logic. Each entity can have a dynamic set of components, from 0 to all. - Systems: Functions or loops that execute the game logic. They query entities given a specific set of components to then operate on that data.
Benchmark Choices
I went overboard and tested across five dimensions:
- Language: Javascript (testing V8 engine optimizations & TypedArrays) vs. WASM (AssemblyScript for extreme memory control).
- Architecture: Traditional OOP (arrays of object references) vs. ECS (cache-friendly Structure of Arrays).
- Broad-phase: Spatial AVL Tree ( balanced tree) vs. Sweep & Prune (sort on X-axis, sweep for overlaps).
- Sorting Strategy (for S&P): Tested Insertion Sort (the textbook favorite for nearly-sorted data), Hybrid Quicksort, a zero-copy Hybrid Mergesort, and JS Native
Array.sort(). - Motion Coherence: Wandering (smooth drift; array stays nearly sorted), Erratic (random teleporting; extreme cache thrashing), and Static (zero motion; isolates pure memory read speed).
Anyways, let’s see how it works! This is hosted on GitHub if you want to play around with it.
Macbook M4 Results
Running all these configurations locally on my Apple M4 Pro chip (plugged in, 200 frames) creates a lot of numbers. If you want the full dataset of all 6 runs, check out the Appendix: Full Benchmark Dataset below.
To cut through information overload, let’s focus on 3 main hypotheses:
H1: Spatial Trees vs. Sweep & Prune
Hypothesis: An spatial tree is always faster than an Sweep & Prune.
Verdict: Almost always no!
While Big-O complexity dictates that hierarchical trees should need fewer operations, Sweep & Prune operates sequentially on small buffers that fit inside the CPU’s L1 cache. Contiguous memory streaming in flat arrays consistently beats pointer-chasing through tree nodes.
More importantly, when we upgrade Sweep & Prune to use a memory-local Quick Sort (or Merge Sort), it becomes universally faster than spatial trees across all workloads. Interestingly, the WebAssembly version of the spatial tree runs at almost identical speed to pure Javascript. Let’s test this head-to-head at 15,000 wandering bodies:
| Broad-phase System | Architecture | Avg Frame Time | 99th Percentile | Speedup vs OOP Tree |
|---|---|---|---|---|
| WASM ECS S&P (Quick) | Flat 1D Array | 1.809 ms | 2.031 ms | 9.02x |
| Custom ECS S&P (Quick) | Flat 1D Array | 4.668 ms | 6.111 ms | 3.50x |
| ECS Tree | 2D Spatial BVH | 9.970 ms | 43.666 ms | 1.64x |
| OOP Tree | 2D Spatial BVH | 16.321 ms | 40.756 ms | 1.00x |
H2: Sorting Strategies at Scale
Hypothesis: Naive Insertion Sort ( for nearly sorted data) is optimal for real-time physics.
Verdict: Kinda, but Quick Sort is the overall winner and much more consistent.
In everyday 2D simulations (under 5,000 entities), both Insertion Sort and Quick/Merge Sort execute in a tiny fraction of a millisecond. Insertion can be faster here! But Quick Sort or Merge Sort are much more consistent, avoiding worst-case behavior. While Quick Sort and Merge Sort perform virtually identically in static or wandering scenarios, Quick Sort pulls ahead when chaotic motion causes cache misses.
Under extreme gameplay events (explosions, scene transitions, high-velocity collisions), naive Insertion Sort drops into a quadratic jank-fest. At 15,000 erratic bodies:
- Naive Insertion Sort: 36.90 ms (JS ECS) / 132.58 ms (JS OOP)
- JS ECS Tree (Flat BVH): 16.64 ms
- Quick Sort: 5.32 ms (JS ECS)
- Merge Sort: 5.67 ms (JS ECS)
Notice that under erratic conditions, naive Insertion Sort becomes nearly 2.2x slower than spatial trees (36.90 ms vs 16.64 ms). By contrast Quick Sort () maintains rock-solid cache streaming, running ~3.1x faster than spatial trees without any risk of jank.
| Sorting Strategy (JS ECS) | Algorithm | Avg Frame Time | 99th Percentile |
|---|---|---|---|
| Quicksort (Hybrid) | in-place | 5.318 ms | 5.765 ms |
| Merge Sort (Hybrid) | zero-copy | 5.668 ms | 6.334 ms |
| Native V8 Sort | Timsort / Introsort | 6.991 ms | 8.146 ms |
| Insertion Sort | worst-case | 36.896 ms | 41.801 ms |
H3: Javascript vs. WebAssembly
Hypothesis: WebAssembly is required to realize the cache locality benefits of Data-Oriented design.
Verdict: False (though WASM does boost S&P).
Without WASM, simply switching memory layout from objects to the decomposed typed arrays (ECS) in pure Javascript alone yields a 3x to 22x speedup.
While our WebAssembly solver (WASM Quick S&P) is ~2.5x faster than pure Javascript (JS Quick S&P), one major WASM advantage is using the unchecked array access operator, bypassing pointer aliasing runtime safety checks.
| Runtime & Memory Layout | Contender | Avg Frame Time | Speedup vs OOP |
|---|---|---|---|
| WebAssembly (SoA) | WASM ECS S&P (Quick) | 2.141 ms | 3.91x |
| JavaScript (SoA) | Custom ECS S&P (Quick) | 5.318 ms | 1.58x |
| JavaScript (AoS) | OOP S&P (Quick) | 8.380 ms | 1.00x |
🏆 Grand Finale
Question: What seems to be the fastest reasonably architected code that we can come up with?
Verdict: Pitting the top spatial trees (WASM Tree, JS ECS Tree) against the Sweep & Prune champions (WASM Quick S&P, JS Quick S&P) across 15,000 chaotic colliding bodies:
| Contender | Paradigm | Architecture | Runtime (Erratic) | Verdict |
|---|---|---|---|---|
| WASM Quick S&P | ECS | 1D Flat Array | 2.14 ms | 🏆 Winner (~7.9x faster than WASM Tree) |
| JS Quick S&P | ECS | 1D Flat Array | 5.32 ms | Best pure Javascript engine |
| WASM Tree | ECS | 2D Spatial BVH | 16.98 ms | Peak spatial tree performance |
| JS ECS Tree | ECS | 2D Spatial BVH | 16.64 ms | JS spatial tree baseline |
My Takeaway: Even when compiled to WebAssembly, pointer-chasing and unpredictable tree branching cannot compete with the contiguous L1/L2 cache locality of a flat 1D array sort. And ECS wins are realizable even with plain old javascript, not requiring WASM (but it certainly can help).
Am I missing anything?
What am I missing? What other configurations should we add? Did I mess something up?
FAQs
- Why not use Web Workers? Because in a microbenchmark, putting things in Web Workers can actually obscure the real bottlenecks. If we spin up multiple threads, the browser might optimize the main thread in a vacuum, leading to “microbenchmark syndrome.” Keeping it single-threaded makes sure we are directly comparing how the browser’s engine manages memory access under active simulation load.
- Why are the 99th percentiles so close to the averages in ECS? Because ECS memory access patterns are highly predictable. The CPU pre-fetcher loads memory caches before the CPU executes instructions, keeping frame-to-frame timings extremely stable. In OOP, heap structures cause sporadic page misses and garbage collection pauses, leading to high 99th percentile frame spikes.
- Why not C or C++ instead of AssemblyScript? I attempted to use C++ here, but there were two main hurdles that made it slower than AssemblyScript:
- Pointer Aliasing: AssemblyScript allowed me to easily use
unchecked()to disable pointer aliasing protection, which was a bit harder to do with C++ and clang, but definitely not impossible. - Optimized Math Operations: I could not figure out how to get the math library calls faster in C++ to WASM - somehow AssemblyScript links in really fast math calls for sin/cos/pow/etc, and I failed to get that working with C++.
- Pointer Aliasing: AssemblyScript allowed me to easily use
- What is ‘hybrid’ about the Hybrid Merge and Hybrid Quick sorts? Both algorithms optimize sorting performance at small scales. When a subarray recursive branch shrinks to fewer than 12 elements, they switch from recursive partitioning/splitting to a simple Insertion Sort. This eliminates the relatively high call-stack and buffer overhead of divide-and-conquer recursion for tiny arrays, where insertion sort’s near-zero setup overhead makes it much faster.
Appendix
Full Benchmark Dataset
For the statistics nerds and performance purists, here is the full, unadulterated dataset of our run of 200 frames across all 18 registered simulator configurations.
1. Static Behavior (Absolute Coherence - 5,000 Entities)
| System | Avg Frame Time | 99th Percentile | Speedup | |
|---|---|---|---|---|
| WASM ECS S&P (Insertion) | 0.148 ms | 0.178 ms | 4.43x | |
| WASM ECS S&P (Merge) | 0.182 ms | 0.226 ms | 3.60x | |
| WASM ECS S&P (Quick) | 0.202 ms | 0.243 ms | 3.24x | |
| Custom ECS S&P (Insertion) | 0.434 ms | 0.484 ms | 1.51x | |
| Custom ECS S&P (Quick) | 0.445 ms | 0.525 ms | 1.47x | |
| Custom ECS S&P (Merge) | 0.445 ms | 0.507 ms | 1.47x | |
| Custom ECS S&P (Native) | 0.601 ms | 0.807 ms | 1.09x | |
| OOP S&P (Native) | 0.604 ms | 0.821 ms | 1.08x | |
| OOP S&P (Insertion) | 0.654 ms | 0.912 ms | 1.00x | |
| OOP S&P (Quick) | 0.706 ms | 1.067 ms | 0.93x | |
| OOP S&P (Merge) | 0.720 ms | 1.044 ms | 0.91x | |
| WASM Tree | 0.742 ms | 0.859 ms | 0.88x | |
| ECS Tree | 0.887 ms | 1.052 ms | 0.74x | |
| bitECS S&P (Insertion) | 0.888 ms | 0.967 ms | 0.74x | |
| bitECS S&P (Quick) | 0.957 ms | 1.067 ms | 0.68x | |
| bitECS S&P (Merge) | 1.066 ms | 2.337 ms | 0.61x | |
| OOP Tree | 1.104 ms | 1.782 ms | 0.59x | |
| bitECS S&P (Native) | 1.351 ms | 3.841 ms | 0.48x |
2. Wandering Behavior (High Coherence - 5,000 Entities)
| System | Avg Frame Time | 99th Percentile | Speedup | |
|---|---|---|---|---|
| WASM ECS S&P (Insertion) | 0.318 ms | 0.355 ms | 3.54x | |
| WASM ECS S&P (Merge) | 0.343 ms | 0.483 ms | 3.28x | |
| WASM ECS S&P (Quick) | 0.374 ms | 0.472 ms | 3.01x | |
| Custom ECS S&P (Insertion) | 0.845 ms | 0.963 ms | 1.33x | |
| Custom ECS S&P (Merge) | 0.856 ms | 1.090 ms | 1.31x | |
| Custom ECS S&P (Quick) | 0.858 ms | 0.994 ms | 1.31x | |
| Custom ECS S&P (Native) | 1.059 ms | 1.723 ms | 1.06x | |
| OOP S&P (Insertion) | 1.124 ms | 1.676 ms | 1.00x | |
| OOP S&P (Quick) | 1.201 ms | 1.769 ms | 0.94x | |
| OOP S&P (Merge) | 1.325 ms | 1.885 ms | 0.85x | |
| OOP S&P (Native) | 1.379 ms | 2.104 ms | 0.81x | |
| bitECS S&P (Insertion) | 1.588 ms | 2.603 ms | 0.71x | |
| bitECS S&P (Quick) | 1.662 ms | 2.384 ms | 0.68x | |
| bitECS S&P (Merge) | 1.773 ms | 2.556 ms | 0.63x | |
| bitECS S&P (Native) | 1.829 ms | 2.673 ms | 0.61x | |
| WASM Tree | 2.233 ms | 2.614 ms | 0.50x | |
| ECS Tree | 2.353 ms | 2.941 ms | 0.48x | |
| OOP Tree | 2.938 ms | 11.642 ms | 0.38x |
3. Erratic Behavior (Low Coherence - 5,000 Entities)
| System | Avg Frame Time | 99th Percentile | Speedup | |
|---|---|---|---|---|
| WASM ECS S&P (Quick) | 0.462 ms | 0.719 ms | 28.71x | |
| WASM ECS S&P (Merge) | 0.475 ms | 0.740 ms | 27.91x | |
| Custom ECS S&P (Quick) | 1.070 ms | 1.605 ms | 12.39x | |
| Custom ECS S&P (Merge) | 1.125 ms | 1.637 ms | 11.79x | |
| Custom ECS S&P (Native) | 1.487 ms | 2.149 ms | 8.92x | |
| OOP S&P (Quick) | 1.549 ms | 1.884 ms | 8.56x | |
| OOP S&P (Merge) | 1.649 ms | 2.101 ms | 8.04x | |
| bitECS S&P (Quick) | 1.675 ms | 2.174 ms | 7.92x | |
| bitECS S&P (Merge) | 1.743 ms | 2.373 ms | 7.61x | |
| bitECS S&P (Native) | 1.966 ms | 2.413 ms | 6.74x | |
| OOP S&P (Native) | 2.052 ms | 2.765 ms | 6.46x | |
| WASM ECS S&P (Insertion) | 3.433 ms | 4.153 ms | 3.86x | |
| ECS Tree | 4.422 ms | 5.360 ms | 3.00x | |
| WASM Tree | 4.485 ms | 4.973 ms | 2.96x | |
| Custom ECS S&P (Insertion) | 4.511 ms | 6.598 ms | 2.94x | |
| OOP Tree | 5.509 ms | 12.983 ms | 2.41x | |
| bitECS S&P (Insertion) | 11.583 ms | 19.861 ms | 1.14x | |
| OOP S&P (Insertion) | 13.258 ms | 18.408 ms | 1.00x |
4. Static Behavior (Absolute Coherence - 15,000 Entities)
| System | Avg Frame Time | 99th Percentile | Speedup | |
|---|---|---|---|---|
| WASM ECS S&P (Insertion) | 0.959 ms | 1.085 ms | 5.43x | |
| WASM ECS S&P (Merge) | 1.112 ms | 1.349 ms | 4.69x | |
| WASM ECS S&P (Quick) | 1.170 ms | 1.449 ms | 4.45x | |
| Custom ECS S&P (Insertion) | 3.162 ms | 3.653 ms | 1.65x | |
| Custom ECS S&P (Quick) | 3.293 ms | 3.769 ms | 1.58x | |
| Custom ECS S&P (Merge) | 3.395 ms | 3.954 ms | 1.53x | |
| WASM Tree | 3.587 ms | 5.116 ms | 1.45x | |
| Custom ECS S&P (Native) | 3.902 ms | 4.741 ms | 1.34x | |
| ECS Tree | 4.320 ms | 5.095 ms | 1.21x | |
| OOP S&P (Insertion) | 5.210 ms | 6.260 ms | 1.00x | |
| OOP S&P (Native) | 5.423 ms | 7.281 ms | 0.96x | |
| OOP S&P (Quick) | 5.614 ms | 6.560 ms | 0.93x | |
| OOP S&P (Merge) | 5.995 ms | 9.221 ms | 0.87x | |
| OOP Tree | 6.746 ms | 9.319 ms | 0.77x | |
| bitECS S&P (Quick) | 7.456 ms | 10.520 ms | 0.70x | |
| bitECS S&P (Merge) | 7.447 ms | 9.227 ms | 0.70x | |
| bitECS S&P (Native) | 7.712 ms | 9.912 ms | 0.68x | |
| bitECS S&P (Insertion) | 8.448 ms | 14.122 ms | 0.62x |
5. Wandering Behavior (High Coherence - 15,000 Entities)
| System | Avg Frame Time | 99th Percentile | Speedup | |
|---|---|---|---|---|
| WASM ECS S&P (Insertion) | 1.557 ms | 1.859 ms | 4.71x | |
| WASM ECS S&P (Merge) | 1.767 ms | 2.186 ms | 4.15x | |
| WASM ECS S&P (Quick) | 1.809 ms | 2.031 ms | 4.05x | |
| Custom ECS S&P (Insertion) | 4.489 ms | 6.483 ms | 1.63x | |
| Custom ECS S&P (Quick) | 4.668 ms | 6.111 ms | 1.57x | |
| Custom ECS S&P (Merge) | 4.781 ms | 5.670 ms | 1.53x | |
| Custom ECS S&P (Native) | 5.658 ms | 8.731 ms | 1.30x | |
| OOP S&P (Insertion) | 7.335 ms | 9.044 ms | 1.00x | |
| OOP S&P (Merge) | 7.540 ms | 9.290 ms | 0.97x | |
| OOP S&P (Quick) | 7.645 ms | 10.421 ms | 0.96x | |
| OOP S&P (Native) | 8.036 ms | 11.292 ms | 0.91x | |
| bitECS S&P (Quick) | 9.077 ms | 10.931 ms | 0.81x | |
| WASM Tree | 9.083 ms | 10.938 ms | 0.81x | |
| bitECS S&P (Merge) | 9.215 ms | 11.554 ms | 0.80x | |
| bitECS S&P (Native) | 9.674 ms | 11.345 ms | 0.76x | |
| ECS Tree | 9.970 ms | 43.666 ms | 0.74x | |
| bitECS S&P (Insertion) | 10.730 ms | 37.820 ms | 0.68x | |
| OOP Tree | 16.321 ms | 40.756 ms | 0.45x |
6. Erratic Behavior (Low Coherence - 15,000 Entities)
| System | Avg Frame Time | 99th Percentile | Speedup | |
|---|---|---|---|---|
| WASM ECS S&P (Quick) | 2.141 ms | 2.588 ms | 61.92x | |
| WASM ECS S&P (Merge) | 2.226 ms | 2.548 ms | 59.56x | |
| Custom ECS S&P (Quick) | 5.318 ms | 5.765 ms | 24.93x | |
| Custom ECS S&P (Merge) | 5.668 ms | 6.334 ms | 23.39x | |
| Custom ECS S&P (Native) | 6.991 ms | 8.146 ms | 18.97x | |
| OOP S&P (Quick) | 8.380 ms | 9.820 ms | 15.82x | |
| OOP S&P (Merge) | 8.826 ms | 12.094 ms | 15.02x | |
| bitECS S&P (Quick) | 9.692 ms | 11.258 ms | 13.68x | |
| bitECS S&P (Merge) | 10.037 ms | 12.722 ms | 13.21x | |
| OOP S&P (Native) | 10.645 ms | 14.415 ms | 12.45x | |
| bitECS S&P (Native) | 10.877 ms | 12.864 ms | 12.19x | |
| ECS Tree | 16.639 ms | 19.107 ms | 7.97x | |
| WASM Tree | 16.976 ms | 19.827 ms | 7.81x | |
| OOP Tree | 29.261 ms | 74.175 ms | 4.53x | |
| WASM ECS S&P (Insertion) | 32.518 ms | 35.499 ms | 4.08x | |
| Custom ECS S&P (Insertion) | 36.896 ms | 41.801 ms | 3.59x | |
| bitECS S&P (Insertion) | 101.341 ms | 122.182 ms | 1.31x | |
| OOP S&P (Insertion) | 132.579 ms | 140.137 ms | 1.00x |
Published: Wed Jun 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time)