Optimising memory for Aggregates and Join operators in Apache Impala.

Hash Table

How do we achieve removing these booleans as they need to be present for every Bucket or DuplicateNode ?

tl;dr: We decided to remove all bool members by folding it into a pointer already part of the struct.

Folding data into pointers

Figure 1. Level 5 64-bit memory address

Other requirements

Experimental evaluation:

Microbenchmark

Figure 2a. Memory Benchmark
Figure 2b. Runtime Benchmark

Billion-Row benchmark

Figure 3a. Memory usage in GroupBy query over 1 billion rows
Figure 3b. Probe times with 1 Billion rows

TPCDS-10000 scale

Figure 4a. TPC-DS queries with memory reduction at operator level
Figure 4b. Average peak memory reduction across all nodes
Figure 4c. Reduction in max peak memory across all nodes

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store