Optimising memory for Aggregates and Join operators in Apache Impala.

Hash Table

  1. Memory address for individual members starts at memory address divisible by its size i.e., pointer will start at memory divisible by 8, bool at 1 and uint32_t by 4. Members will be preceded by padding if needed to make sure the starting address is divisible by their size.
  2. Size of struct will be aligned to the it’s largest member. For instance, in both the struct above the largest member is a pointer of size 8 bytes. Hence, the size of struct will be multiple of 8 too.

Folding data into pointers

Figure 1. Level 5 64-bit memory address

Other requirements

  1. Internal memory allocator allocates memory in power of 2 to avoid internal fragmentation. Hence, number of buckets * sizeof(Bucket) should be power of 2.
  2. Number of buckets (‘N’) being the power of 2 enables faster modulo operation.

Experimental evaluation:

  1. Microbenchmark: We ran the build and probe methods multiple times on a smaller number of rows around 60 times to evaluate the performance and memory consumed.
  2. Billion-Row benchmark: On a single daemon, we ran build and probe benchmark for a billion rows to measure the performance and memory consumed.
  3. TPC-DS-10000: Entire TPC-DS benchmark of scale 10000 was run on a 17-node cluster to measure the performance. It also measured peak memory consumed at node and operator level.


Figure 2a. Memory Benchmark
Figure 2b. Runtime Benchmark

Billion-Row benchmark

Build Benchmark

Figure 3a. Memory usage in GroupBy query over 1 billion rows

Probe Benchmark

  1. sales_base: It has randomly generated 1 billion rows same as what was used in Build Benchmarks
  2. sales_30: It has 1 billion rows with 30% of the rows unique.
  3. sales_60: It has 1 Billion rows with 60% of the rows unique.
Figure 3b. Probe times with 1 Billion rows

TPCDS-10000 scale

Figure 4a. TPC-DS queries with memory reduction at operator level

Per-Operator Reduction:

Per-Node memory reduction:

Figure 4b. Average peak memory reduction across all nodes
Figure 4c. Reduction in max peak memory across all nodes





Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Changing Laravel’s App Directory

The history of the Pixels Camp hackathon voting system

8 Steps To Deploy AFlask App on Heroku

Install, Upgrade, and Downgrade Python 2.7 Packages on Mac and Linux

Implementing JWT Authentication to your API Platform application

HorseAnalytics. Dressage. iBeacons Approach. Part 2

Wordle # 300 today eliminates more stripes in an ironic way

Which Is The Best Free API To Know Platinum Rates?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Amogh Margoor

Amogh Margoor

More from Medium

Import/Export Using Apache Sqoop

Somewhat complex reporting and data analysis in MySql 8 — Part 1


Can SQL Be a Library Language?