Enabling Spark SQL MERGE via optimized ACID Data Source v0.6.0 | Qubole

MERGE Command:

SQL Syntax

MERGE INTO <target table> [AS T] USING <source table> [AS S] ON <boolean merge expression> WHEN MATCHED [AND <boolean expression1>] THEN <match_clause> WHEN MATCHED [AND <boolean expression2>] THEN <match_clause> WHEN NOT MATCHED [AND <boolean expression3>] THEN INSERT VALUES ( <insert value list> ) Possible match clauses can be <match_clause> :: UPDATE SET <set clause list> DELETE Possible insert value list can be <insert value list> :: value 1 [, value 2, value 3, ...] [value 1, value 2, ...] * [, value n, value n+1, ...] Possible update set list can be <update set list> :: target_col1 = value 1 [, target_col2 = value 2 ...]

Illustration

MERGE INTOdriver as t USING source as s ON t.id = s.id WHEN MATCHED AND t.city = 'closed' THEN DELETE WHEN MATCHED THEN UPDATE t.city = s.city, t.ratings = s.ratings WHEN NOT MATCHED THEN INSERT VALUES (*)

Optimized Write Performance:

Optimized Split Computation:

Release Notes and Acknowledgement:

We needed a tool that allowed us to stream data into a mutable Hive table. Spark Acid was able to help us achieve this goal. Amogh and the team at Qubole went out of their way to help us complete our project successfully. It really was above and beyond what was necessary. We wish everyone involved the very best with the release of version 0.6.0.

References:

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store