[PECO-1260] Support results compression #216

kravets-levko · 2024-01-11T16:31:27Z

This PR adds a support for LZ4 compressed results (both Arrow-based and CloudFetch). This almost doesn't affect CPU and memory usage (LZ4 is focused on compression and decompression speed), but noticeably decreases amount of data being downloaded. For example, on synthetic tests uncompressed CloudFetch batches have size of 15371496 bytes each, and with LZ4 compression enabled their size varies between 5041380 and 5042153 bytes (approx. 33% from uncompressed size). Of course this ratio depends on actual data.

Profiler reports (synthetic tests, 10000000 records, CloudFetch with single download thread):

Before

After

Signed-off-by: Levko Kravets <levko.ne@gmail.com>

codecov-commenter · 2024-01-25T14:01:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (5b01d59) 90.55% compared to head (384886e) 90.59%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #216      +/-   ##
==========================================
+ Coverage   90.55%   90.59%   +0.03%     
==========================================
  Files          62       62              
  Lines        1429     1435       +6     
  Branches      241      245       +4     
==========================================
+ Hits         1294     1300       +6     
  Misses         84       84              
  Partials       51       51

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nithinkdb

Looks good!

susodapop

overall looks good. A few comments about unifying with pysql and golang plus a thought about making lz4 on/off part of a test matrix so we have more complete test coverage.

susodapop · 2024-01-29T22:16:53Z

lib/DBSQLClient.ts

@@ -84,6 +84,8 @@ export default class DBSQLClient extends EventEmitter implements IDBSQLClient, I

      useCloudFetch: false,
      cloudFetchConcurrentDownloads: 10,
+
+      useResultsCompression: true,


For unification between golang, nodejs, and pysql can you call this either useLz4Compression or just lz4Compression? In pysql we call this flag explicitly lz4_compression. In golang we call it useLz4Compression

NP, will rename 👍

susodapop · 2024-01-29T22:19:24Z

lib/result/ArrowResultHandler.ts

    this.context = context;
    this.source = source;
    this.arrowSchema = arrowSchema;
+    this.isLZ4Compressed = isLZ4Compressed ?? false;


here's a difference between pysql and golang: pysql enables lz4 compression by default. golang does not. Is there a compelling reason to not enable this by default in nodejs?

No good reason. I'll change it to be enabled by default

Actually, I just realized that it already defaults to true - the default value is set in DBSQLClient.getDefaultConfig. This ?? false handles a case when server responds with metadata.lz4Compressed = undefined (which means that actual result is not compressed)

susodapop · 2024-01-29T22:21:01Z

tests/e2e/arrow.test.js

+      async (session) => {
+        const operation = await session.executeStatement(`SELECT * FROM ${tableName}`);
+        const result = await operation.fetchAll();
+        expect(fixArrowResult(result)).to.deep.equal(expectedArrow);


Just confirming: this is the assertion that verifies the decompressed output generated by fixArrowResult(result) is equivalent to expectedArrow?

Yes, exactly. fixArrowResult does very minor processing of the result to fix floating point fields

susodapop · 2024-01-29T22:21:47Z

tests/e2e/cloudfetch.test.js

-    const session = await openSession({ cloudFetchConcurrentDownloads });
+    const session = await openSession({
+      cloudFetchConcurrentDownloads,
+      useResultsCompression: false,


Do we have a way to run the other e2e tests with lz4 both true and false like a test matrix?

There is a test for useResultsCompression: true.

I agree that it's a good idea to have some sort of test matrix, an even use it more widely in tests. But I think I won't do it right now. I have some tests improvements tasks in my backlog, will add this one as well

susodapop · 2024-01-29T22:22:24Z

tests/e2e/cloudfetch.test.js

@@ -86,4 +89,38 @@ describe('CloudFetch', () => {

    expect(fetchedRowCount).to.be.equal(queriedRowsCount);
  });
+
+  it('should handle LZ4 compressed data', async () => {


imo it would be useful to make lz4=true|false part of a matrix for this test so that the setup/teardown is not duplicated between tests. Less copy and paste, and a more trustworthy test result.

Signed-off-by: Levko Kravets <levko.ne@gmail.com>

susodapop

LGTM.

[PECO-1260] Support results compression

7030cdb

Signed-off-by: Levko Kravets <levko.ne@gmail.com>

kravets-levko had a problem deploying to azure-prod January 11, 2024 16:31 — with GitHub Actions Failure

Fix existing tests

e26f948

Signed-off-by: Levko Kravets <levko.ne@gmail.com>

kravets-levko temporarily deployed to azure-prod January 11, 2024 16:52 — with GitHub Actions Inactive

Merge branch 'main' into PECO-1260-results-compression

978d028

kravets-levko temporarily deployed to azure-prod January 17, 2024 19:12 — with GitHub Actions Inactive

Merge branch 'main' into PECO-1260-results-compression

695a814

kravets-levko temporarily deployed to azure-prod January 25, 2024 09:11 — with GitHub Actions Inactive

databricks deleted a comment from codecov-commenter Jan 25, 2024

Add tests

77ae010

Signed-off-by: Levko Kravets <levko.ne@gmail.com>

kravets-levko temporarily deployed to azure-prod January 25, 2024 13:54 — with GitHub Actions Inactive

kravets-levko temporarily deployed to azure-prod January 25, 2024 14:00 — with GitHub Actions Inactive

kravets-levko marked this pull request as ready for review January 25, 2024 14:19

kravets-levko requested review from arikfr, superdupershant, yunbodeng-db, susodapop, nithinkdb, andrefurlan-db and rcypher-databricks as code owners January 25, 2024 14:19

Merge branch 'main' into PECO-1260-results-compression

384886e

kravets-levko temporarily deployed to azure-prod January 29, 2024 21:57 — with GitHub Actions Inactive

nithinkdb approved these changes Jan 29, 2024

View reviewed changes

kravets-levko temporarily deployed to azure-prod January 29, 2024 22:16 — with GitHub Actions Inactive

susodapop reviewed Jan 29, 2024

View reviewed changes

Rename option to align with other connectors

194d059

Signed-off-by: Levko Kravets <levko.ne@gmail.com>

kravets-levko temporarily deployed to azure-prod January 30, 2024 13:01 — with GitHub Actions Inactive

kravets-levko requested a review from susodapop January 30, 2024 13:05

susodapop approved these changes Jan 30, 2024

View reviewed changes

kravets-levko merged commit f3c53a5 into main Jan 30, 2024

kravets-levko deleted the PECO-1260-results-compression branch January 30, 2024 14:02

archiewood mentioned this pull request Apr 13, 2024

lz4 dependency is causing install issues for our users #245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PECO-1260] Support results compression #216

[PECO-1260] Support results compression #216

Uh oh!

kravets-levko commented Jan 11, 2024 •

edited

Loading

Uh oh!

codecov-commenter commented Jan 25, 2024 •

edited

Loading

Uh oh!

nithinkdb left a comment

Uh oh!

susodapop left a comment

Uh oh!

susodapop Jan 29, 2024

Uh oh!

kravets-levko Jan 29, 2024

Uh oh!

susodapop Jan 29, 2024

Uh oh!

kravets-levko Jan 29, 2024

Uh oh!

kravets-levko Jan 30, 2024 •

edited

Loading

Uh oh!

susodapop Jan 29, 2024

Uh oh!

kravets-levko Jan 29, 2024

Uh oh!

susodapop Jan 29, 2024

Uh oh!

kravets-levko Jan 29, 2024

Uh oh!

susodapop Jan 29, 2024

Uh oh!

susodapop left a comment

Uh oh!

Uh oh!

[PECO-1260] Support results compression #216

[PECO-1260] Support results compression #216

Uh oh!

Conversation

kravets-levko commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jan 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nithinkdb left a comment

Choose a reason for hiding this comment

Uh oh!

susodapop left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kravets-levko Jan 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

susodapop left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kravets-levko commented Jan 11, 2024 •

edited

Loading

codecov-commenter commented Jan 25, 2024 •

edited

Loading

kravets-levko Jan 30, 2024 •

edited

Loading