Skip to content

implement new DDB streams provider using DynamoDB-Local #11688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Oct 18, 2024
Merged

Conversation

bentsku
Copy link
Contributor

@bentsku bentsku commented Oct 15, 2024

Motivation

First commit is based on 6f294e8

As reported by some of our users and also in #11599, DynamoDB-local supports DynamoDB Streams out of the box. This PR remove the current logic we had to support the stream functionality (using Kinesis under the hood) and which had to often query DDB-local multiple times for one request to get the old and/or new images.

This adds a considerable performance boost: we don't need to query ddb, and we reduce our internal usage of kinesis which reduces load on our gateway.

We only need to manipulate the headers + do some payload manipulation to modify the returned region, and update it before forwarding it to ddb local.

These are quite big, so we are releasing them under a new provider for now behind a feature flag (the usual PROVIDER_OVERRIDE_DYNAMODBSTREAMS=v2). Question is, is it DynamoDB v2 or DynamoDBStreams v2? The 2 implementations are tied together, and DynamoDB is shorter, so I'd maybe be inclined to choose that one instead.

Here's the successful run with the provider flag being forced: https://app.circleci.com/pipelines/github/localstack/localstack/28834/workflows/52d80e8f-6354-4764-b073-4f36f10fc121

Changes

@dfangl changes:

  • update the DynamoDB server to have a singleton method so that it can be shared between DDB and DDBStreams provider
  • remove lot of related code to Streams in the DDB Provider
    • the modified provider is a copy-paste of the default one, without all the stream related code, it might be easier to do a diff between those 2 in order to review (dynamodb/provider.py <-> dynamodb/v2/provider.py)
  • update all the code in the DynamoDB Streams provider to forward requests to ddb-local instead of accessing kinesis

@bentsku changes:

  • migrated the code to be in 2 different providers and create the providers definitions
  • updated the regex and added one in order to catch all possible cases with ddb streams
  • update the tests to not have kinesis assumptions
  • skip some fields because of ddb-local for ESM
  • remove even more code in the DDB provider related to streams

Limitations

This new implementation does not support DynamoDB.EnableKinesisStreamingDestination API action and logic about dispatching event to Kinesis directly yet.

TODO

What's left to do:

  • remove the commit 8431fa2 forcing the usage of the new implementation!

@bentsku bentsku added aws:dynamodb Amazon DynamoDB aws:dynamodbstreams AWS DynamoDB Streams semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases labels Oct 15, 2024
@bentsku bentsku self-assigned this Oct 15, 2024
@bentsku bentsku changed the title Ddb stream native implement new DDB streams provider using DynamoDB-Local Oct 15, 2024
Copy link

github-actions bot commented Oct 15, 2024

S3 Image Test Results (AMD64 / ARM64)

  2 files    2 suites   3m 50s ⏱️
423 tests 369 ✅  54 💤 0 ❌
846 runs  738 ✅ 108 💤 0 ❌

Results for commit e1f4bd9.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Oct 15, 2024

LocalStack Community integration with Pro

    2 files      2 suites   1h 42m 31s ⏱️
3 508 tests 3 095 ✅ 413 💤 0 ❌
3 510 runs  3 095 ✅ 415 💤 0 ❌

Results for commit e1f4bd9.

♻️ This comment has been updated with latest results.

@thrau
Copy link
Member

thrau commented Oct 16, 2024

hey folks! i helped narrow down the reviewers to @dfangl and @viren-nadkarni 🙏

Copy link
Member

@viren-nadkarni viren-nadkarni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic change 👏 Kudos!

I'm not sure I like the v1 provider code copied but it makes practical sense. We should deprecate the v1 provider at the earliest and make v2 the default to reduce any maintenance burden.

We also have the upgrade to DDBLocal v2 in the pipeline #11064. This will be a low-level change and should not cause confusion with regards to the v2 marker.

def modify_stream_arn_for_ddb_local(self, stream_arn: str) -> str:
parsed_arn = parse_arn(stream_arn)

return f"arn:aws:dynamodb:ddblocal:000000000000:{parsed_arn['resource']}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how well this plays when non-default account/region is used in the request context?

Would the ARN in the response returned by DDBLocal need to be updated too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea no, I'll give it a try quickly to see, thanks for pointing it out!

Copy link
Contributor Author

@bentsku bentsku Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ARN returned by DDBLocal are being updated by the following service response handler: localstack.services.dynamodb.utils.modify_ddblocal_arns.

I've checked and the needed StreamArn is always the 000000000000 account id and ddblocal region, works properly with non default account and region 👍 thanks for checking! (sharding seems to be done on the credentials level)

@bentsku
Copy link
Contributor Author

bentsku commented Oct 18, 2024

I'm not sure I like the v1 provider code copied but it makes practical sense. We should deprecate the v1 provider at the earliest and make v2 the default to reduce any maintenance burden.

@viren-nadkarni I definitely agree, I don't like it very much either... but the changes are so big that subclassing it made it difficult, and the other way would have been to introduce a feature flag to disable the streams feature in the DDB provider and only implement the DDB Streams v2 provider, but it would have been a lot of weird code paths. 😕

Also, we're still missing a feature in v2 with the KinesisStreamDestination that we need to think about how to implement, with maybe a poller on the DDB streams themselves, but we would need to change the overall configuration and modify the resulting payload from GetRecords... fun times ahead 😄

@bentsku bentsku merged commit 07b1504 into master Oct 18, 2024
41 checks passed
@bentsku bentsku deleted the ddb-stream-native branch October 18, 2024 13:08
@bentsku bentsku mentioned this pull request Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws:dynamodb Amazon DynamoDB aws:dynamodbstreams AWS DynamoDB Streams semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants