feat(parser): integrate parser with Batch Processing #4408

sdangol · 2025-09-03T08:37:03Z

Summary

This PR integrates the Parser functionality with the Batch Processor so that customers can parse and validate payloads before they're passed to the record handler. It supports both the extended schemas for the SQSRecord, KinesisRecord, and DynamoDBRecord as well as the inner payload schema.

Changes

Please provide a summary of what's being changed

Added parser as dev dependency
Added a config to the constructor of BasePartialBatchProcessor to set the schema property
Added a parseRecord method to the BatchProcessor to do the parsing by dynamically importing the parse function and the appropriate schema

Please add the issue number below, if no issue is present the PR might get blocked and not be reviewed

Issue number: closes #4394

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

package-lock.json

dreamorosi · 2025-09-03T09:47:34Z

packages/batch/src/BatchProcessor.ts

+            '@aws-lambda-powertools/parser/schemas/sqs'
+          );
+          const extendedSchema = SqsRecordSchema.extend({
+            // biome-ignore lint/suspicious/noExplicitAny: at least for now, we need to broaden the type because the JSONstringified helper method is not typed with StandardSchemaV1 but with ZodSchema


In practice, I don't think we can support this use case at runtime anyway.

If someone passes us a complete StandardSchema schema from another library, we'll be able to use it as-is (try block).

For us to be able to extend the built-in SqsRecordSchema however we must receive a Zod schema or both the extend and JSONStringified will fail at runtime - so the type error is correct here.

With this in mind, we might want to use the vendor key on the schema (see spec) to check if it's a Zod schema and throw either 1/ throw an error or 2/ log a warning and ignore parsing if the schema is from another library we can't extend with.

If instead we find this conditional behavior confusing, we'll have to restrict the type of schema to Zod schemas only.

I'm in favour of throwing here, if we just log a warning them we run the risk of passing on invalid events.

Both throwing an error and continue processing an invalid event would result in the item being marked as failed by the Batch Processor and sent back to the source.

Most likely we should also log a warning, otherwise all operators would see is the item being reprocessed and eventually be sent to a DLQ or lost.

Yeah, that's a good point, it will be much easier to diagnose if we emit a log statement.

In order to cast the schema to Zod schema, do we need to dynamically import Zod as well?

I'm testing the package on a function and it works, but I think we forgot to emit some kind of warning/log/error when the parsing fails.

Is the Python implementation doing anything in this regard?

I think this is the python implementation for handling parsing failure. Should we do something similar?

I was trying to test it for Kinesis using https://github.com/dreamorosi/kinesis-batch and it was failing, trying to debug.

Message me, I got it working fine

Should we do something similar?

Yes, but in order to be able to do that we'll need to add a logger feature which we currently don't have.

Let's put a pin on this and I'll open a separate issue.

dreamorosi · 2025-09-03T09:48:14Z

Can you also please address all the Sonar findings?

packages/batch/src/BatchProcessor.ts

sdangol · 2025-09-03T13:30:44Z

@dreamorosi I'm still a bit confused about extending the KinesisDataStreamRecord.

if (eventType === EventType.KinesisDataStreams) {
      const extendedSchemaParsing = parse(record, undefined, schema, true);
      if (extendedSchemaParsing.success)
        return extendedSchemaParsing.data as KinesisStreamRecord;
      if (schema['~standard'].vendor === SchemaType.Zod) {
        const { JSONStringified } = await import(
          '@aws-lambda-powertools/parser/helpers'
        );
        const { KinesisDataStreamRecord } = await import(
          '@aws-lambda-powertools/parser/schemas/kinesis'
        );
        const extendedSchema = KinesisDataStreamRecord.extend({
          // biome-ignore lint/suspicious/noExplicitAny: The vendor field in the schema is verified that the schema is a Zod schema
          data: // Decompress and decode the data to match against schema
        });
        return parse(record, undefined, extendedSchema);
      }
      console.warn(
        'The schema provided is not supported. Only Zod schemas are supported for extension.'
      );
      throw new Error('Unsupported schema type');
    }

To extend it, should I create another helper which does the decompressions and decoding?
Or should I just use Envelopes for the whole thing. But, using Envelopes would just return the internal payload and our record handler expects the whole Record. We could maybe update the Record with the parsed payload and return it.
What would you suggest?

dreamorosi · 2025-09-03T15:54:04Z

I've been thinking about your question for a while and there's no straightforward way to do it with our current feature set.

We can't use envelopes because of what you mentioned, and doing the parsing in two parts while possible, is suboptimal for two reasons:

we'd be iterating through the entire batch in order to parse each record and then we'd have to recompose the object
in case of parsing errors the path of the failed field(s) would be messed up since we'd no longer parse a schema (i.e. there's an error at the 2nd item in the batch, the error path should be something like Records.1.kinesis.data.foo)

With this in mind, I think we'll have to extract the transform logic that we have here into its own helper (with a dedicated PR) and then use the helper in the Batch Processing utility when extending.

…n method

…ema is used

dreamorosi · 2025-09-04T13:31:13Z

The SonarCloud message still shows an issue

…:aws-powertools/powertools-lambda-typescript into feat/parser-integration-batch-processing

packages/batch/src/BasePartialBatchProcessor.ts

packages/batch/src/BatchProcessor.ts

packages/batch/tests/unit/BatchProcessor.test.ts

sonarqubecloud · 2025-09-05T10:19:26Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

dreamorosi · 2025-09-05T13:43:32Z

packages/batch/src/constants.ts

+/**
+ * Enum of supported schema types for the utility
+ */
+const SchemaType = {


Suggested change

const SchemaType = {

const SchemaVendor = {

dreamorosi · 2025-09-05T13:54:10Z

packages/batch/src/BatchProcessor.ts

For new methods, I'd prefer creating signatures with objects rather than positional args since it makes it easier to read/understand at the call site.

For example, with the current signatures calling parseRecord looks like: await this.parseRecord(record, this.eventType, this.schema); which is not too bad because the args being passed are named correctly.

Calling createExtendedSchema instead looks much more opaque: await this.createExtendedSchema(eventType, schema, false) making it unclear what false stands for.

If the signature was:

private async createExtendedSchema(options: { eventType: keyof typeof EventType, schema: StandardSchemaV1, useTransformers: boolean })

calling it would look like: await this.createExtendedSchema({ eventType, schema, useTransformers : false })` which is much clearer.

Also, unless the methods are public or protected, let's use #methodName instead of private methodName to make them truly private at runtime.

dreamorosi · 2025-09-05T13:55:36Z

packages/batch/src/BatchProcessor.ts

+        schema,
+        true
+      );
+      return parse(


Using safeParse: true in this call will make it easier to add debug logs later since right now a failed parsing exits this scope.

Similarly, for the parse call above we could achieve the same without generating a stack trace, which can be more performant.

dreamorosi · 2025-09-05T13:56:02Z

packages/batch/src/BatchProcessor.ts

+   *
+   * Only Zod Schemas are supported for automatic schema extension
+   *
+   * @param record The record to be parsed


Suggested change

* @param record The record to be parsed

* @param record - The record to be parsed

Same for the other @param added in this diff

dreamorosi · 2025-09-05T13:56:40Z

packages/batch/src/BatchProcessor.ts

+   * If the passed schema is already an extended schema,
+   * it directly uses the schema to parse the record
+   *
+   * Only Zod Schemas are supported for automatic schema extension


Let's use punctuation correctly throughout the docstring.

dreamorosi · 2025-09-05T13:57:14Z

packages/batch/src/types.ts

@@ -89,6 +91,10 @@ type PartialItemFailures = { itemIdentifier: string };
 */
 type PartialItemFailureResponse = { batchItemFailures: PartialItemFailures[] };

+type BasePartialBatchProcessorConfig = {
+  schema: StandardSchemaV1;


We need to document schema so that customer know what it does and all its nuances.

dreamorosi

Only a couple of last styling and documentation details and then we're good to merge!

svozza · 2025-09-06T07:16:28Z

packages/batch/src/BatchProcessor.ts

+      return extendedSchemaParsing.data as EventSourceDataClassTypes;
+    }
+    // Only proceed with schema extension if it's a Zod schema
+    if (schema['~standard'].vendor !== SchemaType.Zod) {


Should we not do this check as the very first thing in this function? We're wasting work doing the parsing if all we're going to do is throw based on information we had before we began parsing.

boring-cyborg bot added batch This item relates to the Batch Processing Utility dependencies Changes that touch dependencies, e.g. Dependabot, etc. tests PRs that add or change tests labels Sep 3, 2025

pull-request-size bot added the size/L PRs between 100-499 LOC label Sep 3, 2025

sdangol requested review from dreamorosi and svozza September 3, 2025 08:37

sdangol self-assigned this Sep 3, 2025

dreamorosi reviewed Sep 3, 2025

View reviewed changes

package-lock.json Outdated Show resolved Hide resolved

dreamorosi reviewed Sep 3, 2025

View reviewed changes

svozza reviewed Sep 3, 2025

View reviewed changes

packages/batch/src/BatchProcessor.ts Outdated Show resolved Hide resolved

svozza reviewed Sep 3, 2025

View reviewed changes

packages/batch/src/BatchProcessor.ts Outdated Show resolved Hide resolved

sdangol marked this pull request as draft September 3, 2025 12:47

pull-request-size bot added size/XL PRs between 500-999 LOC, often PRs that grown with feedback and removed size/L PRs between 100-499 LOC labels Sep 3, 2025

sdangol changed the title ~~feat(parser): integrate parser with Batch Processing for SQSRecord~~ feat(parser): integrate parser with Batch Processing Sep 3, 2025

sdangol added 12 commits September 4, 2025 11:09

Integrated Parser for the SQS event type

d625ed2

Removed the parsing from the toBatchType function and created it's ow…

4e69aec

…n method

Added test to check for invalid event type

0a60d4f

Added documentation for the function

d0b3d23

Moved the build order of parser before the batch processor

d6d81f9

Added condition to do the extended schema parsing only when a zod sch…

b9ef42b

…ema is used

Moved the parser as a dev dependency

8c1ad65

Updated lock file

e7c8585

Integrated Parser for the DynamoDB event type

b30e3be

Fixed the tests for DynamoDB record processing

f248e9f

Removed unused imports and unused comments

7120337

Integrated Parser for the Kinesis event type

b33832f

sdangol added 2 commits September 4, 2025 14:41

Fixed the SonarQube finding

ce3fc4f

Merge branch 'feat/parser-integration-batch-processing' of github.com…

7c4cba8

…:aws-powertools/powertools-lambda-typescript into feat/parser-integration-batch-processing

svozza reviewed Sep 4, 2025

View reviewed changes

packages/batch/src/BasePartialBatchProcessor.ts Outdated Show resolved Hide resolved

Marked schema property as protected

ee115dd

svozza reviewed Sep 4, 2025

View reviewed changes

packages/batch/src/BatchProcessor.ts Outdated Show resolved Hide resolved

Added braces for the default block

f36d513

sdangol requested a review from svozza September 4, 2025 14:46

svozza previously approved these changes Sep 4, 2025

View reviewed changes

sdangol added the do-not-merge This item should not be merged label Sep 4, 2025

sdangol commented Sep 4, 2025

View reviewed changes

packages/batch/src/BatchProcessor.ts Outdated Show resolved Hide resolved

Implemented parsing with and without the transformers

29da8ff

sdangol dismissed svozza’s stale review via 29da8ff September 4, 2025 22:34

pull-request-size bot added size/XXL PRs with 1K+ LOC, largely documentation related and removed size/XL PRs between 500-999 LOC, often PRs that grown with feedback labels Sep 4, 2025

sdangol commented Sep 4, 2025

View reviewed changes

packages/batch/tests/unit/BatchProcessor.test.ts Outdated Show resolved Hide resolved

sdangol requested a review from svozza September 5, 2025 07:57

Refactored the tests to reduce duplicate code

4f38337

pull-request-size bot added size/XL PRs between 500-999 LOC, often PRs that grown with feedback and removed size/XXL PRs with 1K+ LOC, largely documentation related labels Sep 5, 2025

Merge branch 'main' into feat/parser-integration-batch-processing

e73a331

dreamorosi reviewed Sep 5, 2025

View reviewed changes

dreamorosi requested changes Sep 5, 2025

View reviewed changes

svozza reviewed Sep 6, 2025

View reviewed changes

	* @param record The record to be parsed
	* @param record - The record to be parsed

feat(parser): integrate parser with Batch Processing #4408

Are you sure you want to change the base?

feat(parser): integrate parser with Batch Processing #4408

Conversation

sdangol commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dreamorosi commented Sep 3, 2025

Uh oh!

Uh oh!

Uh oh!

sdangol commented Sep 3, 2025

Uh oh!

dreamorosi commented Sep 3, 2025

Uh oh!

dreamorosi commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Sep 5, 2025

Quality Gate passed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dreamorosi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sdangol commented Sep 3, 2025 •

edited

Loading