Skip to content

Migrate to orjson (JSON library) for better performance #2016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fangnx
Copy link
Member

@fangnx fangnx commented Aug 6, 2025

What

Checklist

  • Contains customer facing changes? Including API/behavior changes
  • Did you add sufficient unit test and/or integration test coverage for this PR?
    • If not, please explain why it is not required

References

JIRA: https://confluentinc.atlassian.net/browse/DGS-21747

Issue: #1177

Test & Review

Open questions / Follow-ups

@Copilot Copilot AI review requested due to automatic review settings August 6, 2025 20:37
@fangnx fangnx requested review from MSeal and a team as code owners August 6, 2025 20:37
@confluent-cla-assistant
Copy link

🎉 All Contributor License Agreements have been signed. Ready to merge.
Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR migrates the codebase from Python's standard json library to orjson for improved JSON parsing and serialization performance. The migration involves replacing json.loads() and json.dumps() calls with their orjson equivalents across multiple schema registry and serialization modules.

  • Replaces all json imports with orjson imports
  • Updates all json.loads() calls to orjson.loads()
  • Updates all json.dumps() calls to orjson.dumps()

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/confluent_kafka/schema_registry/common/avro.py Updates JSON parsing in Avro schema handling
src/confluent_kafka/schema_registry/_sync/schema_registry_client.py Updates JSON serialization in sync schema registry client
src/confluent_kafka/schema_registry/_sync/json_schema.py Updates JSON operations in sync JSON schema handling
src/confluent_kafka/schema_registry/_sync/avro.py Updates JSON parsing in sync Avro schema handling
src/confluent_kafka/schema_registry/_async/schema_registry_client.py Updates JSON serialization in async schema registry client
src/confluent_kafka/schema_registry/_async/json_schema.py Updates JSON operations in async JSON schema handling
src/confluent_kafka/schema_registry/_async/avro.py Updates JSON parsing in async Avro schema handling
src/confluent_kafka/kafkatest/verifiable_client.py Updates JSON serialization in test client
src/confluent_kafka/avro/serializer/message_serializer.py Updates JSON parsing in Avro message serializer
requirements/requirements-schemaregistry.txt Adds orjson dependency

Comment on lines 64 to 65
self.dbg('SEND: %s' % orjson.dumps(d))
sys.stdout.write('%s\n' % orjson.dumps(d))
Copy link
Preview

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orjson.dumps() returns bytes, not a string. This will cause a TypeError when used with string formatting. Convert to string using .decode() or use orjson.dumps(d).decode().

Suggested change
self.dbg('SEND: %s' % orjson.dumps(d))
sys.stdout.write('%s\n' % orjson.dumps(d))
self.dbg('SEND: %s' % orjson.dumps(d).decode())
sys.stdout.write('%s\n' % orjson.dumps(d).decode())

Copilot uses AI. Check for mistakes.

self.dbg('SEND: %s' % json.dumps(d))
sys.stdout.write('%s\n' % json.dumps(d))
self.dbg('SEND: %s' % orjson.dumps(d))
sys.stdout.write('%s\n' % orjson.dumps(d))
Copy link
Preview

Copilot AI Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orjson.dumps() returns bytes, not a string. This will cause a TypeError when writing to stdout. Convert to string using .decode() or use orjson.dumps(d).decode().

Suggested change
sys.stdout.write('%s\n' % orjson.dumps(d))
sys.stdout.write('%s\n' % orjson.dumps(d).decode())

Copilot uses AI. Check for mistakes.

@sonarqube-confluent

This comment has been minimized.

@fangnx fangnx added the component:schema-registry Any schema registry related isues rather than kafka isolated ones label Aug 6, 2025
Copy link
Member

@rayokota rayokota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fangnx , you need to replace orjson.dumps(x) with orjson.dumps(x).decode('utf-8')

@@ -2,3 +2,4 @@ attrs>=21.2.0
cachetools>=5.5.0
httpx>=0.26
authlib>=1.0.0
orjson >= 3.10,<4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
orjson >= 3.10,<4
orjson >= 3.10

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rayokota! I updated the PR to fix it. I am surprised none of the unit tests caught it

@sonarqube-confluent
Copy link

Passed

Analysis Details

4 Issues

  • Bug 0 Bugs
  • Vulnerability 0 Vulnerabilities
  • Code Smell 4 Code Smells

Coverage and Duplications

  • Coverage 86.20% Coverage (66.10% Estimated after merge)
  • Duplications No duplication information (5.50% Estimated after merge)

Project ID: confluent-kafka-python

View in SonarQube

@fangnx fangnx requested a review from rayokota August 7, 2025 20:40
Copy link
Member

@rayokota rayokota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fangnx , LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:schema-registry Any schema registry related isues rather than kafka isolated ones
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants