Goal is to compare the performance of various JSON parsing libraries in Java, focusing on the three key operations we need for last9 clickhouse queries:
- JSON Validation
- Key Existence Check
- Value Extraction by Path
Chose both DOM and Streaming implementations of the fastest/most popular JSON libraries from https://github.com/fabienrenaud/java-json-benchmark
-
Jackson
- DOM (ObjectMapper)
- Streaming (JsonParser)
-
GSON
- DOM
- Streaming
-
FastJSON 2.x (https://www.iteye.com/blog/wenshao-1142031)
- DOM
- Streaming
-
JsonIter (https://jsoniter.com/)
- DOM-based implementation
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(value = 2)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 10, time = 1)
@Timeout(time = 10, timeUnit = TimeUnit.MINUTES)
- Sample Size: 100,000 JSON rows per test
- Data Sources:
- Valid JSON inputs from last9 Parquet files (game logs)
- Invalid JSON inputs from last9 Parquet files
- Data Pattern:
{ "correlationId": "gameplayed__X54ww__1734897498952", // present in ~59% of records "tm": "01:28:19.560", "logger": "com.games24x7.offerservice.execution.step.RTSendToConsumer", // ... other fields }
- Test Keys/Paths:
- Key Check: "correlationId" (tests both presence and absence)
- Path Extraction: "$.logger" (tests value extraction and error handling)
- Data Distribution:
- ~59% of records have correlationId (tests positive case)
- ~41% of records missing correlationId (tests negative case)
- This distribution helps evaluate both successful parsing and error handling
For each parser, we run the following benchmarks:
-
JSON Validation Tests
[parser]_ValidInputs() // Tests isValidJson() with valid game logs [parser]_InvalidInputs() // Tests isValidJson() with invalid JSON
-
Key Check Test
[parser]_HasKey() // Tests hasJsonKey() with key "correlationId"
-
Value Extraction Test
[parser]_GetValue() // Tests getJsonValue() with path "$.logger"
- Early Exit Strategies
- Fast-fail validation for invalid JSON
- Early return in key checking operations
The benchmark will help evaluate:
- Performance differences between DOM and streaming approaches
- Library-specific optimizations effectiveness
- Trade-offs between memory usage and parsing speed
- Impact of different JSON operations on performance
It might make sense to have different libraries for specific JSON operators.
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Fork(value = 2)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 10, time = 1)
@Timeout(time = 10, timeUnit = TimeUnit.MINUTES)
public class JsonParsingBenchmark {
private static final Logger logger = LoggerFactory.getLogger(JsonParsingBenchmark.class);
private static final int SAMPLE_SIZE = 100000;
private static List<String> validJsonInputs;
private static List<String> invalidJsonInputs;
private static final String jsonKey = "correlationId";
private static final String jsonPath = "$.logger";