You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ In particular, the following contributions are invited:
6
6
7
7
- The library is focused on performance. Well-documented performance optimization are invited.
8
8
- Fixes to known or newly discovered bugs are always welcome. Typically, a bug fix should come with a test demonstrating that the bug has been fixed.
9
-
- The simdjson library is advanced software and maintanability and flexibility are always a concern. Specific contributions to improve maintanability and flexibility are invited.
9
+
- The simdjson library is advanced software and maintainability and flexibility are always a concern. Specific contributions to improve maintainability and flexibility are invited.
10
10
11
11
12
12
@@ -28,5 +28,5 @@ Contributors are encouraged to
28
28
29
29
30
30
31
-
Though we do not have a formal code of conduct, we will not tolerate bullying, bigotery or intimidation. Everyone is welcome to contribute.
31
+
Though we do not have a formal code of conduct, we will not tolerate bullying, bigotry or intimidation. Everyone is welcome to contribute.
JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh approach. simdjson uses commonly available SIMD instructions and microparallel algorithms to parse JSON 2.5x faster than anything else out there.
5
+
6
+
***Ludicrous Speed:** Over 2.5x faster than other production-grade JSON parsers.
7
+
***Delightfully Easy:** First-class, easy to use API.
8
+
***Complete Validation:** Full JSON and UTF-8 validation, with no compromises.
9
+
***Rock-Solid Reliability:** From memory allocation to error handling, simdjson's design avoids surprises.
10
+
11
+
This library is part of the [Awesome Modern C++](https://awesomecpp.com) list.
## A C++ library to see how fast we can parse JSON with complete validation.
20
+
simdjson is easily consumable with a single .h and .cpp file.
10
21
11
-
JSON documents are everywhere on the Internet. Servers spend a lot of time parsing these documents. We want to accelerate the parsing of JSON per se using commonly available SIMD instructions as much as possible while doing full validation (including character encoding). This library is part of the [Awesome Modern C++](https://awesomecpp.com) list.
22
+
0. Prerequisites: `g++` or `clang++`.
23
+
1. Pull [simdjson.h](singleheader/simdjson.h) and [simdjson.cpp](singleheader/simdjson.cpp) into a directory, along with the sample file [twitter.json](jsonexamples/twitter.json).
@@ -110,7 +138,7 @@ be concerned with computed gotos.
110
138
111
139
## Thread safety
112
140
113
-
The simdjson library is mostly single-threaded. Thread safety is the responsability of the caller: it is unsafe to reuse a document::parser object between different threads.
141
+
The simdjson library is mostly single-threaded. Thread safety is the responsibility of the caller: it is unsafe to reuse a document::parser object between different threads.
114
142
115
143
If you are on an x64 processor, the runtime dispatching assigns the right code path the first time that parsing is attempted. The runtime dispatching is thread-safe.
116
144
@@ -136,23 +164,23 @@ All examples below use use `#include "simdjson.h"`, `#include "simdjson.cpp"` an
136
164
The simplest API to get started is `document::parse()`, which allocates a new parser, parses a string, and returns the DOM. This is less efficient if you're going to read multiple documents, but as long as you're only parsing a single document, this will do just fine.
137
165
138
166
```c++
139
-
auto [doc, error] = document::parse(string("[ 1, 2, 3 ]"));
If you're using exceptions, it gets even simpler (simdjson won't use exceptions internally, so you'll only pay the performance cost of exceptions in your own calling code):
The simdjson library requires SIMDJSON_PADDING extra bytes at the end of a string (it doesn't matter if the bytes are initialized). The `padded_string` class is an easy way to ensure this is accomplished up front and prevent the extra allocation:
178
+
If you're wondering why the examples above use `_padded`, it's because the simdjson library requires SIMDJSON_PADDING extra bytes at the end of a string (it doesn't matter if the bytes are initialized). `_padded`
179
+
is a way of creating a `padded_string` class, which assures us we have enough allocation.
You can also load from a file with `parser.load()`:
@@ -463,7 +491,7 @@ You then have access to the following methods on the resulting `simdjson::docume
463
491
*`bool move_to_key(const char *key, uint32_t length)`: as above except that the target can contain NULL characters
464
492
*`void move_to_value()`: when at a key location within an object, this moves to the accompanying, value (located next to it). This is equivalent but much faster than calling `next()`.
465
493
*`bool move_to_index(uint32_t index)`: when at `[`, go one level deep, and advance to the given index, if successful, we are left pointing at the value,i f not, we are still pointing at the array
466
-
*`bool move_to(const char *pointer, uint32_t length)`: Moves the iterator to the value correspoding to the json pointer. Always search from the root of the document. If successful, we are left pointing at the value, if not, we are still pointing the same value we were pointing before the call. The json pointer follows the rfc6901 standard's syntax: https://tools.ietf.org/html/rfc6901
494
+
*`bool move_to(const char *pointer, uint32_t length)`: Moves the iterator to the value corresponding to the json pointer. Always search from the root of the document. If successful, we are left pointing at the value, if not, we are still pointing the same value we were pointing before the call. The json pointer follows the rfc6901 standard's syntax: https://tools.ietf.org/html/rfc6901
467
495
*`bool move_to(const std::string &pointer) `: same as above but with a std::string parameter
468
496
*`bool next()`: Within a given scope (series of nodes at the same depth within either an array or an object), we move forward. Thus, given [true, null, {"a":1}, [1,2]], we would visit true, null, { and [. At the object ({) or at the array ([), you can issue a "down" to visit their content. valid if we're not at the end of a scope (returns true).
469
497
*`bool prev()`: Within a given scope (series of nodes at the same depth within either an
@@ -567,15 +595,15 @@ make allparsingcompetition
567
595
```
568
596
569
597
Both the `parsingcompetition` and `allparsingcompetition` tools take a `-t` flag which produces
570
-
a table-oriented output that can be conventiently parsed by other tools.
598
+
a table-oriented output that can be conveniently parsed by other tools.
571
599
572
600
573
601
## Docker
574
602
575
603
One can run tests and benchmarks using docker. It especially makes sense under Linux. A privileged access may be needed to get performance counters.
Copy file name to clipboardExpand all lines: doc/tape.md
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -84,12 +84,12 @@ Simple JSON nodes are represented with one tape element:
84
84
## Integer and Double values
85
85
86
86
Integer values are represented as two 64-bit tape elements:
87
-
- The 64-bit value `('l' << 56)` followed by the 64-bit integer value litterally. Integer values are assumed to be signed 64-bit values, using two's complement notation.
88
-
- The 64-bit value `('u' << 56)` followed by the 64-bit integer value litterally. Integer values are assumed to be unsigned 64-bit values.
87
+
- The 64-bit value `('l' << 56)` followed by the 64-bit integer value literally. Integer values are assumed to be signed 64-bit values, using two's complement notation.
88
+
- The 64-bit value `('u' << 56)` followed by the 64-bit integer value literally. Integer values are assumed to be unsigned 64-bit values.
89
89
90
90
91
91
Float values are represented as two 64-bit tape elements:
92
-
- The 64-bit value `('d' << 56)` followed by the 64-bit double value litterally in standard IEEE 754 notation.
92
+
- The 64-bit value `('d' << 56)` followed by the 64-bit double value literally in standard IEEE 754 notation.
93
93
94
94
Performance consideration: We store numbers of the main tape because we believe that locality of reference is helpful for performance.
0 commit comments