Skip to content

Commit 1e32897

Browse files
authored
Merge pull request simdjson#986 from simdjson/issue984
Fixing issue 984
2 parents 4ec5648 + 4c9f11b commit 1e32897

File tree

3 files changed

+66
-7
lines changed

3 files changed

+66
-7
lines changed

doc/basics.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ An overview of what you need to know to use simdjson, with examples.
88
* [Using simdjson as a CMake dependency](#using-simdjson-as-a-cmake-dependency)
99
* [The Basics: Loading and Parsing JSON Documents](#the-basics-loading-and-parsing-json-documents)
1010
* [Using the Parsed JSON](#using-the-parsed-json)
11+
* [C++11 Support and string_view](#c++11-support-and-string_view)
1112
* [C++17 Support](#c++17-support)
1213
* [Minifying JSON strings without parsing](#minifying-json-strings-without-parsing)
1314
* [UTF-8 validation (alone)](#utf-8-validation-alone)
@@ -192,6 +193,27 @@ And another one:
192193
cout << "number: " << v << endl;
193194
```
194195

196+
197+
C++11 Support and string_view
198+
-------------
199+
200+
The simdjson library builds on compilers supporting the [C++11 standard](https://en.wikipedia.org/wiki/C%2B%2B11). It is also a strict requirement: we have no plan to support older C++ compilers.
201+
202+
We represent parsed strings in simdjson using the `std::string_view` class. It avoids
203+
the need to copy the data, as would be necessary with the `std::string` class. It also
204+
avoids the pitfalls of null-terminated C strings.
205+
206+
The `std::string_view` class has become standard as part of C++17 but it is not always available
207+
on compilers which only supports C++11. When we detect that `string_view` is natively
208+
available, we define the macro `SIMDJSON_HAS_STRING_VIEW`.
209+
210+
When we detect that it is unavailable,
211+
we use [string-view-lite](https://github.com/martinmoene/string-view-lite) as a
212+
substitute. In such cases, we use the type alias `using string_view = nonstd::string_view;` to
213+
offer the same API, irrespective of the compiler and standard library. The macro
214+
`SIMDJSON_HAS_STRING_VIEW` will be *undefined* to indicate that we emulate `string_view`.
215+
216+
195217
C++17 Support
196218
-------------
197219

include/simdjson/dom/element.h

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -62,21 +62,42 @@ class element {
6262
*/
6363
inline simdjson_result<object> get_object() const noexcept;
6464
/**
65-
* Cast this element to a string.
65+
* Cast this element to a null-terminated C string.
66+
*
67+
* The string is guaranteed to be valid UTF-8.
6668
*
67-
* Equivalent to get<const char *>().
69+
* The get_c_str() function is equivalent to get<const char *>().
70+
*
71+
* The length of the string is given by get_string_length(). Because JSON strings
72+
* may contain null characters, it may be incorrect to use strlen to determine the
73+
* string length.
6874
*
69-
* @returns An pointer to a null-terminated string. This string is stored in the parser and will
75+
* It is possible to get a single string_view instance which represents both the string
76+
* content and its length: see get_string().
77+
*
78+
* @returns A pointer to a null-terminated UTF-8 string. This string is stored in the parser and will
7079
* be invalidated the next time it parses a document or when it is destroyed.
7180
* Returns INCORRECT_TYPE if the JSON element is not a string.
7281
*/
7382
inline simdjson_result<const char *> get_c_str() const noexcept;
7483
/**
75-
* Cast this element to a string.
84+
* Gives the length in bytes of the string.
85+
*
86+
* It is possible to get a single string_view instance which represents both the string
87+
* content and its length: see get_string().
88+
*
89+
* @returns A string length in bytes.
90+
* Returns INCORRECT_TYPE if the JSON element is not a string.
91+
*/
92+
inline simdjson_result<size_t> get_string_length() const noexcept;
93+
/**
94+
* Cast this element to a string.
95+
*
96+
* The string is guaranteed to be valid UTF-8.
7697
*
7798
* Equivalent to get<std::string_view>().
7899
*
79-
* @returns A string. The string is stored in the parser and will be invalidated the next time it
100+
* @returns An UTF-8 string. The string is stored in the parser and will be invalidated the next time it
80101
* parses a document or when it is destroyed.
81102
* Returns INCORRECT_TYPE if the JSON element is not a string.
82103
*/
@@ -253,7 +274,9 @@ class element {
253274
inline operator bool() const noexcept(false);
254275

255276
/**
256-
* Read this element as a null-terminated string.
277+
* Read this element as a null-terminated UTF-8 string.
278+
*
279+
* Be mindful that JSON allows strings to contain null characters.
257280
*
258281
* Does *not* convert other types to a string; requires that the JSON type of the element was
259282
* an actual string.
@@ -264,7 +287,7 @@ class element {
264287
inline explicit operator const char*() const noexcept(false);
265288

266289
/**
267-
* Read this element as a null-terminated string.
290+
* Read this element as a null-terminated UTF-8 string.
268291
*
269292
* Does *not* convert other types to a string; requires that the JSON type of the element was
270293
* an actual string.
@@ -464,6 +487,7 @@ struct simdjson_result<dom::element> : public internal::simdjson_result_base<dom
464487
really_inline simdjson_result<dom::array> get_array() const noexcept;
465488
really_inline simdjson_result<dom::object> get_object() const noexcept;
466489
really_inline simdjson_result<const char *> get_c_str() const noexcept;
490+
really_inline simdjson_result<size_t> get_string_length() const noexcept;
467491
really_inline simdjson_result<std::string_view> get_string() const noexcept;
468492
really_inline simdjson_result<int64_t> get_int64() const noexcept;
469493
really_inline simdjson_result<uint64_t> get_uint64() const noexcept;

include/simdjson/inline/element.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,10 @@ really_inline simdjson_result<const char *> simdjson_result<dom::element>::get_c
5050
if (error()) { return error(); }
5151
return first.get_c_str();
5252
}
53+
really_inline simdjson_result<size_t> simdjson_result<dom::element>::get_string_length() const noexcept {
54+
if (error()) { return error(); }
55+
return first.get_string_length();
56+
}
5357
really_inline simdjson_result<std::string_view> simdjson_result<dom::element>::get_string() const noexcept {
5458
if (error()) { return error(); }
5559
return first.get_string();
@@ -190,6 +194,15 @@ inline simdjson_result<const char *> element::get_c_str() const noexcept {
190194
return INCORRECT_TYPE;
191195
}
192196
}
197+
inline simdjson_result<size_t> element::get_string_length() const noexcept {
198+
switch (tape.tape_ref_type()) {
199+
case internal::tape_type::STRING: {
200+
return tape.get_string_length();
201+
}
202+
default:
203+
return INCORRECT_TYPE;
204+
}
205+
}
193206
inline simdjson_result<std::string_view> element::get_string() const noexcept {
194207
switch (tape.tape_ref_type()) {
195208
case internal::tape_type::STRING:

0 commit comments

Comments
 (0)