TDA357-L11-JSON2
TDA357-L11-JSON2
{"title": "Filesystem",
"description": "A system for the organization of files",
"type": "object" }
Why use a Schema?
• We use a schema to regain some structure, even though we're using a
semi-structured model.
• The schema tells us what to expect from the document, such as
which parts are optional and which are required, and the general
structure.
• Allows us to validate (at any time!) data coming from outside sources,
such as user data, or external API data.
• Validation takes a schema and a document and determines if the
document fullfills the requirements of the schema
JSON Schemas
• Each JSON schema is itself a JSON object (or in some cases, a Boolean)
• The structure of a schema is highly recursive, containing lots of sub-schemas
• We use special "keywords" as object keys, and the value of each keyword tells
us something about the schema.
• The empty object `{}` (and true) accepts every JSON document as valid.
• Conversely, the schema false rejects all documents, no matter what.
Example of a schema
• If we have the following schema, that says (informally) every branch
has a name and a program:
{"type": "object", "title": "Branch",
"properties": {"name": {"type": "string"},
"program": {"type": "string"}},
"required": ["name", "program"]}
• title and description are annotations that are used to identify the
schema in question, but are not used for validation. Example:
Schema: {"title": "Character",
"description": "A Lord of the Rings character"}
Valid: everything
Invalid: nothing
Provides documentation for the schema
type is used to define the type of the JSON within, and can be any of
array, boolean, integer, null, number, object, or string.
Example:
Schema: {"type": "number"}
Valid: 1
2
5.9
6.022e+10
…
Invalid: "a"
true
{"as": "hey"}
[2]
…
enum accepts only a specified list of values
Example:
Schema: {"type": "string","enum": ["u", "3", "4","5"]}
Valid: "u"
"3"
"4"
"5" Specifying type here is a bit
Invalid: 3 redundant, but good practice
4
"uu"
…
minimum and maximum are specific to numbers, and specify the minimum and
maximum (inclusive) bounds for the number. Example:
Schema: {"type": "integer","minimum": 1,"maximum": 6}
Valid: 1
2
3
4
5
6
Invalid: 0
7
100
"asd"
{"number": 5}
…
Strings
minLength and maxLength are specific to strings, and specify the
minimum and maximum length of the string. Example:
Schema: {"type": "string","minLength": 10,"maxLength": 10}
Valid: "abde284320"
"1234567890"
…
Invalid: "123"
"1asd"
25
{"idnr": "1234567890"}
…
properties is used define schemas for the properties of objects. Example:
Schema: {"type": "object",
"properties": {"name": {"type": "string"},
"age": {"type":"integer"}}}
• '$' is the root object, which we usually start our expressions with. Example:
'$'
[{"name": "/", "contents": […]}]
• '[]' is the subscript operator, which is used to access elements in arrays. Example:
'$.contents[1].contents[0].name'
["file2"]
'$.contents[2].contents[0].contents[0].size'
[400]
/file1.txt {"name": "/", "contents": [
/a/file2.jpg {"name": "file1", "filetype": "txt", "size": 100},
/a/file3.mp4 {"name": "a/", "contents": [
/a/file4.png {"name": "file2", "filetype": "jpg", "size": 200},
{"name": "file3", "filetype": "mp4", "size": 600},
/b/c/file5.jpg
{"name": "file4", "filetype": "png", "size": 300}]},
{"name": "b/", "contents": [
{"name": "c/", "contents": [
{"name": "file5", "filetype": "jpg", "size": 400}]}]}]}
• '*' is the wild card operator, which returns everything in the current object.
'$.*'
["/", [{"name": "file1", "filetype": "txt", size: 100},…}]]
Note: 2 results!
'$.contents[1].*'
["a/", [{"name": "file2",…}, {"name": "file3",…}, {"name": "file4",…}]]
'$.contents[*]'
3 results! [{"name": "file1",…}, {"name": "a/",…}, {"name": "b/",…} ]
/file1.txt {"name": "/", "contents": [
/a/file2.jpg {"name": "file1", "filetype": "txt", "size": 100},
/a/file3.mp4 {"name": "a/", "contents": [
/a/file4.png {"name": "file2", "filetype": "jpg", "size": 200},
{"name": "file3", "filetype": "mp4", "size": 600},
/b/c/file5.jpg
{"name": "file4", "filetype": "png", "size": 300}]},
{"name": "b/", "contents": [
{"name": "c/", "contents": [
{"name": "file5", "filetype": "jpg", "size": 400}]}]}]}
• '**' is the recursive descent operator, which is a wildcard for a whole path (not
just a single key)
'$.**'
All 32 values from the data! 9 objects (includes original), 4 arrays, 14 strings, 5 numbers
'strict $.contents[1].**.name'
["a/", "file2", "file3", "file4"]
Strict and lax mode in Postgres Paths
• Postgres has two modes for JSON Paths, lax (default) and strict.
• Strict gives errors instead of pruning branches for things like $.*.x if
some objects are missing x (you usually don't want this)
• However, when using the recursive descent operator (**), lax behaves
very oddly (giving duplicate values) and strict does exactly what I
would expect lax to do (never giving errors)
• Workaround: Add strict before $ in paths involving **
• On the exam you don't need to specify this
'$[*]', {"category":"Salads",
"contents":[
to operate on each of the elements {"dish":"Caesar", "price":8.50},
{"dish":"Chicken", "price":9.25}]}
{"category":"Burgers",
"contents":[
{"dish":"Standard", "price":9},
{"dish":"Bacon", "price":10},
Narrows down into a single result {"category":"Vegetarian Burgers",
"contents":[
{"dish":"Haloumi", "price":13},
{"dish":"Mushroom", "price":10}]}]}
Now, we have the right category.
And since we're in Postgres, we can do fun stuff like aggregate and sum up the numbers!
… but we need to do an explicit type cast, since the resulting numbers are still jsonb values.