You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/modeling-data.md
+36-36
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: Modeling data in Azure Cosmos DB
3
3
titleSuffix: Azure Cosmos DB
4
-
description: Learn about data modeling in NoSQL databases, differences between modeling data in a relational database and a document database.
4
+
description: Learn about data modeling in NoSQL databases, differences between modeling data in a relational database and an item database.
5
5
author: rimman
6
6
ms.service: cosmos-db
7
7
ms.topic: conceptual
@@ -25,7 +25,7 @@ After reading this article, you will be able to answer the following questions:
25
25
26
26
## Embedding data
27
27
28
-
When you start modeling data in Azure Cosmos DB try to treat your entities as **self-contained items** represented as JSON documents.
28
+
When you start modeling data in Azure Cosmos DB try to treat your entities as **self-contained items** represented as JSON items.
29
29
30
30
For comparison, let's first see how we might model data in a relational database. The following example shows how a person might be stored in a relational database.
31
31
@@ -64,7 +64,7 @@ Now let's take a look at how we would model the same data as a self-contained en
64
64
]
65
65
}
66
66
67
-
Using the approach above we have **denormalized** the person record, by **embedding** all the information related to this person, such as their contact details and addresses, into a *single JSON*document.
67
+
Using the approach above we have **denormalized** the person record, by **embedding** all the information related to this person, such as their contact details and addresses, into a *single JSON*item.
68
68
In addition, because we're not confined to a fixed schema we have the flexibility to do things like having contact details of different shapes entirely.
69
69
70
70
Retrieving a complete person record from the database is now a **single read operation** against a single container and for a single item. Updating a person record, with their contact details and addresses, is also a **single write operation** against a single item.
@@ -165,19 +165,19 @@ Take this JSON snippet.
165
165
]
166
166
}
167
167
168
-
This could represent a person's stock portfolio. We have chosen to embed the stock information into each portfolio document. In an environment where related data is changing frequently, like a stock trading application, embedding data that changes frequently is going to mean that you are constantly updating each portfolio document every time a stock is traded.
168
+
This could represent a person's stock portfolio. We have chosen to embed the stock information into each portfolio item. In an environment where related data is changing frequently, like a stock trading application, embedding data that changes frequently is going to mean that you are constantly updating each portfolio item every time a stock is traded.
169
169
170
-
Stock *zaza* may be traded many hundreds of times in a single day and thousands of users could have *zaza* on their portfolio. With a data model like the above we would have to update many thousands of portfolio documents many times every day leading to a system that won't scale well.
170
+
Stock *zaza* may be traded many hundreds of times in a single day and thousands of users could have *zaza* on their portfolio. With a data model like the above we would have to update many thousands of portfolio items many times every day leading to a system that won't scale well.
171
171
172
172
## Referencing data
173
173
174
174
Embedding data works nicely for many cases but there are scenarios when denormalizing your data will cause more problems than it is worth. So what do we do now?
175
175
176
-
Relational databases are not the only place where you can create relationships between entities. In a document database, you can have information in one document that relates to data in other documents. We do not recommend building systems that would be better suited to a relational database in Azure Cosmos DB, or any other document database, but simple relationships are fine and can be useful.
176
+
Relational databases are not the only place where you can create relationships between entities. In an item database, you can have information in one item that relates to data in other items. We do not recommend building systems that would be better suited to a relational database in Azure Cosmos DB, or any other item database, but simple relationships are fine and can be useful.
177
177
178
-
In the JSON below we chose to use the example of a stock portfolio from earlier but this time we refer to the stock item on the portfolio instead of embedding it. This way, when the stock item changes frequently throughout the day the only document that needs to be updated is the single stock document.
178
+
In the JSON below we chose to use the example of a stock portfolio from earlier but this time we refer to the stock item on the portfolio instead of embedding it. This way, when the stock item changes frequently throughout the day the only item that needs to be updated is the single stock item.
179
179
180
-
Person document:
180
+
Person item:
181
181
{
182
182
"id": "1",
183
183
"firstName": "Thomas",
@@ -188,7 +188,7 @@ In the JSON below we chose to use the example of a stock portfolio from earlier
188
188
]
189
189
}
190
190
191
-
Stock documents:
191
+
Stock items:
192
192
{
193
193
"id": "1",
194
194
"symbol": "zaza",
@@ -210,14 +210,14 @@ In the JSON below we chose to use the example of a stock portfolio from earlier
210
210
"pe": 75.82
211
211
}
212
212
213
-
An immediate downside to this approach though is if your application is required to show information about each stock that is held when displaying a person's portfolio; in this case you would need to make multiple trips to the database to load the information for each stock document. Here we've made a decision to improve the efficiency of write operations, which happen frequently throughout the day, but in turn compromised on the read operations that potentially have less impact on the performance of this particular system.
213
+
An immediate downside to this approach though is if your application is required to show information about each stock that is held when displaying a person's portfolio; in this case you would need to make multiple trips to the database to load the information for each stock item. Here we've made a decision to improve the efficiency of write operations, which happen frequently throughout the day, but in turn compromised on the read operations that potentially have less impact on the performance of this particular system.
214
214
215
215
> [!NOTE]
216
216
> Normalized data models **can require more round trips** to the server.
217
217
218
218
### What about foreign keys?
219
219
220
-
Because there is currently no concept of a constraint, foreign-key or otherwise, any inter-document relationships that you have in documents are effectively "weak links" and will not be verified by the database itself. If you want to ensure that the data a document is referring to actually exists, then you need to do this in your application, or through the use of server-side triggers or stored procedures on Azure Cosmos DB.
220
+
Because there is currently no concept of a constraint, foreign-key or otherwise, any inter-item relationships that you have in items are effectively "weak links" and will not be verified by the database itself. If you want to ensure that the data an item is referring to actually exists, then you need to do this in your application, or through the use of server-side triggers or stored procedures on Azure Cosmos DB.
221
221
222
222
### When to reference
223
223
@@ -233,18 +233,18 @@ In general, use normalized data models when:
233
233
234
234
### Where do I put the relationship?
235
235
236
-
The growth of the relationship will help determine in which document to store the reference.
236
+
The growth of the relationship will help determine in which item to store the reference.
237
237
238
238
If we look at the JSON below that models publishers and books.
239
239
240
-
Publisher document:
240
+
Publisher item:
241
241
{
242
242
"id": "mspress",
243
243
"name": "Microsoft Press",
244
244
"books": [ 1, 2, 3, ..., 100, ..., 1000]
245
245
}
246
246
247
-
Book documents:
247
+
Book items:
248
248
{"id": "1", "name": "Azure Cosmos DB 101" }
249
249
{"id": "2", "name": "Azure Cosmos DB for RDBMS Users" }
250
250
{"id": "3", "name": "Taking over the world one JSON doc at a time" }
@@ -253,17 +253,17 @@ If we look at the JSON below that models publishers and books.
253
253
...
254
254
{"id": "1000", "name": "Deep Dive into Azure Cosmos DB" }
255
255
256
-
If the number of the books per publisher is small with limited growth, then storing the book reference inside the publisher document may be useful. However, if the number of books per publisher is unbounded, then this data model would lead to mutable, growing arrays, as in the example publisher document above.
256
+
If the number of the books per publisher is small with limited growth, then storing the book reference inside the publisher item may be useful. However, if the number of books per publisher is unbounded, then this data model would lead to mutable, growing arrays, as in the example publisher item above.
257
257
258
258
Switching things around a bit would result in a model that still represents the same data but now avoids these large mutable collections.
259
259
260
-
Publisher document:
260
+
Publisher item:
261
261
{
262
262
"id": "mspress",
263
263
"name": "Microsoft Press"
264
264
}
265
265
266
-
Book documents:
266
+
Book items:
267
267
{"id": "1","name": "Azure Cosmos DB 101", "pub-id": "mspress"}
268
268
{"id": "2","name": "Azure Cosmos DB for RDBMS Users", "pub-id": "mspress"}
269
269
{"id": "3","name": "Taking over the world one JSON doc at a time"}
@@ -272,49 +272,49 @@ Switching things around a bit would result in a model that still represents the
272
272
...
273
273
{"id": "1000","name": "Deep Dive into Azure Cosmos DB", "pub-id": "mspress"}
274
274
275
-
In the above example, we have dropped the unbounded collection on the publisher document. Instead we just have a reference to the publisher on each book document.
275
+
In the above example, we have dropped the unbounded collection on the publisher item. Instead we just have a reference to the publisher on each book item.
276
276
277
277
### How do I model many:many relationships?
278
278
279
279
In a relational database *many:many* relationships are often modeled with join tables, which just join records from other tables together.
You might be tempted to replicate the same thing using documents and produce a data model that looks similar to the following.
283
+
You might be tempted to replicate the same thing using items and produce a data model that looks similar to the following.
284
284
285
-
Author documents:
285
+
Author items:
286
286
{"id": "a1", "name": "Thomas Andersen" }
287
287
{"id": "a2", "name": "William Wakefield" }
288
288
289
-
Book documents:
289
+
Book items:
290
290
{"id": "b1", "name": "Azure Cosmos DB 101" }
291
291
{"id": "b2", "name": "Azure Cosmos DB for RDBMS Users" }
292
292
{"id": "b3", "name": "Taking over the world one JSON doc at a time" }
293
293
{"id": "b4", "name": "Learn about Azure Cosmos DB" }
294
294
{"id": "b5", "name": "Deep Dive into Azure Cosmos DB" }
295
295
296
-
Joining documents:
296
+
Joining items:
297
297
{"authorId": "a1", "bookId": "b1" }
298
298
{"authorId": "a2", "bookId": "b1" }
299
299
{"authorId": "a1", "bookId": "b2" }
300
300
{"authorId": "a1", "bookId": "b3" }
301
301
302
-
This would work. However, loading either an author with their books, or loading a book with its author, would always require at least two additional queries against the database. One query to the joining document and then another query to fetch the actual document being joined.
302
+
This would work. However, loading either an author with their books, or loading a book with its author, would always require at least two additional queries against the database. One query to the joining item and then another query to fetch the actual item being joined.
303
303
304
304
If all this join table is doing is gluing together two pieces of data, then why not drop it completely?
Now, if I had an author, I immediately know which books they have written, and conversely if I had a book document loaded I would know the IDs of the author(s). This saves that intermediary query against the join table reducing the number of server round trips your application has to make.
317
+
Now, if I had an author, I immediately know which books they have written, and conversely if I had a book item loaded I would know the IDs of the author(s). This saves that intermediary query against the join table reducing the number of server round trips your application has to make.
318
318
319
319
## Hybrid data models
320
320
@@ -326,7 +326,7 @@ Based on your application's specific usage patterns and workloads there may be c
326
326
327
327
Consider the following JSON.
328
328
329
-
Author documents:
329
+
Author items:
330
330
{
331
331
"id": "a1",
332
332
"firstName": "Thomas",
@@ -350,7 +350,7 @@ Consider the following JSON.
350
350
]
351
351
}
352
352
353
-
Book documents:
353
+
Book items:
354
354
{
355
355
"id": "b1",
356
356
"name": "Azure Cosmos DB 101",
@@ -367,29 +367,29 @@ Consider the following JSON.
367
367
]
368
368
}
369
369
370
-
Here we've (mostly) followed the embedded model, where data from other entities are embedded in the top-level document, but other data is referenced.
370
+
Here we've (mostly) followed the embedded model, where data from other entities are embedded in the top-level item, but other data is referenced.
371
371
372
-
If you look at the book document, we can see a few interesting fields when we look at the array of authors. There is an `id` field that is the field we use to refer back to an author document, standard practice in a normalized model, but then we also have `name` and `thumbnailUrl`. We could have stuck with `id` and left the application to get any additional information it needed from the respective author document using the "link", but because our application displays the author's name and a thumbnail picture with every book displayed we can save a round trip to the server per book in a list by denormalizing **some** data from the author.
372
+
If you look at the book item, we can see a few interesting fields when we look at the array of authors. There is an `id` field that is the field we use to refer back to an author item, standard practice in a normalized model, but then we also have `name` and `thumbnailUrl`. We could have stuck with `id` and left the application to get any additional information it needed from the respective author item using the "link", but because our application displays the author's name and a thumbnail picture with every book displayed we can save a round trip to the server per book in a list by denormalizing **some** data from the author.
373
373
374
374
Sure, if the author's name changed or they wanted to update their photo we'd have to go and update every book they ever published but for our application, based on the assumption that authors don't change their names often, this is an acceptable design decision.
375
375
376
-
In the example, there are **pre-calculated aggregates** values to save expensive processing on a read operation. In the example, some of the data embedded in the author document is data that is calculated at run-time. Every time a new book is published, a book document is created **and** the countOfBooks field is set to a calculated value based on the number of book documents that exist for a particular author. This optimization would be good in read heavy systems where we can afford to do computations on writes in order to optimize reads.
376
+
In the example, there are **pre-calculated aggregates** values to save expensive processing on a read operation. In the example, some of the data embedded in the author item is data that is calculated at run-time. Every time a new book is published, a book item is created **and** the countOfBooks field is set to a calculated value based on the number of book items that exist for a particular author. This optimization would be good in read heavy systems where we can afford to do computations on writes in order to optimize reads.
377
377
378
-
The ability to have a model with pre-calculated fields is made possible because Azure Cosmos DB supports **multi-document transactions**. Many NoSQL stores cannot do transactions across documents and therefore advocate design decisions, such as "always embed everything", due to this limitation. With Azure Cosmos DB, you can use server-side triggers, or stored procedures, that insert books and update authors all within an ACID transaction. Now you don't **have** to embed everything into one document just to be sure that your data remains consistent.
378
+
The ability to have a model with pre-calculated fields is made possible because Azure Cosmos DB supports **multi-item transactions**. Many NoSQL stores cannot do transactions across items and therefore advocate design decisions, such as "always embed everything", due to this limitation. With Azure Cosmos DB, you can use server-side triggers, or stored procedures, that insert books and update authors all within an ACID transaction. Now you don't **have** to embed everything into one item just to be sure that your data remains consistent.
379
379
380
-
## Distinguishing between different document types
380
+
## Distinguishing between different item types
381
381
382
-
In some scenarios, you may want to mix different document types in the same collection; this is usually the case when you want multiple, related documents to sit in the same [partition](partitioning-overview.md). For example, you could put both books and book reviews in the same collection and partition it by `bookId`. In such situation, you usually want to add to your documents with a field that identifies their type in order to differentiate them.
382
+
In some scenarios, you may want to mix different item types in the same collection; this is usually the case when you want multiple, related items to sit in the same [partition](partitioning-overview.md). For example, you could put both books and book reviews in the same collection and partition it by `bookId`. In such situation, you usually want to add to your items with a field that identifies their type in order to differentiate them.
0 commit comments