Skip to content

Commit ddab8d1

Browse files
authored
Merge pull request MicrosoftDocs#85781 from SnehaGunda/master
Addressing TOC and modeling data doc feedback
2 parents a46f82f + b62e36d commit ddab8d1

File tree

3 files changed

+46
-45
lines changed

3 files changed

+46
-45
lines changed

articles/cosmos-db/TOC.yml

+8-8
Original file line numberDiff line numberDiff line change
@@ -179,15 +179,21 @@
179179
- name: SQL query execution
180180
displayName: query performance, query execution
181181
href: sql-query-execution.md
182+
- name: Data types
183+
items:
184+
- name: DateTime
185+
href: working-with-dates.md
186+
- name: Geospatial
187+
href: geospatial.md
182188
- name: Work with Azure Cosmos account
183189
href: account-overview.md
184190
- name: Containers and items
185191
items:
186192
- name: Work with databases, containers, and items
187193
displayName: collection, document
188194
href: databases-containers-items.md
189-
- name: Model document data
190-
displayName: document
195+
- name: Modeling data
196+
displayName: model document, model json
191197
href: modeling-data.md
192198
- name: Indexing
193199
items:
@@ -197,12 +203,6 @@
197203
- name: Indexing policies
198204
displayName: consistent, none
199205
href: index-policy.md
200-
- name: Data types
201-
items:
202-
- name: DateTime
203-
href: working-with-dates.md
204-
- name: Geospatial
205-
href: geospatial.md
206206
- name: Time to live
207207
displayName: delete, delete data, ttl
208208
href: time-to-live.md

articles/cosmos-db/modeling-data.md

+36-36
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Modeling data in Azure Cosmos DB
33
titleSuffix: Azure Cosmos DB
4-
description: Learn about data modeling in NoSQL databases, differences between modeling data in a relational database and a document database.
4+
description: Learn about data modeling in NoSQL databases, differences between modeling data in a relational database and an item database.
55
author: rimman
66
ms.service: cosmos-db
77
ms.topic: conceptual
@@ -25,7 +25,7 @@ After reading this article, you will be able to answer the following questions:
2525

2626
## Embedding data
2727

28-
When you start modeling data in Azure Cosmos DB try to treat your entities as **self-contained items** represented as JSON documents.
28+
When you start modeling data in Azure Cosmos DB try to treat your entities as **self-contained items** represented as JSON items.
2929

3030
For comparison, let's first see how we might model data in a relational database. The following example shows how a person might be stored in a relational database.
3131

@@ -64,7 +64,7 @@ Now let's take a look at how we would model the same data as a self-contained en
6464
]
6565
}
6666

67-
Using the approach above we have **denormalized** the person record, by **embedding** all the information related to this person, such as their contact details and addresses, into a *single JSON* document.
67+
Using the approach above we have **denormalized** the person record, by **embedding** all the information related to this person, such as their contact details and addresses, into a *single JSON* item.
6868
In addition, because we're not confined to a fixed schema we have the flexibility to do things like having contact details of different shapes entirely.
6969

7070
Retrieving a complete person record from the database is now a **single read operation** against a single container and for a single item. Updating a person record, with their contact details and addresses, is also a **single write operation** against a single item.
@@ -165,19 +165,19 @@ Take this JSON snippet.
165165
]
166166
}
167167

168-
This could represent a person's stock portfolio. We have chosen to embed the stock information into each portfolio document. In an environment where related data is changing frequently, like a stock trading application, embedding data that changes frequently is going to mean that you are constantly updating each portfolio document every time a stock is traded.
168+
This could represent a person's stock portfolio. We have chosen to embed the stock information into each portfolio item. In an environment where related data is changing frequently, like a stock trading application, embedding data that changes frequently is going to mean that you are constantly updating each portfolio item every time a stock is traded.
169169

170-
Stock *zaza* may be traded many hundreds of times in a single day and thousands of users could have *zaza* on their portfolio. With a data model like the above we would have to update many thousands of portfolio documents many times every day leading to a system that won't scale well.
170+
Stock *zaza* may be traded many hundreds of times in a single day and thousands of users could have *zaza* on their portfolio. With a data model like the above we would have to update many thousands of portfolio items many times every day leading to a system that won't scale well.
171171

172172
## Referencing data
173173

174174
Embedding data works nicely for many cases but there are scenarios when denormalizing your data will cause more problems than it is worth. So what do we do now?
175175

176-
Relational databases are not the only place where you can create relationships between entities. In a document database, you can have information in one document that relates to data in other documents. We do not recommend building systems that would be better suited to a relational database in Azure Cosmos DB, or any other document database, but simple relationships are fine and can be useful.
176+
Relational databases are not the only place where you can create relationships between entities. In an item database, you can have information in one item that relates to data in other items. We do not recommend building systems that would be better suited to a relational database in Azure Cosmos DB, or any other item database, but simple relationships are fine and can be useful.
177177

178-
In the JSON below we chose to use the example of a stock portfolio from earlier but this time we refer to the stock item on the portfolio instead of embedding it. This way, when the stock item changes frequently throughout the day the only document that needs to be updated is the single stock document.
178+
In the JSON below we chose to use the example of a stock portfolio from earlier but this time we refer to the stock item on the portfolio instead of embedding it. This way, when the stock item changes frequently throughout the day the only item that needs to be updated is the single stock item.
179179

180-
Person document:
180+
Person item:
181181
{
182182
"id": "1",
183183
"firstName": "Thomas",
@@ -188,7 +188,7 @@ In the JSON below we chose to use the example of a stock portfolio from earlier
188188
]
189189
}
190190

191-
Stock documents:
191+
Stock items:
192192
{
193193
"id": "1",
194194
"symbol": "zaza",
@@ -210,14 +210,14 @@ In the JSON below we chose to use the example of a stock portfolio from earlier
210210
"pe": 75.82
211211
}
212212

213-
An immediate downside to this approach though is if your application is required to show information about each stock that is held when displaying a person's portfolio; in this case you would need to make multiple trips to the database to load the information for each stock document. Here we've made a decision to improve the efficiency of write operations, which happen frequently throughout the day, but in turn compromised on the read operations that potentially have less impact on the performance of this particular system.
213+
An immediate downside to this approach though is if your application is required to show information about each stock that is held when displaying a person's portfolio; in this case you would need to make multiple trips to the database to load the information for each stock item. Here we've made a decision to improve the efficiency of write operations, which happen frequently throughout the day, but in turn compromised on the read operations that potentially have less impact on the performance of this particular system.
214214

215215
> [!NOTE]
216216
> Normalized data models **can require more round trips** to the server.
217217
218218
### What about foreign keys?
219219

220-
Because there is currently no concept of a constraint, foreign-key or otherwise, any inter-document relationships that you have in documents are effectively "weak links" and will not be verified by the database itself. If you want to ensure that the data a document is referring to actually exists, then you need to do this in your application, or through the use of server-side triggers or stored procedures on Azure Cosmos DB.
220+
Because there is currently no concept of a constraint, foreign-key or otherwise, any inter-item relationships that you have in items are effectively "weak links" and will not be verified by the database itself. If you want to ensure that the data an item is referring to actually exists, then you need to do this in your application, or through the use of server-side triggers or stored procedures on Azure Cosmos DB.
221221

222222
### When to reference
223223

@@ -233,18 +233,18 @@ In general, use normalized data models when:
233233
234234
### Where do I put the relationship?
235235

236-
The growth of the relationship will help determine in which document to store the reference.
236+
The growth of the relationship will help determine in which item to store the reference.
237237

238238
If we look at the JSON below that models publishers and books.
239239

240-
Publisher document:
240+
Publisher item:
241241
{
242242
"id": "mspress",
243243
"name": "Microsoft Press",
244244
"books": [ 1, 2, 3, ..., 100, ..., 1000]
245245
}
246246

247-
Book documents:
247+
Book items:
248248
{"id": "1", "name": "Azure Cosmos DB 101" }
249249
{"id": "2", "name": "Azure Cosmos DB for RDBMS Users" }
250250
{"id": "3", "name": "Taking over the world one JSON doc at a time" }
@@ -253,17 +253,17 @@ If we look at the JSON below that models publishers and books.
253253
...
254254
{"id": "1000", "name": "Deep Dive into Azure Cosmos DB" }
255255

256-
If the number of the books per publisher is small with limited growth, then storing the book reference inside the publisher document may be useful. However, if the number of books per publisher is unbounded, then this data model would lead to mutable, growing arrays, as in the example publisher document above.
256+
If the number of the books per publisher is small with limited growth, then storing the book reference inside the publisher item may be useful. However, if the number of books per publisher is unbounded, then this data model would lead to mutable, growing arrays, as in the example publisher item above.
257257

258258
Switching things around a bit would result in a model that still represents the same data but now avoids these large mutable collections.
259259

260-
Publisher document:
260+
Publisher item:
261261
{
262262
"id": "mspress",
263263
"name": "Microsoft Press"
264264
}
265265

266-
Book documents:
266+
Book items:
267267
{"id": "1","name": "Azure Cosmos DB 101", "pub-id": "mspress"}
268268
{"id": "2","name": "Azure Cosmos DB for RDBMS Users", "pub-id": "mspress"}
269269
{"id": "3","name": "Taking over the world one JSON doc at a time"}
@@ -272,49 +272,49 @@ Switching things around a bit would result in a model that still represents the
272272
...
273273
{"id": "1000","name": "Deep Dive into Azure Cosmos DB", "pub-id": "mspress"}
274274

275-
In the above example, we have dropped the unbounded collection on the publisher document. Instead we just have a reference to the publisher on each book document.
275+
In the above example, we have dropped the unbounded collection on the publisher item. Instead we just have a reference to the publisher on each book item.
276276

277277
### How do I model many:many relationships?
278278

279279
In a relational database *many:many* relationships are often modeled with join tables, which just join records from other tables together.
280280

281281
![Join tables](./media/sql-api-modeling-data/join-table.png)
282282

283-
You might be tempted to replicate the same thing using documents and produce a data model that looks similar to the following.
283+
You might be tempted to replicate the same thing using items and produce a data model that looks similar to the following.
284284

285-
Author documents:
285+
Author items:
286286
{"id": "a1", "name": "Thomas Andersen" }
287287
{"id": "a2", "name": "William Wakefield" }
288288

289-
Book documents:
289+
Book items:
290290
{"id": "b1", "name": "Azure Cosmos DB 101" }
291291
{"id": "b2", "name": "Azure Cosmos DB for RDBMS Users" }
292292
{"id": "b3", "name": "Taking over the world one JSON doc at a time" }
293293
{"id": "b4", "name": "Learn about Azure Cosmos DB" }
294294
{"id": "b5", "name": "Deep Dive into Azure Cosmos DB" }
295295

296-
Joining documents:
296+
Joining items:
297297
{"authorId": "a1", "bookId": "b1" }
298298
{"authorId": "a2", "bookId": "b1" }
299299
{"authorId": "a1", "bookId": "b2" }
300300
{"authorId": "a1", "bookId": "b3" }
301301

302-
This would work. However, loading either an author with their books, or loading a book with its author, would always require at least two additional queries against the database. One query to the joining document and then another query to fetch the actual document being joined.
302+
This would work. However, loading either an author with their books, or loading a book with its author, would always require at least two additional queries against the database. One query to the joining item and then another query to fetch the actual item being joined.
303303

304304
If all this join table is doing is gluing together two pieces of data, then why not drop it completely?
305305
Consider the following.
306306

307-
Author documents:
307+
Author items:
308308
{"id": "a1", "name": "Thomas Andersen", "books": ["b1, "b2", "b3"]}
309309
{"id": "a2", "name": "William Wakefield", "books": ["b1", "b4"]}
310310

311-
Book documents:
311+
Book items:
312312
{"id": "b1", "name": "Azure Cosmos DB 101", "authors": ["a1", "a2"]}
313313
{"id": "b2", "name": "Azure Cosmos DB for RDBMS Users", "authors": ["a1"]}
314314
{"id": "b3", "name": "Learn about Azure Cosmos DB", "authors": ["a1"]}
315315
{"id": "b4", "name": "Deep Dive into Azure Cosmos DB", "authors": ["a2"]}
316316

317-
Now, if I had an author, I immediately know which books they have written, and conversely if I had a book document loaded I would know the IDs of the author(s). This saves that intermediary query against the join table reducing the number of server round trips your application has to make.
317+
Now, if I had an author, I immediately know which books they have written, and conversely if I had a book item loaded I would know the IDs of the author(s). This saves that intermediary query against the join table reducing the number of server round trips your application has to make.
318318

319319
## Hybrid data models
320320

@@ -326,7 +326,7 @@ Based on your application's specific usage patterns and workloads there may be c
326326

327327
Consider the following JSON.
328328

329-
Author documents:
329+
Author items:
330330
{
331331
"id": "a1",
332332
"firstName": "Thomas",
@@ -350,7 +350,7 @@ Consider the following JSON.
350350
]
351351
}
352352

353-
Book documents:
353+
Book items:
354354
{
355355
"id": "b1",
356356
"name": "Azure Cosmos DB 101",
@@ -367,29 +367,29 @@ Consider the following JSON.
367367
]
368368
}
369369

370-
Here we've (mostly) followed the embedded model, where data from other entities are embedded in the top-level document, but other data is referenced.
370+
Here we've (mostly) followed the embedded model, where data from other entities are embedded in the top-level item, but other data is referenced.
371371

372-
If you look at the book document, we can see a few interesting fields when we look at the array of authors. There is an `id` field that is the field we use to refer back to an author document, standard practice in a normalized model, but then we also have `name` and `thumbnailUrl`. We could have stuck with `id` and left the application to get any additional information it needed from the respective author document using the "link", but because our application displays the author's name and a thumbnail picture with every book displayed we can save a round trip to the server per book in a list by denormalizing **some** data from the author.
372+
If you look at the book item, we can see a few interesting fields when we look at the array of authors. There is an `id` field that is the field we use to refer back to an author item, standard practice in a normalized model, but then we also have `name` and `thumbnailUrl`. We could have stuck with `id` and left the application to get any additional information it needed from the respective author item using the "link", but because our application displays the author's name and a thumbnail picture with every book displayed we can save a round trip to the server per book in a list by denormalizing **some** data from the author.
373373

374374
Sure, if the author's name changed or they wanted to update their photo we'd have to go and update every book they ever published but for our application, based on the assumption that authors don't change their names often, this is an acceptable design decision.
375375

376-
In the example, there are **pre-calculated aggregates** values to save expensive processing on a read operation. In the example, some of the data embedded in the author document is data that is calculated at run-time. Every time a new book is published, a book document is created **and** the countOfBooks field is set to a calculated value based on the number of book documents that exist for a particular author. This optimization would be good in read heavy systems where we can afford to do computations on writes in order to optimize reads.
376+
In the example, there are **pre-calculated aggregates** values to save expensive processing on a read operation. In the example, some of the data embedded in the author item is data that is calculated at run-time. Every time a new book is published, a book item is created **and** the countOfBooks field is set to a calculated value based on the number of book items that exist for a particular author. This optimization would be good in read heavy systems where we can afford to do computations on writes in order to optimize reads.
377377

378-
The ability to have a model with pre-calculated fields is made possible because Azure Cosmos DB supports **multi-document transactions**. Many NoSQL stores cannot do transactions across documents and therefore advocate design decisions, such as "always embed everything", due to this limitation. With Azure Cosmos DB, you can use server-side triggers, or stored procedures, that insert books and update authors all within an ACID transaction. Now you don't **have** to embed everything into one document just to be sure that your data remains consistent.
378+
The ability to have a model with pre-calculated fields is made possible because Azure Cosmos DB supports **multi-item transactions**. Many NoSQL stores cannot do transactions across items and therefore advocate design decisions, such as "always embed everything", due to this limitation. With Azure Cosmos DB, you can use server-side triggers, or stored procedures, that insert books and update authors all within an ACID transaction. Now you don't **have** to embed everything into one item just to be sure that your data remains consistent.
379379

380-
## Distinguishing between different document types
380+
## Distinguishing between different item types
381381

382-
In some scenarios, you may want to mix different document types in the same collection; this is usually the case when you want multiple, related documents to sit in the same [partition](partitioning-overview.md). For example, you could put both books and book reviews in the same collection and partition it by `bookId`. In such situation, you usually want to add to your documents with a field that identifies their type in order to differentiate them.
382+
In some scenarios, you may want to mix different item types in the same collection; this is usually the case when you want multiple, related items to sit in the same [partition](partitioning-overview.md). For example, you could put both books and book reviews in the same collection and partition it by `bookId`. In such situation, you usually want to add to your items with a field that identifies their type in order to differentiate them.
383383

384-
Book documents:
384+
Book items:
385385
{
386386
"id": "b1",
387387
"name": "Azure Cosmos DB 101",
388388
"bookId": "b1",
389389
"type": "book"
390390
}
391391

392-
Review documents:
392+
Review items:
393393
{
394394
"id": "r1",
395395
"content": "This book is awesome",

0 commit comments

Comments
 (0)