Mongo DB Exercise
Mongo DB Exercise
Mongo DB Exercise
Introduction
Chapter 1. Introduction to MongoDB, key concepts
Concepts
What were the big differences in hardware over the last few decades that MongoDB attempted to
address?
Scaling
Q: When scaling out horizontally (adding more servers to contain your data), what are problems
that arise as you go from, say, 1 commodity server to a few dozen?
Any one server is likelier to fail per given unit of time. The servers must communicate
with one another eating up network bandwidth The need for redundancy increases as the
likelihood of some failure in the system per unit of time increases. Hardware cost per server
is likely to increase.
Documents Overview
What are some advantages of representing our data using a JSON-like format?
JSON presents a flexible and concise framework for specifying queries as well as storing
records. JSON syntax is similar to that of common data structures used in many
JSON Types
1 (just strings) 2 4 6
1. String
2. Integer
3. Boolean
4. Null
5. Array
6. Object or documents (subdocuments)
JSON Syntax
What is the corresponding JSON for the following XML document?
<person>
<name>John</name>
<age>25</age>
<address>
<city>New York</city>
<postalCode>10021</postalCode>
</address>
<phones>
<phone type="home">212-555-1234</phone>
<phone type="mobile">646-555-1234</phone>
</phones>
</person>
{"name":"John",
"age":25,
"address":{"city":"New York","postalCode":"10021"},
"phones":[
{"phone":"212-555-1234","type":"home"},
{"phone":"646-555-1234","type":"mobile"}
]
}
JSON Syntax 2
For the following XML, Is the corresponding JSON example legal json?
<things>
<hat>one</hat>
<coat>z</coat>
<hat>two</hat>
</things>
{
"hat" : "one",
"coat" : "z",
"hat" : "two"
}
Yes No Maybe
Binary JSON
Why do we represent our data as BSON rather than JSON in the system?
Fast machine scanability. Human readability. Stronger typing (and more types)
than JSON
For a typical client (a python client, for example) that is receiving the results of a query in BSON,
would we convert from BSON to JSON to the client's native data structures (for example, nested
dictionaries and lists in Python), or would we convert from BSON straight to those native data
structures?
BSON -> Native data structures BSON -> JSON -> Native data structures
Dynamic Schema
True or False: MongoDB is schemaless because a schema isn't very important in MongoDB
True False
Cursors Introduction
In order to query a collection in the mongo shell, we can type which of the following?
You have a collection where every document has the same fields, and you want to look at the
value of the “_id”, "name", and “email” fields in order to see their format. Furthermore, you want to
eliminate all other fields from the query results. What query might you write?
If you want to add a new key: value pair to the documents in the “shapes” collection, what
methods could you use?
hapes.find() db.shapes.save()
Shell: Queries
db.products.find({"for":"ac9","type":"case"})
Sorting
If you want to run a query on the collection, “books,” and sort ASCIIbetically by title on the query,
which of the following will work?
db.scores.find({"type":"exam"}).sort({"score":-1}).skip(50).limit(20)
db.products.count()
// check it worked:
b.count()
// should print 11
If you have any issues you can restore from "products_bak"; or,
you can re-import with mongoimport. (You would perhaps need
in that situation to empty the collection first or drop it; see the
--drop option on mongoimport --help.) At the shell ">" prompt
type:
homework.a()
3.05
Homework 2.2
Add a new product to the products collection of this form:
"_id" : "ac9",
"brand" : "ACME",
"type" : "phone",
"price" : 333,
"warranty_years" : 0.25,
"available" : true
> myobj = { "_id" : "ac9", "name" : "AC9 Phone", "brand" : "ACME", "type" : "phone",
"price" : 333, "warranty_years" : 0.25, "available" : true }
> db.products.insert(myobj)
WriteResult({ "nInserted" : 1 })
_id : ObjectId("507d95d5719dbef170f15c00")
bb = db.products.find({"_id": ObjectId("507d95d5719dbef170f15c00")})
db.products.update({"_id":ObjectId("507d95d5719dbef170f15c00")},
{"$set":{"limits.sms.over_rate":0.01}}
)
homework.b()
0.050.019031
Homework 2.3
How many products have a voice limit? (That is, have a voice
field present in the limits array.)
Homework 2.4
Create an index on the field for. You might want to first run the
following to get some context on what is present in that field in
the documents of our collection:
db.products.find({},{for:1})
db.products.ensureIndex({"for":1},{name:"for"})
db.products.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "pcat.products"
},
{
"v" : 1,
"key" : {
"for" : 1
},
"name" : "for",
"ns" : "pcat.products"
}
After creating the index, query products that work with an "ac3"
phone; that is, "ac3" is present in the product's "for" field.
db.products.find({"for":"ac3"}).count()
Q2: Run an explain plan on the above query. How many
records were scanned?
db.products.find({"for":"ac3"}).explain()
{
"cursor" : "BtreeCursor for",
"isMultiKey" : true,
"n" : 4,
"nscannedObjects" : 4,
"nscanned" : 4,
"nscannedObjectsAllPlans" : 4,
"nscannedAllPlans" : 4,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"for" : [
[
"ac3",
"ac3"
]
]
},
"server" : "SERVER:27017",
"filterSet" : false
db.products.find({"for":"ac3"})
89.5954.5
MongoDB for DBA's 3/7. Performance. Homeworks
Homework 3.1
Start a mongod server instance (if you still have a replica set,
that would work too).
Next, download the handout and run:
Homework 3.2
In a mongo shell run homework.b() . This will run in an infinite
loop printing some output as it runs various statements
against the server.
We'll now imagine that on this system a user has complained
of slowness and we suspect there is a slow operation running.
Find the slow operation and terminate it.
In order to do this, you'll want to open a second window (or
tab) and there, run a second instance of the mongo shell, with
something like:
$ mongo --shell localhost/performance performance.js
> db.killOp(101)
> db.currentOp()
{ "inprog" : [ ] }
homework.c()
and enter the output below. Once you have it right and are
ready to move on, ctrl-c (terminate) the shell that is still
running the homework.b() function.
> homework.c()
12
Homework 3.3
Compact the performance.sensor_readings collection,
with paddingFactor set to 1.0. Then
run homework.d() and enter the result below.
Note: If you happen to be running a replica
set, just compact on the primary and run
homework.d() on the primary. You may need
to use the force:true option to run compact
on a primary. You may also want to look
at paddingFactor in the docs. Prior to 2.6,
you would have been able to do this problem
without paddingFactor, but now that
powerOfTwoSizes is on by default, you'll
need to use it in order to fully compact the
collection.
> db.runCommand({"compact" :
"sensor_readings", "paddingFactor" : 1 })
{ "ok" : 1 }
> homework.d()
21
mkdir 2
mkdir 3
kill pid
This will load a small amount of test data into the database.
Now run:
> homework.a()
and enter the result. This will simply confirm all the above
happened ok.
result: 5001
Homework 4.2
Now convert the mongod instance (the one in the problem 4.1
above, which uses “--dbpath 1”) to a single server replica set.
To do this, you’ll need to stop the mongod (NOT the mongo
shell instance) and restart it with “--replSet” on its command
line. Give the set any name you like.
ps -Aef | grep mongod
kill 3260
> rs.initiate()
{
"info2" : "no configuration explicitly specified -- making one",
"me" : "SERVER:27001",
"info" : "Config now saved locally. Should come online in about a minute.",
"ok" : 1
}
rs.status()
{
"set" : "abc",
"date" : ISODate("2015-01-30T08:55:02Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "SERVER:27001",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 364,
"optime" : Timestamp(1422607970, 1),
"optimeDate" : ISODate("2015-01-30T08:52:50Z"),
"electionTime" : Timestamp(1422607970, 2),
"electionDate" : ISODate("2015-01-30T08:52:50Z"),
"self" : true
}
],
"ok" : 1
}
abc:PRIMARY> rs.conf()
{
"_id" : "abc",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "SERVER:27001"
}
]
}
When you first ran homework.init(), we loaded some data into
the mongod. You should see it in the replication database. You
can confirm with:
> use replication
> db.foo.find()
Homework 4.3
Now add two more members to the set. Use the 2/ and 3/
directories we created in homework 4.1. Run those two
mongod’s on ports 27002 and 27003 respectively (the exact
numbers could be different).
Remember to use the same replica set name as you used for
the first member.
kill 3272 3484
mongod --replSet abc --dbpath 2 --port 27002 --smallfiles --logpath log.2 --logappend --fork
--oplogSize 50
mongod --replSet abc --dbpath 3 --port 27003 --smallfiles --logpath log.3 --logappend --fork
--oplogSize 50
You will need to add these two new members to your replica
set, which will initially have only one member. In the shell
running on the first member, you can see your replica set
status with
> rs.status()
Initially it will have just that first member. Connecting to the
other members will involve using
rs.add()
. For example,
> rs.add("localhost:27002")
document.
Your machine may or may not be OK with 'localhost'. If it isn't,
try using the name in the "members.name" field in the
document you get by calling rs.status() (but remember to use
the correct port!).
abc:PRIMARY> rs.conf()
{
"_id" : "abc",
"version" : 3,
"members" : [
{
"_id" : 0,
"host" : "SERVER:27001"
},
{
"_id" : 1,
"host" : "SERVER:27002"
},
{
"_id" : 2,
"host" : "SERVER:27003"
}
]
}
Once a secondary has spun up, you can connect to it with a
new mongo shell instance.
mongo --port 27002
Use
rs.slaveOk()
> homework.c()
result: 5
Homework 4.4
We will now remove the first member (@ port 27001) from the
set.
As a first step to doing this we will shut it down. (Given the
rest of the set can maintain a majority, we can still do a
majority reconfiguration if it is down.)
We could simply terminate its mongod process, but if we use
the replSetStepDown command, the failover may be faster.
That is a good practice, though not essential. Connect to
member 1 (port 27001) in the shell and run:
> rs.stepDown()
Homework 4.5
Note our replica set now has an even number of members, and
that is not a best practice. However, to keep the homework
from getting too long we’ll leave it at that for now, and instead
do one more exercise below involving the oplog.
To get the right answer on this problem, you must perform the
homework questions in order. Otherwise, your oplog may look
different than we expect.
Go to the secondary in the replica set. The shell should say
SECONDARY at the prompt if you've done everything correctly.
mongo --port 27002 --shell replication.js
Switch to the local database and then look at the oplog:
> db.oplog.rs.find()
If you get a blank result, you are not on the right database.
Note: as the local database doesn’t replicate, it will let you
query it without entering “rs.slaveOk()” first.
Next look at the stats on the oplog to get a feel for its size:
> db.oplog.rs.stats()
result: R
MongoDB for DBA's 5/7. Replication Part 2.
Homeworks
Homework 5.1
Set up a replica set that includes an arbiter.
To demonstrate that you have done this, what is the text in the
"state" field for the arbiter when you run rs.status()?
Status : 7
Homework 5.2
You have a replica set with two members, having decided that
a total of two copies of your data is sufficient.
You are concerned about robustness, however.
Which of the following are options that will allow you to ensure
that failover can occur if the first server goes down?
Check all that apply.
Homework 5.3
At the beginning of the section, "W Timeout and Capacity
Planning", Dwight mentioned something that hasn't yet been
touched upon in this class: Batch Inserts.
He mentions that you can find information on them in the
docs, but he hasn't used them in the shell during this course.
Your job is to look this functionality up in the documentation,
and use it for the following problem:
Please insert into the m102 collection, the following 3
documents, using only one shell query (please use double
quotes around each key in each document in your answer):
{ "a" : 1 }
{ "b" : 2 }
{ "c" : 3 }
Hints: This is not the same as the Bulk() operations, which
were discussed earlier in this course. Also, this does not
involve semicolons, so don't put any into the text box. You
probably want to test this on your machine, and then copy and
paste the insert query in the box below. Do not include the >
sign from the beginning of the shell.
Result: db.m102.insert([{"a":1},{"b":2},{"c":3}])
Homework 5.4
You would like to create a replica set that is robust to data
center failure.
You only have two data centers available. Which
arrangement(s) of servers will allow you to be stay up (as in,
still able to elect a primary) in the event of a failure of either
data center (but not both at once)? Check all that apply.
Homework 5.5
Consider the following scenario: You have a two member
replica set, a primary, and a secondary.
The data center with the primary goes down, and is expected
to remain down for the foreseeable future. Your secondary is
now the only copy of your data, and it is not accepting writes.
You want to reconfigure your replica set config to exclude the
primary, and allow your secondary to be elected, but you run
into trouble. Find out the optional parameter that you'll need,
and input it into the box below for your rs.reconfigure(new_cfg,
OPTIONAL PARAMETER).
Hint: You may want to use this documentation page to solve
this problem.
Your answer should be of the form { key : value } (including
brackets).
MongoDB for DBA's 6/7. Scalability. Homeworks
Homework 6.1
For this week's homework we will start with a standalone
MongoDB database, turn it into a sharded cluster with two
shards, and shard one of the collections. We will create a
"dev" environment on our local box: no replica sets, and only
one config server. In production you would almost always use
three config servers and replica sets as part of a sharded
cluster. In the final of the course we'll set up a larger cluster
with replica sets and three config servers.
Download week6.js from Download Handout link.
Start an initially empty mongod database instance.
Connect to it with the shell and week6.js loaded:
db.trades.stats()
Note that with --shardsvr specified the default port for mongod
becomes 27018.
Start a mongo config server:
mongod --configsvr …
sh.addShard("localhost:27018")
{ "shardAdded" : "shard0000", "ok" : 1 }
> db.trades.count()
> db.trades.stats()
Run homework.a() and enter the result below. This method will
simply verify that this simple cluster is up and running and
return a result key.
Result: 1000001
Homework 6.2
Now enable sharding for the week6 database. (See sh.help()
for details.)
sh.enableSharding("week6")
{ "ok" : 1 }
Then shard the trades collection on the compound shard key
ticker plus time. Note to shard a collection, you must have an
index on the shard key, so you will need to create the index
first:
> db.trades.ensureIndex( { ticker:1, time:1 } )
> // can now shard the trades collection on the shard key { ticker:1, time:1 }
> db.chunks.find()
> // or:
Result : 3
Homework 6.3
Let's now add a new shard. Run another mongod as the new
shard on a new port number. Use --shardsvr.
sudo mongod --shardsvr --dbpath /var/lib/mongodb/a1 --logpath
/var/log/mongodb/mongod01.log --port 27020 --fork --logappend --smallfiles --oplogSize
50 --journal
Then add the shard to the cluster (see sh.help()).
sh.addShard( "localhost:27020")
You can confirm the above worked by running:
homework.check1()
mongos> homework.check1()
db.getSisterDB("config").shards.count() :
2
There are 2 shards in the cluster as expected
Now wait for the balancer to move data among the two shards more
evenly. Periodically run:
and/or:
db.chunks.aggregate( [
{ $match : { ns : "week6.trades" } } ,
] )
Result: 2
Overview of Security
There is a few different ways to run mongoDB and secure it:
1. To run it in a trusted environment: no one has access except for clients.
They have full access to the database. "lock down at the network layer the relevant TCP
ports for the processes on the machines
2. Enable mongodb authentication with two command line options:
1. --auth : for securing client connections and access
2. --keyFile: intracluster security: the machines in the cluster
authenticating each other among themselves. You can also optionally layer SSL
atop it. To gain encryption of all the communications occurring you have to
compile Mongo with the --SSL make option to do that. More information on the
SSL: docs.mongodb.org/manual/administration/ssl
To run MongoDB with encryption over the wire, you need to:
To turn on the security facilities we can use the option command line --auth:
mongo localhost/test --> let you connect to mongodb database on localhost because there is not
authentication configured yet for the cluster. We have not created any users or roles yet.
To create users and roles, we have to switch to the Admin database (reserved name because it is a
special database which includes the users and roles for administrators):
use admin
var me = { user : "bob", pwd : "robert", roles : [ "userAdminAnyDatabase" ] }
db.createUser( me )
at that moment any try to access database is not authorized because we have not authenticated as
that user we just created yet. And also these user will not have any privileges to reading or writing
to databases, only for database administration.
db.createUser( you )
Quick summary of some of the standard roles that are available from v2.6+:
1. read: read any database
2. readWrite: read and write any database
3. dbAdmin: db Admin any database
4. userAdmin: admin any database
5. clusterAdmin: gives one authorization to use the cluster-related mongodb
commands: adding a shard and managing replica sets, etc...
You can also create custom that are user-define roles. More information on the built-in roles in
the mongodb documentation: docs.mongodb.org/manual/security
--auth on the command line: when we are working with a sharded cluster or replica set, we will
use another parameter called --keyFile <filename>: to tell the processes making up the cluster,
the mongod and mongs processes how to authenticate amnong themselves inter-cluster. The
filename will contain a shared secret key that all members of the cluster have available to them to
cross-authenticate so the can coordinate their actions. If we use keyFile, it implies off, but it is
recommended that we list both out explicitly in the config file or command line
We can use SSL with mongo but it does require us to build it ourself using scons--ssl.
more information at docs.mongodb.org/manual/tutorial/configure-ssl
There are a cuple different kinds of users in terms of the system authorizations.
1. Admin users:
1. can perform admin operations
2. are created in the admin database
3. can access all databases, they ares super user
2. Regular users
1. access a specific database
2. can be read/write or read only
To create a admin user:
{ user: "<name>",
roles: [
db.createUSer( userAdmin )
db.auth( "theAdmin" , "admin" ) --> only for admin database. We can have the same username in
two different databases with different passwords.
db.system.users.find()
use test
db.createUSer( "pat", "123" ) --> by default it has write permission.
All the drivers provide APIs where you can provide credentiasl to authenticate the client as a
particular user when connecting to the database.
Roles are much more finely grained, and custom user permissions can be
created. More information at docs.mongodb.org/manual/reference/built-in-roles
For MongoDB 2.2 and 2.4, which are true?
The user's password is sent over the network unencrypted. The user's password is
stored on disk unencrypted. You can give a user read-only access to one database and write
access to another database.
Intra-cluster Security
XTo use the -keyFile option, we need to put into a file a key, which is a text string made out of
base64 legal character, so upper and lower case letters, a few symbols and numbers or what are
legal but it is very interesting that put a strong password.
rs.initiate()
mongod --port 27002 --dbpath data2 --auth --replSet abc --keyFile filename.txt
> rs.status()
{
"ok" : 0,
"errmsg" : "not authorized on admin to execute command
{ replSetGetStatus: 1.0 }",
"code" : 13
}
>
Overview of Backing Up
Methods to backup in mongodb for individual replica set or an individual server:
Mongodump
Make a dump of all databases on a given server or particular database.
Use the --oplog option.
It can be done while the system servicing normal load and operations but it creates some
additional load on the system
mongorestore utility: to restore from backup later if we needed to. Use the oplogReplay
option : it will allow you to achieve a true point in time snapshot for the particular replica set we
are dumping
Filesystem Snapshotting
If we have a true snapshot capability, this is a good option for a mongod instance that it is up and
running and hot:
1. we will need journaling enabled: used for crash recovery in the event of an
unexpected crash of a Mongod also mean that any snapshot which includes the journal,
will be valid if it is a true point in time snapshot by the underlying file system
1. if journaling is not enabled, what would happen is you will getting a
snapshot from a point in time where it is possible that an operation was mid-
streamed in terms of its rights to data files and it would not be necessarily
consistent
2. we need a snapshotting of the entire data directory and file system
2. we could use a feature called db.fsyncLock : it will flush everything to disk and
then lock the system, preventing writes
Snapshots ares generally very fast.
1. turn off the balancer: because it moves data around and if there are chunk
migrations in progress during our backup that could be problematic. In the shell, we
would stop the balancer with sh.stopBalancer() and if it is in the middle of doing
someting that may take a minute before it returns
2. backup config databases
1. using mongodump
2. stop one of our three config servers and copy its files
3. backup each replica set
4. start the balancer: sh.startBalancer()
1
sh.stopBalancer()
Correct
Backup Strategies
To backup a sharded cluster:
1. stop the balancer
1. mongo --host nameMongos --eval "sh.stopBalancer() --> (make sure that
worked)
2. backup config database / a config server
1. mongodump --host nameMongos_or_nameConfigServer --db config
3. backup all the shards
1. Two ways:
1. shut down a secondary in each replica set and grabbing its data
files. if we have snapshotting, just grabbing a snapshot from each replica
set of one node
2. do a mongodump of each shard:
1. mongodump --host shard_1_srv --oplog
/backups/clusters/shard_1
2. mongodump --host shard_2_srv --oplog
/backups/clusters/shard_2
3. ...
4. mongodump --host shard_n_srv --oplog
/backups/clusters/shard_n
4. turn the balancer back on
1. mongo --host nameMongos --eval "sh.startBalancer()
To take in mind:
1. we might want to check that the cluster is healthy before you even begin.
2. if we have replica sets, they can be up when a single server in the set is down
GridFS
If we have a 100 TeraBytes object or file, it can be stored in gridFS.
The drivers for mongo understand gridFS and have support for it.
There are command line tools for it.
Hardware Tips
fast CPU clock is more helpful than more cores (faster cores rather
than more cores)
RAM is good. Mongos does not require a lot of ram
we definitely want 64 bits because mongo uses memory map files
virtualization is OK but certainly not required. It runs pretty well
on Amazon EC2 and it runs fine on VMWare
disable NUMA (Non-Uniform Memory Access) machine
SSD's (SOLID STATE DISKS) are good (reserve some empty space
on the disk (unpartitioned))
the file system cache is most of mongod's memory usage
check readahead setting (small value)
The production notes are found
at: docs.mongodb.org/manual/administration/production-notes
128 GB RAM and a rotating disk storage 32 GB RAM and solid state
disk storage
Additional Resources
1. docs:
1. mongodb.org
2. driver docs:
1. docs.mongodb.org/ecosystem/drivers/
3. bug database / features
1. jira.mongodb.org
2. support forum
1. google groups
4. IRC
1. irc.freenode.net/#mongodb
2. webchat.freenode.net
5. github.com
1. source code
6. blog:
1. blog.mongodb.org
7. @mongodb
8. MMUGs (mongo meetup groups) in various cities around
the world
9. MMS (Mongo Monitoring Service)
Publicado por Juan José Fajardo (Jadsum) en 16:03 1 comentario:
Enviar por correo electrónicoEscribe un blogCompartir con TwitterCompartir con
FacebookCompartir en Pinterest
Etiquetas: mongoDB_4_DBA
miércoles, 11 de febrero de 2015
MongoDB for DBA's 6/7. Scalability
Chapter 6. SCALABILITY: Sharding setup, sharding monitoring,
shard key selection, inserting large amounts of data
In mongodb, the data distribution is based on a sharded key, and in a given collection the
documents that have the same sharded key will be on the same shard member and documents
who sharded key are closed to other sharded key because each shard member will have a range of
documents (range based partitioning) corresponding with the sharded key. A given range if given
key range will live on a particural shard.
Example: sharding key = customer name
True False
1. "Split": split the key range in two different key ranges because the limit of data
has exceeded. That will make sure that there are no huge chunks and the data will be
moved. This operations is unexpensive.
2. "Migrate": it will do when "the balancer" sees there is a lack of balance between
the number of shards on the different shards and decides to do migrations. This
operations is more expensive because the data transfer between members. While the
data is transfered, the system is still live and reads and writes of documents that were in
the key range during the migration will execute.
Sharding Processes
Each shard server will run one or more mongod processes storing data (with replica sets for
instance). We will have multiple mongod's procecesses in multimple mongo servers witin a given
shard.
In a addition, we have config servers (small mongod's) which store the metadata of the clusters
(chunks of the system).
There are one more component: the mongos process --> the data relying component. We can have
any number of these and our clients will connect them to perform operations on the clusters. The
mongos gives a connection from a client, a view of the whole cluster as a single logical entity and
it does not need to worry about if a replica set o shard systems exists.
The mongos will talk about to everything as needed and to the config servers to get metadata and
when a query comes in from a client, the mongos just started up, it will get a relevant metadata it
needs to decide where to send the query from the config servers, and then, it will do the operation
with the mongod's. If it comunicates with more than of those, it will merge data coming back on
as-needed basis.
1. The mongos processes: have no persistent state. There has no data files.
There are really kind of like a load balancer and they will get information they need on
how to get the data and load it and route it from the config servers and they will cache
that in RAM.
2. The mongod's processes: are the data storers (database)
3. The config servers: are the metadata store (contain which shard has the right
data). We can run one config server, if we are running in a development on a
laptop. There are three config servers in a production mongodb cluster in order to get
consensus if one of them get failure. They have identical data with a copy of the metadata
that they keep in sync. As long as one config server is up, the cluster is alive. If all three
were down, the cluster would be down. If all three are not up, metadata changing
operations like splits and migrates cannot happen
db.foo.find(
True False
Cluster Topology
A Cluster setup minimun will be:
Running on localhost
Example with 4 shards processes and 3 replica sets in one machine
mkdir a0
mkdir a1
mkdir a2
mkdir b0
mkdir b1
mkdir b2
mkdir c0
mkdir c1
mkdir c2
mkdir d0
mkdir d1
mkdir d2
mkdir cfg0
mkdir cfg1
mkdir cfg2
mongod --configsvr --dbpath cfg0 --port 26050 --fork --logpath log.cfg0 --logappend
mongod --configsvr --dbpath cfg1 --port 26051 --fork --logpath log.cfg1 --logappend
mongod --configsvr --dbpath cfg2 --port 26052 --fork --logpath log.cfg2 --logappend
mongod --shardsvr --replSet a --dbpath a0 --logpath log.a0 --port 27000 --fork --logappend
--smallfiles --oplogSize 50
mongod --shardsvr --replSet a --dbpath a1 --logpath log.a1 --port 27001 --fork --logappend
--smallfiles --oplogSize 50
mongod --shardsvr --replSet a --dbpath a2 --logpath log.a2 --port 27002 --fork --logappend
--smallfiles --oplogSize 50
mongod --shardsvr --replSet b --dbpath b0 --logpath log.b0 --port 27100 --fork --logappend
--smallfiles --oplogSize 50
mongod --shardsvr --replSet b --dbpath b1 --logpath log.b1 --port 27101 --fork --logappend
--smallfiles --oplogSize 50
mongod --shardsvr --replSet b --dbpath b2 --logpath log.b2 --port 27102 --fork --logappend
--smallfiles --oplogSize 50
mongod --shardsvr --replSet c --dbpath c0 --logpath log.c0 --port 27200 --fork --logappend
--smallfiles --oplogSize 50
mongod --shardsvr --replSet c --dbpath c1 --logpath log.c1 --port 27201 --fork --logappend
--smallfiles --oplogSize 50
mongod --shardsvr --replSet c --dbpath c2 --logpath log.c2 --port 27202 --fork --logappend
--smallfiles --oplogSize 50
mongod --shardsvr --replSet d --dbpath d0 --logpath log.d0 --port 27300 --fork --logappend
--smallfiles --oplogSize 50
mongod --shardsvr --replSet d --dbpath d1 --logpath log.d1 --port 27301 --fork --logappend
--smallfiles --oplogSize 50
mongod --shardsvr --replSet d --dbpath d2 --logpath log.d2 --port 27302 --fork --logappend
--smallfiles --oplogSize 50
It is important that clients talk to the mongod default server on 27017 port not directly to the
servers. Best practices:
Run mongos on the standard mongodb tcp port 27017. It gives the possibility to connect
to the cluester for the rest of the world.
Do not run shard servers mongod's nor config server on that default port 27017.
One per config server One per mongod process One per shard One
However many you want but usually much more than one
>mongo
>use config
>show collections
chunks
databases
lockpings
locks
mongos
settings
shards
system.indexes
version
Which are true?
To check via the mongo shell what's in the config database of a MongoDB cluster, you must
connect it directly to one of the config servers. Three config servers is typical in a MongoDB
cluster with 1,000 total machines. All the config servers have exactly the same data.
On the primary shard and not sharded Not sharded but homed on different shards at
messages
--------
{_id:<message_id>, mailbox_id:___,
sender_id:___, subject:___, date:___,
body:___, read:<bool>, ...
}
While we could use more if we wanted, 9 machines would be just fine. 9 machines works
but is too low for best practice usage
1. sometimes with bulk inicial loads: a case can be when we need to pre-split the
initial key range manually because of the loading data into the shard server one is faster
than we are migrating it out and we are creating a backlog in this shard server 1.