NGT 11-2018,19

NGT Solution November 2018
(2½ Hours)
[Total Marks: 75]
N. B.: (1) All questions are compulsory.

(2) Make suitable assumptions wherever necessary and state the assumptions made.
(3) Answers to the same question must be written together.
(4) Numbers to the right indicate marks.
(5) Draw neat labeled diagrams wherever necessary.
(6) Use of Non-programmable calculators is allowed.
1. Attempt any three of the following: 15

a. What is Big Data? What are the different sources of Big Data?
Ans. Big data is data that has high volume, is generated at high velocity, and has multiple varieties.
Big data is a term used to describe data that has massive volume, comes in a variety
of structures, and is generated at high velocity. This kind of data poses challenges to the
traditional RDBMS systems used for storing and processing data. Big data is paving way for
newer approaches of processing and storing data.
The major sources of this Big data are
1. Enterprises, which are collecting data with more granularities now, attaching more
details with every transaction in order to understand consumer behavior.
2. Increase in multimedia usage across industries such as health care, product companies, etc.
3. Increased popularity of social media sites such as Facebook, Twitter, etc.
4. Rapid adoption of smartphones, which enable users to actively use social media sites and other
Internet applications.
5. Increased usage of sensors and devices in the day-to-day world, which are connected by
networks to computing resources.
b. Compare ACID Vs BASE.
Ans. The RDBMS systems are based on the concept of ACID transactions. ACID stands for Atomic,
Consistent, Isolated, and Durable, where
1. Atomic implies either all changes of a transaction are applied completely or not
applied at all.
2. Consistent means the data is in a consistent state after the transaction is applied.
This means after a transaction is committed, the queries fetching a particular data
will see the same result.
3. Isolated means the transactions that are applied to the same set of data are
independent of each other. Thus, one transaction will not interfere with another
transaction.
4. Durable means the changes are permanent in the system and will not be lost in case of any
failures
Whereas ACID defines the key characteristics required for reliable transaction processing, the
NoSQL world requires different characteristics to enable flexibility and scalability. These opposing
characteristics are cleverly captured in the acronym BASE.BASE can be explained as
1. Basically Available means the system will be available in terms of the CAP theorem.
2. Soft state indicates that even if no input is provided to the system, the state will
change over time. This is in accordance to eventual consistency.
3. Eventual consistency means the system will attain consistency in the long run,
provided no input is sent to the system during that time.
Page 1 of 20
c. With the help of a neat diagram explain, the CAP theorem.
Ans.
The CAP theorem states that when designing an application in a distributed environment there are
three basic requirements that exist, namely consistency, availability, and partition tolerance.
1. Consistency means that the data remains consistent after any operation is performed that
changes the data, and that all users or clients accessing the application see the same updated
data.
2. Availability means that the system is always available.
3. Partition Tolerance means that the system will continue to function even if it is partitioned
into groups of servers that are not able to communicate with one
d. What are the advantages and disadvantages of NoSQL databases?
Ans. Advantages of NoSQL.
1. High scalability : This scaling up approach fails when the transaction rates and
fast response requirements increase. In contrast to this, the new generation of
NoSQL databases is designed to scale out (i.e. to expand horizontally using low-end commodity
servers).
2. Manageability and administration : NoSQL databases are designed to mostly work with
automated repairs, distributed data, and simpler data models, leading to low manageability and
administration.
3. Low cost : NoSQL databases are typically designed to work with a cluster of cheap commodity
servers, enabling the users to store and process more data at a low cost.
4. Flexible data models : NoSQL databases have a very flexible data model, enabling them to
work with any type of data; they don’t comply with the rigid RDBMS data models. As a result,
any application changes that involve updating the database schema can be easily implemented.
Disadvantages of NoSQL
1. Maturity : Most NoSQL databases are pre-production versions with key features that are
still to be implemented. Thus, when deciding on a NoSQL database, you should analyze the
product properly to ensure the features are fully implemented and not still on the To-do list .
2. Support : Support is one limitation that you need to consider. Most NoSQL databases are
from start-ups which were open sourced. As a result, support is very minimal as compared to
the enterprise software companies and may not have global reach or support resources.
3. Limited Query Capabilities : Since NoSQL databases are generally developed to meet the
scaling requirement of the web-scale applications, they provide limited querying capabilities.
A simple querying requirement may involve significant programming expertise.
4. Administration : Although NoSQL is designed to provide a no-admin solution, it still
requires skill and effort for installing and maintaining the solution.
5. Expertise : Since NoSQL is an evolving area, expertise on the technology is limited in the
developer and administrator community.
Page 2 of 20
e. What are the different categories of NoSQL database? Explain each with an example.
Ans.
f. What are the different challenges Big Data posses?

Ans. Big Data Challenges
1. Policies and Procedures
As more and more data is gathered, digitized, and moved around the globe, the policy and
compliance issues become increasingly important. Data privacy, security, intellectual property, and
protection are of immense importance to organizations.
Compliance with various statutory and legal requirements poses a challenge in data handling. Issues
around ownership and liabilities around data are important legal aspects that need to be dealt with
in cases of big data. Moreover, many big data projects leverage the scalability features of public
cloud computing providers. This poses a challenge for compliance.
Policy questions on who owns the data, what is defined as fair use of data, and who is responsible
for accuracy and confidentiality of data also need to be answered.
2. Access to Data
Accessing data for consumption is a challenge for big data projects. Some of the data may be
available to third parties, and gaining access can be a legal, contractual challenge.
Data about a product or service is available on Facebook, Twitter feeds, reviews, and blogs, so how
does the product owner access this data from various sources owned by various providers?
Likewise, contractual clauses and economic incentives for accessing big data need to be tied in to
enable the availability of data by the consumer.
3. Technology and Techniques
New tools and technologies built specifically to address the needs of big data must be leveraged,
rather than trying to address the aforementioned issues through legacy systems. The inadequacy of
legacy systems to deal with big data on one hand and the lack of experienced resources in newer
technologies is a challenge that any big data project has to manage.

a. Consider a Collection users containing the following fields
{
id: ObjectID(),
FName: "First Name",
LName: "Last Name",
Age: 30,
Gender: "M",
Country: "Country"
}
Page 3 of 20
Where Gender value can be either "M" or "F" or “Other”.
Country can be either "UK" or "India" or "USA".
Based on above information write the MongoDB query for the following.
i. Update the country to UK for all female users.

db.users.update({"Gender":"F"}, {$set:{"Country":"UK"}})
ii. Add the new field company to all the documents.

db.users.update({},{$set:{"Company":"TestComp"}},{multi:true})
iii. Delete all the documents where Gender = ‘M’.

db.users.remove({"Gender":"M"})
iv. Find out a count of female users who stay in either India or USA.
db.users.find({"Gender":"F",$or:[{"Country":"India"},
{"Country":"USA"}]}).count()
v. Display the first name and age of all female employees.

db.users.find({"Gender":"F"}, {"Name":1,"Age":1})
b. Write the Mongo dB command to create the following with an example:

(i) Database
MongoDB use DATABASE_NAME is used to create database. The command will create
a new database if it doesn't exist, otherwise it will return the existing database.
Syntax
Basic syntax of use DATABASE statement is as follows
use DATABASE_NAME
Example
If you want to use a database with name <mydb>, then use DATABASE statement would
be as follows −
use mydb
(ii) Collection
MongoDB db.createCollection(name, options) is used to create collection.
Syntax
Basic syntax of createCollection() command is as follows −
db.createCollection(name, options)
In the command, name is name of collection to be created. Options is a document and is
used to specify configuration of collection.
Basic syntax of createCollection() method without options is as follows −
use test
switched to db test
db.createCollection("mycollection")
(iii) Document
To create a document into MongoDB collection, you need to use MongoDB's insert() or
save() method.
Syntax
The basic syntax of insert() command is as follows −
db.COLLECTION_NAME.insert(document)
Example
db.mycol.insert({
_id: ObjectId(7df78ad8902c),
title: 'MongoDB Overview',
Page 4 of 20
description: 'MongoDB is no sql database',
url: 'http://www.MongoDB.com',
tags: ['mongodb', 'database', 'NoSQL'],
likes: 100
})
(iv) Drop Collection
MongoDB's db.collection.drop() is used to drop a collection from the database.
Syntax
Basic syntax of drop() command is as follows −
db.COLLECTION_NAME.drop()
Example
db.mycollection.drop()
(v) Drop Database
MongoDB db.dropDatabase() command is used to drop a existing database.
Syntax
Basic syntax of dropDatabase() command is as follows −
db.dropDatabase()
(vi) Index
To create an index you need to use ensureIndex() method of MongoDB.
Syntax
The basic syntax of ensureIndex() method is as follows().
db.COLLECTION_NAME.ensureIndex({KEY:1})
Example
db.mycol.ensureIndex({"title":1})
c. Write a short note on Capped collection.
Ans. 1. Capped collections are fixed-size circular collections that follow the insertion order to support
high performance for create, read, and delete operations. By circular, it means that when the
fixed size allocated to the collection is exhausted, it will start deleting the oldest document in
the collection without providing any explicit commands.
2. Capped collections restrict updates to the documents if the update results in increased document
size. Since capped collections store documents in the order of the disk storage, it ensures that
the document size does not increase the size allocated on the disk. Capped collections are best
for storing log information, cache data, or any other high volume data.
3. Creating Capped Collection
To create a capped collection, we use the normal createCollection command but with capped
option as true and specifying the maximum size of collection in bytes.
1. db.createCollection("cappedLogCollection",{capped:true,size:10000})
In addition to collection size, we can also limit the number of documents in the collection
using the max parameter −
2. db.createCollection("cappedLogCollection",{capped:true,size:10000,max:1000})
If you want to check whether a collection is capped or not, use the following isCapped
command −
3. db.cappedLogCollection.isCapped()
If there is an existing collection which you are planning to convert to capped, you can do it
with the following code −
4. db.runCommand({"convertToCapped":"posts",size:10000})
This code would convert our existing collection posts to a capped collection.
d. List and explain the different conditional operators in MongoDB.
Ans. Different conditional operators are $lt , $lte , $gt , $gte , $in , $nin , and $not
1. $lt and $lte
They stand for “ less than ” and “ less than or equal to, ” respectively.
Page 5 of 20
If you want to find all students who are younger than 25 (Age < 25), you can execute the
following find with a selector:
db.students.find({"Age":{"$lt":25}})
2. $gt and $gte
The $ gt and $gte operators stand for “g reater than” and “ greater than or equal to,”
respectively.Let’s find out all of the students with Age > 25 . This can be achieved by executing
the following command:
db.students.find({"Age":{"$gt":25}})
3. $in and $nin
Let’s find all students who belong to either class C1 or C2 . The command for the same is
db.students.find({"Class":{"$in":["C1","C2"]}})
The inverse of this can be returned by using $nin .
To find students who don’t belong to class C1 or C2 . The command is
db.students.find({"Class":{"$nin":["C1","C2"]}})
e. Explain the two ways MongoDB enables distribution of the data in Sharding.
Ans There are two ways MongoDB enables distribution of the data:
range-based partitioning and hash basedpartitioning.
1. Range-Based Partitioning
In range-based partitioning , the shard key values are divided into ranges. Say you consider a
timestamp field as the shard key. In this way of partitioning, the values are considered as a straight
line starting from a Min value to Max value where Min is the starting period (say, 01/01/1970) and
Max is the end period (say, 12/31/9999). Every document in the collection will have timestamp
value within this range only, and it will represent some point on the line.
Based on the number of shards available, the line will be divided into ranges, and documents will
be distributed based on them
Fig: Range Based portioning
2. Hash-Based Partitioning
In hash-based partitioning , the data is distributed on the basis of the hash value of the shard field.
If selected, this will lead to a more random distribution compared to range-based partitioning. It’s
unlikely that the documents with close shard key will be part of the same chunk. For example, for
ranges based on the hash of the id field, there will be a straight line of hash values, which will again
be partitioned on basis of the number of shards. On the basis of the hash values, the documents will
lie in either of the shards.
Page 6 of 20
Fig: Hash-based partitioning
f. List and explain the 3 core components in the MongoDB package.
Ans. The core components in the MongoDB package are
1) mongod :which is the core database process
2) mongos : which is the controller and query router for sharded clusters
3) mongo : which is the interactive MongoDB shell
1. mongod
The primary daemon in a MongoDB system is known as mongod . This daemon handles all the
data requests, manages the data format, and performs operations for background
management.When a mongod is run without any arguments, it connects to the default data
directory, which isC:\data\db or /data/db , and default port 27017, where it listens for socket
connections.It’s important to ensure that the data directory exists and you have write permissions
to the directory before the mongod process is started.
2. mongo
mongo provides an interactive JavaScript interface for the developer to test queries and operations
directly on the database and for the system administrators to manage the database. This is all done
via the command line. When the mongo shell is started, it will connect to the default database called
test . This database connection value is assigned to global variable db 3. mongos
mongos is used in MongoDB sharding. It acts as a routing service that processes queries from the
application layer and determines where in the sharded cluster the requested data is located.

a. List and explain the limitations of Sharding.
Ans. Sharding Limitations
Sharding is the mechanism of splitting data across shards. The following are the limitations
when dealing with sharding.
1. Shard Early to Avoid Any Issues
Using the shard key, the data is split into chunks, which are then automatically distributed amongst
the shards. However, if sharding is implemented late, it can cause slowdowns of the servers because
the splitting and migration of chunks takes time and resources.
Page 7 of 20
A simple solution is to monitor your MongoDB instance capacity using tools such as MongoDB
Cloud Manager (flush time, lock percentages, queue lengths, and faults are good measures) and
shard before reaching 80% of the estimated capacity.
2. Shard Key Can’t Be Updated
The shard key can’t be updated once the document is inserted in the collection because MongoDB
uses shard keys to determine to which shard the document should be routed. If you want to change
the shard key of a document, the suggested solution is to remove the document and reinsert the
document when he change has been made.
3. Shard Collection Limit
The collection should be sharded before it reaches 256GB.
4. Select the Correct Shard Key
It’s very important to choose a correct shard key because once the key is chosen it’s not easy to
correct it.
b. What is Data Storage engine? Which is the default storage engine in MongoDB? Also compare
MMAP and Wired Tiger storage engines.
Ans. A component that has been an integral part of MongoDB database is a Storage Engine, which
defines how data is stored in the disk. There may be different storage engines for a particular
database and each one is productive in different aspects.
MongoDB uses MMAP as its default storage engine
MMAP vs Wired Tiger storage engines
1. Concurrency
MMAP uses Collection-level locking. In MongoDB 3.0 and more,if a client acquires a lock
on a document to modify its content,then no other client can access the collection which
holds the document currently. Whereas in earlier versions a single write operation acquires
the lock to the database.WiredTiger uses Document-Level Locking. Multiple clients can
access the same collection, since the lock is acquired for that particular document.
2. Consistency
Journaling is one feature that helps your database from recovery after a failure. MongoDB
uses write ahead logging to on-disk journal files.
MMAP uses this feature to recover from failure.
In WiredTiger,a consistent view of the data is given by means of checkpoints. so that in
case of failure it can rollback to the previous checkpoint. But journaling is required if the
changes made after checkpoint is also needed.It’s left to the user’s choice to enable or
disable journaling.
3. Compression
Data compression is needed in places where the data growth is extremely fast which can be
used to reduce the disk space consumed.
Data compression facilities is not present in MMAP storage engine.
In WiredTiger Storage Engine. Data compression is achieved using two methods: Snappy
compression and Zlib
In Snappy method,compression rate is slow whereas in Zlib its high, And again, it’s the
user’s preference to have it or not.
4. Memory Constraints
Wired Tiger can make use of multithreading and hence multi core CPU’s can be made use
of it.whereas in MMAP, increasing the size of the RAM decreases the number of page faults
which in turn increases the performance.
c. “With the rise of the Smartphone, it’s becoming very common to query for things near a current
location”. Explain the different indexes used by MongoDB to support such location-based queries.
Ans. The different indexes used by MongoDB to support such location-based queries, MongoDB
provides geospatial indexes
Geospatial indexes
To create a geospatial index, a coordinate pair in the following forms must exist in the documents:
Page 8 of 20
• Either an array with two elements
• Or an embedded document with two keys (the key names can be anything).
The following are valid examples:
{ "userloc" : [ 0, 90 ] }
{ "loc" : { "x" : 30, "y" : -30 } }
{ "loc" : { "latitude" : -30, "longitude" : 180 } }
{"loc" : {"a1" : 0, "b1" : 1}}. db.userplaces.ensureIndex( { userloc : "2d" } )
A geospatial index assumes that the values will range from -180 to 180 by default. If this needs to
be changed, it can be specified along with ensureIndex as follows:
db.userplaces.ensureIndex({"userloc" : "2d"}, {"min" : -1000, "max" : 1000})
The following can be used to create a geospatial index on the userloc field:
Let’s understand with an example how this index works. Say you have documents that are of the
following type:
{"loc":[0,100], "desc":"coffeeshop"}
{"loc":[0,1], "desc":"pizzashop"}
If the query of a user is to find all coffee shops near her location, the following compound index
can help:
db.ensureIndex({"userloc" : "2d", "desc" : 1})
Geohaystack Indexes
Geohaystack indexes are bucket-based geospatial indexes (also called geospatial haystack indexes
). They are useful for queries that need to find out locations in a small area and also need to be
filtered along another dimension, such as finding documents with coordinates within 10 miles and
a type field value as restaurant .
While defining the index, it’s mandatory to specify the bucketSize parameter as it determines the
haystack index granularity. For example,
db.userplaces.ensureIndex({ userpos : "geoHaystack", type : 1 }, { bucketSize : 1 })
This example creates an index wherein keys within 1 unit of latitude or longitude are stored together
in the same bucket. You can also include an additional category in the index, which means that
information will be looked up at the same time as finding the location details.
d. What is Journaling? Explain the importance of Journaling with the help of a neat diagram.
Ans. 1. In this process, a write operation occurs in mongod, which then creates changes in private
view. The first block is memory and the second block is ‘my disc’. After a specified interval,
which is called a ‘journal commit interval’, the private view writes those operations in journal
directory (residing in the disc).
2. Once the journal commit happens, mongod pushes data into shared view. As part of the
process, it gets written to actual data directory from the shared view (as this process happens
in background). The basic advantage is, we have a reduced cycle from 60 seconds to 200
milliseconds.
3. In a scenario where an abruption occurs at any point of time or flash disc remains unavailable
for last 59 seconds (keeping in mind the existing data in journal directory/write operations),
then when the next time mongod starts, it basically replays all write operation logs and writes
into the actual data directory.
Page 9 of 20
e. Write a short note on Replication Lag.
Ans. Replication Lag
Replication lag is the primary administrative concern behind monitoring replica sets. Replication
lag for a given secondary is the difference in time when an operation is written in primary and the
time when the same was replicated on the secondary. Often, the replication lag remedies itself and
is transient. However, if it remains high and continues to rise, there might be a problem with the
system. You might end up either shutting down the system until the problem is resolved, or it might
require manual intervention for reconciling the mismatch, or you might even end up running the
system with outdated data.
The following command can be used to determine the current replication lag of the replica set:
testset:PRIMARY> rs.printSlaveReplicationInfo()
Further, you can use the rs.printReplicationInfo() command to fill in the missing piece:
testset:PRIMARY> rs.printReplicationInfo()
MongoDB Cloud Manager can also be used to view recent and historical replication lag
information. The repl lag graph is available from the Status tab of each SECONDARY node.
Here are some ways to reduce this time:
1. In scenarios with a heavy write load, you should have a secondary as powerful as the
primary node so that it can keep up with the primary and the writes can be applied
on the secondary at the same rate. Also, you should have enough network bandwidth
so that the ops can be retrieved from the primary at the same rate at which they are
getting created.
2. Adjust the application write concern.
3. If the secondary is used for index builds, this can be planned to be done when there
are low write activities on the primary.
4. If the secondary is used for taking backups, consider taking backups without
blocking.
5. Check for replication errors. Run rs.status() and check the errmsg field.
Additionally, the secondary’s log files can be checked for any existing error messages.
f. Explain “ GridFS – The MongoDB File System” with the help of a neat diagram.
Ans. 1. MongoDB stores data in BSON documents. BSONdocuments have a document size limit of
16MB.GridFS is MongoDB’s specification for handling large files that exceed BSON’s
document size limit.
2. GridFS uses two collections for storing the file. One collection maintains the metadata of the
file and the other collection stores the file’s data by breaking it into small pieces called chunks.
This means the file is divided into smaller chunks and each chunk is stored as a separate
document. By default the chunk size is limited to 255KB.This approach not only makes the
storing of data scalable and easy but also makes the range queries easier to use when a specific
Page 10 of 20
part of files are retrieved.Whenver a file is queried in GridFS, the chunks are reassembled as
required by the client. This also provides the user with the capability to access arbitrary sections
of the files. For example, the user can directly move to the middle of a video file.
3. The GridFS specification is useful in cases where the file size exceeds the default 16MB
limitation of MongoDB BSON document. It’s also used for storing files that you need to access
without loading the entire file in memory.GridFS enables you to store large files by splitting
them up into smaller chunks and storing each of the chunks as separate documents. In addition
to these chunks, there’s one more document that contains the metadata about the file. Using
this metadata information, the chunks are grouped together, forming the complete file. The
storage overhead for the chunks can be kept to a minimum, as MongoDB supports storing
binary data in documents.
4. The two collections that are used by GridFS for storing of large files are by default named as
fs.filesand fs.chunks , although a different bucket name can be chosen than fs .The chunks are
stored by default in the fs.chunks collection. If required, this can be overridden. Hence all of
the data is contained in the fs.chunks collection.
5. The structure of the individual documents in the chunks collection is pretty simple:
"_id" : ObjectId("..."),"n" : 0,"data" : BinData("..."),

"files_id" : ObjectId("...")
}
The chunk document has the following important keys.
1. "_id" : This is the unique identifier.
2. "files_id" : This is unique identifier of the document that contains the metadata
related to the chunk.
3. "n" : This is basically depicting the position of the chunk in the original file.
4. "data" : This is the actual binary data that constitutes this chunk.
Fig: Grid FS

a Explain TimesTen Architecture with the help of a neat diagram.
Ans. 1. TimesTen is a relatively early in-memory database system that aspires to support workloads
similar to atraditional relational system, but with better performance. Oracle offers it as a
standalone in-memory database or as a caching database supplementingthe traditional disk-
based Oracle RDBMS.
Page 11 of 20
2. TimesTen implements a fairly familiar SQL-based relational model. Subsequent to the purchase
by Oracle,it implemented ANSI standard SQL, but in recent years the effort has been to make
the database compatiblewith the core Oracle database—to the extent of supporting Oracle’s
stored procedure language PL/SQL.In a TimesTen database, all data is memory resident.
3. Persistence is achieved by writing periodic snapshots of memory to disk, as well as writing to
a disk-based transaction log following a transaction commit.
Fig: TimesTen Architecture

Above Figure illustrates the TimesTen architecture. When the database is started, all data is
loaded fromcheckpoint files into main memory (1). The application interacts with TimesTen
via SQL requests that are guaranteed to find all relevant data inside that main memory (2).
Periodically or when required database data is written to checkpoint files (3). An application
commit triggers a write to the transaction log (4), though by default this write will be
asynchronous so that the application will not need to wait on disk. The transaction log can be
used to recover the database in the event of failure (5).
b Write a jQuery code to change text contents of the elements on button click.
Ans. <!DOCTYPE html>
<html>
<head>
<title></title>
<script src="jquery.js"></script>
<script>
$(document).ready(
function()
{
$("button").click(
function()
{
document.write("hello world");
}
Page 12 of 20
);
}
);
</script>
</head>
<body>
<p>Hello! Welcome in Jquery Language!!</p>
<button>Click me</button>
</body>
</html>
c Explain how we can create our own custom event in jQuery with an example.
Ans. A seldom-used but very useful feature of jQuery’s events is the ability to trigger and bind to your own
custom events. We can use the jQuery,s On method to attach event handlers to elements.For example in the
below code we have created a customized event named “myOwnEvent” which will get triggered on click
of the button.
Code:
<html>
<head>
<script src="jquery-3.3.1.min.js"></script>
<script>
$(document).ready(function(){
$("p").on("myOwnEvent", function(event, showName){
$(this).text(showName + "! It is a Javascript Library!");
});
$("button").click(function(){
$("p").trigger("myOwnEvent", ["Jquery"]);
});
});
</script>
</head>
<body>
<button>Trigger custom event</button>
<p>Click the button to attach a customized event on this p element.</p>
</body>
</html>
d What is Ajax? What is the use of Ajax? Explain how Ajax can be used with jQuery.
Ans. 1. Ajax stands for Asynchronous Javascript And Xml. Ajax is just a means of loading data from
the server to the web browser without reloading the whole page.
2. Basically, what Ajax does is make use of the JavaScript-based XMLHttpRequest object to send
and receive information to and from a web server asynchronously, in the background, without
interfering with the user's experience.
3. Ajax has become so popular that you hardly find an application that doesn't use Ajax to some
extent. The example of some large-scale Ajax-driven online applications is: Gmail, Google
Maps, Google Docs, YouTube, Facebook, Flickr, etc.
Ajax with jQuery
4. Different browsers implement the Ajax differently that means if we are adopting the typical
JavaScript way to implement the Ajax we have to write the different code for different browsers
to ensure that Ajax would work cross-browser.
Page 13 of 20
5. But, fortunately jQuery simplifies the process of implementing Ajax by taking care of those
browser differences. It offers simple methods such as load(), $.get(), $.post(), etc. to implement
the Ajax that works seamlessly across all the browsers.
For example jQuery load() Method
6. The jQuery load() method loads data from the server and place the returned HTML into the
selected element. This method provides a simple way to load data asynchronous from a web
server. The basic syntax of this method can be given with:
$(selector).load(URL, data, complete);
The parameters of the load() method has the following meaning:
1) The required URL parameter specifies the URL of the file you want to load.
2) The optional data parameter specifies a set of query string (i.e. key/value pairs) that is
sent to the web server along with the request.
3) The optional complete parameter is basically a callback function that is executed when
the request completes. The callback is fired once for each selected element.
Code:
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script>
$("#div1").load("demo_test.txt", function(responseTxt, statusTxt, xhr){
if(statusTxt == "success")
alert("External content loaded successfully!");
if(statusTxt == "error")
alert("Error: " + xhr.status + ": " + xhr.statusText);
});
});
});
</script>
</head>
<body>
<div id="div1"><h2>Let jQuery AJAX Change This Text</h2></div>
<button>Get External Content</button>
</body>
</html>
The following example loads the content of the element with id="p1", inside the file
"demo_test.txt", into a specific <div> element:
e Explain how to add and remove elements to DOM in jQuery with an example
Ans. Adding Elements to DOM
1) jQuery provides several methods that allows us to insert new content inside an existing
element.Thare are 3 ways to insert elements in the DOM. Inserting into the DOM
1. DOM Insertion, Around: These methods let you insert elements around
existing ones.(wrap(),wrapAll(),wrapInner())
The wrap() method wraps specified HTML element(s) around each selected element.
Example
Wrap a <div> element around each <p> element:
$("p").wrap("<div></div>");
Page 14 of 20
});
wrapAll():Wraps HTML element(s) around all selected elements
wrapInner():Wraps HTML element(s) around the content of each selected element
2. DOM Insertion, Inside: These methods let you insert elements within existing
ones.(append(),appendTo(), html(),prepend(),prependTo(),text())
The append() method inserts specified content at the end of the selected elements.
Example
Insert content at the end of all <p> elements:
$("p").append("<b>Appended text</b>");
});
The prepend() method inserts specified content at the beginning of the selected
elements.
Example
Insert content at the beginning of all <p> elements:
$("p").prepend("<b>Prepended text</b>");
});
The html() method sets or returns the content (innerHTML) of the selected elements.
Example
Change the content of all <p> elements:
$("p").html("Hello <b>world</b>!");
});
3. DOM Insertion, Outside: These methods let you insert elements outside existing
ones that are completely separate( after(),before(),insertAfter(),insertBefore() )
The after() method inserts specified content after the selected elements.
Example
Insert content after each <p> element:
$("p").after("<p>Hello world!</p>");
});
The before() method inserts specified content in front of (before) the selected elements.
Example
Insert content before each <p> element:
$("p").before("<p>Hello world!</p>");
});
Removing Elements to DOM
2) jQuery provides handful of methods, such as empty(), remove(), unwrap() etc. to remove
existing HTML elements or contents from the document.
The empty() method removes all child nodes and content from the selected elements
Example
Page 15 of 20
Remove the content of all <div> elements:
$("div").empty();
});
The remove() method removes the selected elements, including all text and child nodes.
This method also removes data and events of the selected elements.
Example
Remove all <p> elements:
$("p").remove();
});
f Write a jQuery code to add a CSS class to the HTML elements.
Ans. <!DOCTYPE html>
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script>
$("p:first").addClass("intro");
});
});
</script>
<style>
.intro {
font-size: 150%;
color: red;
}
</style>
</head>
<body>
<h1>This is a heading</h1>
<p>This is a paragraph.</p>
<p>This is another paragraph.</p>
<button>Add a class name to the first p element</button>
</body>
</html>

a. Explain the use of json_encode and json_decode function with an example.
Ans. 1. A common use of JSON is to read data from a web server, and display the data in a web page.We can
exchange JSON data between the client and a PHP server.PHP has some built-in functions to handle
JSON.
2. Objects in PHP can be converted into JSON by using the PHP function json_encode()
For Example:
<?php
$myObj->name = "John";
$myObj->age = 30;
$myObj->city = "New York";
Page 16 of 20
$myJSON = json_encode($myObj);
echo $myJSON;
?>
3. PHP's json_decode function takes a JSON string and converts it into a PHP variable. Typically, the
JSON data will represent a JavaScript array or object literal which json_decode will convert into a PHP
array or object. The following two examples demonstrate, first with an array, then with an object:
Example 1:
$json = '["apple","orange","banana","strawberry"]';
$ar = json_decode($json);
// access first element of $ar array
echo $ar[0]; // apple
Example 2:
$json = '{
"title": "JavaScript: The Definitive Guide",
"author": "David Flanagan",
"edition": 6
}';
$book = json_decode($json);
// access title of $book object
echo $book->title; // JavaScript: The Definitive Guide
b. List and explain any 5 XMLHttpRequest Event Handlers used for Monitoring the Progress of the
HTTP Request.
Ans.
c. What is the use of Stringify function? What are the different parameters that can be passed in
Stringify function? Explain with an example.
Ans. 1. A common use of JSON is to exchange data to/from a web server. When sending data to a web server,
the data has to be a string.
2. We can Convert a JavaScript object into a string with JSON.stringify(). Stringify a JavaScript Object
Imagine we have this object in JavaScript:
var obj = { name: "John", age: 30, city: "New York" };
Use the JavaScript function JSON.stringify() to convert it into a string.
var myJSON = JSON.stringify(obj);
Page 17 of 20
The result will be a string following the JSON notation.
3. Syntax of the JSON stringify Method
JSON.stringify(value[, replacer [, space]]);
4. The value parameter of the stringify method is the only required parameter of the three outlined by the
signature.The argument supplied to the method represents the JavaScript value intended to be serialized.
This can be that of any object, primitive, or even a composite of the two.
5. The optional replacer parameter is either a function that alters the way objects and arrays are stringified
or an array of strings and numbers that acts as a white list for selecting the object properties that will be
stringified.
6. The third parameter, space, is also optional and allows you to specify the amount ofpadding that
separates each value from one another within the produced JSON text. This padding provides an added
layer of readability to the produced string.
7. Code:
<html>
<head>
<title>JSON programs </title>
</head>
<body>
<script type="text/javascript">
var data={
"Bname":"JSON",
"Publisher": "TataMcgraw",
"author": "Smith",
"price":250,
"ISBN":"1256897912345"
};
document.writeln(JSON.stringify(data));
document.writeln(JSON.stringify(data,["Bname","author","price"]));
document.write(JSON.stringify(data,["Bname","author","price"],5));
</script>
</body>
</html>
d. Write a short note on JSON Arrays.

Ans. 1. Arrays in JSON are almost the same as arrays in JavaScript.In JSON, array values must be of
type string, number, object, array, boolean or null.In JavaScript, array values can be all of the
above, plus any other valid JavaScript expression, including functions, dates, and undefined.
2. Arrays in JSON Objects
Arrays can be values of an object property:
Example
{
"name":"John",
"age":30,
"cars":[ "Ford", "BMW", "Fiat" ]
}
3. Accessing Array Values

We can access the array values by using the index number:
Example
x = myObj.cars[0];
Page 18 of 20
4. Looping Through an Array
We can access array values by using a for-in loop:
var myObj, i, x = "";
myObj = {
"name":"John",
"age":30,
"cars":[ "Ford", "BMW", "Fiat" ]
};
for (i in myObj.cars) {
x += myObj.cars[i] + "<br>";
}
5. We can use the index number to modify an array:

Example
myObj.cars[1] = "Mercedes";
6. We can use the delete keyword to delete items from an array:

Example
delete myObj.cars[1];
e. List and explain the different methods of a Cradle Wrapper.
Ans.
f. “JSON is better than XML”.Comment.

Ans. 1. JSON is Unlike XML Because
a) JSON doesn't use end tag
b) JSON is shorter
c) JSON is quicker to read and write
d) JSON can use arrays
2. The biggest difference is: XML has to be parsed with an XML parser. JSON can be parsed by
a standard JavaScript function.XML is much more difficult to parse than JSON.
3. JSON is parsed into a ready-to-use JavaScript object.For AJAX applications, JSON is faster
and easier than XML:
4. Using XML
a) Fetch an XML document
b) Use the XML DOM to loop through the document
c) Extract values and store in variables
5. Using JSON
a) Fetch a JSON string
Page 19 of 20
b) JSON.Parse the JSON string.
6. The following JSON and XML examples both define an employees object, with an array of 3
employees:
JSON Example
{"employees":[
{ "firstName":"John", "lastName":"Doe" },
{ "firstName":"Anna", "lastName":"Smith" },
{ "firstName":"Peter", "lastName":"Jones" }
]}
XML Example
<employees>
<employee>
<firstName>John</firstName> <lastName>Doe</lastName>
</employee>
<employee>
<firstName>Anna</firstName> <lastName>Smith</lastName>
</employee>
<employee>
<firstName>Peter</firstName> <lastName>Jones</lastName>
</employee>
</employees>
_____________________________
Page 20 of 20
BSc.(Information Technology)
(Semester V)
2019-20
New Generation
Technology
(USIT507 Elective)
University Paper Solution
By
Ms. Seema Bhatkar
Ms. Seema Bhatkar Page 1

Question 1
Q1a. Explain the three Vs of Big Data.
Ans:
1. Volume
Volume in big data means the size of the data. As businesses are becoming
more transaction-oriented so increasing numbers of transactions adding
generating huge amount of data. This huge volume of data is the biggest
challenge for big data technologies. The storage and processing power
needed to store, process, and make accessible the data in a timely and cost
effective manner is massive.
2. Variety
The data generated from various devices and sources follows no fixed format
or structure. Compared to text, CSV or RDBMS data varies from text files, log
files, streaming videos, photos, meter readings, stock ticker data, PDFs, audio,
and various other unstructured formats.
New sources and structures of data are being created at a rapid pace. So the
onus is on technology to find a solution to analyze and visualize the huge
variety of data that is out there. As an example, to provide alternate routes for
commuters, a traffic analysis application needs data feeds from millions of
smartphones and sensors to provide accurate analytics on traffic conditions
and alternate routes.
3. Velocity
Velocity in big data is the speed at which data is created and the speed at
which it is required to be processed. If data cannot be processed at the
required speed, it loses its significance. Due to data streaming in from social
media sites, sensors, tickers, metering, and monitoring, it is important for the
organizations to speedily process data both when it is on move and when it is
static.

Q1b. Briefly explain the CAP theorem.
Ans: CAP Theorem ( Brewer’s Theorem )

Eric Brewer outlined the CAP theorem in 2000. The theorem states that
when designing an application in a distributed environment there are three basic
requirements that exist, namely consistency, availability, and partition tolerance.
• Consistency means that the data remains consistent after any operation is
performed that changes the data, and that all users or clients accessing the
application see the same updated data.
• Availability means that the system is always available.
• Partition Tolerance means that the system will continue to function even if it
is partitioned into groups of servers that are not able to communicate with
one another.
The CAP theorem states that at any point in time a distributed system can fulfil only
two of the above three guarantees.
Q 1c. What is MongoDB Design Philosophy? Explain.

Ans:
1. Speed, Scalability, and Agility
To achieve speed and horizontal scalability in a partitioned database, as
explained in the CAP theorem, the consistency and transactional support have to
be compromised. Thus, per this theorem,MongoDB provides high availability,
scalability, and partitioning at the cost of consistency and transactional support.
In practical terms, this means that instead of tables and rows, MongoDB uses
documents to make it flexible, scalable, and fast.

2. Non-Relational Approach
There are following issues in RDBMS systems
 In order to scale out, the RDBMS database needs to link the data available in
two or more systems in order to report back the result. This is difficult to
achieve in RDBMS systems since they are designed to work when all the data
is available for computation together. Thus the data has to be available for
processing at a single location.
 In case of multiple Active-Active servers, when both are getting updated from
multiple sources there is a challenge in determining which update is correct.
 When an application tries to read data from the second server, and the
information has been updated on the first server but has yet to be
synchronized with the second server, the information returned may be stale.
The MongoDB takes a non-relational approach and stores its data in
BSON documents where all the related data is placed together,which means
everything is in one place. The queries in MongoDB are based on keys in the
document, so the documents can be spread across multiple servers. Querying
each server means it will check its own set of documents and return the result.
This enables linear scalability and improved performance.
MongoDB has a primary-secondary replication where the primary
accepts the write requests. If the write performance needs to be improved,
then sharding can be used; this splits the data across multiple machines and
enables these multiple machines to update different parts of the datasets.
Sharding is automatic in MongoDB; as more machines are added, data is
distributed automatically.
3. JSON-Based Document Store

MongoDB uses a JSON-based (JavaScript Object Notation) document store
for the data. JSON/BSON offers a schema-less model, which provides flexibility in
terms of database design.
This design also makes for high performance by providing for grouping of
relevant data together internally and making it easily searchable.
A JSON document contains the actual data and is comparable to a row in SQL.
However, in contrast to RDBMS rows, documents can have dynamic schema. This
means documents within a collection can have different fields or structure, or
common fields can have different type of data.
A document contains data in form of key-value pairs.
example:
{
"Name": "ABC",
"Phone": ["1111111",
........"222222"
........],
"Fax":..}

4. Performance vs. Features
MongoDB is a document-oriented DBMS where data is stored as
documents. It does not support JOINs, and it does not have fully generalized
transactions. However, it does provide support for secondary indexes, it enables
users to query using query documents, and it provides support for atomic
updates at a per document level. It provides a replica set, which is a form of
master-slave replication with automated failover, and it has built-in horizontal
scaling.
5. Running the Database Anywhere

The language used for implementing MongoDB is C++, which enables
MongoDB to run on servers, VMs, or even on the cloud. The 10gen site provides
binaries for different OS platforms, enabling MongoDB to run on almost any type
of machine.
Q1d. Differentiate between SQL and NoSQL Databases
Ans:
SQL NoSQL
Types All types support SQL standard. Multiple types exists, such as
document stores, key value
stores, column databases, etc.
Development Developed in 1970. Developed in 2000s.
History
Examples SQL Server, Oracle, MySQL. MongoDB, HBase, Cassandra.
Data Storage Data is stored in rows and The data model depends on the
Model columns in a table, where each database type. Say data is stored
column is of a specific type. as a key-value pair for key-value
The tables generally are created stores. In documentbased
on principles of normalization. databases, the data is stored as
Joins are used to retrieve data documents.
from multiple tables. The data model is flexible, in
contrast to the rigid table model
of the RDBMS.
Schemas Fixed structure and schema, so Dynamic schema, new data types,
any change to schema involves or structures can be
altering the database. accommodated by
expanding or altering the current
schema.
New fields can be added
dynamically.
Scalability Scale up approach is used; this Scale out approach is used; this
means as the load increases, means distributing the data load
bigger, across inexpensive commodity
expensive servers are bought to servers.
accommodate the data.
Supports Supports ACID and transactions. Supports partitioning and
Transactions availability, and compromises on

transactions.
Transactions exist at certain level,
such as the database level or
document level.
Consistency Strong consistency. Dependent on the product. Few
chose to provide strong
consistency whereas few provide
eventual consistency.
Support High level of enterprise support Open source model. Support
is through third parties or
provided. companies building the open
source products.
Maturity Have been around for a long Some of them are mature; others
time. are evolving.
Querying Available through easy-to-use Querying may require
Capabilities GUI programming expertise and
interfaces. knowledge. Rather than an UI,
focus is on functionality and
programming interfaces.
Expertise Large community of developers Small community of developers
who have been leveraging the working on these open source
SQL tools.
language and RDBMS concepts
to architect and develop
applications.
Q1e. Write a short note on Non-Relational Approach.

Traditional RDBMS platforms provide scalability using a scale-up approach,
which requires a faster server to increase performance. The following issues in
RDBMS systems led to why MongoDB and other NoSQL databases were designed
the way they are designed:
 In order to scale out, the RDBMS database needs to link the data available in
two or more systems in order to report back the result. This is difficult to
achieve in RDBMS systems since they are designed to work when all the data
is available for computation together. Thus the data has to be available for
processing at a single location.
 In case of multiple Active-Active servers, when both are getting updated from
multiple sources there is a challenge in determining which update is correct.
 When an application tries to read data from the second server, and the
information has been updated on the first server but has yet to be
synchronized with the second server, the information returned may be stale.
The MongoDB team decided to take a non-relational approach to
solving these problems. As mentioned, MongoDB stores its data in BSON
documents where all the related data is placed together,which means
everything is in one place. The queries in MongoDB are based on keys in the
document, so the documents can be spread across multiple servers. Querying

each server means it will check its own set of documents and return the result.
This enables linear scalability and improved performance.
MongoDB has a primary-secondary replication where the primary
accepts the write requests. If the write performance needs to be improved,
then sharding can be used; this splits the data across multiple machines and
enables these multiple machines to update different parts of the datasets.
Sharding is automatic in MongoDB; as more machines are added, data is
distributed automatically.
Q1f. Discuss the various applications of Big Data.

Ans:
Big data has found many applications in various fields today. The major fields where
big data is being used are as follows.
1. Government
Big data analytics has proven to be very useful in the government
sector. Big data analysis played a large role in Barack Obama’s successful 2012
re-election campaign. Also most recently, Big data analysis was majorly
responsible for the BJP and its allies to win a highly successful Indian General
Election 2014.
2. Social Media Analytics
Social media can provide valuable real-time insights into how the
market is responding to products and campaigns. With the help of these
insights, the companies can adjust their pricing, promotion, and campaign
placements accordingly. Before utilizing the big data there needs to be some
preprocessing to be done on the big data in order to derive some intelligent
and valuable results.
3. Technology
The technological applications of big data comprise of the following
companies which deal with huge amounts of data every day and put them to
use for business decisions as well. For example, eBay.com uses two data
warehouses at 7.5 petabytes and 40PB as well as a 40PB Hadoop cluster for
search, consumer recommendations, and merchandising.
4. Agriculture
A biotechnology firm uses sensor data to optimize crop efficiency. It
plants test crops and runs simulations to measure how plants react to various
changes in condition. Its data environment constantly adjusts to changes in
the attributes of various data it collects, including temperature, water levels,
soil composition, growth, output, and gene sequencing of each plant in the
test bed. These simulations allow it to discover the optimal environmental
conditions for specific gene types.
5. Marketing
Marketers have begun to use facial recognition software to learn how
well their advertising succeeds or fails at stimulating interest in their products.
6. Smart phones
Perhaps more impressive, people now carry facial recognition
technology in their pockets. Users of I Phone and Android smartphones have
applications at their fingertips that use facial recognition technology for
various tasks.

Question 2
Q2a. Explain the Capped Collection.

Ans Capped Collection
 MongoDB has a concept of capping the collection. This means it stores the
documents in the collection in the inserted order.
 As the collection reaches its limit, the documents will be removed from the collection
in FIFO (first in, first out) order. This means that the least recently inserted documents
will be removed first.
 This is good for use cases where the order of insertion is required to be maintained
automatically, and deletion of records after a fixed size is required.
 One such use cases is log files that get automatically truncated after a certain size.
 MongoDB itself uses capped collections for maintaining its replication logs. Capped
collection guarantees preservation of the data in insertion order, so queries retrieving
data in the insertion order return results quickly and don’t need an index.
 Updates that change the document size are not allowed.
Creating Capped Collection
Syntax:
db.createCollection(name, options)
Option Type Description
(Optional) If true, enables a capped collection.

Capped collection is a fixed size collection that
capped Boolean automatically overwrites its oldest entries when it
reaches its maximum size. If you specify true, you
need to specify size parameter also.
(Optional) If true, automatically create index on _id

autoIndexId Boolean
field. Deprecated since version 3.2
(Optional) Specifies a maximum size in bytes for a

Size number capped collection. If capped is true, then you
need to specify this field also.
(Optional) Specifies the maximum number of

max number
documents allowed in the capped collection.
Example:
db.createCollection(“VSIT”,{capped:true, size:20000, max:100})
Q2b. What are the various tools available in MongoDB? Explain.

Ans
There are various tools that are available as part of the MongoDB installation:
 mongodump : This utility is used as part of an effective backup strategy. It creates a
binary export of the database contents.
 mongorestore : The binary database dump created by the mongodump utility is

imported to a new or an existing database using the mongorestore utility.

 bsondump : This utility converts the BSON files into human-readable formats
such as JSON and CSV. For example, this utility can be used to read the output file
generated by mongodump.
 mongoimport , mongoexport : mongoimport provides a method for taking data in

JSON , CSV, or T SV formats and importing it into a mongod instance. mongoexport
provides a method to export data from a mongod instance into JSON, CSV, or TSV
formats.
 mongostat , mongotop , mongosniff : These utilities provide diagnostic

information related to the current operation of a mongod instance.
Q2c. Discuss the points to be considered while Importing data in a Share environment.
Ans
1. Pre-Splitting of the Data
Instead of leaving the choice of chunks creation with MongoDB, you can tell
MongoDB how to do so using the following command:
db.runCommand( { split : "practicalmongodb.mycollection" , middle : { shardkey :

value } } );
2. Deciding on the Chunk Size

You need to keep the following points in mind when deciding on the chunk
size :
 If the size is too small, the data will be distributed evenly but it will end up having
more frequent migrations, which will be an expensive operation at the mongos
layer.
 If the size is large, it will lead to less migration, reducing the expense at the mongos
layer, but you will end up with uneven data distribution.
3. Choosing a Good Shard Key

It’s very essential to pick a good shard key for good distribution of data
among nodes of the shard cluster .
4. Monitoring for Sharding

In addition to the normal monitoring and analysis that is done for other
MongoDB instances, the sharding cluster requires an additional monitoring to ensure
that all its operations are functioning appropriately and the data is distributed effectively
among the nodes.
5. Monitoring the Config Servers

The config server stores the metadata of the sharded cluster. The mongos caches
the data and routes the request to the respective shards. If the config server goes down
but there’s a running mongos instance, there’s no immediate impact on the shard cluster
and it will remain available for a while.However, you won’t be able to perform operations
like chunk migration or restart a new mongos.
6. Monitoring the Shard Status Balancing and Chunk Distribution

For a most effective sharded cluster deployment , it’s required that the chunks be
distributed evenly among the shards. As this is done automatically by MongoDB using a
background process. You need to monitor the shard status to ensure that the process is

working effectively. For this, you can use the db.printShardingStatus() or sh.status()
command in the mongos mongo shell to ensure that the process is working effectively.
7. Monitoring the Lock Status

In almost all cases the balancer releases its locks automatically after completing
its process, but you need to check the lock status of the database in order to ensure
there’s no long lasting lock because this can block future balancing, which will affect the
availability of the cluster. Issue the following from mongos mongo to check the lock
status:
use config
db.locks.find().
Q2d. Explain the concept Inserting by Explicitly Specifying_id.

Ans
 MongoDB stores data in documents. Documents are made up of key-value pairs. A
key is used for querying data from the documents.
 Hence, like a RDBMS primary key, you need to have a key that uniquely identifies
each document within a collection. This is referred to as _id in MongoDB.
 If the _id field was not specified, so it was implicitly added.
 In the following example, you will see how to explicitly specify the id field when
inserting the documents within a collection.
 While explicitly specifying the id field, you have to keep in mind the uniqueness of
the field; otherwise the insert will fail.
The following command explicitly specifies the id field:
> db.users.insert({"_id":1, "Name": "vsit"})
The insert operation creates the following document in the users collection:
{ "_id" : 1, "Name" : "vsit" }
Q2e. Discuss Indexes and its types.

Ans
Indexes are used to provide high performance read operations for queries that are
used frequently. By default, whenever a collection is created and documents are added to it,
an index is created on the id field of the document.
1. Single Key Index

create an index on the Name field of the document.
Use ensureIndex() to create the index.
> db.testindx.ensureIndex({"Name":1})
2. Compound Index
When creating an index, you should keep in mind that the index covers most of
your queries. If you sometimes query only the Name field and at times you query both
the Name and the Age field, creating a compound index on the Name and Age fields will
be more beneficial than an index that is created on either of the fields because the
compound index will cover both queries.
The following command creates a compound index on fields Name and Age of
the collection testindx .
> db.testindx.ensureIndex({"Name":1, "Age": 1})

3. Unique Index
Creating index on a field doesn’t ensure uniqueness, so if an index is created on
the Name field, then two or more documents can have the same names. However, if
uniqueness is one of the constraints that needs to be enabled, the unique property needs
to be set to true when creating the index.
First, drop the existing indexes.
>db.testindx.dropIndexes()
Q2f. Write a short note on Data Distribution Process.

Ans
In MongoDB, the data is sharded or distributed at the collection level. The collection
is partitioned by the shard key.
Shard Key
Any indexed single/compound field that exists within all documents of the collection
can be a shard key. You specify that this is the field basis which the documents of the
collection need to be distributed. Internally, MongoDB divides the documents based on the
value of the field into chunks and distributes them across the shards.
There are two ways MongoDB enables distribution of the data: range-based
partitioning and hashbased partitioning.
1. Range-Based Partitioning
In range-based partitioning , the shard key values are divided into ranges. Say you
consider a timestamp field as the shard key. In this way of partitioning, the values are
considered as a straight line starting from a Min value to Max value where Min is the starting
period (say, 01/01/1970) and Max is the end period (say, 12/31/9999). Every document in the
collection will have timestamp value within this range only, and it will represent some point
on the line.
Based on the number of shards available, the line will be divided into ranges, and
documents will be distributed based on them.
The documents where the values of the shard key are nearby are likely to fall on the
same shard. This can significantly improve the performance of the range queries.
2. Hash-Based Partitioning
In hash-based partitioning , the data is distributed on the basis of the hash value of
the shard field. If selected, this will lead to a more random distribution compared to range-
based partitioning.
It’s unlikely that the documents with close shard key will be part of the same chunk.
For example, for ranges based on the hash of the id field, there will be a straight line of hash
values, which will again be partitioned on basis of the number of shards. On the basis of the
hash values, the documents will lie in either of the shards.
In contrast to range-based partitioning, this ensures that the data is evenly

distributed, but it happens at the cost of efficient range queries.
Question 3
Q3a. What is Wired Tiger Storage Engine?

Ans
 When the storage option selected is WiredTiger, data, journals, and indexes are
compressed on disk. The compression is done based on the compression algorithm
specified when starting the mongod.
 Snappy is the default compression option. Under the data directory, there are
separate compressed wt files corresponding to each collection and indexes. Journals
have their own folder under the data directory.
 The compressed files are actually created when data is inserted in the collection (the
files are allocated at write time, no preallocation).
 For example, if you create collection called users , it will be stored in
collection-0—2259994602858926461 files and the associated index will be stored in
index-1—2259994602858926461 , index-2—2259994602858926461 , and so on.
 In addition to the collection and index compressed files, there is a mdb_catalog file
that stores metadata mapping collection and indexes to the files in the data directory.
 Internally, Wired Tiger uses the traditional B+ tree structure for storing and managing
data but that’s where the similarity ends. Unlike B+ tree, it doesn’t support in-place
updates. WiredTiger cache is used for any read/write operations on the data. The
trees in cache are optimized for in-memory access.

Q3b. List and explain the MongoDB limitations.
Ans
1. MongoDB Space Is Too Large (Applicable for MMAPv1)
MongoDB (with storage engine MMAPv1 ) space is too large; in other words,
the data directory files are larger than the database’s actual data. This is because of
preallocated data files. This is by design in order to prevent file system fragmentation.
2. Memory Issues (Applicable for Storage Engine MMAPv1)

In MongoDB, memory is managed by memory mapping the entire data set. It
allows the OS to control the memory mapping and allocate the maximum amount of
RAM. The result is that the performance is non-optimal and the memory usage cannot be
effectively reasoned about.
3. 32-bit vs. 64-bit

MongoDB comes with two versions, 32-bit and 64-bit. Since MongoDB uses
memory mapped files, the 32-bit versions are limited to storing only about 2GB of data. If
you need more data to be stored, you should use the 64-bit build.
Starting from version 3.0, commercial support for 32-bit versions is no longer
provided by MongoDB. Also, the 32-bit version of MongoDB does not support the
WiredTiger storage engine.
4. BSON Documents
This section covers the limitations of BSON documents .
 Size limits : The current versions support documents up to 16MB in size. This
maximum size ensures that a document cannot not use excessive RAM or excessive
bandwidth while in transmission.
 Nested depth limit : In MongoDB, no more than 100 levels of nesting are supported
for BSON documents.
 Field names : If you store 1,000 documents with the key “col1”, the key is stored that
many times in the data set. Although arbitrary documents are supported in
MongoDB, in practice most of the field names are the same.
5. Security Limitations
 No Authentication by Default:- Although authentication is not enabled by
default, it’s fully supported and can be enabled easily.
 Traffic to and from MongoDB Isn’t Encrypted:- By default the connections
to and from MongoDB are not encrypted. Communications on a public
network can be encrypted using the SSL-supported build of MongoDB, which
is available in the 64-bit version only.
Q3c. Explain the hardware requirements for MongoDB.

Ans
The following are only intended to provide high-level guidance for a MongoDB
deployment. The actual hardware configuration depends on your data, availability
requirement, queries, performance criteria, and the selected hardware components ’
capabilities.
 Memory : Since memory is used extensively by MongoDB for a better performance,
the more memory, the better the performance.
 Storage : MongoDB can use SSDs (solid state drives ) or local attached storage. Since
MongoDB’s disk access patterns don’t have sequential properties, SSDs usage can

enable customers to experience substantial performance gains. Another benefit of
using a SSD is if the working set no longer fits in memory, they provide a gentle
degradation of performance.
Most MongoDB deployments should use RAID-10. When using the WiredTiger
storage engine, the use of a XFS file system is highly recommended due to
performance issues. Also, do not use huge pages because MongoDB performs better
with default virtual memory pages.
 CPU : Since MongoDB with a MMAPv1 storage engine rarely encounters workloads
needing a large number of cores, it’s preferable to use servers with a faster clock
speed than the ones with multiple cores but slower clock speed. However, the
WiredTiger storage engine is CPU bound, so using a server with multiple cores will
offer a significant performance improvement.
Q3d. Write a short note on GridFS.

Ans
 GridFS is MongoDB’s specification for handling large files that exceed BSON’s
document size limit. The Rationale of GridFS By design, a MongoDB document (i.e. a
BSON object) cannot be larger than 16MB.
 This is to keep performance at an optimum level, and the size is well suited for our
needs. For example, 4MB of space might be sufficient for storing a sound clip or a
profile picture. However, if the requirement is to store high quality audio or movie
clips, or even files that are more than several hundred megabytes in size, MongoDB
has you covered by using GridFS
 GridFS uses two collections for storing the file. One collection maintains the metadata
of the file and the other collection stores the file’s data by breaking it into small
pieces called chunks. This means the file is divided into smaller chunks and each
chunk is stored as a separate document
 GridFSunder the Hood
GridFS enables you to store large files by splitting them up into smaller
chunks and storing each of the chunks as separate documents. In addition to these
chunks, there’s one more document that contains the metadata about the file. Using
this metadata information, the chunks are grouped together, forming the complete
file.
The storage overhead for the chunks can be kept to a minimum, as MongoDB
supports storing binary data in documents.
The two collections that are used by GridFS for storing of large files are by
default named as fs.files and fs.chunks
The chunks are stored by default in the fs.chunks collection.
The chunk document has the following important keys.
• "_id" : This is the unique identifier.
• "files_id" : This is unique identifier of the document that contains the metadata
related to the chunk.
• "n" : This is basically depicting the position of the chunk in the original file.
• "data" : This is the actual binary data that constitutes this chunk.
Q3e. How are read and write operations performed in MongoDB?

Ans
 when MongoDB updates and reads from the DB, it is actually reading and writing to
memory. If a modification operation in the MongoDB MMAPv1 storage engine
increases the record size bigger then the space allocated for it, then the entire record
will be moved to a much bigger space with extra padding bytes.

 By default, MongoDB uses power of 2-sized allocations so that every document in
MongoDB is stored in a record that contains the document itself and extra space
(padding). Padding allows the document to grow as the result of updates while
minimizing the likelihood of reallocations. Once the record is moved, the space that
was originally occupied is freed up and will be tracked as free lists of different size. As
mentioned, it’s the $freelist namespace in the .ns file.
 In WiredTiger, the data in the cache is stored in a B+ tree structure which is
optimized for in-memory. The cache maintains an on-disk page image in association
with an index , which is used to identify where the data being asked for actually
resides within the page
 The write operation in WiredTiger never updates in-place. Whenever an operation is

issued to WiredTiger, internally it’s broken into multiple transactions wherein each
transaction works within the context of an in-memory snapshot. The snapshot is of
the committed version before the transactions started. Writers can create new
versions concurrent with the readers.
 The write operations do not change the page; instead the updates are layered on top
of the page.
 A skipList data structure is used to maintain all the updates, where the most recent
update is on the top. Thus, whenever a user reads/writes the data, the index checks
whether a skiplist exists.
 If a skiplist is not there, it returns data from the on-disk page image. If skiplist exists,
the data at the head of the list is returned to the threads, which then update the data.
Once a commit is performed, the updated data is added to the head of the list and
the pointers are adjusted accordingly.
Q3f. Discuss how data is written using Journaling.

Ans
 MongoDB disk writes are lazy, which means if there are 1,000 increments in one
second, it will only be written once. The physical writes occurs a few seconds after the
operation. We will now see how an update actually happens in mongod.
 In the MongoDB system, mongod is the primary daemon process. So the disk has the
data files and the journal files .

 When the mongod is started, the data files are mapped to a shared view. In other
words, the data file is mapped to a virtual address space.
 Basically, the OS recognizes that your data file is 2000 bytes on disk, so it maps this to
memory address 1,000,000 – 1,002,000. Until now you still had files backing up the
memory. Thus any change in memory will be flushed to the underlying files by the
OS.
 This is how the mongod works when journaling is not enabled. Every 60 seconds the
in-memory changes are flushed by the OS. why the virtual memory amount used by
mongod doubles when the journaling is enabled

Question 4
Q4a. Diagrammatically explain the Spark architecture.

Ans
 In Spark, data is represented as resilient distributed datasets (RDD). RDDs are
collections
 of objects that can be partitioned across multiple nodes of the cluster. The
partitioning and subsequent distribution of processing are handled automatically by
the Spark framework.
 RDDs are described as immutable: Spark operations on RDDs return new RDDs, rather
 than modifying the original RDD. So, for instance, sorting an RDD creates a new RDD
that contains the sorted data.
 The Spark API defines high-level methods that perform operations on RDDs.
Operations
 such as joining, filtering, and aggregation, which would entail hundreds of lines of
Java code in MapReduce, are expressed as simple method calls in Spark; the level of
abstraction is similar to that found with Hadoop’s Pig scripting language.
 The data within an RDD can be simple types such as strings or any other Java/Scala
object type. However, it’s common for the RDD to contain key-value pairs, and Spark
provides specific data manipulation operations such as aggregation and joins that
work only on key-value oriented RDDs.
 Under the hood, Spark RDD methods are implemented by directed acyclic graph
(DAG) operations.
The figure below some essential features of Spark Processing
Q4b.Explain the concept of Oracle 12c “in-Memory Databases”.

Ans
 Oracle RDBMS version 12.1 introduced the “Oracle database in-memory” feature. This
wording is potentially misleading, since the database as a whole is not held in
memory. Rather, Oracle has implemented an in-memory column store to supplement
its disk-based row store.
 Figure below illustrates the essential elements of the Oracle in-memory column store
architecture.

 OLTP applications work with the database in the usual manner. Data is maintained in
disk files (1), but cached in memory (2).
 An OLTP application primarily reads and writes from memory (3), but any committed
transactions are written immediately to the transaction log on disk (4).
Q4c. What is jQuery? Explain its features.

Ans
 jQuery is a lightweight, "write less, do more", JavaScript library.
 The purpose of jQuery is to make it much easier to use JavaScript on your website.
 jQuery takes a lot of common tasks that require many lines of JavaScript code to
accomplish, and wraps them into methods that you can call with a single line of code.
 jQuery also simplifies a lot of the complicated things from JavaScript, like AJAX calls
and DOM manipulation.
JQuery Features
 DOM manipulation − The jQuery made it easy to select DOM elements, negotiate
them and modifying their content by using cross-browser open source selector
engine called Sizzle.
 Event handling − The jQuery offers an elegant way to capture a wide variety of
events, such as a user clicking on a link, without the need to clutter the HTML code
itself with event handlers.
 AJAX Support − The jQuery helps you a lot to develop a responsive and featurerich
site using AJAX technology.
 Animations − The jQuery comes with plenty of built-in animation effects which you
can use in your websites.
 Lightweight − The jQuery is very lightweight library - about 19KB in size (Minified
and gzipped).
 Cross Browser Support − The jQuery has cross-browser support, and works well in IE
6.0+, FF 2.0+, Safari 3.0+, Chrome and Opera 9.0+
 Latest Technology − The jQuery supports CSS3 selectors and basic XPath syntax.

Q4d. Discuss the concept of Disk Economics.
Ans
 The promise of SSD has led some to anticipate a day when all magnetic disks are
replaced by SSD.
 While this day may come, in the near term the economics of storage and the
economics of I/O are at odds: magnetic disk technology provides a more economical
medium per unit of storage, while flash technology provides a more economical
medium for delivering high I/O rates and low latencies.
 And although the cost of a solid state disk is dropping rapidly, so is the cost of a
magnetic disk. An examination of the trend for the cost per gigabyte for SSD and
HDD is shown in Figure
 While SSD continues to be an increasingly economic solution for small databases or

for performance-critical systems, it is unlikely to become a universal solution for
massive databases, especially for data that is infrequently accessed.
 We are, therefore, likely to see combinations of solid state disk, traditional hard
drives, and memory providing the foundation for next-generation databases.
Q4e. Explain the jQuery DOM Filter Methods.

Ans
 The jQuery is a very powerful tool which helps us to incorporate a variety of DOM
traversal methods to select elements in a document randomly or in sequential order.
 Most of the DOM traversal methods do not modify the elements whereas they filter
them out upon the given conditions.
 The filter() method is used to filter out all the elements that do not match the
selected criteria and those matches will be returned.
Syntax: $(selector).filter(criteria, function(index))
Parameters:
1. criteria : It specifies a selector expression, a jQuery object or one or more elements
to be returned from a group of selected elements.
2. function(index) : It specifies a function to run for each element in the set. If the
function returns true, the element is kept. Otherwise, it is removed.
3. index : The index position of the element in the set.

Example:
<script>
$("p").filter(".intro").css("background-color", "yellow");
});
</script>
Return all <p> elements with class name "intro":
Q4f. Write a short note on jQuery Event Handling.

Ans
Events are actions that can be detected by your Web Application.
Following are the examples events −
 A mouse click
 A web page loading
 Taking mouse over an element
 Submitting an HTML form
 A keystroke on your keyboard, etc.
When these events are triggered, you can then use a custom function to do pretty much
whatever you want with the event. These custom functions call Event Handlers.
 Binding Event Handlers
Using the jQuery Event Model, we can establish event handlers on DOM elements
with the bind() method
Syntax:
selector.bind( eventType[, eventData], handler)
Following is the description of the parameters −
 eventType − A string containing a JavaScript event type, such as click or submit.

Refer to the next section for a complete list of event types.
 eventData − This is optional parameter is a map of data that will be passed to
the event handler.
 handler − A function to execute each time the event is triggered.
 Removing Event Handlers
jQuery provides the unbind() command to remove an exiting event handler.
Syntax
selector.unbind(eventType, handler)
or
selector.unbind(eventType)

Following is the description of the parameters −
1. eventType − A string containing a JavaScript event type, such as click or submit.

Refer to the next section for a complete list of event types.
2. handler − If provided, identifies the specific listener that's to be removed.
Question 5
Q5a. Explain JSON data types.

Ans
In JSON, values must be one of the following data types:
1. a string
2. a number
3. an object (JSON object)
4. an array
5. a boolean
6. null
1. JSON Strings
Strings in JSON must be written in double quotes.
Example:- { "name":"John" }
2. JSON Numbers
Numbers in JSON must be an integer or a floating point.
Example:- { "age":30 }
3. JSON Objects
Values in JSON can be objects.
Example
{
"employee":{ "name":"John", "age":30, "city":"New York" }
}
4. JSON Arrays
Values in JSON can be arrays.
Example
{
"employees":[ "John", "Anna", "Peter" ]
}
5. JSON Booleans
Values in JSON can be true/false.
Example
{ "sale":true }

6. JSON null
Values in JSON can be null.
Example
{ "middlename":null }
Q5b. Discuss JSON schema with validation libraries.

Ans
JSON Schema is a specification for JSON based format for defining the
structure of JSON data. JSON Schema −
 Describes your existing data format.
 Clear, human- and machine-readable documentation.
 Complete structural validation, useful for automated testing.
 Complete structural validation, validating client-submitted data.
JSON Schema Example

{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Product",
"description": "A product from Acme's catalog",
"type": "object",
"properties": {
"id": {
"description": "The unique identifier for a product",
"type": "integer"
},
"name": {
"description": "Name of the product",
"type": "string"
},
"price": {
"type": "number",
"minimum": 0,
}
},
"required": ["id", "name", "price"]

}

Sr.No. Keyword & Description
1 $schema
The $schema keyword states that this schema is written according to the draft
v4 specification.
2 Title
You will use this to give a title to your schema.
3 Description
A little description of the schema.
4 Type
The type keyword defines the first constraint on our JSON data: it has to be a
JSON Object.
5 Properties
Defines various keys and their value types, minimum and maximum values to be
used in JSON file.
6 Required
This keeps a list of required properties.
7 Minimum
This is the constraint to be put on the value and represents minimum acceptable
value.
8 Maximum
This is the constraint to be put on the value and represents maximum
acceptable value.
9 maxLength
The length of a string instance is defined as the maximum number of its
characters.
10 minLength
The length of a string instance is defined as the minimum number of its
characters.
Example:
[
{
"id": 2,
"name": "soap",
"price": 12.50,
}
]

Q5c. Differentiate between JSON and XML.
Ans
JSON XML
JSON stands for JavaScript Object XML stands for eXtensible Markup Language.
Notation.
It is based on JavaScript language. It is derived from SGML.
JSON is data-oriented. XML is document-oriented.

JSON doesn't provide display capabilities. XML provides the capability to display data
because it is a markup language.
JSON supports array. XML doesn't support array.
JSON is less secured than XML. XML is more secured.
Its files are very easy to read as compared Its documents are comparatively difficult to read
to XML. and interpret.
It doesn’t use end tag. It has start and end tags.
It doesn’t supports comments. It supports comments.
It supports only UTF-8 encoding. It supports various encoding.
Q5d. How do we do encoding and decoding JSON in Python?

Ans
1. Encoding JSON with Python
 Python comes pre-equipped with a JSON encoder and decoder to make it very
simple to play nice with JSON in your applications
 To convert JSON to Python json.loads() function is used
Example:
import json
json_data = '{"name": "Brian", "city": "Seattle"}'

python_obj = json.loads(json_data)
print python_obj["name"]
print python_obj["city"]
Output:-
Brian
Seattle
2. Decoding JSON with Python
 Use json.dumps() to convert the dictionary to a JSON object.
import json
data = {
a: 0,
b: 9.6,
c: "Hello World",
d: {
a: 4
}
}
json_data = json.dumps(data)
print(json_data)
Output:-
{"c": "Hello World", "b": 9.6, "d": {"e": [89, 90]}, "a": 0}
Q5e. What is JSON Grammar? Explain.

Ans
JSON Grammar
JSON, in a nutshell, is a textual representation defined by a small set of governing rules in

which data is structured. The JSON specification states that data can be structured in either of
the two following compositions:
1. A collection of name/value pairs
2. An ordered list of values(Array)

The figure below illustrates the grammatical representation for a collection of
string/value pairs
As the diagram outlines, a collection begins with the use of the opening brace ({), and
ends with the use of the closing brace (}). The content of the collection can be
composed of any of the following possible three designated paths:
i. The top path illustrates that the collection can remain devoid of any string/value
pairs.
ii. The middle path illustrates that our collection can be that of a single string/value
pair.
iii. The bottom path illustrates that after a single string/value pair is supplied, the
collection needn’t end but, rather, allow for any number of string/value pairs,
before reaching the end. Each string/value pair possessed by the collection must
be delimited or separated from one another by way of a comma (,).
Example:-
//Empty Collection Set
{};
//Single string/value pair
{name:"Bob"};
//Multiple string/value pairs

{name: "Bob",age:30,Gender: "Male"};
The figure below illustrates the grammatical representation for an ordered list of
values
The diagram outlines following three paths:

i. The top path illustrates that our list can remain devoid of any value(s).
ii. The middle path illustrates that our ordered list can possess a singular value.
iii. The bottom path illustrates that the length of our list can possess any number of
values, which must be delimited, that is, separated, with the use of a comma (,).
//Empty Ordered List
[];
//Ordered List of single value
["abc"];
//Ordered List of multiple values
["0",1,2,3,4,100];
Q5f. Write a short note on Persisting JSON.

Ans
 A technique that is used for persistence of JSON is the HTTP cookie.
 The HTTP cookie was created as a means to string together the actions taken by the user per
“isolated” request and provide a convenient way to persist the state of one page into that of
another.
 The cookie is simply a chunk of data that the browser has been notified to retain.
Furthermore, the browser will have to supply, per subsequent request, the retained cookie
to the server for the domain that set it, thereby providing state to a stateless protocol.
 The cookie can be utilized on the client side of an application with JavaScript.
 Additionally, it is available to the server, supplied within the header of each request made by
the browser.
Syntax
The cookie is simply a string of ASCII encoded characters composed of one or more
attribute-value pairs, separated by a semicolon (;) token.
cookie = NAME "=" VALUE *(";" cookie-av)

cookie-av = "expires" "=" value | "max-age" "=" value | "domain" "=" value |
"path" "=" value| "secure" | "httponly"
Here syntax outlines to set some cookie specified by the indicated NAME, to possess the
assigned VALUE.
i. expires informs the browser of the date and time it is no longer necessary to further
store said cookie.
ii. Max-age specifies how long(in seconds) a cookie should persist.
iii. domain attribute explicitly defines the domain(s) to which the cookie is to be made
available.
iv. path attribute further enforces to which subdirectories a cookie is available.
v. secure attribute does not provide security. It informs the browser to send the cookie
to the server only if the connection over which it is to be sent is a secure connection,
such as HTTPS. Transmitting
vi. httponly attribute, when specified, limits the availability of the cookie to the server
and the server alone. This means the cookie will not be available to the client side,
thereby preventing client-side JavaScript from referencing, deleting, or updating the
cookie.
Example
Creating a cookie
document.cookie= "ourFirstCookie=abc123";

BSc.(Information Technology)
(Semester V)
2019-20
New Generation
Technology
(USIT5P7 Core)
University Paper Solution
By
Mrs. Spruha More
Ms. Spruha More Page 1

Question 1
a. What is big data? list the different uses of big data?
Ans Big data is data that has high volume, is generated at high velocity, and has multiple
varieties. Let’s look at few facts and figures of big data.
Usage of Big Data

1. Visibility
Accessibility to data in a timely fashion to relevant stakeholders generates a tremendous
amount of value.
Example:- Consider a manufacturing company that has R&D, engineering, and
manufacturing departments dispersed geographically. If the data is accessible across all
these departments and can be readily integrated, it can not only reduce the search and
processing time but will also help in improving the product quality according to the present
needs.
2. Discover and Analyze Information

Most of the value of big data comes from when the data collected from outside sources can
be merged with the organization’s internal data. Organizations are capturing detailed data
on inventories, employees, and customers. Using all of this data, they can discover and
analyze new information and patterns; as a result, this information and knowledge can be
used to improve processes and performance.
3. Segmentation and Customizations

Big data enables organizations to create tailor-made products and services to meet specific
segment needs. This can also be used in the social sector to accurately segment populations
and target benefit schemes for specific needs. Segmentation of customers based on various
parameters can aid in targeted marketing campaigns and tailoring of products to suit the
needs of customers.
4. Aiding Decision Making

Big data can substantially minimize risks, improve decision making , and uncover valuable
insights. Automated fraud alert systems in credit card processing and automatic fine-tuning
of inventory are examples of systems that aid or automate decision-making based on big
data analytics.
5. Innovation
Big data enables innovation of new ideas in the form of products and services. It enables
innovation in the existing ones in order to reach out to large segments of people. Using data
gathered for actual products, the manufacturers can not only innovate to create the next
generation product but they can also innovate sales offerings.
b. Briefly explain how is MONGODB different from SQL .

Ans SQL Comparison
The following are the ways in which MongoDB is different from SQL.
1. MongoDB uses documents for storing its data, which offer a flexible schema
(documents in same collection can have different fields). This enables the users
to store nested or multi-value fields such as arrays, hashes, etc. In contrast,
RDBMS systems offer a fixed schema where a column’s value should have a
similar data type. Also, it’s not possible to store arrays or nested values in a cell.
2. MongoDB doesn’t provide support for JOIN operations, like in SQL. However, it
enables the user to store all relevant data together in a single document, avoiding
at the periphery the usage of JOINs. It has a workaround to overcome this issue
3. MongoDB doesn’t provide support for transactions in the same way as SQL.
However, it guarantees atomicity at the document level. Also, it uses an isolation
operator to isolate write operations that affect multiple documents, but it does
not provide “all-or-nothing” atomicity for multi-document write operations.
c. What is MongoDB? Explain the features of MongoDB?

Ans
“MongoDB is one of the leading NoSQL document store databases. It enables organizations
to handle and gain meaningful insights from Big Data.”
The features of MongoDB
1. Speed, Scalability, and Agility

The design team’s goal when designing MongoDB was to create a database that was fast,
massively scalable, and easy to use. To achieve speed and horizontal scalability in a
partitioned database
2. JSON-Based Document Store

MongoDB uses a JSON-based (JavaScript Object Notation) document store for the data.
JSON/BSON offers a schema-less model, which provides flexibility in terms of database
design. Unlike in RDBMSs, changes can be done to the schema seamlessly.
3. Performance vs. Features

In order to make MongoDB high performance and fast, certain features commonly
available in RDBMS systems are not available in MongoDB. MongoDB is a document-
oriented DBMS where data is stored as documents. It does not support JOINs, and it does
not have fully generalized transactions.

4. Running the Database Anywhere
The language used for implementing MongoDB is C++, which enables MongoDB to
run on servers, VMs, or even on the cloud. The 10gen site provides binaries for different OS
platforms, enabling MongoDB to run on almost any type of machine.
d. Explain how volume, velocity and variety are important component of bigdata.
Ans Three Vs of Big Data
1. Volume
Volume in big data means the size of the data. As businesses are becoming more transaction
oriented so increasing numbers of transactions adding generating huge amount of data. This
huge volume of data is the biggest challenge for big data technologies. The storage and
processing power needed to store, process, and make accessible the data in a timely and
cost effective manner is massive.
2. Variety
The data generated from various devices and sources follows no fixed format or structure.
Compared to text, CSV or RDBMS data varies from text files, log files, streaming videos,
photos, meter readings, stock ticker data, PDFs, audio, and various other unstructured
formats. New sources and structures of data are being created at a rapid pace. So the onus is
on technology to find a solution to analyze and visualize the huge variety of data that is out
there.
3. Velocity
Velocity in big data is the speed at which data is created and the speed at which it is required
to be processed. If data cannot be processed at the required speed, it loses its significance.
Due to data streaming in from social media sites, sensors, tickers, metering, and monitoring,
it is important for the organizations to speedily process data both when it is on move and
when it is static.

e. Write short notes on Cap theorem.
Ans CAP Theorem ( Brewer’s Theorem )
Eric Brewer outlined the CAP theorem in 2000. The theorem states that when designing
an application in a distributed environment there are three basic requirements that exist,
namely
consistency, availability, and partition tolerance.

• Consistency means that the data remains consistent after any operation is performed that
changes the data, and that all users or clients accessing the application see the same
updated data.
• Availability means that the system is always available.
• Partition Tolerance means that the system will continue to function even if it is partitioned
into groups of servers that are not able to communicate with one another.
The CAP theorem states that at any point in time a distributed system can fulfil only two of
the above three guarantees.

f. Discuss the various categories of NOSQL databases.
Ans
The NoSQL databases are categorized on the basis of how the data is stored. NoSQL mostly
follows a horizontal structure because of the need to provide curated information from large
volumes, generally in near real-time. They are optimized for insert and retrieve operations on
a large scale with built-in capabilities for replication and clustering.
Table briefly provides a feature comparison between the various categories of NoSQL
Databases

Question 2
a. Explain binary JSON(BSON).

Ans MongoDB is a document-based database. It uses Binary JSON for storing its data.
JSON stands for JavaScript Object Notation. It’s a standard used for data interchange in
today’s modern Web (along with XML). The format is human and machine readable. It is not
only a great way to exchange data but also a nice way to store data. All the basic data types
(such as strings, numbers, Boolean values, and arrays) are supported by JSON.
The following code shows what a JSON document looks like:
{
“_id” : 1,
“name” : { “first” : “John”, “last” : “Doe” },
“publications” : [
{
“title” : “First Book”,
“year” : 1989,
“publisher” : “publisher1”
},
{ “title” : “Second Book”,
“year” : 1999,
“publisher
Binary JSON (BSON)

MongoDB stores the JSON document in a binary-encoded format. This is termed as BSON.
The BSON data model is an extended form of the JSON data model. MongoDB’s
implementation of a BSON document is fast, highly traversable, and lightweight. It supports
embedding of arrays and objects within other arrays, and
also enables MongoDB to reach inside the objects to build indexes and match objects
against queried expressions, both on top-level and nested BSON keys.
b. Explain with example the process of deleting documents in a collection.

Ans Delete
To delete documents in a collection, use the remove () method . If you specify a selection
criterion, only the documents meeting the criteria will be deleted. If no criteria is specified, all
of the documents will be deleted.
The following command will delete the documents where Gender = ‘M’ :
> db.users.remove({"Gender":"M"})
>
The same can be verified by issuing the find() command on Users :
> db.users.find({"Gender":"M"})
Finally, if you want to drop the collection, the following command will drop the collection:
> db.users.drop()
true
>

In order to validate whether the collection is dropped or not, issue the show collections
command.
> show collections
system.indexes
c. Discuss the various tools in MongoDB.

Ans MongoDB Tools
Apart from the core services, there are various tools that are available as part of the
MongoDB installation:
• mongodump : This utility is used as part of an effective backup strategy. It creates a
binary export of the database contents.
• mongorestore : The binary database dump created by the mongodump utility is
imported to a new or an existing database using the mongorestore utility.
• bsondump : This utility converts the BSON files into human-readable formats
such as JSON and CSV. For example, this utility can be used to read the output file
generated by mongodump.
• mongoimport , mongoexport : mongoimport provides a method for taking data in
JSON , CSV, or T SV formats and importing it into a mongod instance. mongoexport
provides a method to export data from a mongod instance into JSON, CSV, or TSV
formats.
• mongostat , mongotop , mongosniff : These utilities provide diagnostic information
related to the current operation of a mongod instance.
d. Explain the concept of sharding in detail.

Ans One of the important factors when designing the application model is whether to
partition the data or not. This is implemented using sharding in MongoDB.
Sharding is also referred as partitioning of data. In MongoDB, a collection is partitioned with
its documents distributed across cluster of machines, which are referred as shards. This can
have a significant impact on the performance.
A page fault happens when data which is not there in memory is accessed by MongoDB. If
there’s free memory available, the OS will directly load the requested page into memory;
however, in the absence of free memory, the page in memory is written to the disk and then
the requested page is loaded in the memory, slowing down the process. Few operations
accidentally purge large portion of the working set from the memory, leading to an adverse
effect on the performance. One example is a query scanning through all documents of a
database where the size exceeds the server memory. This leads to loading of the documents
in memory and moving the working set out to disk.
Sharding Components
The components that enable sharding in MongoDB. Sharding is enabled in MongoDB via
sharded clusters.
The following are the components of a sharded cluster:
• Shards
• mongos
• Config servers
e. Differentiate between single key and compound key.

Ans: single key Index
Use ensureIndex() to create the index.
> db.testindx.ensureIndex({"Name":1})

The index creation will take few minutes depending on the server and the collection size.
Let’s run the same query that you ran earlier with explain() to check what the steps the
database is executing post index creation. Check the n , nscanned , and millis fields in the
output.
>db.testindx.find({"Name":"user101"}).explain("allPathsExecution")
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "mydbproc.testindx",
"indexFilterSet" : false,
"parsedQuery" : {
"Name" : {
"$eq" : "user101"
}
},
Compound Index
When creating an index, you should keep in mind that the index covers most of your queries.
If you sometimes query only the Name field and at times you query both the Name and the
Age field, creating a compound index on the Name and Age fields will be more beneficial
than an index that is created on either of the fields because the compound index will cover
both queries.
The following command creates a compound index on fields Name and Age of the collection
testindx .
> db.testindx.ensureIndex({"Name":1, "Age": 1})
Compound indexes help MongoDB execute queries with multiple clauses more efficiently.
When creating a compound index, it is also very important to keep in
mind that the fields that will be used for exact matches (e.g. Name : "S1" ) come first,
followed by fields that are used in ranges (e.g. Age : {"$gt":20} ).
f. Write short note on master/slave replication of MongoDB.

Ans In MongoDB, the traditional master/slave replication is available but it is
recommended only for more than 50 node replications. In this type of replication, there is
one master and a number of slaves that replicate the data from the master. The only
advantage with this type of replication is that there’s no restriction on the number of slaves
within a cluster. However, thousands of slaves will overburden the master node, so in
practical scenarios it’s better to have less than dozen slaves. In addition, this type of
replication doesn’t automate failover and provides less
redundancy.
In a basic master/slave setup , you have two types of mongod instances: one instance is in
the master mode and the remaining are in the slave mode, as shown in Figure Since the
slaves are replicating from the master, all slaves need to be aware of the master’s address.

The master node maintains a capped collection (oplog) that stores an ordered history of
logical writes
to the database.
The slaves replicate the data using this oplog collection. Since the oplog is a capped
collection, if
the slave’s state is far behind the master’s state, the slave may become out of sync. In that
scenario, the
replication will stop and manual intervention will be needed to re-establish the replication.
There are two main reasons behind a slave becoming out of sync:
• The slave shuts down or stops and restarts later. During this time, the oplog may have
deleted the log of operations required to be applied on the slave.
• The slave is slow in executing the updates that are available from the master.
Question 3
a. Discuss the fields used for sharding.

Ans The shard key controls how data is distributed and the resulting system’s capacity for
queries and writes. Ideally, the shard key should have the following two characteristics:
• Insertions are balanced across the shard cluster.
• Most queries can be routed to a subset of shards to be satisfied.
Let’s see which fields can be used for sharding.
1. Time field : In choosing this option, although the data will be distributed evenly among the
shards, neither the inserts nor the reads will be balanced.
As in the case of performance data, the time field is in an upward direction, so all the
inserts will end up going to a single shard and the write throughput will end up being same
as in a standalone instance.
Most reads will also end up on the same shard, assuming you are interested in
viewing the most recent data frequently.

2. Hashes : You can also consider using a random value to cater to the above situations; a
hash of the id field can be considered the shard key.
Although this will cater to the write situation of the above (that is, the writes will be
distributed), it will affect querying. In this case, the queries must be broadcasted to all the
shards and will not be routable.
3. Use the key, which is evenly distributed, such as Host This has following advantages: if the
query selects the host field, the reads will be selective and local to a single shard, and the
writes will be balanced. However, the biggest potential drawback is that all data collected for
a single host must go to the same chunk since all the documents in it have the same shard
key. This will not be a problem if the data is getting collected across all the hosts, but if the
monitoring collects a disproportionate amount of data for one host, you can end up with a
large chunk that will be completely unsplittable, causing an unbalanced load on one shard.
4. Combining the best of options 2 and 3, you can have a compound shard key, such as
{host:1, ssk: 1} where host is the host field of the document and ssk is _id field’s hash value.
In this case, the data is distributed largely by the host field making queries, accessing
the host field local to either one shard or group of shards. At the same time, using ssk
ensures that data is distributed evenly across the cluster.
b. List and explain the limitation of indexes.

Ans Behaviors and Limitations
Finally, the following are a few behaviors and limitations that you need to be aware of:
• More than 64 indexes may not be allowed in a collection.
• I ndex keys cannot be larger than 1 024 bytes .
• A document cannot be indexed if its fields’ values are greater than this size.
• The following command can be used to query documents that are too large to index:
db.practicalCollection.find({<key>: <too large to index>}).hint({$natural: 1})
• An index name (including the namespace ) must be less than 128 characters .
• The insert/update speeds are impacted to some extent by an index.
• Do not maintain indexes that are not used or will not be used.
• Since each clause of an $or query executes in parallel, each can use a different index.
• The queries that use the sort () method and the $or operator will not be able to use
the indexes on the $or fields.
• Queries that use the $or operator are not supported by the second geospatial query .
c. Explain the MongoDB limitation from security perspective Security Limitations.

.Ans : MongoDB limitation are as follow
MongoDB Space Is Too Large (Applicable for MMAPv1)
MongoDB (with storage engine MMAPv1 ) space is too large, the data directory
files are larger than the database’s actual data.
Memory Issues (Applicable for Storage Engine MMAPv1)

Indexes are memory-heavy; in other words, indexes take up lot of RAM. Since these are Btree
indexes, defining many indexes can lead to faster consumption of system resources.
32-bit vs. 64-bit

MongoDB comes with two versions, 32-bit and 64-bit. Since MongoDB uses memory
mapped files, the 32-bit versions are limited to storing only about 2GB of data. If you need
more data to be stored, you should use the 64-bit build. The 32-bit version of MongoDB
does not support the WiredTiger storage engine.
Security Limitations:
Security is an important matter when it comes to databases.
1. No Authentication by Default
Although authentication is not enabled by default, it’s fully supported and can be
enabled easily.
2. Traffic to and from MongoDB Isn’t Encrypted

By default the connections to and from MongoDB are not encrypted. Communications on a
public network can be encrypted using the SSL-supported build of MongoDB, which is
available in the 64-bit version only.
d. Write short note on deployment.

Ans Deployment
While deciding on the deployment strategy, keep the following tips in mind so that the
hardware sizing is done appropriately. These tips will also help you decide whether to use
sharding and replication.
1. Data set size: The most important thing is to determine the current and anticipated data
set size. This not only lets you choose resources for individual physical nodes, but it also
helps when planning your sharding plans (if any).
2. Data importance: The second most important thing is to determine data importance, to
determine how important the data is and how tolerant you can be to any data loss or data
lagging (especially in case of replication) .
3. Memory sizing: The next step is to identify memory needs and accordingly take care of
the
RAM. If possible, you should always select a platform that has memory greater than
your working set size.
4. Disk Type: If speed is not a primary concern or if the data set is larger than what any
in-memory strategy can support, it’s very important to select a proper disk type. IOPS
(input/output operations per second) is the key for selecting a disk type; the higher
the IOPS, the better the MongoDB performance. If possible, local disks should be
used because network storage can cause poor performance and high latency. It is
also advised to use RAID 10 when creating disk arrays (wherever possible).
5. CPU: Clock speed can also have a major impact on the overall performance when you are
running a mongod with most data in memory. In circumstances where you want to
maximize the operations per second, you must consider including a CPU with a high
clock/bus speed in your deployment strategy.
6. Replication is used if high availability is one of the requirements. In any MongoDB

deployment it should be standard to set up a replica set with at least three nodes.
e. Define monitoring. Explain the factors to be considered while using monitoring

services.
Ans MongoDB system should be proactively monitored to detect unusual behaviors so that
necessary actions can be taken to resolve issues.
• A free hosted monitoring service named MongoDB Cloud Manager is provided by
MongoDB developers. MongoDB Cloud Manager offers a dashboard view of the entire

cluster metrics. Alternatively, you can use nagios, SNMP, or munin to build your own tool.
• MongoDB also provides several tools such as mongostat and mongotop to gain insights
into the performance.
When using monitoring services, the following should be watched closely:
1. Op counters: Includes inserts, delete, reads, updates and cursor usage.
2. Resident memory: An eye should always be kept on the allocated memory. This counter
value should always be lower than the physical memory.
3. Working set size: The active working set should fit into memory for a good performance.
You can either optimize the queries so that the working set fits inside the memory or
increase the memory when the working set is expected to increase.
4. Queues: Prior to the release of MongoDB 3.0, a reader-writer lock was used for
simultaneous reads and exclusive access was used for writes. In such scenario, you might end
up with queues behind a single writer. Starting from Version 3.0, collection level locking (in
the MMAPv1 storage engine) and document level locking (in the WiredTiger storage engine)
have been introduced.
5. Whenever there’s a hiccup in the application, the CRUD behavior, indexing patterns, and
indexes can help you better understand the application’s flow.
6. It’s recommended to run the entire performance test against a full-size database, such as
the production database copy, because performance characteristic are often highlighted
when dealing with the actual data.
f. What is data storage engine? Differentiate between MMAP and wired storage
engines.
Ans: Data Storage Engine
MongoDB uses MMAP as its default storage engine. This engine works with memory-
mapped files. Memory-mapped files are data files that are placed by the operating system in
memory using the mmap() system call. mmap is a feature of OS that maps a file on the disk
into virtual memory.
MongoDB allows the OS to control the memory mapping and allocate the maximum amount
of RAM. The caching is done based on LRU behavior wherein the least recently used files are
moved out to disk from the working set, making space for the new recently and frequently
used pages. But there are some drawbacks of this method:-
1. MongoDB has no control over what data to keep in memory and what to remove. So every
server restart will lead to a page fault because every page that is accessed will not be
available in the working set, leading to a long data retrieval time.
2. MongoDB also has no control over prioritizing the content of the memory.
Differentiate between MMAP and wired storage engines

Data File (Relevant for MMAPv1)
• As you know under the core services the default data directory used by mongod is
/data/db/ .
• Under this directory there are separate files for every database. Each database has a single
.ns
file and multiple data files with monotonically increasing numeric extensions.

• For example, if you create a database called mydbpoc , it will be stored in the following
files:
mydb.ns ,mydb.1 , mydb.2 , and so on, as shown in Figure below
• For each new numeric data file for a database, the size will be double the size of the
previous
number data file. The limit of the file size is 2GB. If the file size has reached 2GB, all
subsequent numbered files will remain 2GB in size.
Data File (Relevant for WiredTiger)

• When the storage option selected is WiredTiger, data, journals, and indexes are
compressed
on disk.
• The compression is done based on the compression algorithm specified when starting the
mongod. Snappy is the default compression option.
• Under the data directory, there are separate compressed wt files corresponding to each
collection and indexes.
• Journals have their own folder under the data directory.
• The compressed files are actually created when data is inserted in the collection
• For example, if you create collection called users , it will be stored in
collection-0—2259994602858926461 files and the associated index will be stored in index-
1—2259994602858926461 , index-2—2259994602858926461 ,and so on.
• In addition to the collection and index compressed files, there is a mdb_catalog file that
stores metadata mapping collection and indexes to the files in the data directory.
In the above example it will store mapping of collection users to the wt file collection-0—
2259994602858926461 .
Question 4
a. Define In-memory database. What are the techniques used in In-Memory database
to ensure that data is not lost.
Ans: In-Memory Databases
The solid-state disk may have had a transformative impact on database performance, but it
has resulted in only incremental changes for most database architectures. A more paradigm-
shifting trend has been the increasing practicality of storing complete databases in main
memory.
The cost of memory and the amount of memory that can be stored on a server have both
been moving exponentially since the earliest days of computing. Figure illustrates these
trends: both the cost of memory per unit storage and the amount of storage that can fit on a
single memory chip have been increasing over many
In-memory databases generally use some combination of techniques to ensure they don’t
lose data.
These include:
• Replicating data to other members of a cluster.
• Writing complete database images (called snapshots or checkpoints) to disk files.
• Writing out transaction/operation records to an append-only disk file (called a
transaction log or journal).
b. Explain how does Redis uses disk files for persistence Redis.

Ans
While
TimesTen is an attempt to build an RDBMS compatible in-memory database, Redis is at the
opposite extreme: essentially an in-memory key-value store. Redis (Remote Dictionary
Server) was originally envisaged as a simple in-memory system capable of sustaining very
high transaction rates on underpowered systems, such as virtual machine images.
Redis was created by Salvatore Sanfilippo in 2009. VMware hired Sanfilippo and sponsored
Redis development in 2010. In 2013, Pivotal software a Big Data spinoff from VMware’s
parent company EMC—became the primary sponsor.
Redis follows a familiar key-value store architecture in which keys point to objects. In Redis,
objects consist mainly of strings and various types of collections of strings (lists, sorted lists,
hash maps, etc.). Only primary key lookups are supported; Redis does not have a secondary
indexing mechanism.
Redis uses disk files for persistence:
• The Snapshot files store copies of the entire Redis system at a point in time.
Snapshots can be created on demand or can be configured to occur at scheduled
intervals or after a threshold of writes has been reached. A snapshot also occurs
when the server is shut down.
c. What is Berkeley Analytics data Stack? explain its components?

Ans: Berkeley Analytics Data Stack and Spark
If SAP HANA and Oracle TimesTen represent in-memory variations on the relational database
theme, and if Redis represents an in-memory variation on the key-value store theme, then
Spark represents an in-memory variation on the Hadoop theme.
Hadoop became the de facto foundation for today’s Big Data stack by providing a flexible,
scalable, and economic framework for processing massive amounts of structured,
unstructured, and semi-structured data.
The Hadoop 1.0 MapReduce algorithm represented a relatively simple but scalable approach
to parallel processing. MapReduce is not the most elegant or sophisticated approach for all
workloads, but it can be adapted to almost any problem, and it can usually scale through the
brute-force application of many servers.
BDAS consists of a few core components:

• Spark is an in-memory, distributed, fault-tolerant processing framework.
Implemented in the Java virtual-machine-compatible programming language Scala, it
provides higher-level abstractions than MapReduce and thus improves developer
productivity. As an in-memory solution, Spark excels at tasks that cause bottlenecks on disk
IO in MapReduce. Tasks that iterate repeatedly over a dataset—typical of many machine-
learning workloads—show significant improvements.
• Mesos is a cluster management layer somewhat analogous to Hadoop’s YARN. However,
Mesos is specifically intended to allow multiple frameworks, including BDAS and Hadoop, to
share a cluster.
• Tachyon is a fault-tolerant, Hadoop-compatible, memory-centric distributed file system.
The file system allows for disk storage of large datasets but promotes aggressive caching to
provide memory-level response times for frequently accessed data.
d. What is an event? State the different type of Events in JQuery ?

Ans: All the different visitors' actions that a web page can respond to are called events.
An event represents the precise moment when something happens.
Examples:
moving a mouse over an element selecting a radio button

clicking on an element The term "fires/fired" is often used with events. Example: "The
keypress event is fired, the moment you press a key".
State the different type of Events in JQuery
Commonly Used jQuery Event Methods

$(document).ready()
The $(document).ready() method allows us to execute a function when the document is fully
loaded. This event is already explained in the jQuery Syntax chapter.
click()
The click() method attaches an event handler function to an HTML element.

The function is executed when the user clicks on the HTML element
dblclick()
The dblclick() method attaches an event handler function to an HTML element.

The function is executed when the user double-clicks on the HTML element:
mouseenter()
The mouseenter() method attaches an event handler function to an HTML element.

The function is executed when the mouse pointer enters the HTML element:
e. Write a short note on jQuery CSS method .

Ans: CSS Selectors in jQuAery
The beauty of jQuery, and why it became so popular, certainly involves the fact that it’s so
easy to use. You’re probably familiar with CSS, and you know that to select an element by its
ID, you use the hash symbol (#). To select an element by its class, you use a period (.), and so
on. jQuery lets you use these selectors (and more) to select elements from the DOM. What’s
also great is that it provides backward compatibility.
CSS
jQuery’s css() method is very powerful. There are actually three primary ways that you’ll work
with it.
The first is when determining the value of an element’s property. Simply pass it one
parameter—the
property whose value you want to know:
$("div").css("width");
$("div").css("margin-right");
$("div").css("color");
You can also use CSS to set values. To set just one value, pass in a property and a value as
separate
parameters. You used this in Chapter 3.
$("div").css("color", "red");
$("div").css("border", "1px solid red");
animate() and Animation Convenience Methods
all the animation methods you’ve used so far, including fadeIn() and fadeOut(), use animate().
JQuery provides these methods, known as convenience methods, to save you some typing.
Here’s the code that implements fadeIn() from the jQuery source:
function (speed, easing, callback) {
return this.animate(props, speed, easing, callback);
f. State features of jQuery.

Ans: jQuery is a write less and do more javascript library.
- It helps us to make the use of javascript much easier.
- It simplifies the complicated things from javascript like the AJAX calls and the DOM
manipulation.
Features of jQuery are :
1. Effects and animations.

2. Ajax.
3. Extensibility.
4. DOM element selections functions.
5. Events.
6. CSS manipulation.
7. Utilities - such as browser version and the each function.
8. JavaScript Plugins.
9. DOM traversal and modification.

Some of them are described briefly
DOM manipulation − The jQuery made it easy to select DOM elements, negotiate them and
modifying their content by using cross-browser open source selector engine called Sizzle.
Event handling − The jQuery offers an elegant way to capture a wide variety of events, such
as a user clicking on a link, without the need to clutter the HTML code itself with event
handlers.
AJAX Support − The jQuery helps you a lot to develop a responsive and feature rich site
using AJAX technology.
Animations − The jQuery comes with plenty of built-in animation effects which you can use
in your websites.
Lightweight − The jQuery is very lightweight library - about 19KB in size (Minified and
gzipped).
Cross Browser Support − The jQuery has cross-browser support, and works well in IE 6.0+,
FF 2.0+, Safari 3.0+, Chrome and Opera 9.0+
Question 5
a. Explain JSON grammar.
Ans: JSON Grammar
JSON, in a nutshell, is a textual representation defined by a small set of governing rules in
which data is structured. The JSON specification states that data can be structured in either
of the two following compositions:

The figure below illustrates the grammatical representation for a collection of string/value
Pairs
1. The top path illustrates that the collection can remain devoid of any string/value pairs.
2. The middle path illustrates that our collection can be that of a single string/value pair.
3. The bottom path illustrates that after a single string/value pair is supplied, the collection
needn’t end but, rather, allow for any number of string/value pairs, before reaching the
end. Each string/value pair possessed by the collection must be delimited or separated
from one another by way of a comma (,).

The figure below illustrates the grammatical representation for an ordered list of values

1. The top path illustrates that our list can remain devoid of any value(s).
2. The middle path illustrates that our ordered list can possess a singular value.
3. The bottom path illustrates that the length of our list can possess any number of values,
which must be delimited, that is, separated, with the use of a comma (,).
b. Differentiate between XML and JSON .

Ans: JSON vs XML
Both JSON and XML can be used to receive data from a web server.
The following JSON and XML examples both define an employees object, with an array of 3
employees:
JSON Example
{"employees":[
{ "firstName":"John", "lastName":"Doe" },
{ "firstName":"Anna", "lastName":"Smith" },
{ "firstName":"Peter", "lastName":"Jones" }
]}
XML Example
<employees>
<employee>
<firstName>John</firstName> <lastName>Doe</lastName>
</employee>
<employee>
<firstName>Anna</firstName> <lastName>Smith</lastName>
</employee>
<employee>
<firstName>Peter</firstName> <lastName>Jones</lastName>
</employee>
</employees>
JSON is Like XML Because
• Both JSON and XML are "self describing" (human readable)
• Both JSON and XML are hierarchical (values within values)
• Both JSON and XML can be parsed and used by lots of programming languages
• Both JSON and XML can be fetched with an XMLHttpRequest
JSON is Unlike XML Because
• JSON doesn't use end tag
• JSON is shorter
• JSON is quicker to read and write
• JSON can use arrays

c. Explain request Headers .
Ans: A request header is an HTTP header that can be used in an HTTP request, and that
doesn't relate to the content of the message
These headers can be supplied with the request to provide the server with preferential
information that will assist in the request. Additionally, they outline the configurations of the
client making the request. Such headers may reveal information about the user-agent
making the request or
the preferred data type that the response should provide. By utilizing the headers within
this category, we can potentially influence the response from the server. For this reason, the
request headers are the most commonly configured headers. One very useful header is the
Accept header. It can be used to inform the server as to what MIME type or data type the
client can properly handle. This can often be set to a particular MIME type, such as
application/json, or text/plain. It can even be set to */*,which informs the server that the
client can accept all MIME types. The response provided by the server is expected to reflect
one of the MIME types the client can handle. The following are request headers:
Accept
Accept-Charset
Accept-Encoding
Accept-Language
Authorization
Expect
From
Host
If-Match
d. Write short note on JSON parsing .

Ans: Parsing is the process of analyzing a string of symbols, either in natural language or
in computer languages, according to the rules of a formal grammar. As the grammar of JSON
is a subset of JavaScript, the analysis of its tokens by the parser occurs indifferently from
how the Engine parses source code. Because of this, the data produced from the analysis
of the JSON grammar will be that of objects, arrays, strings, and numbers. Additionally,
the three literals—true, false, and null—are produced as well.

JSON.parse
JSON.parse converts serialized JSON into usable JavaScript values.
Syntax of the JSON.parse Method
JSON.parse(text [, reviver]);
JSON.parse can accept two parameters, text and reviver. The name of the parameter text is
indicative of the value it expects to receive. The parameter reviver is used similarly to the
replacer parameter of stringify, in that it offers the ability for custom logic to be supplied for
necessary
parsing that would otherwise not be possible by default. As indicated in the method’s
signature, only the provision of text is required.
e. Explain the stringify object for JSON objects .

Ans: stringify is used for serializing JavaScript values into that of avalid JSON. The method
itself accepts three parameters, value, replacer, and space. the JSON Object is aglobal object
that does not offer the ability to create any instances of the JSON Object. one must simply
access the stringify method via the global JSON Object.
Syntax of the JSON stringify Method
JSON.stringify(value[, replacer [, space]]);
<!DOCTYPE html>
<html>
<body>
<h2>Create JSON string from a JavaScript object.</h2>
<p id="demo"></p>
<script>
var obj = { name: "John", age: 30, city: "New York" };
var myJSON = JSON.stringify(obj);
document.getElementById("demo").innerHTML = myJSON;
</script>
</body>
</html>
f. Discuss the JSON values .

Ans: JSON Values
The figure below defines the possible values that JSON possess

String
String literals are any number of Unicode characters enclosed within either single or double
quotes
Example;
{ "name":"John" }
Number
A number in JSON is the arrangement of base10 literals, in combination with mathematical
notation to define a real number literal.
Example: { "age":30 }

JSON Objects
Values in JSON can be objects.

Example:
{
"employee":{ "name":"John", "age":30, "city":"New York" }
}
JSON Arrays
Values in JSON can be arrays.
Example:
{
"employees":[ "John", "Anna", "Peter" ]
}
JSON Booleans
Example:
Values in JSON can be true/false.
{ "sale":true
}
JSON null
Example:
Values in JSON can be null.
{ "middlename":null
}

NGT 11-2018,19

Uploaded by

Copyright:

Available Formats

NGT 11-2018,19

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NGT 11-2018,19

Uploaded by

Copyright:

Available Formats

NGT Solution November 2018

N. B.: (1) All questions are compulsory.

1. Attempt any three of the following: 15

f. What are the different challenges Big Data posses?

2. Attempt any three of the following: 15

i. Update the country to UK for all female users.

ii. Add the new field company to all the documents.

iii. Delete all the documents where Gender = ‘M’.

v. Display the first name and age of all female employees.

b. Write the Mongo dB command to create the following with an example:

Fig: Range Based portioning

3. Attempt any three of the following: 15

"_id" : ObjectId("..."),"n" : 0,"data" : BinData("..."),

4. Attempt any three of the following: 15

Fig: TimesTen Architecture

<button>Trigger custom event</button>

<p>Click the button to attach a customized event on this p element.</p>

<div id="div1"><h2>Let jQuery AJAX Change This Text</h2></div>

<button>Get External Content</button>

Removing Elements to DOM

<button>Add a class name to the first p element</button>

5. Attempt any three of the following: 15

d. Write a short note on JSON Arrays.

3. Accessing Array Values

5. We can use the index number to modify an array:

6. We can use the delete keyword to delete items from an array:

f. “JSON is better than XML”.Comment.

Ms. Seema Bhatkar Page 1

Q1a. Explain the three Vs of Big Data.

Ms. Seema Bhatkar Page 2

Ans: CAP Theorem ( Brewer’s Theorem )

Q 1c. What is MongoDB Design Philosophy? Explain.

Ms. Seema Bhatkar Page 3

3. JSON-Based Document Store

Ms. Seema Bhatkar Page 4

5. Running the Database Anywhere

Q1d. Differentiate between SQL and NoSQL Databases

Ms. Seema Bhatkar Page 5

Q1e. Write a short note on Non-Relational Approach.

Ms. Seema Bhatkar Page 6

Q1f. Discuss the various applications of Big Data.

Ms. Seema Bhatkar Page 7

Q2a. Explain the Capped Collection.

Creating Capped Collection

Option Type Description

(Optional) If true, enables a capped collection.

(Optional) If true, automatically create index on _id

(Optional) Specifies a maximum size in bytes for a

(Optional) Specifies the maximum number of

Q2b. What are the various tools available in MongoDB? Explain.

 mongorestore : The binary database dump created by the mongodump utility is

Ms. Seema Bhatkar Page 8

 mongoimport , mongoexport : mongoimport provides a method for taking data in

 mongostat , mongotop , mongosniff : These utilities provide diagnostic

db.runCommand( { split : "practicalmongodb.mycollection" , middle : { shardkey :

2. Deciding on the Chunk Size

3. Choosing a Good Shard Key

4. Monitoring for Sharding

5. Monitoring the Config Servers