Database replication is like having a backup copy of your data.
When changes are
made in the main database, these changes are also sent to the backup. This way, if
the main database has problems, the backup can step in without losing any data.
It’s like saving your work in two places, so you don’t lose it if something goes
wrong with one.
In database architecture, there are three main models:
1. Shared-everything: A single-node setup where all processes utilize the
same CPU, memory, and storage. It’s efficient for small-scale operations but
can bottleneck at high concurrency levels.
2. Shared-disk: Nodes operate independently with their own CPU and
memory but access a common storage pool. This allows for better
distribution of read operations across nodes.
3. Shared-nothing: Each node is self-sufficient with dedicated CPU, memory,
and storage, handling a portion of the database. This architecture excels in
parallel processing, particularly for data warehousing, as it avoids resource
contention and enables high scalability.
The shared-nothing approach underpins Massively Parallel Processing (MPP)
databases, optimizing read-intensive workloads by distributing queries across
nodes based on data locality.
Shared-nothing architecture is like a team project where each member works on a
separate part. If they need to work together on something that overlaps, it gets
tricky because they have to coordinate perfectly, which can slow things down and
sometimes lead to confusion or incomplete work.
Also, if the project isn’t divided up well from the start, some team members might
end up with too much or too little to do. And if new members join or leave, it can
take a lot of effort to redistribute the work fairly.
On the other hand, shared-disk architecture is like a group of artists sharing the
same set of paints. They can all work on their own canvases, and if one artist takes
a break, the others can continue using the paints without any problem. This setup is
more flexible and doesn’t need as much planning when adding or removing artists.
However, if they don’t communicate about which colors they’re using or if they’re
running low on a particular shade, they might end up with inconsistent results. This
need for constant communication can be challenging and can put pressure on their
ability to work smoothly together.
Oracle’s Real Application Clusters (RAC) is an example of a shared-disk system
that has managed to handle these challenges well. It’s like an art studio that’s
found a good system for sharing paints and keeping track of who’s using what.
In a shared-nothing architecture, each node operates independently with its own
data, making it challenging to manage ACID transactions across multiple nodes.
The two-phase commit protocol required for this coordination is complex and can
lead to uncertain transaction states and reduced performance.
Moreover, data partitioning must be carefully managed to prevent workload
imbalance. As the cluster changes with nodes being added or removed, significant
effort is needed for data rebalancing, which can be resource-intensive.
Conversely, a shared-disk architecture allows for more flexible scaling and
simplifies high-availability setups. Nodes share access to the same disk storage, so
if one node fails, others can take over its workload without data unavailability.
The primary challenge in shared-disk systems is maintaining cache coherence
across nodes to ensure a consistent view of the data. This requires a robust network
infrastructure and can be complex to implement effectively. Oracle’s Real
Application Clusters (RAC) is a notable example of a successful shared-disk
relational database management system (RDBMS), overcoming these obstacles
through sophisticated cache management techniques.
In nonrelational databases, keeping transactions perfectly in sync (ACID
compliance) isn’t always necessary. This leads to different design choices:
1. Availability vs. Consistency: Imagine a network that sometimes
disconnects. Should the system keep working (available) or stop until
everything can be perfectly updated (consistent)? Nonrelational databases
often choose to keep working, even if it means some updates happen later.
2. Cost-Effective Hardware: When building a large system, the cost of
servers adds up. Nonrelational databases aim to use affordable, common
hardware efficiently and adapt to new hardware without needing to upgrade
everything.
3. Staying Robust: In a big network, parts will fail. The system must handle
these failures without losing data or stopping service.
There are three main designs for these databases:
Sharding: Data is split and stored across different servers based on certain
values (shard keys).
Omniscient Master (like Hadoop HDFS/HBase): A central controller
decides where data goes based on current loads and other factors.
Consistent Hashing (like Amazon Dynamo): Data placement is calculated
using a predictable formula (hashing) based on key values.
All these designs include making copies of data (replication) to avoid loss if a
server fails.
MONGODB
MongoDB uses sharding to spread data across multiple servers, which helps
handle more data and users. It also uses replication to create copies of data for
safety and continuous availability.
Here’s a simple breakdown:
Sharding: Think of it like organizing a library with many branches. Each
branch (shard) holds different books (data). A special database (config
server) keeps track of which book is in which branch. When you want a
book, the librarian (router process) checks the records and sends you to the
right branch.
Range-based Partitioning: Books are sorted by topics or numbers, so if you
want a range (like all books on science), you go to one branch that has them
all.
Hash-based Partitioning: Books are placed randomly based on a secret
formula (hash function), so you might have to visit several branches to find
all science books. But this way, no single branch gets too crowded.
Tag-aware Sharding: If some books are more popular or need special
storage, they can be tagged and kept in specific branches.
Cluster Balancing: With hash-based sharding, new books are spread out
evenly so no branch gets overwhelmed. If the library grows, it can adjust to
keep the balance.
In short, MongoDB’s sharding makes sure data is spread out efficiently across
servers, and replication ensures there’s always a backup copy.
MongoDB uses sharding to spread data across multiple servers, which helps
handle more data and users. It also uses replication to create copies of data for
safety and continuous availability.
Here’s a simple breakdown:
Sharding: Think of it like organizing a library with many branches. Each
branch (shard) holds different books (data). A special database (config
server) keeps track of which book is in which branch. When you want a
book, the librarian (router process) checks the records and sends you to the
right branch.
Range-based Partitioning: Books are sorted by topics or numbers, so if you
want a range (like all books on science), you go to one branch that has them
all.
Hash-based Partitioning: Books are placed randomly based on a secret
formula (hash function), so you might have to visit several branches to find
all science books. But this way, no single branch gets too crowded.
Tag-aware Sharding: If some books are more popular or need special
storage, they can be tagged and kept in specific branches.
Cluster Balancing: With hash-based sharding, new books are spread out
evenly so no branch gets overwhelmed. If the library grows, it can adjust to
keep the balance.
In short, MongoDB’s sharding makes sure data is spread out efficiently across
servers, and replication ensures there’s always a backup copy.
In MongoDB, replication is achieved through replica sets, which are groups of
servers that maintain the same data set. Replica sets provide redundancy and high
availability, and are the basis for all production deployments.
Here’s a technical breakdown:
Replica Set: A group of MongoDB servers that hold the same data. One
server acts as the primary node and handles all write operations. The other
servers are secondary nodes that replicate the primary’s data.
Primary Node: The only node in a replica set that receives write operations.
MongoDB writes are first recorded on the primary and then replicated to
secondaries.
Secondary Nodes: These nodes replicate the primary’s oplog and apply the
operations to their data sets in an asynchronous process.
Elections: When a primary node goes down or becomes unreachable, the
replica set holds an election to choose a new primary from the secondaries.
To be eligible, a node must be in contact with more than half of the nodes in
the replica set.
Priority Value: Administrators can influence elections by setting priority
values for nodes. A priority of 0 means a node can never become primary.
Oplog (Operation Log): A special capped collection that records all
operations that modify the data stored in databases. The primary node uses
this log to keep track of changes that need to be replicated to secondaries.
Heartbeats: Members of a replica set send regular heartbeat messages to
each other to monitor their status. If a primary stops receiving heartbeats
from a majority of secondaries, it steps down, and an election is triggered.
This replication mechanism ensures that MongoDB can provide high availability,
fault tolerance, and automatic failover.
In MongoDB, you can adjust settings to control how strict you want to be about
your data’s consistency and availability.
Write Concern: This setting decides when a data change (write) is
considered complete. By default, it’s done as soon as the primary server
knows about it. But there’s a risk: if the primary fails right after, you might
lose that data. To be safer, you can wait until one or more backup servers
(secondaries) also have the change.
Read Preference: This tells MongoDB where to get data for read
operations. Usually, it asks the primary server. But you can set it up to ask
backup servers if the primary is busy or down, or even to just ask whichever
server responds fastest, which might give you slightly older data but faster
responses.
By default, MongoDB makes sure everyone sees the latest data (strict consistency).
If you’re okay with sometimes getting slightly older data for faster reads, you can
change the settings for a more relaxed approach (eventual consistency), especially
if you also adjust write concern to make sure writes are safe on backup servers.
Pc-Mongodb sharding acchitecture
Pc-comparision range and hash shard
Pc- MongoDB replica set and primary failover
HBASE
Certainly! Let’s dive into the technical details of HBase and its relationship with
Hadoop HDFS:
HBase: It’s a distributed, scalable, big data store, often referred to as the
“Hadoop database” because it’s built on top of the Hadoop ecosystem. It
provides random access and real-time read/write access to your big data.
Open-source BigTable: HBase is inspired by Google’s BigTable and can be
considered an open-source equivalent. It uses the Hadoop Distributed File
System (HDFS) for storage, which allows it to leverage the scalability and
fault tolerance of Hadoop.
Architecture: While HBase can theoretically work on top of any file
system, it’s predominantly used with HDFS. This is because many of its
architectural decisions are optimized for HDFS’s characteristics.
Hybrid Clustering: The architecture of HBase over HDFS is a hybrid
between shared-nothing and shared-disk architectures. Although every node
in an HBase cluster can access all data through HDFS, nodes typically have
local data that they manage, which optimizes performance.
Data Reliability: With HDFS underneath, HBase doesn’t need to handle
data replication or disk failure itself. HDFS automatically replicates data
across multiple DataNodes for fault tolerance. By default, there are three
copies of each piece of data, ideally with one copy on a different rack to
protect against rack-level failures.
DataNodes: These are the servers in a Hadoop cluster where the actual data
is stored. They are directly connected to the disks that store your files.
In summary, HBase offers a powerful way to perform real-time operations on large
datasets by leveraging the robust storage capabilities of HDFS. It simplifies
managing big data by providing a database-like interface while ensuring high
availability and fault tolerance through its integration with the broader Hadoop
ecosystem.
Sure, let’s break down these HBase concepts into simpler terms:
Tables: In HBase, tables are like big spreadsheets that store your data.
They’re made up of rows and columns, but they can handle a massive
amount of data.
Regions: Think of regions as chunks of the table. Just like a big spreadsheet
might be too large to fit on one screen, a table is split into regions so it can
be managed more easily.
RegionServers: These are the caretakers for the regions. Each RegionServer
looks after a set of regions, much like a librarian might be responsible for
certain sections of a library.
Hfiles: These are the actual files on the disk where HBase stores your data.
Zookeeper: This is like an address book for HBase. When a client wants to
read or write data, it first checks with Zookeeper to find out which
RegionServer to talk to.
Catalog Tables: These are special tables that keep track of where all the
regions are located. The client uses these to find the right place to read or
write data.
HBase Master Server: This server is like a manager that makes sure all the
RegionServers are balanced and not overloaded. If new servers are added or
removed, it helps move regions around to keep everything running
smoothly.
So, when you want to get or change data in HBase, your request goes through this
process: ask Zookeeper where to go, check the catalog tables for the exact location,
and then talk to the right RegionServer to do your read or write operation.
HBase archietecture
Alright, let’s simplify these terms:
Caching and data locality.
Block Cache: This is like a quick-access memory area in the RegionServer
where frequently read data is kept so it can be retrieved fast, without having
to read from the disk every time.
MemStore: It’s a temporary holding area for new data that hasn’t been
saved to disk yet. Think of it as a notepad where the RegionServer jots down
data before filing it away properly.
Write Ahead Log (WAL): This is like a diary for the RegionServer. Every
change gets written here first, so if something goes wrong, it can remember
what it was doing and make sure no data is lost.
Data Locality: This means keeping the data close to where it’s being
processed. In HBase, it’s ideal to have the data on the same server as the
RegionServer, so it doesn’t have to go far to get it.
Rebalance Operations and Failovers: Sometimes, regions need to be
moved around between servers for balance or if a server fails. This can
temporarily disrupt data locality.
Compactions: This process cleans up and organizes data files on HDFS,
which helps bring back good data locality after rebalancing or failovers.
Short-circuit Reads: Normally, reading data involves asking the
NameNode where to find it. But with short-circuit reads, if the data is on the
same server as the RegionServer, it can skip asking and read directly from
the disk, which is faster.
So in essence, HBase tries to keep things running quickly and safely by storing
frequently used data in memory for quick reads, writing all changes down in a log
for safety, and keeping data close to where it’s processed for speed.
Pc-data locality in Hbase
Row key ordering:
In HBase, using rowkeys with values that always increase, like timestamps, can
overload one server. To fix this, mix up the rowkeys with random values or extra
attributes to spread out the data.
RegionServer
In HBase, regions are split by the RegionServer as they grow, and these splits can
be moved to balance the load across servers. The HBase master node manages this
balancing act, moving the management of regions without moving the actual data.
However, this can temporarily affect how quickly data can be accessed until the
next compaction reorganizes it.
Region replicas
HBase region replicas provide backup copies of data on different servers, allowing
quick recovery from server failures without data loss. These replicas ensure service
continuity but may not always have the latest data due to asynchronous updates.
CASSANDRA
Sure, let’s break it down:
Master Nodes (HBase/MongoDB): Think of master nodes like school
principals who oversee everything and keep records.
Cassandra’s Approach: It’s like a classroom where every student can be
the leader; there’s no permanent principal.
Gossip Protocol: It’s like a game of telephone where Cassandra nodes
constantly update each other about the cluster’s status, ensuring everyone is
informed.
Node Responsibilities: Sometimes, a Cassandra node takes on a temporary
special task, like helping a new node learn about the cluster or coordinating
an operation.
No Single Point of Failure: Because every node in Cassandra can do any
job, there’s no single node whose failure can stop the whole system.
Failure Detection: Instead of relying on regular “heartbeat” signals which
might get lost, Cassandra nodes use a more flexible approach to figure out if
another node might be down, similar to how people might get concerned if
they haven’t heard from a friend in a while.
This setup helps Cassandra avoid major disruptions even if one node has problems.
Consistent Hashing
Consistent hashing is a method used by Cassandra and other Dynamo-based
databases to distribute data across a cluster. It works by hashing the rowkey
(similar to a primary key in an RDBMS) and assigning each node a range of hash
values. The node responsible for the range that includes the hashed key value will
store the corresponding data.
Imagine the hash values are arranged in a circle, or “ring,” with each node
covering a segment of the ring. When data comes in, its rowkey is hashed, and
wherever this hash falls on the ring determines which node will store it.
When adding new nodes to the cluster, it’s crucial to maintain balance so that each
node is responsible for an approximately equal range of hash values. If you add
nodes one by one, you can insert them between existing nodes on the ring, but this
can lead to an unbalanced distribution because it effectively splits the hash range of
an existing node.
To avoid this, you could remap all hash ranges every time a new node is added,
ensuring a balanced cluster, but this process is resource-intensive. Alternatively,
you can allow the cluster to become temporarily unbalanced until enough new
nodes are added to justify a full rebalancing.
Ex: Let’s say we have a Cassandra cluster with 4 nodes, and we’re using consistent
hashing to distribute data. We hash the rowkeys and assign each node a range of
these hash values.
For simplicity, let’s use letters instead of numbers for hash values. The hash range
is A to Z, and each node gets a range:
Node 1: A-F
Node 2: G-L
Node 3: M-R
Node 4: S-Z
Now, if we have data with rowkeys that hash to ‘C’, ‘H’, ‘O’, and ‘V’, they would
be stored on nodes 1, 2, 3, and 4 respectively.
If we add a new Node 5, we could place it between Node 2 and Node 3, giving it a
range of I-K. This splits Node 2’s range but keeps the cluster running without
remapping all ranges. However, now Node 2 has fewer hash values (G-H), and
Node 5 has only I-K, while the others have larger ranges. This is an unbalanced
situation that we’d want to avoid or fix by redistributing the ranges more evenly.
Pc-Consitent hashing
Order preserving partition and replicas
In Cassandra, the partitioner decides how data is spread across nodes. Order-
preserving partitioners keep rows in sorted order but can cause uneven load and
hotspots. Replicas are additional copies of data; the coordinator node ensures
they’re stored on other nodes. The replication factor determines how many copies
exist. Simple replication writes to adjacent nodes, while Network Topology Aware
replication spreads copies across different racks or data centers for better
availability.