Google Bigtable

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 21

GOOGLE BIGTABLE

BY ABHINEET SUMAN 192170001


ABHIJEET INGLE 192170004
Overview of Cloud Bigtable
 Cloud Bigtable is a sparsely populated table that can scale to
billions of rows and thousands of columns, enabling you to store
terabytes or even petabytes of data.
 A single value in each row is indexed; this value is known as the
row key.
 Cloud Bigtable is ideal for storing very large amounts of single-
keyed data with very low latency.
 It supports high read and write throughput at low latency, and it
is an ideal data source for MapReduce operations.
 Cloud Bigtable is exposed to applications through multiple client
libraries, including a supported extension to the Apache HBase
library for Java. As a result, it integrates with the existing Apache
ecosystem of open-source Big Data software.
Cloud Bigtable’s Advantages
 Incredible scalability. Cloud Bigtable scales in direct proportion to the
number of machines in your cluster. A self-managed HBase installation has a
design bottleneck that limits the performance after a certain threshold is
reached. Cloud Bigtable does not have this bottleneck, so you can scale your
cluster up to handle more reads and writes.
 Simple administration. Cloud Bigtable handles upgrades and restarts
transparently, and it automatically maintains high data durability. To replicate
your data, simply add a second cluster to your instance, and replication starts
automatically. No more managing masters or regions; just design your table
schemas, and Cloud Bigtable will handle the rest for you.
 Cluster resizing without downtime. You can increase the size of a Cloud
Bigtable cluster for a few hours to handle a large load, then reduce the
cluster's size again—all without any downtime. After you change a cluster's
size, it typically takes just a few minutes under load for Cloud Bigtable to
balance performance across all of the nodes in your cluster.
What It’s Good For?
 You can use Cloud Bigtable to store and query all of the following
types of data:
 Time-series data, such as CPU and memory usage over time
for multiple servers.
 Marketing data, such as purchase histories and customer
preferences.
 Financial data, such as transaction histories, stock prices, and
currency exchange rates.
 Internet of Things data, such as usage reports from energy
meters and home appliances.
 Graph data, such as information about how users are connected
to one another.
Cloud Bigtable storage model
 Cloud Bigtable stores data in massively scalable tables, each of which is a
sorted key/value map. The table is composed of rows, each of which
typically describes a single entity, and columns, which contain individual
values for each row.
 Each row is indexed by a single row key, and columns that are related to
one another are typically grouped together into a column family.
 Each column is identified by a combination of the column family and
a column qualifier, which is a unique name within the column family.
 Each row/column intersection can contain multiple cells, or versions, at
different timestamps, providing a record of how the stored data has been
altered over time.
 Cloud Bigtable tables are sparse; if a cell does not contain any data, it does
not take up any space.
Cloud Bigtable Architecture
 Client requests go through a front-end server before they are sent to a Cloud
Bigtable node. The nodes are organized into a Cloud Bigtable cluster, which
belongs to a Cloud Bigtable instance, a container for the cluster.
 Each node in the cluster handles a subset of the requests to the cluster. By
adding nodes to a cluster, you can increase the number of simultaneous
requests that the cluster can handle, as well as the maximum throughput for
the entire cluster. If you enable replication by adding a second cluster, you
can also send different types of traffic to different clusters, and you can fail
over to one cluster if the other cluster becomes unavailable.
 A Cloud Bigtable table is sharded into blocks of contiguous rows,
called tablets, to help balance the workload of queries. (Tablets are similar to
HBase regions.) Tablets are stored on Colossus, Google's file system,
in SSTable format. An SSTable provides a persistent, ordered immutable map
from keys to values, where both keys and values are arbitrary byte strings.
 Each tablet is associated with a specific Cloud Bigtable node. In addition to the
SSTable files, all writes are stored in Colossus's shared log as soon as they are
acknowledged by Cloud Bigtable, providing increased durability.
Load balancing
 Each Cloud Bigtable zone is managed by a master process, which
balances workload and data volume within clusters. The master
splits busier/larger tablets in half and merges less-
accessed/smaller tablets together, redistributing them between
nodes as needed.
  If a certain tablet gets a spike of traffic, the master will split the
tablet in two, then move one of the new tablets to another node.
Cloud Bigtable manages all of the splitting, merging, and
rebalancing automatically, saving users the effort of manually
administering their tablets.
Supported data types
 Cloud Bigtable treats all data as raw byte strings for most
purposes. The only time Cloud Bigtable tries to determine the
type is for increment operations, where the target must be a 64-
bit integer encoded as an 8-byte big-endian value.
Memory and disk usage
 Empty cells in a Cloud Bigtable table do not take up any space.
Each row is essentially a collection of key/value entries, where
the key is a combination of the column family, column qualifier
and timestamp. If a row does not include a value for a specific
key, the key/value entry is simply not present.
 Column qualifiers take up space in a row, since each column
qualifier used in a row is stored in that row. As a result, it is often
efficient to use column qualifiers as data.
 Cloud Bigtable periodically rewrites your tables to remove
deleted entries, and to reorganize your data so that reads and
writes are more efficient. This process is known as a compaction.
There are no configuration settings for compactions—Cloud
Bigtable compacts your data automatically.
 Mutations, or changes, to a row take up extra storage space,
because Cloud Bigtable stores mutations sequentially and
compacts them only periodically. When Cloud Bigtable compacts
a table, it removes values that are no longer needed. If you
update the value in a cell, both the original value and the new
value will be stored on disk for some amount of time until the
data is compacted.
 Deletions also take up extra storage space, at least in the short
term, because deletions are actually a specialized type of
mutation. Until the table is compacted, a deletion uses extra
storage rather than freeing up space.
Data compression
 Cloud Bigtable compresses your data automatically using an
intelligent algorithm. You cannot configure compression settings
for your table. However, it is useful to know how to store data so
that it can be compressed efficiently:
 Random data cannot be compressed as efficiently as
patterned data. Patterned data includes text, such as the page
you're reading right now.
 Compression works best if identical values are near each
other, either in the same row or in adjoining rows. If you arrange
your row keys so that rows with identical chunks of data are next
to each other, the data can be compressed efficiently.
Data durability
 When you use Cloud Bigtable, your data is stored on Colossus,
Google's internal, highly durable file system, using storage
devices in Google's data centres. You do not need to run an HDFS
cluster or any other file system to use Cloud Bigtable. If your
instance uses replication, Cloud Bigtable maintains one copy of
your data in Colossus for each cluster in the instance. Each copy
is located in a different zone or region, further improving
durability.
 Behind the scenes, Google uses proprietary storage methods to
achieve data durability above and beyond what's provided by
standard HDFS three-way replication. In addition, we create
backups of your data to protect against catastrophic events and
provide for disaster recovery.
Security
 Access to your Cloud Bigtable tables is controlled by your Google
Cloud project and the Cloud Identity and Access Management
roles that you assign to users.
 For example, you can assign Cloud IAM roles that prevent
individual users from reading from tables, writing to tables, or
creating new instances.
 If someone does not have access to your project or does not
have a Cloud IAM role with appropriate permissions for Cloud
Bigtable, they cannot access any of your tables.
 You can manage security at the project, instance, and table
levels. Cloud Bigtable does not support row-level, column-level,
or cell-level security restrictions.
Instances, clusters, and nodes

 A Cloud Bigtable instance is mostly just a container for


your clusters and nodes, which do all of the real work.
 An instance has a few important properties that you need to
know about:
 The instance type (production or development)
 The storage type (SSD or HDD)
 The application profiles, for instances that use replication
Clusters
 A cluster represents the actual Cloud Bigtable service. Each
cluster belongs to a single Cloud Bigtable instance, and an
instance can have up to 4 clusters. When your application sends
requests to a Cloud Bigtable instance, those requests are actually
handled by one of the clusters in the instance.
 Each cluster is located in a single zone. An instance's clusters
must be in unique zones. You can create an additional cluster in
any zone where Cloud Bigtable is available.
Nodes
 Each cluster in a production instance has 3 or more nodes, which
are compute resources that Cloud Bigtable uses to manage your
data.
 Behind the scenes, Cloud Bigtable splits all of the data from your
tables into smaller tablets.
 Tablets are stored on disk, separate from the nodes but in the
same zone as the nodes. Each node is responsible for keeping
track of specific tablets on disk; handling incoming reads and
writes for its tablets; and performing maintenance tasks on its
tablets, such as periodic compactions.
 Each tablet is associated with a single node.
CPU usage
CPU Usage
Disk usage
THANK YOU…

You might also like