11 Managed Services
11 Managed Services
11 Managed Services
Managed Services
Managed services are partial or complete solutions offered as a service. They exist on
a continuum between platform as a service and software as a service, depending on
how much of the internal methods and controls are exposed. Using a managed
service allows you to outsource a lot of the administrative and maintenance overhead
to Google, if your application requirements fit within the service offering.
Proprietary + Confidential
Agenda
01 BigQuery
02 Dataflow
03 Dataprep
04 Dataproc
BigQuery
01
Proprietary + Confidential
● Fully managed
● Petabyte scale
● SQL interface
● Very fast
BigQuery is Google Cloud’s serverless, highly scalable, and cost-effective cloud data
warehouse.
It is a petabyte-scale data warehouse that allows for super-fast queries using the
processing power of Google's infrastructure. Because there is no infrastructure for
you to manage, you can focus on uncovering meaningful insights using familiar SQL
without the need for a database administrator.
Query example
WITH groceries AS
(SELECT "milk" AS dairy,
"eggs" AS protein,
"bread" AS grain)
SELECT g.*
FROM groceries AS g;
+-------+---------+-------+
| dairy | protein | grain |
+-------+---------+-------+
| milk | eggs | bread |
+-------+---------+-------+
You can access BigQuery by using the Google Cloud console, by using a
command-line tool, or by making calls to the BigQuery REST API using a variety of
client libraries such as Java, .NET, or Python. There are also several third-party tools
that you can use to interact with BigQuery, such as visualizing the data or loading the
data.
Here is an example of a Standard SQL query on a table called groceries. This query
produces one output column for each column in the table groceries, aliased as g.
Proprietary + Confidential
02
Dataflow
Dataflow supports fast, simplified pipeline development via expressive SQL, Java,
and Python APIs in the Apache Beam SDK, which provides a rich set of windowing
and session analysis primitives as well as an ecosystem of source and sink
connectors. Dataflow is also tightly coupled with other Google Cloud services like
Google Cloud’s operations suite, so you can set up priority alerts and notifications to
monitor your pipeline and the quality of data coming in and out.
Proprietary + Confidential
Data Third-Party
Pub/Sub Studio Tools
Data
Warehouse
Datastore Stream BigQuery
AI Platform
Caching &
Cloud Bigtable
Serving
Apache Ka a
This diagram shows some example uses cases of Dataflow. As we just mentioned,
Dataflow processes stream and batch data. This data could come from other Google
Cloud services like Datastore or Pub/Sub, which is Google’s messaging and
publishing service. The data could also be ingested from third-party services like
Apache Avro and Apache Kafka.
After you transform the data with Dataflow, you can analyze it in BigQuery, Vertex AI,
or even Cloud Bigtable. Using Data Studio, you can even build real-time dashboards
for IoT devices.
Proprietary + Confidential
03 Dataprep
Dataprep is an intelligent data service for visually exploring, cleaning, and preparing
structured and unstructured data for analysis, reporting, and machine learning.
With automatic schema, datatype, possible joins, and anomaly detection, you can
skip time-consuming data profiling and focus on data analysis.
Dataprep architecture
BigQuery/
BigQuery
BigQuery ML
Raw Refined
data data
Cloud Dataprep Data ow BigQuery Data Studio
Storage
Looker
Vertex AI
Dataproc
04
Let’s learn a little bit about Dataproc.
Proprietary + Confidential
● Managed service
Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark
and Apache Hadoop clusters in a simpler way. You only pay for the resources you
use with per-second billing. If you leverage preemptible instances in your cluster, you
can reduce your costs even further.
Without using Dataproc, it can take from five to 30 minutes to create Spark and
Hadoop clusters on-premises or through other Infrastructure-as-a-Service providers.
Dataproc clusters are quick to start, scale, and shut down, with each of these
operations taking 90 seconds or less, on average. This means you can spend less
time waiting for clusters and more hands-on time working with your data.
Dataproc has built-in integration with other Google Cloud services, such as BigQuery,
Cloud Storage, Cloud Bigtable, Cloud Logging, and Cloud Monitoring. This provides
you with a complete data platform rather than just a Spark or Hadoop cluster.
As a managed service, you can create clusters quickly, manage them easily, and save
money by turning clusters off when you don't need them. With less time and money
spent on administration, you can focus on your jobs and your data.
If you’re already using Spark, Hadoop, Pig, or Hive, you don’t even need to learn new
tools or APIs to use Dataproc. This makes it easy to move existing projects into
Dataproc without redevelopment.
Proprietary + Confidential
Yes No
Manual Automatic
Dataproc Dataflow
Now, Dataproc and Dataflow can both be used for data processing, and there’s
overlap in their batch and streaming capabilities. So, how do you decide which
product is a better fit for your environment?
Well, first, ask yourself whether you have dependencies on specific tools or packages
in the Apache Hadoop or Spark ecosystem. If that’s the case, you’ll obviously want to
use Dataproc.
If not, ask yourself whether you prefer manual provisioning of clusters, in this case
you would choose Dataproc. If you prefer Serverless, automatic provisioning of
clusters, than choose Dataflow.
For quick walkthrough on how to create a Dataproc cluster, modify the number of
workers in the cluster, and submit a simple Apache Spark job, refer to this video.
Proprietary + Confidential
Quiz
Proprietary + Confidential
Question #1
Question
C. If you have an existing infrastructure service, Google will manage it for you if you purchase
a Managed Services contract
Question #1
Answer
C. If you have an existing infrastructure service, Google will manage it for you if you purchase
a Managed Services contract
Explanation:
Managed Services in this class are presented as a possible alternative to building
your own infrastructure data processing solution.
Proprietary + Confidential
Question #2
Question
D. It doesn’t integrate with Cloud Monitoring, but it has its own monitoring system
Proprietary + Confidential
Question #2
Answer
D. It doesn’t integrate with Cloud Monitoring, but it has its own monitoring system
Explanation:
Fast to start a cluster.
Proprietary + Confidential
Review:
Managed Services
In this module, we provided you with an overview of managed services for data
processing in Google Cloud, namely BigQuery, Dataflow, Dataprep, and Dataproc.
Managed services allow you to outsource a lot of the administrative and maintenance
overhead to Google, so you can focus on your workloads, instead of the
infrastructure. Speaking of infrastructure, most of the services that we covered are
serverless. Now, this doesn’t mean that there aren’t any actual servers processing
your data. Serverless means that servers or Compute Engine instances are
obfuscated so that you don’t have to worry about the infrastructure.
Dataproc isn’t a serverless service, because you are able to view and manages the
underlying master and worker instances.
Proprietary + Confidential
1 2
Catalog
My Learning
Click on My Learning in the left-hand navigation bar Select the class from the Completed Courses list
The “Architecting with Google Compute Engine” course is part of the Cloud
Infrastructure learning path. This path is designed for IT professionals who are
responsible for implementing, deploying, migrating, and maintaining applications in
the cloud. Next, we recommend taking the Architecting with Google Cloud: Design
and Process course, which is part of the learning path for the Professional Cloud
Architect Certification.
Proprietary + Confidential
01 02 03 04
● Quests:
a. Kubernetes in Google Cloud
b. Google Kubernetes Engine Best Practices
● Self-Paced Labs:
a. Cloud Functions - Qwik Start
b. Deploying the Application into App Engine Flexible Environment - Java
OR Deploying the Application into App Engine Flexible Environment -
Python
To help you structure your preparation for the Associate Cloud Engineer exam, we
recommend the Preparing for the Associate Cloud Engineer Examination course.
You can also prepare using the Official Google Cloud Certified Associate Cloud
Engineer Study Guide, published by Wiley. Visit the Google Cloud Certification
website for more information and to register.
Good luck!
Thank you for taking the “Architecting with Google Compute Engine” course!
Now it’s your turn. Go ahead and apply what you have learned by architecting your
own infrastructure in Google Cloud.