Cloudera Data Platform

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

CLOUDERA DATA

PLATFORM
Introduction
The functions of the core technologies of
Hadoop
How Cloudera Manager simplifies Hadoop
COURSE installation and administration
How to deploy a Hadoop cluster using
OBJECTIVE Cloudera Manager
How to plan your Hadoop cluster hardware
and software
How to maintain your cluster

How to monitor, troubleshoot, and optimize


the cluster
Data Analytics
Big Picture Data Science

BI Presentation
Big Data

e
um
Data Source

ns
Co
ETL

ET
e

L
Data Warehouse um
ns
Co
Big Data 3 V’s
Volume
Jumlah data yang sangatbesar Variety
Data yang disimpanberbagai
macam format

Velocity
Kecepatan proses datasangat
cepat
Hadoop’s Origin

In 2007 Google collected Google needs a better


around 270PB of data platform to processthis
every month large data

G
F
S
Google runs
Increased internetusage, The same amount MapReduceoperation
lots of new website and increased by20000PB Called as GFS(Google
searches every day in2009 File System)
Typical Hadoop Component
Flatfile

Streaming &Event
Analytic and Machine Learning Processing Analytic
Cloud Dashboard
Data Governance

Batch | Streaming |Realtime


RDBM S REPORTING
SQL and Data Processing

NoSQL
Public Cloud
IoT

Security Automation
Realtime
Apps Dashboard
Distributed Storage
HADOOP EVOLUTION
EVOLUTION TO THE HYBRID ARCHITECTURE
Disaggregated Storage/Compute + SaaS
Experiences
On Premise Public Cloud Multi-Public/Private/Hybrid Cloud
Streaming & Data Data Operational Machine Streaming & Data Data Operational Machine
WORKLOAD Data Flow Engineering Warehouse Database Learning Data Flow Engineering Warehouse Database Learning
S
CLUSTER Streaming & Data Data Operational Machine EXPERIENCES Streaming & Data Data Operational Machine
METADATA | CATALOG | SCHEMA Data Flow Engineering Warehouse Database Learning Data Flow Engineering Warehouse Database Learning
S
SECURIT Streaming & Data Data Operational Machine
Y Data Flow Engineering Warehouse Database Learning

BARE SECURIT SECURIT


METAL METADATA | CATALOG | SCHEMA METADATA | CATALOG | SCHEMA | MIGRATION | GOVERNANCE
Y Y

VIRTUAL
STORAG STORAG STORAG CONTAINER
MACHINE
E E S E S

COMMODITY COMMODITY COMMODITY


INFRASTRUCTUR INFRASTRUCTUR INFRASTRUCTUR
E E E
ABOUT CLOUDERA

The leader in Apache Founded by Hadoop experts Provides support, consulting, Staff includes committers to Many authors of
Hadoop-based software and from Facebook, Yahoo, training, and certification for virtually all Hadoop projects authoritative books on
services Google, and Oracle Hadoop users Apache Hadoop projects
CDP Private Cloud Base

The two best open-source data analytics platforms fused together with the
addition of new capabilities

Enterprise Data Hub


+ + New Features CDP Private Cloud
HDP Enterprise Plus Base
✓ CDP product on-premises
✓ 30+ components
✓ Highly customizable
COMPONENT LIST

CDP Private Cloud Base 7.1

• Cloudera Manager 7.1 • HBase 2.2 • Key HSM 7.1 • Kafka Schema Registry 0.8
• Hadoop 3.1 • Phoenix 5.0 • Knox 1.3 • Streams Messaging Mgr 1.0
• Spark 2.4 • Kudu 1.12 • Livy 0.7 • Streams Replication Mgr 2.1
• Hive 3.1 • Sqoop 1.4.7 • Navigator Encrypt 7.1 • Ozone (Beta) 0.6
• Impala 3.2 • Parquet 1.10 • Ranger KMS 7.1 • Kafka Connect 2.4
• Oozie 5.1 • Avro 1.8 • Zeppelin • Cruise Control 2.0
• Hue 4.5 • ORC 1.5 • Hive Warehouse Connector 1.0 • Tez 0.9
• Ranger 2.0 • Zookeeper 3.5 • Kafka 2.4 • Key Trustee Server 7
• Atlas 2.0 • Solr 8.4

• RHEL/CENTOS/OEL 7.7 • MySQL 5.7 • Upgrades from CDP DC 7.0


• Postgres 10 • Oracle DB 12 (Fresh Install Only) • Upgrades from CDH 5.13-5.16
• JDK 8 • PostgreSQL 10 • Upgrades from HDP 2.6.5
• JDK 11 Runtime • Maria DB 10.2
CLOUDERA MANAGER

• In this chapter, you will learn:


• The rationale for, and benefits of,
a cluster management solution
• Cloudera Manager features,
options, and requirements
• How to install Cloudera Manager
• CDH cluster installation options
SIMPLIFIED MANAGEMENT
Cloudera Manager
• Management of of new services
• Knox,Ranger,Atlas,Hive-on-
Tez,DAS
• CDP Look-and-Feel
• Cluster-level configuration history
• Improved global search
• Resume errors in enabling Kerberos
• Minor scalability improvements
(hosts page)
• Improved alerts configuration
• JQuery 3.4 (improved security)
• Upgrade Support
• Support for Private Cloud
Automated Deployment

Automatically install and configure Hadoop services on hosts

Cloudera Manager sets recommended default parameters


CLOUDERA Easy to stop and start services on master and worker nodes

MANAGER Manage A Wide Range Of Hadoop And Hadoop “Ecosystem” Services

FEATURE Diagnose And Resolve Issues More Quickly

Manage User And Group Access To The Cluster(s)

Monitor Cluster Health And Performance, Track Events

Real-time monitoring, charts, custom reporting, email alerts


TERMINOLOGI
CM
Architecture
• Cloudera Manager Server
─ Installed outside the cluster on dedicated hardware
─ Runs service monitor and activity monitor daemons
─ Stores cluster configuration information in a database
TOPOLOGY ─ Sends configuration information and commands to the

CM agents over HTTP(S)


• Cloudera Manager Agents
─ Receive updated configurations from server
─ Start and stop Hadoop daemons, collect statistics
─ Heartbeat status to the server
CLOUDERA DEPLOYMENT
Install and configure the database, install the JDK

─ Database should be external for Production deployments

Ensure access to the Cloudera software ─ For Cloudera Manager


repositories ─ For CDH

Install Cloudera Manager and agents

Install the CDH Parcel services or RPMs for the functionality required on each host in the
cluster
CM REQUIREMENT (1)
Supported 64-bit Operating Systems, recent versions of
• Red Hat Enterprise Linux/Centos
• Oracle Enterprise Linux
• SUSE Linux Enterprise Server
• Debian
• Ubuntu

The latest versions of these browsers are supported


• Google Chrome
• Safari
• Firefox
• Internet Explorer
CM REQUIREMENT (2)
Supported JDKs Oracle JDK 1.7.0_55, 1.7.0_67 or higher, 1.8_31 or higher

MySQL 5.1, 5.5 and 5.6


Supported databases Oracle 11g Release 2, 12c
PostgreSQL 8.4, 9.2, and 9.3

CDP 7.0 or later


Supported Cloudera Distribution of CDH 6.0 or later
Hadoop CDH 5.0 or later (will be end-of-life, and therefore deprecated)

See http://tiny.cloudera.com/supported-versions for version specific details


• Cloudera Manager installation options
─ Package distribution (recommended)
─ Package repository available at
http://archive.cloudera.com/p/cm7/
─ Install using yum, zypper, or apt-get

INSTALLING ─ Binary distribution


─ Download binary from http://www.cloudera.com/downloads

CM ─ Run the installer from the command line


• After installation, complete subsequent configuration through the
Web UI
─ Access via http://<manager_host>:7180
• Documentation
─ For detailed information about installing Cloudera Manager, refer
to http://www.cloudera.com/documentation.html
BASIC CLUSTER CONFIGURATION
Basing your cluster growth on storage capacity is often a good method

Example:

─ Data grows by approximately 3TB per week

─ HDFS set up to replicate each block three times


CLUSTER
GROWTH ─ Therefore, 9TB of extra storage space required per week

─ Plus some overhead—say, 25% of all disk space

─ Assuming machines with 16 x 3TB hard drives, this equates to a new machine required
every four weeks
─ Alternatively: Two years of data—1.2 petabytes—will require iiiapproximately 35 machines
Typical Configurations For Worker Nodes

─ Midline: deep storage, 1Gb Ethernet

• 12 x 2TB SATA II hard drives, in a non-RAID, JBOD† configuration


• 1 or 2 of the 16 drives for the OS, with RAID-1 mirroring
• 2 x 8-core 2.0GHz CPUs, 15MB cache
NODES • 256GB RAM
• 2x1 Gigabit Ethernet

─ High-end: high memory, spindle dense, 10Gb Ethernet

• 24 x 2TB Nearline/MDL SAS hard drives, in a non-RAID, JBOD* configuration


• 2 x 12-core 3.0GHz CPUs, 15MB cache
• 512GB RAM (or more)
• 2x10 Gigabit Ethernet
CDP INSTALLATION OVERVIEW

01 02 03
Install Cloudera Specify the version Create a CDP
Manager Server of Cloudera Data cluster
Platform
CDP SOFTWARE DISTRIBUTION

• Cloudera Manager will install the selected Hadoop services on cluster nodes for you
• Installation is via packages or parcels
• Packages
─ RPM packages (Red Hat, CentOS), Ubuntu packages, etc.
─ Automatically create needed users/groups, init scripts
• Parcels (recommended)
─ Cloudera Manager-specific. All the benefits of packages, plus:
─ Allows easy upgrading with minimal down-time
─ Allows easy rolling upgrades (Cloudera Enterprise)
─ Installation of CDH without sudo – installation handled by the Cloudera Manager Agent
DISTRIBUTE THE DAEMONS

• Cloudera Manager CDP Installation Wizard Will Suggest Specific Hadoop Daemons Be Installed To Specific Cluster
Nodes
─ Based on available resources
─ It is easy to override suggestions
• Not All Daemons Run On Each Machine
─ NameNode, ResourceManager, JobHistoryServer (“master” daemons)
─ One per cluster, unless running in an HA configuration
─ Secondary NameNode
─ One per cluster in a non-HA environment
─ DataNodes, NodeManagers
─ On each data node in the cluster
─ Exception: for small clusters (less than 10 - 20 nodes), it is acceptable for more than one of the master daemons to
run on the same physical node
Managing Cloudera Manager
Accessing the Cloudera Manager Admin
Console

To access the Cloudera Manager Admin Console:


• Open the Cloudera Management Console in a Web browser
using the following URL:
http://<Cloudera Manager Server URL>:7180
• Enter your user name.
• Enter your password.
• Click the Sign In button.
The Cloudera Manager Admin Console opens.
Automatic Logout

For security purposes, Cloudera Manager automatically logs


out a user session after 30 minutes. You can change this
session logout period.
• Click Administration > Settings.
• Click Category > Security.
• Edit the Session Timeout property.
• Enter a Reason for change, and then click Save Changes to commit the changes.
Starting, Stopping, and Restarting the
Cloudera Manager Server

• To start the Cloudera Manager Server:


sudo service cloudera-scm-server start
• You can stop (for example, to perform maintenance on its host) or
restart the Cloudera Manager Server without affecting the other
services running on your cluster. Statistics data used by activity
monitoring and service monitoring will continue to be collected during
the time the server is down.
• To stop the Cloudera Manager Server:
sudo service cloudera-scm-server stop
• To restart the Cloudera Manager Server:
sudo service cloudera-scm-server restart
Managing User Roles
Cloudera Manager Authentication :

• Internal Authentication
• External Authentication

CLOUDERA User Roles :

MANAGER • Auditor
• Read-Only
USER • Operator
ACCOUNTS •

Cluster Administrator
BDR Administrator
• Navigator Administrator
• User Administrator
• Key Administrator
• Full Administrator
Default User
Roles
ADD USER

1. Adding an Internal User Account


2. Select Administration > Users.
3. Click the Add User button.
4. Enter a username and password.
5. In the Role drop-down menu, select a role for the new user.
6. Click Add.
CONSOLIDATED HEALTH MONITORING
MONITORING
CLUSTER
• To display a cluster Status page, click the cluster
name on the Home > Status tab Status tab.
• The cluster Status page displays a table
containing links to the Hosts page and the status
pages of the services running in the cluster.
ACTIVITY LAB
Starting, Stopping, and Restarting a Cluster
Starting a Cluster
On the Home > Status tab, click to the right of the cluster
name and select Start.
Click Start, Stop or Restart ,that appears in the next screen
to confirm. The Command Details window shows the
progress of starting services.

47
Stopping a Cluster
Stopping a Cluster
On the Home > Status tab, click to the right of the cluster name and
select Stop.
Click Stop in the confirmation screen. The Command Details
window shows the progress of stopping services.

48
Restarting a Cluster
Restarting a Cluster
On the Home > Status tab, click to the right of the cluster name and
select Restart.
Click Restart that appears in the next screen to confirm. If you have
enabled high availability for HDFS, you can choose Rolling Restart
instead to minimize cluster downtime. The Command Details
window shows the progress of stopping services.

49
Renaming a Cluster
Renaming a Cluster
On the Home > Status tab, click to the right of the cluster name and
select Rename Cluster.
Type the new cluster name and click Rename Cluster.

50
Viewing Host Status
Viewing Host Status
To display summary information about all the hosts managed by
Cloudera Manager, click Hosts > All Hosts in the left menu.
The All Hosts page displays with a list of all the hosts managed by
Cloudera Manager.

51
Viewing Host Role Assignments
Viewing Host Role Assignments
In the left menu, click Hosts > Roles.
Click a cluster name or All Clusters.

52
Host Templates
Creating a Host Templates
Click Hosts > Host Templates.
From the Host Templates page, click Create.
The Create New Host Template pop-up window appears.
Type a name for the template.
For each role, select the appropriate role group. There may be
multiple role groups for a given role type — you want to select the
one with the configuration that meets your needs.
Click Create to create the host template.

53
Host Templates
Editing a Host Template
Click Hosts > Host Templates.
Pull down the Actions menu for the template you want to modify,
and click Edit.
The Edit Host Template window appears. This page is identical to
the Create New Host Template page. You can modify the template
name or any of the role group selections.
Click OK when you have finished.

54
Adding a Host to a Cluster
Adding a Host to a Cluster
Click the Hosts tab.
Click the Add Hosts button.
Select Add hosts to cluster.
If the cluster uses Kerberos authentication, ensure that the Kerberos packages are installed on the new
hosts.
If necessary, use the package commands provided on the Add Hosts screen to install these packages.
Select the cluster where you want to add the host from the drop-down list.
Click Continue.

55
Adding a Host to a Cluster

56
Adding a Host to a Cluster
Adding a Host to a Cluster
Add a new host:
On the Specify Hosts page, enter a host name or pattern (click "using patterns" for more information) to
search for new hosts to add to the cluster.
A list of matching hosts displays.
Select the hosts that you want to add.
Click Continue.
Select the Repository Location where Cloudera Manager can find the software to install on the new hosts.
Select Public Cloudera Repository or Custom Repository and enter the URL of a custom repository available
on your local network.
Click Continue.
Follow the instructions in the wizard to install the Oracle JDK.
Enter Login Credentials:
Select root for the root account, or select Another user and enter the username for an account that has
password-less sudo privileges.

57
Adding a Host to a Cluster

58
Adding a Host to a Cluster
Adding a Host to a Cluster
Select an authentication method:
If you choose password authentication, enter and confirm the password.
If you choose public-key authentication, provide a passphrase and path to the required key files.
You can modify the default SSH port if necessary.
Specify the maximum number of host installations to run at once. The default and recommended
value is 10. You can adjust this based on your network capacity.
Click Continue.
The Install Agents page displays and Cloudera Manager installs the Agent software on the new hosts.
When the agent installation finishes, click Continue.

59
Adding a Host to a Cluster

60
Adding a Host to a Cluster

61
Deleting a Host from a Cluster
Deleting a Host from a Cluster
In the Cloudera Manager Admin Console, click the Hosts tab.
Select the hosts to delete.
Select Actions for Selected > Remove From Cluster. The Remove Hosts From Cluster dialog box
displays.
Leave the selections to decommission roles and skip removing the Cloudera Management Service
roles. Click Confirm to proceed with removing the selected hosts.

62
Deleting a Host from a Cluster

63
Stopping All the Roles on a Host
Stopping All the Roles on a Host
In the left menu, click Clusters > Hosts or Hosts > All Hosts.
Select one or more hosts on which to stop all roles.
Select Actions for Selected > Stop Roles on Hosts.

64
Stopping All the Roles on a Host
Stopping All the Roles on a Host
In the left menu, click Clusters > Hosts or Hosts > All Hosts.
Select one or more hosts on which to stop all roles.
Select Actions for Selected > Stop Roles on Hosts.

65
Starting All the Roles on a Host
Starting All the Roles on a Host
Click the Hosts tab.
Select one or more hosts on which to start all roles.
Select Actions for Selected > Start Roles on Hosts.

66
Starting All the Roles on a Host
Starting All the Roles on a Host
Click the Hosts tab.
Select one or more hosts on which to start all roles.
Select Actions for Selected > Start Roles on Hosts.

67
Starting, Stopping, and Restarting Role Instances
Starting, Stopping, and Restarting Role Instances
Go to the service that contains the role instances to start, stop, or restart.
Click the Instances tab.
Check the checkboxes next to the role instances to start, stop, or restart (such as a DataNode instance).
Select Actions for Selected > Start, Stop, or Restart, and then click Start, Stop, or Restart again to start the
process. When you see a Finished status, the process has finished.

68
Adding a Role Instance
Adding a Role Instance
Go to the service for which you want to add a role instance. For example, to add a Atlas Server role instance, go
to the Atlas service.
Click the Instances tab.
Click the Add Role Instances button.
Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the
hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same set of hosts to
which the Atlas Server role is assigned. You can reassign role instances.

Click a field below a role to display a dialog box containing a list of hosts. If you click a field containing multiple
hosts, you can also select All Hosts to assign the role to all hosts, or Custom to display the hosts dialog box.
The following shortcuts for specifying hostname patterns are supported:

69
Adding a Role Instance
Adding a Role Instance
The following shortcuts for specifying hostname patterns are supported:
Range of hostnames (without the domain portion)

IP addresses
Rack name
Click the View By Host button for an overview of the role assignment by hostname ranges.
Click Continue.
In the Review Changes page, review the configuration changes to be applied.
Confirm the settings entered for file system paths. The file paths required vary based on the services to be
installed.
Click Continue.

70
Adding a Role Instance

71
Deleting a Role Instance
Deleting a Role Instance
Click the service instance that contains the role instance you want to delete. For example, if you want to delete a
DataNode role instance, click an HDFS service instance.
Click the Instances tab.
Check the checkboxes next to the role instances you want to delete.
If the role instance is running, select Actions for Selected > Stop and click Stop to confirm the action.
Select Actions for Selected > Delete. Click Delete to confirm the deletion.

72
Deleting a Role Instance

73
Managing Cloudera Runtime Services

Cloudera Manager's Service Monitoring feature :


- Presents health and performance data in a variety of formats
including interactive charts
- Monitors metrics against configurable thresholds
- Generates events related to system and service health and critical
log entries and makes them available for searching and alerting
- Maintains a complete record of service-related actions and
configuration changes
Adding Service

1. On the Home > Status tab, click and select Add a Service
2. Select a service and click Continue
3. Select the services on which the new service should depend
4. Customize the assignment of role instance to host
5. Review and modify configuration setting
6. Click Continue then click Finish
7. Verify the new service started properly
Starting,Stopping a Cloudera Runtime
Service on All Hosts

Start
• In the left menu, click Clusters and select a service.
• Click to the right of the service name and select Start.
• Click Start in the next screen to confirm.
• When you see a Finished status, the service has started
Stop
• In the left menu, click Clusters and select a service.
• Click to the right of the service name and select Stop.
• Click Stop in the next screen to confirm.
• When you see a Finished status, the service has stopped
Terima Kasih

You might also like