Cloudera Data Platform
Cloudera Data Platform
Cloudera Data Platform
PLATFORM
Introduction
The functions of the core technologies of
Hadoop
How Cloudera Manager simplifies Hadoop
COURSE installation and administration
How to deploy a Hadoop cluster using
OBJECTIVE Cloudera Manager
How to plan your Hadoop cluster hardware
and software
How to maintain your cluster
BI Presentation
Big Data
e
um
Data Source
ns
Co
ETL
ET
e
L
Data Warehouse um
ns
Co
Big Data 3 V’s
Volume
Jumlah data yang sangatbesar Variety
Data yang disimpanberbagai
macam format
Velocity
Kecepatan proses datasangat
cepat
Hadoop’s Origin
G
F
S
Google runs
Increased internetusage, The same amount MapReduceoperation
lots of new website and increased by20000PB Called as GFS(Google
searches every day in2009 File System)
Typical Hadoop Component
Flatfile
Streaming &Event
Analytic and Machine Learning Processing Analytic
Cloud Dashboard
Data Governance
NoSQL
Public Cloud
IoT
Security Automation
Realtime
Apps Dashboard
Distributed Storage
HADOOP EVOLUTION
EVOLUTION TO THE HYBRID ARCHITECTURE
Disaggregated Storage/Compute + SaaS
Experiences
On Premise Public Cloud Multi-Public/Private/Hybrid Cloud
Streaming & Data Data Operational Machine Streaming & Data Data Operational Machine
WORKLOAD Data Flow Engineering Warehouse Database Learning Data Flow Engineering Warehouse Database Learning
S
CLUSTER Streaming & Data Data Operational Machine EXPERIENCES Streaming & Data Data Operational Machine
METADATA | CATALOG | SCHEMA Data Flow Engineering Warehouse Database Learning Data Flow Engineering Warehouse Database Learning
S
SECURIT Streaming & Data Data Operational Machine
Y Data Flow Engineering Warehouse Database Learning
VIRTUAL
STORAG STORAG STORAG CONTAINER
MACHINE
E E S E S
The leader in Apache Founded by Hadoop experts Provides support, consulting, Staff includes committers to Many authors of
Hadoop-based software and from Facebook, Yahoo, training, and certification for virtually all Hadoop projects authoritative books on
services Google, and Oracle Hadoop users Apache Hadoop projects
CDP Private Cloud Base
The two best open-source data analytics platforms fused together with the
addition of new capabilities
• Cloudera Manager 7.1 • HBase 2.2 • Key HSM 7.1 • Kafka Schema Registry 0.8
• Hadoop 3.1 • Phoenix 5.0 • Knox 1.3 • Streams Messaging Mgr 1.0
• Spark 2.4 • Kudu 1.12 • Livy 0.7 • Streams Replication Mgr 2.1
• Hive 3.1 • Sqoop 1.4.7 • Navigator Encrypt 7.1 • Ozone (Beta) 0.6
• Impala 3.2 • Parquet 1.10 • Ranger KMS 7.1 • Kafka Connect 2.4
• Oozie 5.1 • Avro 1.8 • Zeppelin • Cruise Control 2.0
• Hue 4.5 • ORC 1.5 • Hive Warehouse Connector 1.0 • Tez 0.9
• Ranger 2.0 • Zookeeper 3.5 • Kafka 2.4 • Key Trustee Server 7
• Atlas 2.0 • Solr 8.4
Install the CDH Parcel services or RPMs for the functionality required on each host in the
cluster
CM REQUIREMENT (1)
Supported 64-bit Operating Systems, recent versions of
• Red Hat Enterprise Linux/Centos
• Oracle Enterprise Linux
• SUSE Linux Enterprise Server
• Debian
• Ubuntu
Example:
─ Assuming machines with 16 x 3TB hard drives, this equates to a new machine required
every four weeks
─ Alternatively: Two years of data—1.2 petabytes—will require iiiapproximately 35 machines
Typical Configurations For Worker Nodes
01 02 03
Install Cloudera Specify the version Create a CDP
Manager Server of Cloudera Data cluster
Platform
CDP SOFTWARE DISTRIBUTION
• Cloudera Manager will install the selected Hadoop services on cluster nodes for you
• Installation is via packages or parcels
• Packages
─ RPM packages (Red Hat, CentOS), Ubuntu packages, etc.
─ Automatically create needed users/groups, init scripts
• Parcels (recommended)
─ Cloudera Manager-specific. All the benefits of packages, plus:
─ Allows easy upgrading with minimal down-time
─ Allows easy rolling upgrades (Cloudera Enterprise)
─ Installation of CDH without sudo – installation handled by the Cloudera Manager Agent
DISTRIBUTE THE DAEMONS
• Cloudera Manager CDP Installation Wizard Will Suggest Specific Hadoop Daemons Be Installed To Specific Cluster
Nodes
─ Based on available resources
─ It is easy to override suggestions
• Not All Daemons Run On Each Machine
─ NameNode, ResourceManager, JobHistoryServer (“master” daemons)
─ One per cluster, unless running in an HA configuration
─ Secondary NameNode
─ One per cluster in a non-HA environment
─ DataNodes, NodeManagers
─ On each data node in the cluster
─ Exception: for small clusters (less than 10 - 20 nodes), it is acceptable for more than one of the master daemons to
run on the same physical node
Managing Cloudera Manager
Accessing the Cloudera Manager Admin
Console
• Internal Authentication
• External Authentication
MANAGER • Auditor
• Read-Only
USER • Operator
ACCOUNTS •
•
Cluster Administrator
BDR Administrator
• Navigator Administrator
• User Administrator
• Key Administrator
• Full Administrator
Default User
Roles
ADD USER
47
Stopping a Cluster
Stopping a Cluster
On the Home > Status tab, click to the right of the cluster name and
select Stop.
Click Stop in the confirmation screen. The Command Details
window shows the progress of stopping services.
48
Restarting a Cluster
Restarting a Cluster
On the Home > Status tab, click to the right of the cluster name and
select Restart.
Click Restart that appears in the next screen to confirm. If you have
enabled high availability for HDFS, you can choose Rolling Restart
instead to minimize cluster downtime. The Command Details
window shows the progress of stopping services.
49
Renaming a Cluster
Renaming a Cluster
On the Home > Status tab, click to the right of the cluster name and
select Rename Cluster.
Type the new cluster name and click Rename Cluster.
50
Viewing Host Status
Viewing Host Status
To display summary information about all the hosts managed by
Cloudera Manager, click Hosts > All Hosts in the left menu.
The All Hosts page displays with a list of all the hosts managed by
Cloudera Manager.
51
Viewing Host Role Assignments
Viewing Host Role Assignments
In the left menu, click Hosts > Roles.
Click a cluster name or All Clusters.
52
Host Templates
Creating a Host Templates
Click Hosts > Host Templates.
From the Host Templates page, click Create.
The Create New Host Template pop-up window appears.
Type a name for the template.
For each role, select the appropriate role group. There may be
multiple role groups for a given role type — you want to select the
one with the configuration that meets your needs.
Click Create to create the host template.
53
Host Templates
Editing a Host Template
Click Hosts > Host Templates.
Pull down the Actions menu for the template you want to modify,
and click Edit.
The Edit Host Template window appears. This page is identical to
the Create New Host Template page. You can modify the template
name or any of the role group selections.
Click OK when you have finished.
54
Adding a Host to a Cluster
Adding a Host to a Cluster
Click the Hosts tab.
Click the Add Hosts button.
Select Add hosts to cluster.
If the cluster uses Kerberos authentication, ensure that the Kerberos packages are installed on the new
hosts.
If necessary, use the package commands provided on the Add Hosts screen to install these packages.
Select the cluster where you want to add the host from the drop-down list.
Click Continue.
55
Adding a Host to a Cluster
56
Adding a Host to a Cluster
Adding a Host to a Cluster
Add a new host:
On the Specify Hosts page, enter a host name or pattern (click "using patterns" for more information) to
search for new hosts to add to the cluster.
A list of matching hosts displays.
Select the hosts that you want to add.
Click Continue.
Select the Repository Location where Cloudera Manager can find the software to install on the new hosts.
Select Public Cloudera Repository or Custom Repository and enter the URL of a custom repository available
on your local network.
Click Continue.
Follow the instructions in the wizard to install the Oracle JDK.
Enter Login Credentials:
Select root for the root account, or select Another user and enter the username for an account that has
password-less sudo privileges.
57
Adding a Host to a Cluster
58
Adding a Host to a Cluster
Adding a Host to a Cluster
Select an authentication method:
If you choose password authentication, enter and confirm the password.
If you choose public-key authentication, provide a passphrase and path to the required key files.
You can modify the default SSH port if necessary.
Specify the maximum number of host installations to run at once. The default and recommended
value is 10. You can adjust this based on your network capacity.
Click Continue.
The Install Agents page displays and Cloudera Manager installs the Agent software on the new hosts.
When the agent installation finishes, click Continue.
59
Adding a Host to a Cluster
60
Adding a Host to a Cluster
61
Deleting a Host from a Cluster
Deleting a Host from a Cluster
In the Cloudera Manager Admin Console, click the Hosts tab.
Select the hosts to delete.
Select Actions for Selected > Remove From Cluster. The Remove Hosts From Cluster dialog box
displays.
Leave the selections to decommission roles and skip removing the Cloudera Management Service
roles. Click Confirm to proceed with removing the selected hosts.
62
Deleting a Host from a Cluster
63
Stopping All the Roles on a Host
Stopping All the Roles on a Host
In the left menu, click Clusters > Hosts or Hosts > All Hosts.
Select one or more hosts on which to stop all roles.
Select Actions for Selected > Stop Roles on Hosts.
64
Stopping All the Roles on a Host
Stopping All the Roles on a Host
In the left menu, click Clusters > Hosts or Hosts > All Hosts.
Select one or more hosts on which to stop all roles.
Select Actions for Selected > Stop Roles on Hosts.
65
Starting All the Roles on a Host
Starting All the Roles on a Host
Click the Hosts tab.
Select one or more hosts on which to start all roles.
Select Actions for Selected > Start Roles on Hosts.
66
Starting All the Roles on a Host
Starting All the Roles on a Host
Click the Hosts tab.
Select one or more hosts on which to start all roles.
Select Actions for Selected > Start Roles on Hosts.
67
Starting, Stopping, and Restarting Role Instances
Starting, Stopping, and Restarting Role Instances
Go to the service that contains the role instances to start, stop, or restart.
Click the Instances tab.
Check the checkboxes next to the role instances to start, stop, or restart (such as a DataNode instance).
Select Actions for Selected > Start, Stop, or Restart, and then click Start, Stop, or Restart again to start the
process. When you see a Finished status, the process has finished.
68
Adding a Role Instance
Adding a Role Instance
Go to the service for which you want to add a role instance. For example, to add a Atlas Server role instance, go
to the Atlas service.
Click the Instances tab.
Click the Add Role Instances button.
Customize the assignment of role instances to hosts. The wizard evaluates the hardware configurations of the
hosts to determine the best hosts for each role. The wizard assigns all worker roles to the same set of hosts to
which the Atlas Server role is assigned. You can reassign role instances.
Click a field below a role to display a dialog box containing a list of hosts. If you click a field containing multiple
hosts, you can also select All Hosts to assign the role to all hosts, or Custom to display the hosts dialog box.
The following shortcuts for specifying hostname patterns are supported:
69
Adding a Role Instance
Adding a Role Instance
The following shortcuts for specifying hostname patterns are supported:
Range of hostnames (without the domain portion)
IP addresses
Rack name
Click the View By Host button for an overview of the role assignment by hostname ranges.
Click Continue.
In the Review Changes page, review the configuration changes to be applied.
Confirm the settings entered for file system paths. The file paths required vary based on the services to be
installed.
Click Continue.
70
Adding a Role Instance
71
Deleting a Role Instance
Deleting a Role Instance
Click the service instance that contains the role instance you want to delete. For example, if you want to delete a
DataNode role instance, click an HDFS service instance.
Click the Instances tab.
Check the checkboxes next to the role instances you want to delete.
If the role instance is running, select Actions for Selected > Stop and click Stop to confirm the action.
Select Actions for Selected > Delete. Click Delete to confirm the deletion.
72
Deleting a Role Instance
73
Managing Cloudera Runtime Services
1. On the Home > Status tab, click and select Add a Service
2. Select a service and click Continue
3. Select the services on which the new service should depend
4. Customize the assignment of role instance to host
5. Review and modify configuration setting
6. Click Continue then click Finish
7. Verify the new service started properly
Starting,Stopping a Cloudera Runtime
Service on All Hosts
Start
• In the left menu, click Clusters and select a service.
• Click to the right of the service name and select Start.
• Click Start in the next screen to confirm.
• When you see a Finished status, the service has started
Stop
• In the left menu, click Clusters and select a service.
• Click to the right of the service name and select Stop.
• Click Stop in the next screen to confirm.
• When you see a Finished status, the service has stopped
Terima Kasih