IBM Cloud Pak For Data Express Parts On SNO

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Deploying IBM Cloud Pak for

Data Express on Single Node


OpenShift Cluster

© Copyright International Business Machines Corporation 2023


Overview
Red Hat OpenShift Container Platform offers a single platform to build, deploy, and manage
applications consistently across hybrid cloud and multi-cloud infrastructure. Earlier, the smallest
supported cluster size (including control and worker nodes) was the three-node cluster. As of OpenShift
4.9, we can deploy a full OpenShift deployment in a single node, offering both control and worker node
capabilities in a single server. This is beneficial for users who have adopted Kubernetes at their central
management sites and who wish to have independent Kubernetes clusters at edge sites can deploy this
smaller OpenShift footprint.

IBM Cloud Pak for Data (CPD) is a fully-integrated data and AI platform, built on top of Red Hat
OpenShift container platform and can run on your private, on-premises cluster or an OpenShift
deployment on cloud. CPD simplifies data access, automates data discovery and curation, and safeguards
sensitive information by automating policy enforcement for all users in your organization. It helps
connect to your data, govern it, find it, and use it for analysis.

Value Proposition & Benefits


• Ideal solution for developers who want to test out Cloud Pak for data Express (CPD) in a small-
scale environment before deploying them on to a larger cluster.
• Provides smallest overhead for installing CPD on OpenShift in terms of physical hardware
(control/worker node capabilities in a single server).
• Suitable for PoC/Demo setups of CPD with smaller set of deployed assemblies that are
anticipated to be short-lived, and therefore resilience is not a major issue.
• CPD services that are independent of any other prerequisites may benefit with a stand-alone
installation on a resource constrained SNO cluster.
• SNO provides many of the same features as the full-fledged OpenShift cluster (web-based console
that allows users to manage their applications, including deploying, monitoring etc.)

Note: The major tradeoff with an installation on a single node is the lack of high availability. In
environments that require high availability, it is recommended that the architecture be configured in a
way in which if the hardware was to fail, those workloads are transitioned to other sites or nodes while
the impacted node is recovered.

Minimum Resource Requirements


For a production grade server, the minimum resource requirements for SNO, sufficient to run
OpenShift Container Platform services and a production workload. As given on Red Hat official site:

© Copyright International Business Machines Corporation 2023


Prerequisites for installation
This edition has been verified on the following software versions:
• RedHat OpenShift: 4.10
• Cloud Pak for Data Express and addons: 4.6.4
• Platform : AWS
Other prerequisites:
• AWS IAM temporary admin Access & Secret key permissions
• A public domain configured in Route 53
• The target region must have available quota
• OpenShift installer file for specific OCP version
• Red Hat account pull secret
Create a Public Hosted Zone
To install OpenShift Container Platform, we must need to register a Domain Name. For that you can
use Route53 service in AWS or any other domain name. When we register a domain with Route 53, AWS
create a hosted zone for you automatically.
Download OpenShift Installer

Create a Red Hat OpenShift account. Open the hybrid console (https://console.redhat.com/openshift/)
and select Cloud option from the cluster tab and create cluster by selecting AWS option. Select Linux

© Copyright International Business Machines Corporation 2023


option in the dropdown and copy the link, Right click on the "Download installer" button and copy the
link. Download the installer using wget command. Unzip OpenShift installer.

Installation
There are different choices to perform installation of the single node OpenShift cluster. Here we
explore one simple way we have been using to deploy an SNO from scratch on AWS. The OpenShift
installer will provision following resources:

• Virtual private cloud (VPC) that spans three Availability Zones (private and one public subnet in
each Availability Zone).
• Single EC2 compute/worker node instance.
• Internet gateway to provide internet access to each subnet.
• Load Balancer for access to the OpenShift API.

Again, it’s important to note that running OpenShift with only a single node is not recommended for
production environments due to the lack of high availability and scalability.
Note: RedHat does not claim support for single node OpenShift clusters on cloud providers.
Deploy The OpenShift Cluster
Our environment is now ready for OpenShift cluster installation on AWS Cloud Infrastructure.

© Copyright International Business Machines Corporation 2023


q Generate Install config file using following command.

1. For <installation_directory>, specify the directory name to store the files that the installation
program creates. After you run the command, it will prompt you for a few inputs
• SSH public key
• Cloud Provider: Select the cloud provider from the listed items
• Region: Select the Region where you want to deploy the cluster
• Base Domain: select from the dropdown
• Cluster Name: provide a unique name.
• Red Hat Pull Secret

q Edit the install-config.yaml.

1. The cluster domain name

© Copyright International Business Machines Corporation 2023


2. Set the compute replicas to 0. This makes the control plane node schedulable. Set the control
Plane replicas to 1. In conjunction with the previous compute setting, this setting ensures the
cluster runs on a single node.
3. Specify the instance type for the EC2 , This parameter sets the vCPU and Memory of the
instance
4. The AWS region where cluster will be deployed
5. The cluster network plugin to install.
6. Copy the pull secret from the Red Hat OpenShift Cluster Manager and add the contents to this
configuration setting.

q Run the installation program. Execute the following command to create a cluster.

1. For <installation_directory>, specify the location of your customized ./install-config.yaml file.


2. To view different installation details, specify warn, debug, or error instead of info

During installation, installation program will create a temporary extra bootstrap node which will get
automatically torn down by the installer when installation is done, leaving you with a single-node
OpenShift installation. The process might take around 20 minutes to get the cluster up and running.
Do not delete the installation program or the files that the installation program creates. Both are
required to delete the cluster.
When the cluster deployment completes, directions for accessing your cluster, including a link to its web
console and credentials for the kubeadmin user, displays as below in your terminal.

Configuring Persistent Storage


For configuring persistent storage for the cluster, we are going to make use of Amazon Elastic File
System (EFS). The steps which we need to follow to setup EFS are:
q Create an EFS file system on the same VPC as that of ec2 and note the filesystem DNS name.
q Create Storage Class
Apply yaml files for:
• Configuring authorization for EFS volumes
• Create the EFS provisioner

© Copyright International Business Machines Corporation 2023


• Create the EFS StorageClass
To verify whether the storage has been properly configured we can execute the oc get sc command. In
the output look for the following storage classes
File storage: efs-nfs-client
Block storage: gp2-csi or gp3-csi
Execute oc get pods –n default to check for nfs-client provisioner pod.

IBM Cloud Pak for Data Express Parts


IBM Cloud Pak for Data Express is a set of three pre-built, pre-sized offerings designed to address
problems in cataloging, analyzing and integrating data. Express offerings give you a choice of three
popular data fabric starting points: IBM Data Governance Express for a data catalog, IBM ELT Pushdown
Express for data pipelines or IBM Data Science and MLOps Express for analytics and modeling. Each
provides pre-sized, pre-selected services designed to address a current data fabric need. All three
solutions are built on the IBM Cloud Pak for Data framework and the Red Hat OpenShift Container
Platform.
1. Data Governance Express
Data Governance Express augments its catalog by discovering and classifying data assets with
automation and enforcing your organization’s security controls to help protect sensitive data from
prying eyes.
Refer to their respective documentation links for installation instructions.
• Analytics Engine : https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=spark-
installing
• WKC core : https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=catalog-installing

2. ELT Pushdown Express


ETL (Extract, Transform, Load) which is first loading raw data into a data warehouse and then
transforming it into a finished state that’s ready for reporting and analytics.
Refer to their respective documentation links for installation instructions.
• DataStage: https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=datastage-
installing
• Watson Pipelines: https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=pipelines-
installing

3. Data Science and MLOps Express

© Copyright International Business Machines Corporation 2023


Data science covers simple exploration and visualization through to modeling and AI training. MLOps
(Machine Learning Ops) covers capabilities to help an organization use models in production.
Mandatory Services: WSL, WML, WOS

• Watson Studio: https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=ws-installing


• Watson Machine Learning: https://www.ibm.com/docs/en/cloud-paks/cp-
data/4.6.x?topic=learning-installing
• Watson OpenScale: https://www.ibm.com/docs/en/SSQNUZ_4.6.x/svc-openscale/openscale-
svc-install.html

Optional Services - Analytics Engine, SPSS Modeler, Decision Optimization, Watson Pipelines, RStudio

• Analytics Engine: https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=spark-


installing
• SPSS Modeler: https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=modeler-
installing
• Decision Optimization: https://www.ibm.com/docs/en/cloud-paks/cp-
data/4.6.x?topic=optimization-installing
• Watson Pipelines: https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=pipelines-
installing
• RStudio: https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=runtimes-installing

Note: Depending on number of services to be installed, it may be needed to increase the pod limit
(default maximum pods per node is 250). Refer below:
https://docs.openshift.com/container-platform/4.12/nodes/nodes/nodes-nodes-managing-max-
pods.html
This can cause the node to reboot, so it would be better to do this before starting CPD installation.

© Copyright International Business Machines Corporation 2023


vCPU and Memory requests on installation
CPD version: 4.6.4
OCP version: 4.10
Services CPU Memory
Requests requests

units (vCPU) (GB)

IBM Data Governance: 24 69


WKC core
Analytics Engine
ELT Pushdown Express: 27 75
DataStage
Watson Pipelines

Data science and MLOps #1: 31 90


Watson Studio
WML
Watson Openscale
Auto AI
Jupyter notebook
Data science and MLOps #2: 23 82
Watson Studio
WML
SPSS Modeler
Auto AI
Jupyter NB
Data science and MLOps #3: 21 65
Watson Studio
WML
Watson Pipelines
Auto AI
Jupyter NB

Data science and MLOps #4: 20 64


Watson Studio
WML
Decision Optimization

© Copyright International Business Machines Corporation 2023


Uninstalling The Cluster
Pre-requisites:
• You have a copy of the installation program that you used to deploy the cluster.
• You have the files that the installation program generated when you created your cluster.

1. For <installation_directory>, specify the path to the directory that you stored the installation
files in.
2. To view different details, specify warn, debug, or error instead of info

Conclusion
In conclusion, SNO is a cost-effective tool for developers and system administrators who want to
deploy Cloud Pak for Data Express in a small-scale environment. It provides a way to make use of
various CPD services with minimum resource overhead for running smaller, fault-tolerant, and short-
term projects. Overall, SNO is a viable option for anyone who wants to test their applications in a
smaller environment or manage a small number of applications without the need for a full-fledged
cluster.

References:
• Single Node OpenShift RedHat documentation: https://docs.openshift.com/container-
platform/4.12/installing/installing_sno/install-sno-installing-sno.html
• Installing a cluster on AWS with customizations: https://docs.openshift.com/container-
platform/4.10/installing/installing_aws/installing-aws- customizations.html
• Configuring persistent storage: https://access.redhat.com/documentation/en-
us/openshift_container_platform/4.3/html/storage/configuring-persistent-storage

© Copyright International Business Machines Corporation 2023


Thanks to the colleagues of Cloud Pak for Data Platform Engineering team for the support.

Authors
This paper was produced by IBM Cloud Pak for Data Multi-Cloud team led by Anuj Sharma.
Contributed by the following people.

Vamshidhar Cholleti is a Senior Software Developer from IBM-ISL (India


Software Labs). He has been working with the Cloud Pak for Data Team
since Dec 2020. He worked on various CPD Deployments on Cloud, PoCs
and had working experience in Multiple Cloud Environments like AWS,
Azure and IBM Cloud.

Aparna Sreekumar, Software Developer for Cloud Pak for Data Multi-
Cloud team, India Software Labs (ISL). Working with IBM since August
2022.

Gayathri A D, Software Developer for Cloud Pak for Data Multi-Cloud


team, India Software Labs (ISL). Working with IBM since August 2022.

© Copyright International Business Machines Corporation 2023


Notices

This information was developed for products and services offered in the U.S.A. IBM may not offer the
products, services, or features discussed in this document in other countries. Consult your local IBM
representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product, program, or service
that does not infringe any IBM intellectual property right may be used instead. However, it is the user's
responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this
document. The furnishing of this document does not give you any license to these patents. You can
send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785 U.S.A.
For license inquiries regarding double-byte character set (DBCS) information, contact the IBM
Intellectual Property Department in your country or send inquiries, in writing, to:
Intellectual Property Licensing
Legal and Intellectual Property Law
IBM Japan, Ltd.
19-21, Nihonbashi-Hakozakicho, Chuo-ku
Tokyo 103-8510, Japan

The following paragraph does not apply to the United Kingdom or any other country where such
provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS"
WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.
Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore
this statement might not apply to you. This information could include technical inaccuracies or
typographical errors. Changes are periodically made to the information herein; these changes will be
incorporated in new editions of the publication. IBM may make improvements and/or changes in the
product(s) and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not
in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not
part of the materials for this IBM product and use of those Web sites is at your own risk.

© Copyright International Business Machines Corporation 2023


IBM may use or distribute any of the information you supply in any way it believes appropriate without
incurring any obligation to you.
Licensees of this program who wish to have information about it for the purpose of enabling: (i) the
exchange of information between independently created programs and other programs (including this
one) and (ii) the mutual use of the information which has been exchanged, should contact:

IBM Corporation
2Z4A/101
11400 Burnet Road
Austin, TX 78758 U.S.A.

Such information may be available, subject to appropriate terms and conditions, including in some
cases payment of a fee.
The licensed program described in this document and all licensed material available for it are provided
by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement
or any equivalent agreement between us.
Any performance data contained herein was determined in a controlled environment. Therefore, the
results obtained in other operating environments may vary significantly. Some measurements may
have been made on development-level systems and there is no guarantee that these measurements
will be the same on generally available systems. Furthermore, some measurement may have been
estimated
through extrapolation. Actual results may vary. Users of this document should verify the applicable
data for their specific environment.
Information concerning non-IBM products was obtained from the suppliers of those products, their
published announcements or other publicly available sources. IBM has not tested those products and
cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM
products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of
those products. All statements regarding IBM's future direction or intent are subject to change or
withdrawal without notice and represent goals and objectives only. All IBM prices shown are IBM's
suggested retail prices, are current and are subject to change without notice. Dealer prices may vary.
This information is for planning purposes only. The information herein is subject to change before the
products described become available. This information contains examples of data and reports used in
daily business operations. To illustrate them as completely as possible, the examples include the
names of individuals, companies, brands, and products. All of these names are fictitious and any
similarity to the names and addresses used by an actual business enterprise is entirely coincidental

COPYRIGHT LICENSE
This information contains sample application programs in source language, which illustrate

© Copyright International Business Machines Corporation 2023


programming techniques on various operating platforms. You may copy, modify, and distribute these
sample programs in any form without payment to IBM, for the purposes of developing, using,
marketing or distributing application programs conforming to the application programming interface
for the operating platform for which the sample programs are written. These examples have not been
thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability,
serviceability, or function of these programs. You may copy, modify, and distribute these sample
programs in any form without payment to IBM for the purposes of developing, using, marketing, or
distributing application programs conforming to IBM's application programming interfaces. Each copy
or any portion of these sample programs or any derivative work, must include a copyright notice as
follows:
© (your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. ©
Copyright IBM Corp. _enter the year or years_. All rights reserved.
If you are viewing this information in softcopy form, the photographs and color illustrations might not
be displayed.

Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at
Copyright and trademark information at ibm.com/legal/copytrade.shtml.
Adobe, Acrobat, PostScript and all Adobe-based trademarks are either registered trademarks or
trademarks of Adobe Systems Incorporated in the United States, other countries, or both. IT
Infrastructure Library is a registered trademark of the Central Computer and Telecommunications
Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel
Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and
Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United
States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries,
or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft
Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a
registered community trademark of the Office of Government Commerce, and is registered in the U.S.
Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States
and other countries. Java and all Java-based trademarks and logos are trademarks or registered
trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer
Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp.
and Quantum in the U.S. and other countries.

Statement of Good Security Practices


IT system security involves protecting systems and information through prevention, detection and

© Copyright International Business Machines Corporation 2023


response to improper access from within and outside your enterprise. Improper access can result in
information being altered, destroyed, misappropriated or misused or can result in damage to or misuse
of your systems, including for use in attacks on others. No IT system or product should be considered
completely secure and no single product, service or security measure can be completely effective in
preventing improper use or access. IBM systems, products and services are designed to be part of a
comprehensive security approach, which will necessarily involve additional operational procedures,
and may require other systems, products or services to be most effective. IBM DOES NOT WARRANT
THAT ANY SYSTEMS, PRODUCTS OR SERVICES ARE IMMUNE FROM, OR WILL MAKE YOUR
ENTERPRISE IMMUNE FROM, THE MALICIOUS OR ILLEGAL CONDUCT OF ANY PARTY.

© International Business Machines Corporation 2020


International Business Machines Corporation
New Orchard Road Armonk, NY 10504
Produced in the United States 06-20
All Rights Reserved
References in this publication to IBM products and services do not imply that IBM intends to make
them available in all countries in which IBM operates

© Copyright International Business Machines Corporation 2023

You might also like