0% found this document useful (0 votes)
20 views

DataOps Automation With Control-M Python Client

Uploaded by

shaikhilyas.2007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

DataOps Automation With Control-M Python Client

Uploaded by

shaikhilyas.2007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

White Paper

How to Orchestrate
Data Workflows with the
Control-M Python Client

1
Table of Contents
01 Executive Summary

04 The Challenge: Going From Data Wrangling to Insights

05 Understanding the Python Client

07 How Does It Work?

07 Summary

2
Executive Summary
Data science and engineering teams use many tools enables data engineers and developers to easily
today and spend a lot of time wrangling data and build automated execution into their workflows
managing data pipelines. In fact, data scientists as they develop them with Python and their other
spend more of their time on data preparation preferred development tools and environments.
(22%) than any other task, plus another 17% on
data cleansing according to the Anaconda 2021 Python has clearly emerged as the favorite
State of DataOps report.1 After these front-end development language – according to Anaconda’s
activities are completed and data teams finally get 2021 State of Data Science report, 34% of
to develop, deployment takes up another 11% of respondents said they always used Python in 2021.2
their time. That adds up to data scientists and Plus, 63% say they always or frequently use Python,
engineers spending half their time on data which compares to 27% for R (which ranked second).
and development pipeline activities. Imagine
what these teams could do if they could focus on
delivering data-driven insights one hundred percent
of their time.
Did You Know?
Managing dependencies and
To make the processes more convenient (and
environments is the top challenge
hopefully save time), organizations give these teams
for 23 percent of respondents to the
wide latitude to choose tools they are familiar
with. And because more data sources and related Anaconda 2021 State of Data Science
tools are available than ever before, there is more Report. Among developer respondents,
diversity (and more silos) across the enterprise 30% say it is their top challenge.
development environment. Unfortunately, that
means over-burdened operations staff need to learn
each tool and build documentation, standards, and
complete other activities for each.

Now enterprises that use Control-M or BMC Helix


Control-M for application and data workflow
orchestration can use Python as the bridge to
faster workflow promotion, smoother execution
in production, and for enabling data development
scalability. BMC’s new Control-M Python client

¹ Anaconda “2021 State of Data Science” accessible at https://www.anaconda.com/state-of-data-science-2021.


2
Anaconda “2021 State of Data Science” accessible at https://www.anaconda.com/state-of-data-science-2021.
3
The Challenge: Going From Data
Wrangling to Insights
While the diversity of development and data
pipeline tools may be good for data scientists About Control-M
and developers, it presents integration and
Control-M is not as well known among data science
orchestration challenges for the operations team. professionals and software developers as it is in
The automation and management features available the IT operations world, where it was named by
for some data workflows can’t be applied to all. Enterprise Management Associates as the overall
That hinders orchestration and visibility, and leader in workflow automation for the sixth
ultimately limits deploy and production scalability. consecutive time in 2021 (see the details here).
Without automated orchestration, IT Ops needs to Control-M simplifies application and data workflow
find workarounds to view, schedule, and monitor orchestration, making it easy to define, schedule,
application and data workflows, manage the manage and monitor workflows, ensuring visibility
dependencies, arrange necessary file transfers and reliability, and improving SLAs. It can be
and ETL operations, issue alerts, and more. That deployed on premises or by SaaS, through BMC Helix
makes it difficult to manage the workflows and their Control-M.
dependencies effectively and to get a complete end-
Control-M integrates, automates, and orchestrates
to-end view of workflow status.
workflows on-premises, and in public and private
clouds, so jobs and business services are delivered
The sooner that orchestration is addressed in the on time, every time. With a single unified view,
development pipeline, the sooner workflows can users can orchestrate all their workflows, including
run in production. file transfers, applications, data sources and
infrastructure with a rich library of plug-ins. Easily
As Gartner® says: “Data orchestration provisioned in any cloud, Control-M leverages the
systems allow composing these complex ephemeral capabilities of cloud compute services.
workflows, scheduling, and executing
Visit our website to learn more about Control-M and
them reliably at scale in production. A BMC Helix Control-M, including their integrations,
large part of a data engineering team’s and how they support the Jobs-as-Code approach
focus should be on orchestration and to streamline development and operations.
automation, a prerequisite to building,
running and deploying data pipelines
at scale.” 3

3
Gartner “Gartner®, Data Engineering Essentials, Patterns and Best Practices”, May 2021. GARTNER is a registered trademark and service mark of
Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.
4
But the data engineering team does not want to focus on orchestration and automation, it wants to focus
on development. The operations team typically takes the lead on orchestration and automation, using
a standard enterprise platform. Control-M and Helix Control-M have already been helping organizations
innovate faster by automating and orchestrating enterprise workflows. This functionality extends across the
complete data pipeline, including orchestrating across hybrid and multi-cloud operations. The Python client
makes this and other functionality accessible to data engineers and developers using the language and tools
they are familiar with so they can build automated execution into their workflows as they develop them.

Understanding the Python Client


The new Python client allows data engineers and Control-M provides the ability to embed application
data scientists to leverage Python programming to and data workflow orchestration (via Python or
seamlessly interact with Control-M. They can use JSON) in your DevOps tool chain. The Python client
Python to build, test, and promote data workflows provides access to Control-M’s production-grade
through Control-M Automation API, which is a set orchestration, helping to eliminate silos in your
of programmatic interfaces that let developers and operational environment, using a programming
DataOps engineers use Control-M in a self-service language that is familiar and intuitive for developers
manner within the agile application release process. and engineers. Some of the key Control-M features
By connecting both the data and operations teams, include:
organizations can ensure visibility, improve service
level agreements (SLAs), and deliver data-driven • End-to-end visibility across the entire technology
outcomes faster—at scale—across hybrid and landscape, including multi-cloud and on-premises
multi-cloud environments. environments

The Python client is open-source and available • Support for data-focused cloud services and
under a BSD license on GitHub. Anyone can write traditional technologies and applications
and test their workflows using the BMC Helix
Control-M sandbox or the Control-M Workbench, a • Predictive SLA management
self-service, standalone development sandbox that
includes an identical, 100% compatible Control-M • Self-service access to Control-M for engineers,
API that can be installed as a personal copy on developers, operations, and business users
premises, in a container, or in a cloud instance.
• Granular security and comprehensive
governance

5
The Python client provides access to key workflow • Automate workflow movement across
orchestration capabilities in Control-M. Some of development, test, and production
these features include: environments.

• Developers can build execution rules into their Control-M has integrations to PaaS services on
workflows, by taking a Jobs-as-Code approach AWS, Azure, and GCP, SAP (including HANA),
with REST APIs and JSON to accelerate Oracle E-Business Suite, Informatica, leading
application build, test, and validation times. databases, IBM InfoSphere DataStage and Cognos
This cuts costs and improves quality by finding Business Intelligence, and much more. BMC has
defects and bugs earlier in the software committed to increasing the release velocity for
development lifecycle. cloud services and other integrations. The Python
client adds support for all job types, including:
• Simplify and scale data pipelines by ingesting and
processing data from platforms like Hadoop, • AWS Glue
Spark, Amazon EMR, Snowflake, and Amazon
Redshift and get a 360-degree view of data • Azure Data Factory
pipelines at every stage—from ingestion, to
processing, to analytics. Control-M can automate • Databricks for AWS and Azure
and orchestrate data flows from all sources
including cloud, on-premises, data lakes, etc. • Google Cloud Dataflow

• Intelligently move internal and external file • Google Cloud Functions


transfers from a central interface.
The Python client also bridges developers’
• The ability to visualize and test workflows prior favored environments and tools to the enterprise
to deployment. The Control-M Workbench or application and data workflow orchestration
Helix Control-M sandbox instance provide all platform.
the functionality needed for experimentation
and testing.

“Control-M and Helix Control-M automate every aspect of the big data
pipeline from a single point of control, from ingestion and data processing
to presenting it to an analytics layer, removing reliance on multiple point
solutions in various stages. They offer deep integration with the Hadoop
ecosystem, including support for HDFS, Spark, MapReduce, DistCp, Pig,
Hive, Sqoop, Tajo, Oozie, etc.”

Enterprise Management Associates | 2021 Workload Automation EMA Radar Report

6
How Does It Work?
Here is an overview of how to set up and use the 5. Create folder and jobs
Python client in Control-M and Helix Control-M.
6. Define dependencies
1. Install the library one time, using pip
7. Display dependencies in tabular or graphical
2. Import the functions you plan to use form to validate the flow

3. Specify folder attributes 8. Submit to Control-M (or Helix Control-M)

4. Log in to Control-M (or Helix Control-M) You can see a quick demo here (free registration
required).

Summary
The Python client for Control-M and BMC Helix Control-M provides automation and orchestration
Control-M was created to make life easier for data so application and data workflows can be developed
science, software development, and IT operations and deployed at scale. The Python client removes a
staff. It brings automation to an area that needs it. key source of friction between the development and
As the Anaconda 2021 State of Data Science Report operations environments at organizations that are
notes: already using Control-M or Helix Control-M.
The client helps organizations get more out of those
“Despite themes in the media suggesting platforms their data science staff and investments,
automation will ‘take over,’ it’s actually and their IT operations professionals. The Python
welcomed by data practitioners and isn’t client helps extend DataOps principles and
Control-M functionality to data operations, giving
seen as a competitor…. Automation is seen
data science specialists more time to spend on
as a complement to work. Data scientists
development, and helping operations teams deliver
aren’t worried because they recognize those innovations to the enterprise faster.
how many aspects of their job still require
expert human judgment that technology
can’t replicate.”

7
You can freely access Control-M Python client documentation here, see our blog for more
perspective, and visit the Control-M home page for lots of additional resources.

About BMC
BMC works with 86% of the Forbes Global 50 and customers and partners around the world to create their future. With our history of innovation,
industry-leading automation, operations, and service management solutions, combined with unmatched flexibility, we help organizations free up time
and space to become an Autonomous Digital Enterprise that conquers the opportunities ahead.

BMC—Run and Reinvent www.bmc.com

BMC, the BMC logo, and BMC’s other product names are the exclusive properties of BMC Software, Inc. or its affiliates, are registered or pending
registration with the U.S. Patent and Trademark Office, and may be registered or pending registration in other countries. All other trademarks *532535*
or registered trademarks are the property of their respective owners. ©Copyright 2022 BMC Software, Inc.

You might also like