DataOps Automation With Control-M Python Client
DataOps Automation With Control-M Python Client
How to Orchestrate
Data Workflows with the
Control-M Python Client
1
Table of Contents
01 Executive Summary
07 Summary
2
Executive Summary
Data science and engineering teams use many tools enables data engineers and developers to easily
today and spend a lot of time wrangling data and build automated execution into their workflows
managing data pipelines. In fact, data scientists as they develop them with Python and their other
spend more of their time on data preparation preferred development tools and environments.
(22%) than any other task, plus another 17% on
data cleansing according to the Anaconda 2021 Python has clearly emerged as the favorite
State of DataOps report.1 After these front-end development language – according to Anaconda’s
activities are completed and data teams finally get 2021 State of Data Science report, 34% of
to develop, deployment takes up another 11% of respondents said they always used Python in 2021.2
their time. That adds up to data scientists and Plus, 63% say they always or frequently use Python,
engineers spending half their time on data which compares to 27% for R (which ranked second).
and development pipeline activities. Imagine
what these teams could do if they could focus on
delivering data-driven insights one hundred percent
of their time.
Did You Know?
Managing dependencies and
To make the processes more convenient (and
environments is the top challenge
hopefully save time), organizations give these teams
for 23 percent of respondents to the
wide latitude to choose tools they are familiar
with. And because more data sources and related Anaconda 2021 State of Data Science
tools are available than ever before, there is more Report. Among developer respondents,
diversity (and more silos) across the enterprise 30% say it is their top challenge.
development environment. Unfortunately, that
means over-burdened operations staff need to learn
each tool and build documentation, standards, and
complete other activities for each.
3
Gartner “Gartner®, Data Engineering Essentials, Patterns and Best Practices”, May 2021. GARTNER is a registered trademark and service mark of
Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.
4
But the data engineering team does not want to focus on orchestration and automation, it wants to focus
on development. The operations team typically takes the lead on orchestration and automation, using
a standard enterprise platform. Control-M and Helix Control-M have already been helping organizations
innovate faster by automating and orchestrating enterprise workflows. This functionality extends across the
complete data pipeline, including orchestrating across hybrid and multi-cloud operations. The Python client
makes this and other functionality accessible to data engineers and developers using the language and tools
they are familiar with so they can build automated execution into their workflows as they develop them.
The Python client is open-source and available • Support for data-focused cloud services and
under a BSD license on GitHub. Anyone can write traditional technologies and applications
and test their workflows using the BMC Helix
Control-M sandbox or the Control-M Workbench, a • Predictive SLA management
self-service, standalone development sandbox that
includes an identical, 100% compatible Control-M • Self-service access to Control-M for engineers,
API that can be installed as a personal copy on developers, operations, and business users
premises, in a container, or in a cloud instance.
• Granular security and comprehensive
governance
5
The Python client provides access to key workflow • Automate workflow movement across
orchestration capabilities in Control-M. Some of development, test, and production
these features include: environments.
• Developers can build execution rules into their Control-M has integrations to PaaS services on
workflows, by taking a Jobs-as-Code approach AWS, Azure, and GCP, SAP (including HANA),
with REST APIs and JSON to accelerate Oracle E-Business Suite, Informatica, leading
application build, test, and validation times. databases, IBM InfoSphere DataStage and Cognos
This cuts costs and improves quality by finding Business Intelligence, and much more. BMC has
defects and bugs earlier in the software committed to increasing the release velocity for
development lifecycle. cloud services and other integrations. The Python
client adds support for all job types, including:
• Simplify and scale data pipelines by ingesting and
processing data from platforms like Hadoop, • AWS Glue
Spark, Amazon EMR, Snowflake, and Amazon
Redshift and get a 360-degree view of data • Azure Data Factory
pipelines at every stage—from ingestion, to
processing, to analytics. Control-M can automate • Databricks for AWS and Azure
and orchestrate data flows from all sources
including cloud, on-premises, data lakes, etc. • Google Cloud Dataflow
“Control-M and Helix Control-M automate every aspect of the big data
pipeline from a single point of control, from ingestion and data processing
to presenting it to an analytics layer, removing reliance on multiple point
solutions in various stages. They offer deep integration with the Hadoop
ecosystem, including support for HDFS, Spark, MapReduce, DistCp, Pig,
Hive, Sqoop, Tajo, Oozie, etc.”
6
How Does It Work?
Here is an overview of how to set up and use the 5. Create folder and jobs
Python client in Control-M and Helix Control-M.
6. Define dependencies
1. Install the library one time, using pip
7. Display dependencies in tabular or graphical
2. Import the functions you plan to use form to validate the flow
4. Log in to Control-M (or Helix Control-M) You can see a quick demo here (free registration
required).
Summary
The Python client for Control-M and BMC Helix Control-M provides automation and orchestration
Control-M was created to make life easier for data so application and data workflows can be developed
science, software development, and IT operations and deployed at scale. The Python client removes a
staff. It brings automation to an area that needs it. key source of friction between the development and
As the Anaconda 2021 State of Data Science Report operations environments at organizations that are
notes: already using Control-M or Helix Control-M.
The client helps organizations get more out of those
“Despite themes in the media suggesting platforms their data science staff and investments,
automation will ‘take over,’ it’s actually and their IT operations professionals. The Python
welcomed by data practitioners and isn’t client helps extend DataOps principles and
Control-M functionality to data operations, giving
seen as a competitor…. Automation is seen
data science specialists more time to spend on
as a complement to work. Data scientists
development, and helping operations teams deliver
aren’t worried because they recognize those innovations to the enterprise faster.
how many aspects of their job still require
expert human judgment that technology
can’t replicate.”
7
You can freely access Control-M Python client documentation here, see our blog for more
perspective, and visit the Control-M home page for lots of additional resources.
About BMC
BMC works with 86% of the Forbes Global 50 and customers and partners around the world to create their future. With our history of innovation,
industry-leading automation, operations, and service management solutions, combined with unmatched flexibility, we help organizations free up time
and space to become an Autonomous Digital Enterprise that conquers the opportunities ahead.
BMC, the BMC logo, and BMC’s other product names are the exclusive properties of BMC Software, Inc. or its affiliates, are registered or pending
registration with the U.S. Patent and Trademark Office, and may be registered or pending registration in other countries. All other trademarks *532535*
or registered trademarks are the property of their respective owners. ©Copyright 2022 BMC Software, Inc.