AISTech 2021 Falkonry Technical Paper
AISTech 2021 Falkonry Technical Paper
AISTech 2021 Falkonry Technical Paper
Crick Waters1, Beverly Klemme1, Raj Talla1, Prerna Jain1, Nikunj Mehta1
1
Falkonry Inc.
Sunnyvale, CA 94087 USA
Phone:+1 408 461 9286
Emails: crick@falkonry.com, beverly.klemme@falkonry.com, raj.talla@falkonry.com,
prerna.jain@falkonry.com,nikunj.mehta@falkonry.com
ABSTRACT
Recently, Artificial Intelligence (AI) and Machine Learning (ML) techniques have been used to solve complex operations
problems. However, scaling ML/AI across a multitude of equipment types and use cases, a variety of signals, and over time
with changing operations remains a significant challenge. This paper discusses how Falkonry's Operational AI platform
learns, detects, and predicts conditions in continuous casting. This methodology is scalable across use cases and time without
the need for data scientists. Precedent detection of impending equipment failures allows operations to schedule necessary
maintenance interventions, thus avoiding loss of production due to unexpected downtime events.
Keywords: Operational AI, Machine Learning, Time Series Data, Predictive Operational Excellence, PredictiveMaintenance,
Continuous Casting, Reliability, Condition-Based Maintenance
INTRODUCTION
The advent of IoT sensors, edge devices, and connectivity standards in factories has pushed data collected from industrial
automation projects into the cloud. This new data availability has spawned a revolution in both the underlying methods and
scale of data analytics – creating smart factories whose performance and behaviors are managed by digital interpretation of
data. Insights are no longer limited to human inspection and domain knowledge but are now acquired using digital
technologies and practices. Machine learning methods have emerged onto the industrial landscape and are rapidly evolving to
meet the demand for this next generation of digital insights.
Industries that have the maturity and foresight to ride the wave of new and smart factory innovation have tremendous
potential to create a competitive advantage by boosting productivity and increasing revenues. For example, cost savings can
come from predicting equipment failures, shifting the emphasis from preventive to predictive maintenance, freeing up
otherwise lost production time for incremental revenue generation.
Despite the evolution of sensors, devices, cloud computing, and innovations in machine learning, many factors challenge the
practical application of machine learning and artificial intelligence in operations at scale. Scaling across multiple equipment
types and use cases is one such challenge. Traditional approaches by data scientists and use-specific software development
simply cannot scale - even with “auto ML”[1], shortening the machine learning cycle for algorithm selection. Not only is the
custom-ML development cycle a substantial impediment to scale, so too is the effect of time on ML models as underlying
production equipment is upgraded, raw materials change, product mixes evolve, and new failure modalities emerge through
long-term use. Any or all these changes render custom-crafted ML models moot, leaving the manufacturer without a critical
continuous analytic tool.
This paper presents Falkonry’s Operational AI as one solution to the challenges of scaling machine learning and AI in
manufacturing. This intelligence-first approach[2] [3] to creating machine learning applications is based on a consistent,
repeatable, time series classification technique robust to incomplete and irregular data, engenders expert knowledge capture,
and leads to real-time actions to positively affect production. Underlying challenges in the real-time application of
Operational AI to production systems are discussed. This paper further presents a real-world case study and how Operational
AI is applied in steel manufacturing at production scale to predict equipment failure to reduce unexpected downtime, thus
Operational AI
Operational AI is a software platform to enhance operational excellence by avoiding operational events that disrupt
manufacturing and production operations. As such a broad set of problems need to be solved for increasing operational
excellence, such as reducing downtime, reducing injuries, increasing throughput, enhancing capacity utilization, reducing
cycle time, increasing yield, reducing work-in-progress inventory, lowering mean cost per unit, reducing and eliminating
defects and lowering changeover time. For such operational excellence solutions, Operational AI needs to offer a broad set of
process and asset-agnostic capabilities.
Operational AI functions by applying computational methods to operational data to discover and report behaviors that engage
operational experts and solicit their know-how through accessible interfaces. Operational AI is easy-to-use in that front-line
practitioners (such as manufacturing engineers, reliability engineers, process engineers, maintenance managers, and those
with similar experience) can use themselves. Operational AI produces applications that predict undesirable system behavior
and can be evolved without assistance from data engineers or data scientists. Operational AI emphasizes solutions that enable
front-line subject matter experts to use the solution on their own. Results are achieved more rapidly, scale more quickly, and
do so at a lower cost than traditional AI allows because the coordination and direct costs of data science specialists are not
required. In this paper, the term “Operational AI” refers to Falkonry’s Operational AI platform [4].
Various manufacturing sectors such as energy, chemical, pharmaceutical, and metals need Operational AI today [5] [6] [7] [8].
AI/ML techniques prove to be promising for a variety of manufacturing applications across the value chain. These
applications accelerate decision-making, minimize unanticipated failures, and improve logistics and utilization of resources.
Figure 1 shows Falkonry’s Predictive Operational Excellence framework.
Operational AI
Operational AI performs data normalization, aggregation and analysis. Operational AI transduces the fine-grained data from
OT into actionable insights through discovery and detection of both known and novel operating conditions. (Sometimes data
used by Operational AI may not originate in OT, such as raw material characteristics and production plan.) Operational AI
provides an explanation of detected conditions to assist with root cause analysis and an estimation of the remaining time to
events of interest (such as consumable replacement or time to failure). Through practitioner interaction with digital twins and
plant analytics, digital insights and recommendations are translated into decisions and converted to actions by sending
workflow instructions to operational management systems.
Operational Management
Operational Management connects insights and recommendations from Operational AI to business operations and
management processes, bringing intelligence to the overall operations facilitating business value objectives. Operational AI
processes fine-grained OT data and turns it into inputs and recommendations for Operational Management systems.
The flow of information from OT sources through Operational AI to Operational Management systems maximizes value-
creation across domains such as equipment maintenance, plant design, manufacturing management, energy management, and
quality control while focusing on factors like costs, schedules, risks, and business priorities.
CHALLENGES
Numerous challenges stand before the innovative steel manufacturer intent on increasing productivity without new capital
investment using existing capital assets. Continuous casting of steel is a well-understood process. Once a caster is put into
operation, the manufacturer can do little to change the physics of the caster to change rated production throughput. Rated
throughput, however, is not continuously attainable in real-world applications.
Despite the maturity of continuous casting equipment and casting operations, valuable production time is lost to unscheduled
downtime events in real-world applications. An “ordinary” continuous caster producing 150 tonnes per hour can generate
$2,500,000 per day in production revenue. Conversely, a single day of lost production loses that equivalent $2.5M.
Therefore, a manufacturer can release tremendous stored potential value by eliminating unscheduled production downtime.
Casting molten steel, not surprisingly, is hard on heavy equipment. Components wear under harsh conditions leading to
failures or adverse product quality. However, the opportunity exists to apply machine learning techniques to detect early
evidence of conditions leading to equipment failure. Early detection of such conditions is a warning to maintenance and
production scheduling managers that downtime needs to be scheduled for repair before failures occur. Therefore, the first
challenge is developing a machine learning application that can detect those early conditions that predict equipment failure and
not being distracted by other operating modes that inevitably arise in industrial operations.
The next challenge is deploying ML applications into a production environment. Sensor and machine parameters are often
available in SCADA, PLC, or from a data acquisition system such as IBA[10]. An Operational AI platform uses such
operational data to create Operational AI applications that predict conditions observed in that data and these systems contain
data that is not systematically normalized for machine learning. Operational AI platforms may use a small amount of recent
data or large amounts of historical data to create applications for specific failure or operational modes that indicate pre-failure
conditions. Operations practitioners need to be able to deploy those applications into continuous operation without costly
and time-consuming software development and evolve them quickly over long periods of usage. The Operational AI platform
must fit within the steel manufacturers’ overall data and operational management software architecture. It must receive raw
data while also providing insights to manufacturing management systems such as asset performance management (APM) or
computerized maintenance management systems (CMMS).
Creating and deploying ML/AI applications for detecting and predicting equipment maintenance requirements is further
complicated by two main issues: 1) relatively infrequent actual problems and 2) changes to operations over time. Industrial
operations insist on maintaining low variance and long MTBF. As a result, problem conditions tend to be rare and unique-
meaning that what one believes today to be a “complete” design will invariably miss a future, previously unknown, failuremode.
Over the ordinary course of plant operations, maintenance will be performed, and operating conditions will change.
METHODOLOGY
To address the aforementioned challenges, Falkonry’s Operational AI methodology includes two sections: a Machine
Learning Pipeline and a Service Infrastructure.
Condition Prediction
This stage aims to produce a condition value at every required time t based on the feature vector produced by the automated
feature extraction stage for that time t. The reported condition at time t may be one of the following: 1) a user supplied label,
2) a system generated label, or 3) unknown. To do so, it uses two additional pieces of information - labeled events and the
desired degree of generalization. Each labeled event included for condition prediction, also called a fact, carries a start and
end time as well as a condition label. Facts may be either ground truth or hypothesis to be tested. The desired degree of
generalization controls the degree of tightness of match between supplied labeled event and patterns arising at other times
and results in a choice between high reliability or high predictability. This approach is referred to as semi- supervised
learning and produces a condition value even when no example data is provided.
During this stage, feature vectors are first clustered dynamically and a cluster identity is attached to every feature vector. This
first step does not depend on the time order of feature vectors and can be called time-free. In the second step, feature vectors
supplemented with cluster identity are combined with facts in a classifier to produce a percentage match for each feature
vector to the feature vector for each of the fact labels. The result of this stage is further resolved down to a single condition
label by using the generalization factor and to isolate anomalies into the unknown condition. The highest match percentage is
used as the confidence in the label of the condition value.
The conditions predicted by Operational AI can be selectively reviewed and labeled with the help of experts from the
operations team and this stage repeated. In this way, experts’ domain knowledge is digitally recorded and is also available for
future applications.
Explanation
The previous stage outputs most of the condition assessment for the datastream. For every assessment, the datastream also
provides a measure of each signal's contribution to making that assessment’s condition label (called an "explanation score").
Explanation scores range from 1 (highest contribution) to 0 (no contribution) to -1 (contradicts the assessment but was
outweighed by other signals). Explanation scores are calculated per assessment using the condition label and the feature
vector from each signal (internal to the condition models). For each signal, the feature vector ("assessment point")is compared
against a sample of the feature vectors used during model learning ("sample points"). The algorithm draws a neighborhood
boundary around the assessment point to find the nearby sample points. Among these nearby neighborhood sample points, the
ratio of points with the same, vs. a different condition, as the assessment is found. This ratio is calculated for all the sample
points (not just the neighborhood). These two ratios are used to calculate the explanation score.
SERVICE INFRASTRUCTURE
The Operational AI platform is designed for easy deployment into IaaS (Infrastructure as a Service) environments like AWS
or Azure or into privately managed compute and storage infrastructures running a container-compatible Linux operating
system. Figure 4 presents the overall architecture of the application deployment.
Figure 5 shows the flow of the applied Operational AI-based predictive maintenance solution:
ADVANTAGES
The advantages of this Operational AI approach applied to continuous casting were:
• Simple to use data integration architecture enabled connectivity and streaming of large data quantities from multiple
sources with different sampling rates.
• ML applications were created from multivariate data with various, irregular sampling rates, without data engineering
effort, for equipment with dynamic operating modes. The automated, consistent, and repeatable development
methodology enabled non-data scientist users to create, deploy, and update production-ready ML applications
rapidly.
• Operational AI incorporates both unsupervised and semi-supervised ML techniques enabling both discovery of
novel conditions and recognition of specific conditions. This method blends learning the patterns of known
operating conditions from previously captured operating data while also identifying unexpected and novel
conditions in real-time operational data. Therefore, this learning system provides benefits for a wide range of use
cases over time and without needing data scientists to develop or maintain these applications.
• ML applications were easily exported for deployment as stand-alone, fully functional, containerized runtime
Analyzers deployed on virtual machines or at the edge on gateway processors. The deployment architecture chosen
RESULTS
The application of Operational AI provided visibility into day-to-day operations, including timely alerts of asset condition to
the production operations team. It recovered what would have been production hours lost to unanticipated failures in vertical
casting and hot rolling operations. Conditions preceding failure were detected by the Operational AI from one to three weeks
in advance of actual failures. Figure 6 illustrates sections of the continuous caster where the application of Operational AI
yielded advance notifications to failures.
CONCLUSION
In this work, we discussed the challenges of using AI/ML techniques to improve plant operations. We presented our
Operational AI vision and demonstrated a real-world case study in steel manufacturing. We presented the summary of
benefits of the steel manufacturing use cases across various equipment with the detailed application-building and deployment
approach.
The main conclusions from our work in the space of Operational AI are:
• An intelligence-first approach reduces time-to-first-value. Reducing time-to-first-value enables plant and
manufacturing executives who believe in the potential of Operational AI to prove their intuition quickly and with low
risk [12].
• The reduction in time-to-first-value removes many hurdles to scaling across multiple equipment, use cases, and
time. Plant managers can implement Operational AI without asking for significant corporate resources and plant
engineers can discover insights and improve production.
• An intelligence-first alternative to the more conventional data-first approach lowers AI implementation risk, which
can often stall value realization. Full deployment of a global data architecture can be deferred until optimization is
required to more efficiently capture production value otherwise lost to unscheduled downtime.
• Operational AI makes possible rapid testing and selecting, followed by fast solution deployment, from a wide range
of hypotheses. This agile approach of rapidly iterating over models and use cases enables shorter learning cycles and
quick identification of essential use cases. Arriving at solutions for use cases faster supports getting stakeholder
feedback earlier than conventional methods. This agile Operational AI approach leads to time savings and rapid proof
of value.
• Operational AI is capable of learning from available data and improving its predictions over time. It augments the
asset owners’ abilities by continuously learning and reporting system behavior[13]. It overcomes the time- consuming
challenges of gathering new ground truth facts, collecting new operational data sets, learning the new behaviors, and
validating the prediction results necessary for traditional ML methods. Manufacturing leaders benefit from
digitizing captured expert knowledge so crucial to the application and scaling of Operational AI. Operational AI is
cost-effective for discovery, deployment, and scaling; and facilitates ongoing continuous improvement[14].
In this novel way, we are rethinking Operational AI. Operational AI is an intelligence-first approach that reduces business risk
and lowers time-to-first value. Operational AI digitally captures expert knowledge for long-term use and enables non- data
REFERENCES
1. "Automated machine learning," [Online]. Available: https://en.wikipedia.org/wiki/Automated_machine_learning.
[Accessed 2021 April 2021].
2. C. Lee, "The Intelligence-First Path to Predictive Operations," Falkonry Inc., 6 July 2020. [Online]. Available:
https://falkonry.com/blog/the-intelligence-first-path-to-predictive-operations/. [Accessed 22 April 2021].
3. C. Lee, "Database-First-vs Intelligence-First: The Cart Before the Horse," Falkonry Inc., 10 June 2020. [Online].
Available: https://falkonry.com/blog/database-first-vs-intelligence-first-the-cart-before-the-horse/. [Accessed 22 April
2021].
4. N. Mehta, "Operational AI: the bridge between Operational Technology and Operational management," Falkonry Inc., 2
December 2020. [Online]. Available: https://falkonry.com/blog/operational-ai-the-bridge-between- operational-
technology-and-operational-management/. [Accessed 22 April 2021].
5. J. Yoon, D. He and B. Van Hecke, "A PHM approach to additive manufacturing equipment health monitoring, fault
diagnosis, and quality control," Nantes, 2014.
6. J. Moyne and J. Iskander, "Big data analytics for smart manufacturing: Case studies in semiconductor manufacturing,"
Processes, p. 39, 2017.
7. H. Ding, R. X. Gao, A. J. Isaksson, R. G. Landers , T. Parisini and Y. Yuan, "State of AI-based monitoring in smart
manufacturing and introduction to focused section," IEEE/ASME Transactions on Mechatronics, pp. 2143-2154, 2020.
8. K. Zope, K. Singh , S. H. Nistala , A. Basak, P. Rathore and V. Runkana , "Anomaly Detection and Diagnosis In
Manufacturing Systems: A Comparative Study Of Statistical, Machine Learning And Deep Learning Techniques," in
Annual Conference of the PHM Society, 2019.
9. "Operational technology," [Online]. Available: https://en.wikipedia.org/wiki/Operational_technology. [Accessed 22April
2021].
10. "iba System," [Online]. Available: https://www.iba-ag.com/en/iba-system. [Accessed 22 April 2021].
11. N. Mehta, "Event Horizon Estimation - Time to Any Critical Event," Falkonry INc., 17 November 2020. [Online].
Available: https://falkonry.com/blog/event-horizon-estimation-time-to-any-critical-event/. [Accessed 22 April 2021].
12. N. Mehta, "Rethinking Operational AI," Falkonry Inc., 12 May 2020. [Online]. Available:
https://falkonry.com/blog/rethinking-operational-ai. [Accessed 22 April 2021].
13. C. Lee, "How intelligence first leads to faster learning and cost effective scaling," Falkonry Inc., 13 April 2021. [Online].
Available: https://falkonry.com/blog/how-intelligence-first-leads-to-faster-learning-and-cost-effective- scaling/.
[Accessed 22 April 2021].
14. C. Lee, "Continuous Improvement of an Operational AI Deployment," Falkonry Inc., 18 June 2020. [Online]. Available:
https://falkonry.com/blog/continuous-improvement-of-an-operational-ai-deployment/. [Accessed 22 April 2021].