Historical Incidents and Incentives For Change
Historical Incidents and Incentives For Change
Historical Incidents and Incentives For Change
Curt Miller
SIS SILverstone LLC
General Manager / Principal Consultant
11999 Katy Freeway, Suite 600
Houston, TX 77079
camiller@sissilverstone.com
Ph: 832-439-3793
Justin Kassie
Tesoro
Lead Reliability Engineer
2350 E. 223rd Street, Carson CA 90745
justin.l.kassie@tsocorp.com
Ph: 310-233-6034
Dan Poston
LyondellBasell
Consulting Engineer, Global Projects & Engineering
202 N. Castlegory Rd, Houston, Texas 77049
Daniel.Poston@lyb.com
Ph: 832-679-8710
Page 1 of 8
Also, Mr. Clark prepared a list titled Recent Rotating Equipment Qualitative versus Quantitative Techniques for Risk Mitigation
Failures at Five Refineries [R3] and it contains similar history as
Process Hazard Assessment (PHA) is a systematic way to identify
shown in Table 2 below.
all potential hazards for a facility so the risk team can determine
how to manage each one. Generally speaking, HAZOPs are
Equipment Site Event
favored for their thoroughness with processes, since the whole
Unit Charge Pump A Major Fire plant is reviewed node-by-node, with a detailed set of guide words
Pump A A Vapor Cloud Release applied to each characteristic of the process. What-ifs, FMEAs, and
Pump B A Vapor Cloud Release Checklists are used for many rotating equipment configurations.
Unit Screw Compressor A Major Fire
Pump A Major Fire The primary objective of the PHA studies was to identify the
causes of potential safety and environmental hazards, as well as
Pump B Major Fire
major operability problems. Based on the evaluated consequences
Coker Wet Gas C Vapor Cloud Release
and safeguards identified, the multi-disciplined PHA team
Compressor
proposed recommendations to reduce the risk and enhance
XX Crude Pump C Major Fire
operability to tolerable levels in compliance with each company’s
FCC Auxiliary Air Blower D Fire
risk criteria.
FCC Unit Main Air Blower E Near Miss
Table 2: Recent Rotating Equip Failures Five Refineries Use of qualitative risk ranking tools is relatively simple, but lead to
Summary inconsistencies between different PHA teams as well as the
potential to under or over-estimate the risk. For lower level risks,
Such events support a change to a more thorough approach to risk this is not generally a significant concern, however, for higher risks
analysis, interlock design, and operational measures so that such there is a need for management to be able to make better informed
incidents are minimized as low as reasonably practicable. decisions using a more consistent basis. That requires a greater
level of insight and is provided by more quantitative analysis
Standards Timeline techniques that determine if there is a risk or Safety Integrity Level
(SIL) gap. In layman terms, SIL refers to “orders of risk
New process plant design and existing facilities have included risk reduction” as shown in the following table:
assessments associated with their unique processes for many years,
SIL Risk Reduction
although it was not until the release of 29 CFR 1910.119, Process
1 > 10 factor
Safety Management (PSM) for Highly Hazardous Chemicals [R4]
2 > 100 factor
that it became a formal requirement in the U.S. (Note that some
3 > 1,000 factor
equivalent requirements are usually present in other parts of the
Table 3: SIL & Risk Reduction
world, but not always.)
After such risk targets are discovered, other Safety Life Cycle
After OSHA 1910.119, there was a succession of domestic and (SLC) processes are followed such as verification calculations of
international attempts at standards for interlock design. They the formally termed Safety Instrumented Functions (SIFs). This
included: and other such processes ensure that the SIFs are capable of
a. AIChE CCPS, Guidelines for Safe Automation of achieving the necessary risk reduction.
Chemical Processes, 1993
PHA Example – Reciprocating Compressor
b. ISA S84.01-1996, App. of Safety Inst. Sys. for the
Process Ind., Feb. 15, 1996 The following table provides a glimpse of the main components for
c. ANSI approval of S84, 1997 a reciprocating compressor PHA.
d. IEC 61508, Functional Safety: Safety-Related Systems Safe-
Devi ati on Caus e Cons equence S L R Recommendati on
"General" released, 1998 guards
Water entrai nment Hi gh 1. Cons i der addi ng
e. OSHA recommends S84, Mar 23, 2000 Too Hi gh Level dump
l eadi ng to damage & l evel 2 4 3 hi gh l evel tri p i f
f. IEC 61511, Functional Safety: Safety Inst. Systems for l evel fai l s cl os ed
l os s of contai nment al arm requi red by LOPA
the Process Ind., 2002 Mai nt. val ve Water entrai nment Hi gh (repeat 1. Add hi gh
i nadvertentl y l eadi ng to damage & l evel 2 5 4 l evel tri p)
g. ISA S84.00.01-2004, Sept 2, 2004 (ISA84) l eft cl os ed l os s of contai nment al arm
The focus of this paper is on implementation of ISA84 [R5] to Table 4: PHA Example – Reciprocating Compressor
rotating equipment. Its full implementation involves a Safety Life This PHA data will serve as input to the subsequent SIL analysis to
Cycle (SLC) as most have experienced with engineered systems further quantify that risk reduction was obtained.
and will be highlighted in the following section.
FUNDAMENTALS ON SIL REQUIREMENTS
Specific to turbomachinery, the 5th edition of the API Machinery
Protection Standard API670 provides detailed guidelines on the Once an interlock has received the “SIL branding”, the ISA84
implementation of the machinery protection systems (MPSs). It standard requires that other steps be followed diligently to ensure
will be reviewed in more detail shortly. that both random and systematic failure is not introduced into its
Page 2 of 8
design and operability. The highlights of this process are captured • Quantification of the risk accounting for initiating cause
in the remainder of this section. frequency and independent layers of protection (IPL),
where IPLs prevent the propagation and fulfillment of a
Functional Safety Management (FSM) hazardous event.
The first ISA 84 objective is to specify safety lifecycle (SLC) • Application of conditional modifiers (probabilities) that
management and technical activities needed to implement the affect the likelihood of the hazardous event being enabled
safety instrumented system. It should designate responsibilities for or mitigation and thus alters the hazardous event outcome.
each SLC phase and the activities within that phase.
The LOPA methodology helps us predict with greater certainty and
The basic FSM tasks include: consistency whether or not the risk complies with corporate
1. Defining a safety lifecycle process. criteria.
2. Developing a functional safety management procedure.
3. Develop a project execution safety plan. Data used in the LOPAs and their references are normally
documented in each company’s guidance document.
Functional safety management (FSM) is specifically noted to act as
an extension to existing monitored quality systems and processes. LOPA Worksheet Example
This quality-based philosophy of “plan, execute according to plan, Without going into the intricate details of the LOPA process, the
verify, document, and improve based on the resulting experience” following worksheet (Figure 1) shows how these key elements are
carries through the entire safety lifecycle. utilized together.
Page 3 of 8
systems lifecycle. The SRS addresses both functional and integrity Verification calculations are performed after the other conceptual
specifications as stated below. design steps have been completed at draft level. See Safety
The functional part of the SRS describes what the safety Instrumented Systems Verification -Practical Probabilistic
instrumented function does when harm from a given hazard is Calculations [R6] for more information on this subject.
imminent. Required details include process inputs and their trip If calculations show that the draft design does not meet the SIL
set-points, safety system outputs and their actions, and the logical target, the choices are:
relationship between each of them. This is a similar requirement
for any control loop within the basic process control system, but in 1. Shorten the testing interval, but not beyond the practical
the SRS case, improved safety, not production, is the goal. Some of point for operations
the specified functional requirements that have been included in 2. Select better technology/equipment.
ISA84 are included in Table 6. 3. Add redundancy or other IPLs
ISA84 SRS Functional Requirements The conceptual design iterations will continue until the SIL or risk
Defined safe state reduction target is met with the overall most economical system.
SIS process measurements and their trip points SIL Verification Calculations Example
SIS process output actions Although the SIL verification calculations can be completed by
The functional relationship between inputs / outputs hand with the ISA84 simplified equations and Markov models,
most functional safety professionals prefer to use off-the-shelf
Manual shutdown detail
tools. Shown below is the output of such software.
Energize or de-energize to trip specification
Action(s) to be taken on loss of energy source(s) to SIS
Method to reset the SIS after a shutdown
Table 6: ISA84 SRS Functional Requirements
The integrity part of the SRS describes “how well” the safety
instrumented function needs to work when harm from a given
hazard is imminent. In this part of the SRS, it must specify such
things as the required SIL, as well as necessary diagnostics,
maintenance, and testing. Some of the specified integrity
requirements that have been included in ISA84 are included in
Table 7.
Page 4 of 8
developed for relays and pneumatic instruments is not only less There is latitude on whether all the shutdown logic is performed in
effective, but very costly compared to more appropriate methods. the ESD. If it is not and separate surge, monitoring, and overspeed
systems are tied into the ESD, the overall system is considered to
The biggest issue with proof testing is that no methods have 100% have “distributed architecture”. If such functions are included in
coverage of dangerous failures. To account for this discrepancy,
the ESD, then it is termed “Integrated Architecture”.
replacement or “rebuild to new” (i.e. mission times) now must be
specified for all equipment and they must be within the useful life Annex L – Safety Integrity Level
of the component.
Annex L provides a 17-page introduction of the SIL concepts and
The second issue discovered when reviewing site test practices is correlates their application to turbomachinery standards. Although
that most procedures included the full functional test, but testing of the risk graph methodology is not as prevalent in the USA as it is
the diagnostic routines was not completed. Since the associated in Europe, its principals still apply to those that have standardized
SIFs verification included such diagnostics, such test practices had
on LOPA as mentioned earlier.
to be upgraded to account for detection of faults, degraded
architecture, and presentation of associated alarms. Key takeaways from this SIL annex include:
Example – Optimized Turbine Testing and Maintenance SIL compliance, although associated directly to the ESD
logic solver, should be extended to the I/O devices
As discussed in above, no proof tests are 100% effective in
detecting all covert faults. Due to such a limitation, Separation of control and safety is imperative
turbomachinery specialists in one corporation inherently SIL is determined by performance requirements set by
understood this issue and have been rebuilding their critical trip each user, not by prescriptive methods
and throttle valves in every turnaround for years. This level of
maintenance, coupled with optimized partial stroke testing Such practices line up closely with those of the currently functional
techniques, has helped each site meet their SIL2 safety and standards like ISA84.
production goals simultaneously. Annex M – Spurious Shutdowns
SPECIFIC APPLICATIONS OF SIL WITHIN API670 Since safe, fault tolerant methods with higher spurious trip rates
may at times oppose process uptime (i.e. machine reliability), this
Although API Machinery Protection Standard API670 has been
annex “recommends some practice to reduce the risk of economic
around since the 1980s, the November 2014 update adds over 150
losses”.
pages of new content. The most notable changes are shown in the
following table: Key takeaways from this spurious trip annex include:
Utilization of fault tolerant designs for safety and
Clause
Specification reliability
/Annex
Applying preventative diagnostics where applicable
8 Electronic Overspeed -More detailed discussion
9 Surge detection (New) Regarding plant impact examples, the following table truly
supports such measures.
10 ESD (New)
K Surge Detection (New) Process Application Spurious Trip Cost
L Safety Integrity Level (New)
Oil & Gas Platforms Up to $2 million/day
M Spurious Shutdowns (New)
20 days to recover at
N Condition Monitoring (New) Polystrene
$20k/day=$400k
O Overspeed (New) Refinery Coker Heater $35k/day
P Recip Compressors (New) Refinery Catalytic
$500k
Cracker
Q Wireless (New)
Complete Refinery $1 million/day
Table 8: New Clauses & Annexes in API670
Ammonia & Urea Plants $1 million/day
Such additions support a cohesive strategy with the other process
Power Generation $100k/MW hour to $millions/site
and machine functional safety standards.
$1 million to include getting
Ethylene
Emergency Shutdown Device (ESD) and API670 product to spec
Table 9: Spurious Trip Cost in Different Process Industries
The Emergency Shutdown Device (ESD) detailed in clause 10 is [R7]
synonymous to the SIS detailed by ISA84. By having such a
“single brain” for supporting all the critical safety functions, the
first requirement of consolidating all trip demands and ensuring
“proper timing and sequencing for a safe shutdown” is met.
Page 5 of 8
COMMON SIL RATINGS FOR ROTATING EQUIPMENT In addition to the incidents listed above, the same could be said for
the 110 steam turbine overspeed incidents [R2]. The following
As shown in the preceding section, the process of SIL assignment table summarizes primary failure modes for the interlocks in those
takes into account the end user’s risk target, consequence severity, cases.
initiating cause frequency, the number of safeguards, and the
application of conditional modifiers, so the final SIL requirement Interlock Failure Classification Cases %
can vary widely. Utilizing the experience of the authors and a Random Hardware Failure 22 20.0%
broad perspective of reviews, the following table offers a glimpse Analysis 9 8.2%
of what could be expected. Design 15 13.6%
Systematic
Commissioning 2 1.8%
SIL Target Failure
Operational 10 9.1%
Driver Application High Low Norm Testing 25 22.7%
Failure cause not stated 27 24.5%
Steam turbine Overspeed SIL2 SIL1 SIL1
Cases total 110 100%
Centrifugal Anti-Surge SIL2 None SIL1
Table 12: Summary of Steam Turbine Overspeed Failures
Compressor High Level SIL2 None SIL1
Gas turbine Light Off SIL2 SIL1 SIL1 As can be shown, systematic failures account for over 50% of the
overspeed cases. More detailed procedures, verification, and 3rd
Turbine Generator Overspeed SIL2 SIL2 SIL2
party assessments should minimize such accidents.
Reciprocating High Level SIL2 None SIL1
Compressor HP discharge SIL2 None SIL1 SIL Economics – Justification for Investment
Table 10: Common SIL Targets for Rotating Equipment The 36 serious accidents that resulted from turbine overspeed
Naturally, each company must review each application individually errors can be broken down into two general categories, namely:
to avoid either over- specification, or more imperative for safety, Functional Specification
under-specification of turbomachinery SIL.
These are the actions needed to prevent the incident that are laid
out by the safety controls engineer in the form of detailed
ECONOMIC JUSTIFICATION OF SIL requirements, such as input & outputs and their relationships,
Based on the previous publication on turbomachinery failures at response time, trip points, etc. In an HSE study [R8], two
refineries [R3] and a general review of their applications, the examples (reactor and material handling) presented had incorrect
following table summarizes where the SIL principles could be safe state action after the hazard was detected.
applied. Integrity Specification
SIL This involves the "failure free" operation of the safety system. The
Equipment Event Issue failures could be either random (component failures) or systematic
Case?
Charge Major Fire Yes Failure of lube oil (based on procedures. Two random failure examples include circuit
Pump trip board and motor contactor failures.
Pump A Vapor Cloud Yes Seal pressure trip As an economics example based on a functional specification error,
Pump B Vapor Cloud Yes Seal pressure consider the following catastrophic event scenario. You have the
Release interlock potential for a large loss of $10 Million based upon an in initiating
Unit Screw Major Fire Yes Failure of lube oil event that could occur every ten years. If your corporate tolerable
Comp trip frequency limit for an event of this magnitude is once in ten
thousand years, then you are accepting the following risk targets:
Pump Major Fire Mech Vibration?
Pump Major Fire Yes Vibration trip Tolerable Cost/yr = $10 million/10,000 years = $1,000/yr
Coker Wet Vapor Cloud Yes Vibration trip Chance of Plant Accident in 50 years= 50 yr *
Gas Comp Release (accident/10,000 years) * 100= 0.5%
Crude Major Fire Mech Vibration? To lower the frequency to this tolerable level, a combination of
Pump issue safeguards will need to be in place which prevent the initiating
FCC Aux Fire Yes Surge detection event from causing the accident at least 999 out of 1,000 times.
Air Blower Naturally, the more independent non-SIS safeguards that you have
FCC Main Near Miss Yes Integrity issue of in place, the lower the safety system's SIL rating needs to be. This
Air Blower interlock is shown below in the two left hand columns. The table also shows
Table 11: Recent Rotating Equip Failures Five Refineries the escalating cost associated with making a specification error
Summary of SIL Issues such that the SIS does not prevent the accident even when it works
according to design.
Of those, the majority may have been prevented following better
SIL functional safety management.
Page 6 of 8
Based on the cause, consequence severities, and safe guards stated
in the HAZOP, the following LOPA worksheet was completed by
With Loss of SIS a competent team during a LOPA workshop.
Non - SIS Increase in
SIS Ratin Chance of Cost Over
IPLs* g Cost Accident Tolerable
in Lifetime
2 SIL1 $10,000/yr 5% $9,000/yr
1 SIL2 $100,000/yr 50% $99,000/yr
0 SIL3 $1,000,000/yr Near 100% $999,000/yr
* Non-SIS IPLs (i.e. Basic Process Control System, operator
intervention, pressure relief valves, and deluge systems) are
typical non-SIS safeguards for many companies. Each
safeguard is assumed to have a Risk Reduction Factor (RRF) of
10.
Table 13: Exposure with Loss of SIS Protection
As the table dramatically shows, it is increasingly important to get
the functional specification correct when working with higher SIL
values. This is why the ISA 84-2004 safety lifecycle design stages
and the methods to reduce systematic errors are based upon each
safety function’s SIL value. The higher the SIL, the more strenuous
your safety lifecycle reviews and cross checks must be to ensure
that systematic errors are kept in check.
As we noted, the safety lifecycle deals with both random and
systematic causes of accidents. The random part is addressed and
Figure 3: LOPA Tool Output for High HP Turbine
managed with the part of SIL related to probability of failure upon
demand averaged (PFDavg) and risk reduction factor. But since it After reviewing the current safeguards and determining that none
represents only part of the safety system specification and only part besides the overspeed trip were effective to prevent the hazard, the
of the sources for dangerous errors, more is needed. Other LOPA output specified a SIL2 requirement for the trip system.
systematic problems, such as failing to consider alternate paths to 3. Defining SIL Requirements in the SRS
the accident or not fully specifying all of the elements of the safety
Since the overspeed interlock was now SIL2 classified,
function can kill people just as dead. Thus, the functional safety
documentation was developed to specify its performance
management parts of the safety lifecycle that address these requirements. In all, there were twenty-seven requirements
systematic errors are vitally needed to provide the required risk documented to meet the ISA84 functional safety standard.
reduction.
The most stringent SRS requirement was the process safety time of
EXAMPLE: THE COMPLETE SIL PROCESS APPLIED TO only 50 milli-seconds due to such a quick load release on a turbine
A HIGH HORSEPOWER TURBINE of such horsepower. Like all the SRS requirements, such an
accelerated response would need to be validated during the pre-
These unique SIL processes were recently utilized on a critical startup safety review and all proof tests in the future.
high horsepower turbine application and will be generally
4. Performing SIL Verification to Prove that SIL2 was
discussed so that the overall safety lifecycle can be understood.
Attained
1. Utilizing a HAZOP to Identify a High Severity Hazard To prove that the overspeed trip met the SIL2 risk target, reliability
Per OHSA 1910.119 regulatory requirements for review of process calculations were performed as the next step in the functional
hazards every 5 years, a HAZOP was conducted and the team safety lifecycle. Such calculations were based on the components
concluded that a loss of load based on a coupling failure would selected, their voting architecture, diagnostics applied, and finally,
result in turbine overspeed. Such an event was considered testing and replacement intervals.
significant and had severe personnel injury and mechanical impacts The specific components that made up the overspeed trip system
2. Applying a LOPA Review to Quantify Risk included the magnetic pickup sensors, logic solver (i.e. Safety PLC
or SIS), the trip & throttle shutoff valve, and any interface
Since the user wanted to follow Recognized and Generally components in between. There initially was no concern in meeting
Accepted Good Engineering Practice (RAGAGEP) for risk the SIL2 target since the overspeed system applied SIL3 certified
mitigation, LOPA practices were adapted to further quantify risk electronics and the final element was partial stroke tested.
exposure. The turbine overspeed scenario met the criteria for
further detailed analysis through LOPA due to its severe safety Data for the certified devices was readily available in the vender’s
consequences. product safety manuals. It should be noted that the two (2)
commonly accepted functional safety assessment agencies include
Page 7 of 8
TUV and exida Certification and the standard adhered to is CONCLUSION
IEC61508 [R9].
SIL is here to stay; get on-board
Since the trip & throttle valve had not been certified and the
manufacturer had no failure mode specific data, the SIS project Taken individually, each of the guidance measures presented in the
engineer contracted a Failure Modes, Effects, and Diagnostics earlier chapters should make “good engineering sense”. But
Analysis (FMEDA). This analysis was specific to the OEM’s valve dependent upon where each company is in their functional safety
assembly and therefore resulted in a precise, yet conservative data lifecycle development, the sum of the measures may be
set to be used in the SIL verification calculations.. Otherwise, overwhelming. The key takeaway is this – each progressive step
conservative data based on generic components would have been forward makes our industry a safer one.
used and the SIL2 risk target likely would not have been met.
Applied properly, SIL knowledge will be an advantage
Another SIL2 issue surfaced when an emergency trip device (ETD)
was discovered in the turbine mechanical drawings. The ETD was Although the task of ISA84 compliance can seem daunting, it is
critical to the overspeed trip since it acted as an interface worth the effort. With a growing public risk aversion, the process
component for dumping the hydraulic power fluid. Since it could industry cannot be satisfied with an “it’s never happened here
not be tested by the partial stoke apparatus, it became a SIL2 before” safety culture.
limiter and the overspeed system became degraded to SIL1.
Fortunately, the design team found an alternative solution to avoid Each progressive measure taken in ISA84 compliance is fully
adding an inline steam valve (~ $300k). worth the investment. Although most responsible facilities want to
get there immediately, a 6 to 10 year full implementation is
The team utilized a commercially available software platform to
expected. The key takeaway is this – each step forward makes our
perform the calculations. SIL2 results were achieved by using
industry a safer one.
partial stroke testing and accounting for specific overspeed failure
modes where a significant leak was required to fit the scenario ABBREVIATIONS
stated. The results are shown in Figure 4. API – American Petroleum Institute
ESD – Emergency Shutdown Device
ISA – International Systems and Automation
FSM – Functional Safety Management
LOPA – Layer of Protection Analysis
SLC – Safety Lifecycle
SIF – Safety Instrumented Function
SIL – Safety Integrity Level
SRS – Safety Requirements Specification
REFERENCES
[R1] API Standard 670, Machine Protection Systems, 5th edition,
November 2014
[R2] Clark, Steve, Steam Turbine Overspeed Incidents (Listing),
September 2009
[R3] Clark, Steve, CLARK Generic List of Recent Rotating
Equipment Failures, Nov 2009
[R4] 29 CFR Part 1910.119, Process Safety Management of Highly
Hazardous Chemicals, U.S. Federal Register, Feb. 24, 1992,
http://www.osha.gov
[R5] ANSI/ISA SP84.00.01 – 2004 (IEC 61511 Mod.),
Figure 4: SIL Verification Tool Output for High HP Turbine
Application of Safety Instrumented Systems for the Process
With the upgraded model, this overspeed trip met SIL2 and had a Industries, NC: Raleigh, ISA, 2004.
7.15 year spurious trip rate. [R6] Goble, W. M. and Cheddie, Harry, Safety Instrumented
5. Periodic Maintenance & Testing Systems Verification -Practical Probabilistic Calculations, NC:
Research Triangle Park, ISA, 2005
Once operational, periodic procedures were to be developed and
performed as per the SIL verification and manufacturers [R7] Miller, Curtis, Win/Win: A Manager’s Guide to Functional
requirements. Any component failures will be documented and Safety, 1st Edition, 2008
also compared to data utilized in the original study to ensure that [R8] Out of Control: Why Control Systems go Wrong and How to
the risk target is continually achieved. Prevent Failure, U.K.: Sheffield, Health & Safety Executive, 1995
By applying and documenting each of these ISA84 safety lifecycle [R9] IEC 61508, Functional Safety of electrical / electronic /
steps, the user felt assured that they had met current RAGAGEP programmable electronic safety-related systems, Geneva:
and underwriter requirements. Switzerland, 2010.
Page 8 of 8