Shell Moerdijk Cast Nancy Leveson
Shell Moerdijk Cast Nancy Leveson
Shell Moerdijk Cast Nancy Leveson
Moerdijk Accident
Nancy G. Leveson
Massachusetts Institute of Technology
This analysis was created for a benchmarking exercise led by the E.U. Major
Accident Hazards Bureau (MAHB), 2016–2017.
INTRODUCTION TO CAST
CAST (Causal Analysis based on Systems Theory) is an accident analysis technique using the STAMP
(Systems-Theoretic Accident Model and Processes) accident causality model. Traditionally, accidents
have been thought of as resulting from a chain of failure events, each event directly related to the event
that precedes it in the chain. For example, water gets into a tank causing corrosion which leads to
weakened metal. When the weakened metal is combined with a certain pressure in the tank, the tank
may explode leading to injuries and deaths. A comprehensive critique of event chain models is beyond
the scope of this document. A detailed discussion can be found in Leveson [2012]. The biggest problem
with the chain-of-events model is what it omits.
STAMP extends this model of accident causation to include the chain-of-events model as one
subcase but includes the causes of accidents that do not fit within this model, particularly those that
occur in the complex sociotechnical systems common today. These causes (in addition to component
failure) include system design errors, unintended and unplanned interactions among system
components (none of which may have failed), flawed safety culture and human decision making,
inadequate controls and oversight, and flawed organizational design. In STAMP, accidents are treated as
complex processes rather than simply chains of failure events.
Most safety engineering techniques used today are based on reliability theory and focus on failures.
They treat software, human, and organizational behavior as exhibiting random failures and assume
individual errors are independent (which these methods do in order to make the mathematics feasible).
This approach is much too simplistic to account for the safety culture flaws and poor decision making
often involved in major losses. These losses are not the result of a simple summation of individual
component (human or technical) failures. For example, in the Deepwater Horizon (DWH) accident, the
blowout preventer (BOP) was recognized as critical and had redundant components to make it operate
very reliably. What was not accounted for was decision making based on profitability and other social
and time pressures that would result in inadequate maintenance of the BOP and decisions to put off
replacing the BOP batteries. It also does not account for the common-mode failures of the redundancy
in the BOP (i.e., design errors in the attempts to make the BOP very reliable), which resulted from
inadequate engineering and underestimation of the risk for such failures. And, of course, the BOP
inadequacies are only a tiny part of the problems that occurred in the DWH accident. The social and
managerial deficiencies in DWH eclipse the engineering and maintenance flaws.
In contrast, STAMP is based on systems theory and focuses on control. Informally, systems theory
has four basic concepts: hierarchy, emergence, communication, and control [Checkland 1981, Leveson
2012]
Hierarchy: A general model of complex systems can be expressed in terms of a hierarchy of levels of
organization. An example of a hierarchical safety control structure is shown in Figure 1, which shows an
example for a typical regulated industry in the U.S. Notice that the operating process (the focus of most
hazard analysis) in the lower right of the figure makes up only a small part of the safety control
structure. There are two basic hierarchical control structures shown in Figure 1—one for system
development (on the left) and one for system operation (on the right)—with interactions between
them. Each level of the structure contains controllers with responsibility for control of the behavior of
the components at the level below as well as their interactions. Higher level controllers may provide
overall safety policy, standards, and procedures (downward arrows), and get feedback (upward arrows)
about their effect in various types of reports, including incident and accident reports. The feedback
provides the ability to learn and to improve the effectiveness of the safety controls.
There is usually interaction between the control structures. Manufacturers must communicate to
their customers the assumptions about the operational environment in which the original safety analysis
was based, e.g., maintenance quality and procedures, as well as information about safe operating
1
procedures. The operational environment, in turn, provides feedback to the manufacturer about the
performance of the system during operations. Each component in the hierarchical safety control
structure has responsibilities for enforcing safety constraints appropriate for that component, and
together these responsibilities should result in enforcement of the overall system safety constraints.
2
it interacts with other plant components in a particular way. Emergent properties associated
with the behavior of components at one level in a hierarchy are related to constraints upon the
degree of freedom of those components. One example constraint is that pressure in a chemical
reactor must never be allowed to rise above a particular level and that communities near plants
producing potentially toxic chemicals must have contingency plans in place to deal with an
accidental release. Controls need to be created to ensure that the safety constraints are
enforced.
• Control: Control involves the imposition of constraints upon the activity at a lower level of the
hierarchy, i.e., at the interfaces between levels. The imposition of safety constraints on the
behavior of the system components plays a fundamental role in a systems approach to safety. A
physical example of a typical control for a chemical plant is a pressure relief valve. An
organizational example is the safety engineering department creating standards for safe
operation. Finally, a social control example is a regulator providing oversight and certification of
plant activities and policies. Losses occur when the controls are not adequately designed or
enforced, resulting in violation of the safety constraints.
• Communication: Control implies the need for communication between levels of the hierarchy
and between the components at each level.
This type of model is similar to Rasmussen’s model of sociotechnical control [Rasmussen 1997]
except that he includes only the operational aspects of the system while treating engineering and
manufacturing activities only as inputs to the model, and he ties his model to an event chain. See the
AcciMap model of the Shell Moerdijk accident being produced independently for this benchmarking
effort, which is based on Rasmussen’s model.
Note that the use of the term “control” does not imply a rigid command and control structure.
Behavior is controlled not only by engineered systems and direct management intervention, but also
indirectly by policies, procedures, shared value systems, and other aspects of the organizational culture.
All behavior is influenced and at least partially “controlled” by the social and organizational context in
which the behavior occurs. Engineering (i.e., designing) this context can be an effective way to create
and change a safety culture, i.e., the subset of organizational culture that reflects the general attitude
about and approaches to safety by the participants in the organization or industry [Shein 1986]. Formal
modeling and analysis of accidents/incidents must include these social and organizational factors and
cannot be effective if it focuses only on the technical aspects of the system. As we have learned from
major accidents in the oil and gas industry and most other industries, managerial and organizational
factors are as important as technical factors in accident causation and prevention.
For space reasons, Figure 1 emphasizes the high-level components of a safety control structure and
not their detailed design. The detailed design of the operating process (lower right-hand box) can be
quite complex, such as the detailed design of the physical chemical plant itself and the operations in the
plant.
Figure 2 shows the basic form of the interactions between the levels of the control structure, where
the controller imposes control actions on the controlled process. The standard requirements for
effective management—assignment of responsibility, authority, and accountability—are part of the
control structure design and specification.
3
Figure 2. A Simple Feedback Control Loop showing the relationship to
standard management concepts of responsibility, authority, and accountability
The controller uses information about the current state of the controlled process, usually derived at
least partially from feedback. In STAMP, feedback information is incorporated into the controller’s
model of the controlled process, called the process model or, if the controller is a human, it may be
called the mental model. Accidents often result when the controller’s process model becomes
inconsistent with the actual state of the process and the controller provides unsafe control as a result.
For example, the controller thinks that catalyst has been added to the reactor when, in fact, it has not.
Other examples are that the manager of a plant undergoing restart after a maintenance stop believes
the operators have adequate training and expertise to perform the operation safely when they do not or
an operator thinks that the pressure in a reactor is within a safe limit when it is in the danger zone.
The problems occur not just with inconsistency between the controller’s process model and the state
of the controlled process but also when different operators, all involved in the same general task—
particularly under safety-critical or emergency conditions—are operating with different mental models
of either (a) what the system is currently doing, or (b) what should be done to control it.
Process models are kept up to date through feedback or from information received externally. A
common factor in accidents is that appropriate feedback or other information about the controlled
process is incorrect, missing, or delayed.
There are four types of unsafe control actions:
• A provided control action leads to a hazard
• Not providing a necessary control action leads to a hazard
• A control action provided with wrong timing (early, late) or in the wrong order leads to a hazard
• A continuous control action provided for too long or too short a time leads to a hazard
These four types of unsafe control actions, along with the hierarchical safety control structure, can be
used after an accident to generate the causal scenarios that led to it or to identify future potential
accident scenarios so they can be eliminated or mitigated.
The use of the process model concept is a much better way to understand why humans or software
may have done the wrong thing and how to prevent such events in the future than simply saying the
human or software “failed,” which only attaches a pejorative word without providing any insight about
why the person or software did something dangerous.
CAST is a method for analyzing accidents with the STAMP causality model as its foundation. Hence,
it assumes accidents are caused by a lack of effective enforcement of safety constraints on the system
4
behavior to prevent hazardous states (conditions). CAST takes a “systems thinking” view of accidents
using the following assumptions:
• Accidents are complex and do not have single or even several “root causes.” A root cause is
often defined as an event in the event chain whose removal would prevent the final
undesirable consequence. In practice, the root cause is usually identified by going back in the
event chain until some event can be labeled as the root cause. Rasmussen suggests that a
practical explanation for why actions by operators actively involved in the dynamic flow of
events are so often identified as the cause of an accident is the difficulty in continuing the
backtracking “through” a human [Rasmussen 1990]. The concept of a root cause is seductive
because it gives us an illusion of control. It often leads to a sophisticated “whack-a-mole”
process that results in fixing symptoms but not the flaws that led to those symptoms. The result
may be an organization or industry that is in continual fire-fighting mode.
Major accident causes never consist of just a few limited factors: Almost always there is
unsafe behavior by the operator, flawed management decision making, flaws in the physical
design of the equipment as well as flaws in the engineering process used to design that
equipment, safety culture problems, regulatory deficiencies (if the industry is regulated), and
unsafe interactions among all these factors or components of the system. Some accident
analysis techniques focus on one aspect, such as human factors, system component failures or
failures of barriers. Instead, we need accident analysis techniques that allow us to consider all
the factors that may be involved in a loss, including social, managerial, organizational, human,
and technological and their interactions and relationships, including indirect relationships.
In STAMP, the root cause of all accidents is the same: The design and/or operation of the
safety control structure were inadequate to prevent the loss or near miss. The goal of accident
analysis, then, is to identify the flaws in the safety control structure that allowed the events to
occur and to learn how to strengthen the controls to prevent similar losses from occurring in
the future.
• Blame is the enemy of safety [Leveson, 2012]. Blame is a matter for courts. Engineers and
managers need to understand why something occurred, not who to blame. While punishing the
person or people involved may be satisfying, it does not fix the reasons why they did what they
did and does not prevent similar events in the future. Too often, an accident investigation stops
after assigning blame and does not provide enough information to eliminate the basic social
and technical factors involved. In addition, blame can be counterproductive. For example, it can
lead to finger pointing and hiding important information during investigations.
• Human error is a symptom of a system that needs to be redesigned. Most accidents are blamed
on the operators. Human behavior, however, is always influenced by the context in which it
occurs. That context usually has physical, organizational, psychological, and social aspects that
impact on the behavior. Trying to change behavior without changing the context in which it
occurs is usually doomed to failure. Identifying the contextual aspects of human behavior
involved in an accident is necessary to learn what to change to prevent such behavior in the
future.
• Hindsight bias hinders learning from accidents. For the most part, humans are trying to do the
right thing and do not purposely do something that will injure themselves or others. After an
accident, it is easy to see where people went wrong, what they should have done or not done,
to judge people for missing a piece of information that turned out to be critical, and to blame
them for not foreseeing or preventing the consequences [Dekker 2006]. Before the event, such
insight is difficult and, usually, impossible. The Clapham Junction railway accident in Britain
concluded: “There is almost no human action or decision that cannot be made to look flawed
and less than sensible in the misleading light of hindsight.” [Hidden 1990] Saying that a person
5
did something wrong provides very little useful information about how to eliminate that
behavior. The common phrases in accident reports like “he could have,” “she should have,” or
“if he or she would have” all indicate instances of hindsight bias. To improve safety, it is
necessary to start with the premise that except for a few sociopathic individuals, nobody
purposely engages in behavior that they think will lead to an accident. For maximum learning
from the loss, we need to get rid of hindsight bias in our accident reports and go beyond listing
what people did wrong and ask why it made sense to the person to do what they did [Dekker
2006]. Factors that can influence behavior include conflicting goals (e.g., safety vs. efficiency),
unwritten rules or norms, lack of information observability (information may be available but
not observable for many reasons), productivity pressures, attentional demands, and
organizational context. CAST attempts to eliminate hindsight bias as much as possible from
accident analysis and identify why the unsafe (in retrospect) behavior occurred.
The basic process involved in a CAST analysis involves first creating the safety control structure at the
time of the loss:
1. Starting at the bottom of this structure (the physical process involving the loss), identify the
failures and unsafe interactions involved in the loss events (e.g., explosion) as well as any
physical controls that were designed to prevent the specific loss events that occurred. Why
were they not effective?
2. Next, starting with the controller(s) immediately above the physical process and moving in
turn upward in the control structure, identify
a. The controller’s responsibilities related to preventing the loss
b. Their unsafe control actions or lack of actions
c. Why they behaved unsafely
i. Process model flaws
ii. Contextual factors
3. Identify other factors that affected the behavior and interactions among the safety control
structure components including
a. Industry and organizational safety culture
b. Safety information system
c. Communication and coordination among controllers
d. Dynamics and changes over time
4. Generate recommendations that will eliminate or reduce the unsafe behavior. These will often
involve missing feedback.
The rest of this report provides an example of CAST applied to an explosion in a Shell chemical plant in
the Netherlands.
6
CAST ANALYSIS OF THE ACCIDENT
This CAST analysis example is based on the official accident report by the Dutch Safety Board
[2015]. Unfortunately, a lot of important information needed for the CAST analysis could not be
obtained after the completion of the investigation. Where this occurred, the questions that would have
been prompted if CAST had been used during the investigation are instead inserted. One of the
important tasks of an accident analysis method is to identify the relevant questions to be asked by the
investigators.
BACKGROUND 1
On 3 June 2014, an explosion and fire occurred at the Shell Moerdijk plant in The Netherlands. Shell
Moerdijk produces chemicals, such as ethylene and propylene, used to manufacture plastic products.
Heat is first used to convert gasoil, naphtha, and LPG into a wide variety of chemicals. These chemicals
are then used, among other things, as raw materials to produce other products at Shell Moerdijk,
including those produced by the styrene monomer and propylene oxide (MSPO) plant involved in the
accident.
Shell has two MSPO plants in Moerdijk: MSPO1 (commissioned in 1979) and MSPO2. The accident
took place in the MSPO2 plant, which was designed in 1996 by the predecessor of what is now called
Shell Projects and Technology,2 the license-holder for the process. On the basis of a user agreement,
Shell Moerdijk is responsible for the operation of the MSPO2 plant.
The MSPO plants produce styrene monomer and propylene oxide using ethylbenzene as the raw
material. Styrene monomer is used for the production of polystyrene, a plastic that is used in a wide
range of products such as polystyrene foam. Propylene oxide is used for the production of propylene
glycol, which is used in food, cosmetics, medicines and other products.
Worldwide, Shell has three more plants in which styrene monomer and propylene oxide are
produced by means of a process that is virtually the same as at the MSPO2 plant. Two plants are located
in Singapore, at a site called Seraya, and one plant is in Nanhai, China.
In general terms, styrene monomer and propylene oxide are produced as follows (Figure 23):
Ethylbenzene reacts with oxygen whereby it is converted into ethylbenzene hydroperoxide. The
ethylbenzene hydroperoxide then reacts with propylene with the help of a catalyst4 and is converted
into propylene oxide and methylphenylcarbinol and methylphenyl ketone. The methylphenyl ketone is a
by-product of this reaction. In the last step, the methylphenylcarbinol is converted into styrene
monomer. The by-product methylphenyl ketone is also converted into methylphenylcarbinol in a
separate process step with the help of a different catalyst. It was in this final step of the process that the
accident occurred.
The explosion was in the hydrogenation Unit (4800) of the MSPO2 plant. In the reactors of Unit
4800, hydrogen is used along with a catalyst to convert methylphenyl ketone into methylphenylcarbinol.
This conversion, using hydrogen, is known as hydrogenation. The reaction with hydrogen in Unit 4800
releases heat, which is dissipated by allowing liquid ethylbenzene to flow along the catalyst in the
reactors. The process is called “exothermic hydrogenation reaction.” It requires a pressure increase in
the reactor. Because hydrogen is very flammable when combined with the increased pressure, fire can
1
Most of this section is taken directly from the Dutch Safety Board’s Accident Investigation Report.
2
Because I do not know the name of the predecessor organization, it will be referred to by the current name,
Shell Projects and Technology, in this analysis.
3
I know very little about chemical engineering so I made up a notation for the figure that made sense to me.
There is probably a standard notation used by chemical engineers.
4
A catalyst is a substance that influences the rate of a specific chemical reaction.
7
occur in the event of a leak. This hazard places important safety requirements on the design and
operation of the Unit.
In general terms, Unit 4800 consists of two reactors, two separation vessels, a combined
installation with which a liquid can be heated or cooled, and an installation for condensing the gas flow.
The various parts of the Unit 4800 installation are interconnected by pipes and one central pump. See
Figure 3.
Figure 3. Unit 4800 during normal production [taken from DSB report]
Liquids and gases from the reactor are separated from each other in the separation vessels. The
gases from the first separation vessel go to reactor 2, and the gases from the second separation vessel
go to the flare (combustion). In order for the separation vessel to function properly, it is important to
achieve the correct ratio of gas and liquid. Various safety devices are used to achieve this goal.
8
The reactors contain a catalyst. The catalyst is used to accelerate the reaction between the
substances being used in the reactors. In Unit 4800, the catalyst is in the form of cylindrical catalyst
pellets. These are composed of different elements, including copper, chromium and barium. After a
number of years of production, the effects of the catalyst decline and it has to be replaced. The catalyst
pellets are replaced during a brief maintenance stop. The replacement of the pellets was uneventful in
this case.
After the catalyst pellets have been replaced, Unit 4800 has to be restarted. This restart involves
several steps: (1) release the oxygen from the Unit and then test for leaks; (2) flush the Unit with
ethylbenzene to remove contamination; (3) fill the Unit with clean ethylbenzene and start circulating the
ethylbenzene (called the circulation phase); (4) heat up the Unit (the reheating phase); and (5) reduce
the catalyst using hydrogen (the reduction phase).
Circulating the ethylbenzene and heating the Unit (Steps 3 and 4) are necessary in order to wet the
catalyst pellets and to raise the Unit temperature to a level that facilitates the reduction of the catalyst.
The accident occurred during the reheating phase (Step 4).
Thoroughly wetting the catalyst pellets in a trickle-bed reactor5 is critical. Wetting involves fully
soaking the catalyst with ethylbenzene and keeping the pellets continuously wet. If there are localized
dry zones, the heat released from a reaction cannot dissipate. The result can be an undesirable rise in
the temperature of the reactors. To ensure the catalyst pellets are wet down thoroughly, enough
ethylbenzene and nitrogen must be allowed to flow through the reactors and the ethylbenzene must be
well distributed. These requirements are achieved by feeding ethylbenzene (liquid) and nitrogen (gas) in
the correct ratios through a distribution plate in the reactors, creating a “shower effect” that distributes
the liquid optimally across the catalyst pellets.
Catalyst reduction (the fifth step in restarting the reactor) can begin once the plant is at the correct
temperature and hot ethylbenzene has been circulated through it for at least 6 hours. Unit 4800 never
reached this step on the evening of the accident due to explosions and fire during the heating phase.
5
Trickle-bed reactors (used in Unit 4800) have “open” columns filled with catalyst in which a gas and a liquid
flow together in the same direction under the influence of gravity,
6
The term “incident” is defined so differently in different fields that it will be avoided here.
9
3. Means must be available, effective, and used to treat exposed individuals inside or outside the
plant.
After the hazards and safety constraints are identified, the safety control structure at the time of
the accident can be modeled, showing the controls in place to enforce the constraints. The goal of the
safety control structure is to enforce the identified safety constraints on system operation. A major goal
of the analysis is to identify why the safety control structure was not able to prevent the adverse events.
Then recommendations can be created to strengthen the current controls.
I do not know the details of the safety control structure beyond the information included in the
accident report. If this analysis were being done at the time of the accident, the safety control structure
could have easily been identified. In this case, I am limited to the information in the official accident
report. Some basic organizational structures in process plants will be assumed in this benchmarking
exercise.
Figure 4 shows the hierarchical safety control structure at a very high level of abstraction. There
were two major subsystems involved: (1) Shell Moerdijk (shown in the dotted box on the left) and the
state and community emergency management system (the dotted box on the right). These two
subsystems have above them the Dutch regulatory authorities that control the safety of the operation of
Shell Moerdijk and other chemical plants in The Netherlands and the state and local emergency
management. Shell Global oversees Shell Moerdijk and other Shell subsidiaries.
EU
U
Shell Global Netherlands Govt.
Oversight Agencies
10
As stated earlier, the goal of a CAST analysis is not to place blame or to identify so-called root
causes but to understand why the accident occurred. The “root cause” of all accidents, using the STAMP
causality model, is that the safety control structure was not able to prevent the adverse events. After all,
that is the goal of the safety control structure (or, as it is sometimes called, the safety management
system). The goal of the analysis, then, is to understand the weaknesses in this structure so that it can
be strengthened.
The CAST analysis process examines each of the components of the safety control structure at the
time of the loss and determines how they may have contributed to the events. The process does not
stop after a “root cause” is identified but continues until all contributors are understood. Only then can
maximum learning occur and changes to strengthen the entire safety control structure be identified. In
the events at Shell Moerdijk, as in almost all major accidents, nearly every part of the safety control
structure contributed to the events and can be improved.
The safety control structure consists of the controls that have been implemented to prevent
hazards. To understand why the accident (the events) occurred using systems thinking and treating
safety as a control problem, it is necessary to determine why the controls created to prevent it were
unsuccessful and what changes are necessary to provide more effective control over safety.
Figure 5 shows a more detailed version of the safety control structure using the information in the
accident report, the author’s knowledge of the process industry in general, and the Shell public website.
It almost surely does not match the structure within Shell, but it is adequate for this benchmarking
exercise.
Each component in a safety control structure has particular responsibilities with respect to safety.
For the purpose of this CAST demonstration, responsibilities have been inferred that seemed reasonable
but may not match the actual Shell organization. If the CAST analysis had been done as part of the
investigation, the responsibilities and control structure could have been determined.
An accident analysis using CAST involves determining whether these responsibilities were carried
out and, if not, why not. If the safety control structure used for the analysis does not exactly match that
existing at the time of the accident, it will have little impact on the analysis as those responsibilities
should be assigned to someone. The goal is not to determine blame but to identify weaknesses in the
safety control structure and the changes that need to be made to prevent future losses.
11
Shell Global
Executive Management EU
U
Shell Projects Safety Netherlands Govt.
and Technology Management Oversight Agencies
Process Control
System
Chemical Process
Figure 5. The Assumed Detailed Safety Control Structure for Shell Moerdijk
12
EVENTS INVOLVED IN THE LOSS
Focusing on the events only does not provide the information necessary to identify why the events
occurred, which should be the goal of the accident analysis. Identifying the proximate events preceding
the loss is, however, a useful starting place for the analysis. The events can be used to identify questions
that need to be answered in the accident investigation and causal analysis. Table 1 shows the primary
proximate events leading to and following the explosion and some questions they raise that any
accident analysis should answer.
2. One of the restart procedures is to warm up Why were the reactions unforeseen? Were they
the reactors with ethylbenzene. During the foreseeable? Were there precursors that might
warming (reheating) process, uncontrolled have been used to foresee the reactions? Did
energy was released and unforeseen the operators detect these reactions before the
chemical reactions occurred between the explosion? If not, then why not? If they did, why
warming up liquid (ethylbenzene) and the did they not do anything about controlling
catalyst pellets that were used. them?
3 The reactions caused gas formation and
increased pressure in the reactors.
4 An automatic protection system was Did the operators notice this? Was it
triggered that was designed to prevent detectable? Why did they not respond? This
liquid from entering the exhaust gas system seems like a predictable design flaw. Was the
(flare). But preventing the liquids from unsafe interaction between the two
entering the flare also prevented the gases requirements (preventing liquid from entering
in the system from being discharged, the flare and the need to discharge gases to the
increasing pressure in the reactor. flare) identified in the design or hazard analysis
efforts? If so, why was it not handled in the
design or in operational procedures? If it was
not identified, why not?
5 Continued warming up of the reactors Why wasn’t the increasing pressure detected
caused more chemical reactions to occur and handled? If there were alerts, why did they
between the ethylbenzene and the catalyst not result in effective action to handle the
pellets, causing more gas formation and increasing pressure? If there were automatic
increasing pressure in the reactor. overpressurization control devices (e.g., relief
valves), why were they not effective? If there
13
were not automatic devices, then why not? Was
it infeasible to provide them?
6 The pressure rose so fast that it could no Was it not possible to provide more effective
longer be controlled by the pressure relief pressure relief? If it was possible, why was it
devices, and the reactor exploded due to not provided?
high pressure and the separation vessel
collapsed and exploded.
7 The contents of the reactor and its Was there any way to contain the contents
associated separation vessel were released within some controlled area (barrier), at least
into the wider environment. Sections of the the catalyst pellets?
reactor were blasted across 250 meters
while other debris was later found 800
meters away. The explosion could be heard
20 kilometers away.
8 Two people working opposite Unit 4800 at Why was the area not isolated during a
the time of the explosion were hit by the potentially hazardous operation?
pressure wave of the explosion and the hot Why was there no protection against catalyst
and burning catalyst pellets that were flying pellets flying around?
around.
A large, raging local fire occurred,
generating considerable amounts of smoke
Community firefighting, healthcare, crisis
management, and crisis communications
were initiated.
After the control structure has been constructed and the events leading to the accident identified,
CAST involves examining the role each controller played in the events, starting with the lowest physical
plant safety controls at the bottom of the safety control structure and working upward to the social and
political controls. At each step, the goal is to look at the higher levels to determine why the unsafe
control at the current level occurred. Only the controls related to the specific events are examined,
although general safety-related responsibilities are included here. Where the specific information
needed for a complete analysis of this particular accident could not be located, questions are inserted
(in italics) in the analysis results that would have been asked during a CAST-driven investigation and
included in the final report. CAST helps investigators to identify what information needs to be gathered
and the questions to ask those involved.
THE ROLE OF THE PHYSICAL DESIGN OF THE PLANT (PLANT EQUIPMENT) IN THE LOSS
The analysis of the physical controls does not differ significantly from that done in most accident
analysis except that more than failures are considered.
Controls: The physical safety equipment (controls) in a chemical plant are usually designed as a series of
barriers to protect against runaway reactions; protect against inadvertent release of toxic chemicals or
an explosion (uncontrolled energy); convert any released chemicals into a non-hazardous or less
hazardous form; provide protection against human or environmental exposure after release; and treat
exposed individuals. The Shell Moerdijk plant had the standard types of safety equipment installed. Not
all of it worked as expected, however.
14
Requirements: Provide physical protection against hazards (protection for employees and others within
the vicinity)
1. Protect against runaway reactions
2. Protect against inadvertent release of toxic chemicals or explosion
3. Provide feedback about the state of the safety-critical equipment and conditions
4. Provide indicators (alarms) of the existence of hazardous conditions
5. Convert released chemicals into a non-hazardous or less hazardous form
6. Contain inadvertently released toxic chemicals
7. Provide physical protection against human or environmental exposure after release
Missing or inadequate plant physical controls that might have prevented the accident:
• There was an inadequate number of temperature sensors in the reactor to detect hot spots.
• The plant was not fitted with pressure relief valves that would have prevented a runaway. Those
that were installed were not designed for the rapid pressure increases that occurred.
Failures:
None of the physical controls failed except for the final collapse of the reactor and separation vessel
after pressure reached a critical level.
Unsafe Interactions: Accidents often result from interactions among the system components. In this
case, the following unsafe (and mostly unexpected) interactions occurred:
• The process to distribute the ethylbenzene over the catalyst pellets (wet them) resulted in dry
zones. There were two main reasons for these dry zones:
- The nitrogen flow was too low. To wet the catalyst properly, an adequate amount and ratio
of ethylbenzene and nitrogen must pass through the distribution plate. Because the flow of
nitrogen was too low, the distribution plate did not operate properly. Later, due to this
problem, along with other unintended interactions, the pressure increased eventually to
the point where it exceeded the flow of nitrogen to the reactor. The nitrogen flow came to
a standstill, resulting in a negative pressure differential.
- The flow of ethylbenzene was unstable and at times too low. In addition to a sufficiently
high nitrogen flow, a constant and sufficient flow of ethylbenzene is required in order to
properly wet the pellets. The two reactors of Unit 4800 have different diameters, which
means that reactor 1 requires an ethylbenzene flow of approximately 88 tons per hour
while reactor 2 needs approximately 22 tons per hour. A constant flow of this volume was
achieved in reactor 1. A constant flow of the correct volume was also initially achieved for
reactor 2. However, once ethylbenzene began being heated, the flow became unstable. In
the last hour before the explosion, this flow was virtually zero on two occasions. As a
result, the ethylbenzene was not evenly spread over the catalyst pellets, leading to the
catalyst pellets not being adequately wetted and dry zones developing in reactor 2.
15
• Energy released during the warming of the reactor led to unforeseen chemical reactions
between the warming up liquid (ethylbenzene) and the catalyst pellets in the dry zones. As
heating took place, the ethylbenzene began to react with one of the catalyst elements (barium
chromate), generating heat. The ethylbenzene dissipated this heat in the areas that were
sufficiently wetted. In the dry zones, however, this heat did not dissipate due to the lack of
ethylbenzene. The result was that in the dry zones, the catalyst pellets heated up considerably,
and there was localized development of very hot areas or “hotspots.” The hotspots were not
automatically detected due to the limited number of temperature sensors in the reactors.
• Due to the rising temperature, the reaction in the hotspots kept accelerating, thereby producing
even more heat. The localized temperature was now very high, which resulted in a chemical
reaction between the ethylbenzene and another catalyst element (copper oxide). This reaction
caused gases to be released. These follow-on reactions reinforced each other and could no
longer be stopped: a runaway had developed. The rapidly rising temperature led to localized
ethylbenzene evaporation.
• Gas formation increased the pressure in the reactor. At the same time, the maximum liquid level
in the second separation vessel was exceeded, causing the automatic protection system (used to
release excess pressure) to shut down automatically in order to prevent liquids from entering
the exhaust gas system (flare). As a result, the gases in the system could no longer be
discharged. This automatic protection device to prevent liquids from entering the flare operated
as designed, but had the unintended consequences of preventing the venting of the gas.
• The buildup of gas caused the pressure to increase. Eventually, the pressure reached the point
where the automatic pressure relief devices in place could not adequately release it. The
pressure relief devices on the separation vessels were not designed for such rapid pressure
increases, and eventually the collapse pressure of the reactors was reached. Reactor 2 collapsed
and exploded, followed 20 seconds later by the explosion of the first separation vessel.
• The contents of reactor and the separation vessel spread beyond the boundary of Unit 4800. A
pressure wave and hot, burning catalyst pellets hit workers in the area causing injuries.
• There are three remote-controlled containment valves. The explosions made these valves
ineffective. The alternative was to use other limiting valves, but these valves cannot be remotely
operated (they must be operated manually). Due to the intensity of the fire that had broken out
and the risk of explosion, it was not possible for these to be operated immediately. An initial
attempt was made around 02:30.
Contextual Factors:
• The plant was out of operation for a short, scheduled maintenance to replace the catalyst-
causing granules.
• Unexpected reactions occurred due to vulnerabilities related to the design, including:
- potential for insufficient wetting
- use of ethylbenzene and an assumption that this substance is inert
Summary of the Role of the Physical Components in the Accident: None of the physical controls failed.
The final physical collapse of the reactor and separation vessel after pressure reached a critical level
resulted from unexpected and unhandled chemical and physical interactions. Many of these unsafe
interactions were a result of design flaws in the reactor or in the safety-related controls.
16
Recommendations: The physical design limitations and inadequate physical controls need to be fixed.
(The potential detailed fixes are not included here; they need to be determined by a qualified chemical
engineer.)
The above analysis is useful in terms of learning about flaws in the design of the physical equipment
and how to eliminate them to prevent the same occurrence in the future. It does not, however, fully
explain why the accident occurred and what needs to be changed (beyond this specific design) in the
design process, in the assumptions used in the design process, and in operations to prevent a wide
variety of accidents in the future and not just a repetition of these specific events. Many questions are
raised from this analysis, such as: Why did the design flaws get through the design and review process?
Were they there from the beginning or did they result from changes over time? Were there any
precursor events that might have been used to identify the design flaws before an accident occurred?
Why did the operators not notice the increasing pressure before the runaway occurred and prevent the
explosion? And so on.
To answer these questions, it is necessary to look at the higher-level components of the safety
control structure that were meant to prevent and control unsafe conditions in the physical plant.
Responsibilities:
• Assist operators in controlling the plant during normal production and off-nominal operations
(shut down, startup, maintenance, emergencies, etc.)
• Display relevant values, provide alerts, issue control actions on plant equipment
• Control temperature, pressure, level, and flow to ensure that the process remains within the
safe margins and does not end up in an alarm situation.
UCA: The Process Control System did not provide the assistance required by the operators to safely
control the start-up process including automatically controlling the heating rate and other important
variables.
Process Model Flaws: It appears that the process control system, for the most part, had the correct
information to assist the operators in controlling the start-up. There was some missing information
about temperature that was the result of inadequate numbers of temperature sensors.
17
the required temperature and the time required for heating are
checked by the process control system and the values are coordinated
together.
However, there were no special automated control circuits for the
heating phase after a catalyst has been replaced, which was the phase
in which the problems arose. Even for procedures that were known to
be difficult to manage (such as heating) or required “intense attention”
by the operators, no assistance was provided.
The accident report does not provide any reason why the process
control system was configured only for a normal production phase.
Who made the
After a Unit stops, in the pit stop period, most of the controls are set on
decision? With what
manual. This decision was justified as giving the Panel Operator more
rationale? What
flexibility. However, because the filling, circulating, and heating phases
analysis and review
during the preparatory phase for reducing are not included in the
was done to justify
design, this flexibility can be dangerous and lead to operator errors.
this decision?
The temperature in the reactors is measured using temperature
Because there are two
elements that do not allow the temperature throughout the volume of
possible controllers, is
the reactor to be measured. As a result, measurements may be delayed
there potential for
and/or areas in the reactor may be hotter/colder than temperatures
confusion over who is
registered by the temperature element. The aim of circulating is
actually in control at
therefore to ensure that the catalyst bed is wetted and heated
any particular time?
homogeneously. The different temperature controls were sometimes
operated manually by the Panel Operator and sometimes automatically
by the system. This design also meant the Panel Operator had to be
extremely attentive.
UCA: There was no automatic reset after two high-high level alarms so the gas discharge system
remained closed.
UCA: The process control system did not step in to stop the process when pressure and temperature
increased precipitously.
18
Why? (Factors Affecting the Unsafe Control) Questions Raised
A safety margin was built into the collapse pressure, which is at least 3
times higher than the design pressure. The vessel was actually able to
withstand an even higher pressure. None of this prevented the
pressure from the chemical reaction from exceeding the collapse
pressure of the two affected vessels. But the incorrect designed safety
margin may have created complacency in the Process Control system
designers and reduced the need in their minds to provide ways to stop
rapidly increasing temperature and pressure. They appear to have
assumed that such a pressure/temperature increase was not possible.
I am not completely
The designers did not anticipate a scenario whereby a fast and high
sure why activating
pressure build-up was possible. This assumption affected the
the valve would have
configuration of the instrumentation safety devices. For example, it was
made no difference,
estimated that any pressure/temperature build-up (a few bar and a
but it is probably
maximum temperature of approximately 74°C) due to a runaway
because the very high
would not actually be high enough to reach the pre-set pressure on the
rate of the pressure
pressure relief valve and to operate this valve.
build-up did not
The programmed Emergency Depressuring System (EDP) was
provide enough time
therefore configured in such a way that during an unwanted pressure
between exceeding
build-up, the pressure in Unit 4800 would be relieved within a half
the set point and the
hour. During this pressure relief, the pressure in the Unit would drop to
explosion for the EDP
50% of the design pressure. The Panel Operator also had to activate this
to work.
instrument-based pressure relief manually. On 3 June 2014, this
instrument-based pressure relief was not activated. According to the
accident report, if the Panel Operator had activated this valve, it would
most likely not have made any difference, however, in terms of the
explosion.
The installed independent pressure relief valves were specifically
intended to accommodate the pressure that could build up if the
hydrogen feed valve did not open. In that case, the pressure in the
hydrogen system could cause a pressure build-up in the Unit 4800. The
blow-off capacity of this pressure relief was insufficient to provide for
the scenario that occurred that night.
In a previous incident at another Shell plant (Nanhai) with the same
design, a runaway was observed that resulted in a temperature many
hundreds of degrees Celsius higher as well as a higher pressure than
was previously estimated. Nobody felt this was a reason to reconsider
the runaway scenario and the associated instrument-based safety
devices (see higher level system control and design components).
19
Without an emergency stop, there was way to safely stop I could not find a reason in
operation of the Unit 4800 quickly with a single press of a the accident report for the
button. omission of an emergency
stop button for Unit 4800.
Was this complacency,
cost, or an engineering
reason? Emergency stop
buttons are standard for
safety-critical systems.
The instrument-based safety devices were designed to respond
Why were these
to and prevent particular conditions, but not the ones that
conditions omitted?
occurred. After this accident, a safety device was added to
protect Unit 4800 from an excessively high temperature due to
an unwanted chemical reaction with hydrogen.
The containment system
There is a containment system, which is described as “one or
design was not useful in
more appliances of which any components remain permanently
this situation. Was it
in open connection with each other and which is/are intended to
impossible to design one
contain one or more substances.” The valves of the containment
that can be operated
system, however, cannot be remotely operated and must be
remotely?
operated manually. Due to the intensity of the fire that night and
the risk of additional explosions, it was not possible for these
valves to be operated immediately and, in fact, were not
operated until several hours later when the fire had been
extinguished.
Summary of the Role of the Process Control System in the Accident: The process control system was not
configured to provide the necessary help to the operators during a start-up or to allow them to easily
stop the process in an emergency. The reason for these design decisions rests primarily in incorrect
assumptions by the designers about the impossibility of the scenario that occurred. Even after previous
incidents at similar plants in which these assumptions were violated, the assumptions were not
questioned and revisited.
Recommendations: The operators’ knowledge and skill is most challenged during off-nominal phases,
and most accidents occur during such phases and after changes are made or occur. The process control
system should be redesigned to assist operators in all safety-critical, off-nominal operations (not just
this restart scenario). For manual operations, the goal should be to provide all necessary assistance to
the operators in decision making and taking action and to reduce attention and time pressures (see the
next Section).
20
Given all these things that in hindsight the operators did wrong, they appear to have major
responsibility for the loss. In fact, listing these actions is where many accident causal analyses and
accident reports stop and the operators are blamed for the accident. While it is possible that everyone
working the turnaround that day was negligent and irresponsible, it is more likely that they were trying
to do their best. Without understanding why they made bad decisions, i.e. why the decisions seemed
correct to them at the time, we cannot do much about preventing similar flawed decision making in the
future. Many of the answers lie in higher levels of the control structure, but some of the operators
actions can be understood by looking at their process models and the context in which they were
making decisions.
To understand the operators’ actions, a time line of relevant actions is useful.
Process Model Flaws: Operator decisions are based on their mental model of the state of the controlled
system (in this case, the reactor) and its expected behavior. Mental models are created by past
experience in operating the process, by training, and by feedback about the current state of the
controlled process. At least a partial understanding of why the operators acted the way they did is
gained by looking at their mental models at the time of their unsafe control actions.
Contextual Factors: Human behavior is always affected by the context in which it occurs. To understand
why a person behaved the way they did, it is necessary to identify the contextual factors that affected
their behavior.
The following are contextual factors affecting all the unsafe control actions:
21
Why? (Context) Questions Raised
Safety during the heating and wetting of the reactors was Why were they assigned to this task?
dependent on the knowledge and skill of the Panel Given that the safety of the startup
Operator and the Production Team Leader on duty. Shell was known to be dependent on the
Moerdijk’s safety report (dated 2000) stated that the knowledge and experience of the
starting and stopping of the plants had to be undertaken operators (as stated in the 2000 Safety
by experienced operators using the work instructions that Report), who made the decision to let
are present for this purpose. operators inexperienced in a reactor
The Operator and Production Team Leader performing start-up control the start-up? More
this maintenance stop were experienced staff on Unit important than who is “why” This
4800, and were educated and trained for working at the question is not answered in the report.
MSPO2 plant during regular production. However, only Were they the only ones available or
once every three to four years is Unit 4800 started up the most experienced available or was
after a catalyst change. This was the first time that the the person making the decision
Panel Operator and Production Team Leader had unaware of the safety report
experienced a startup of Unit 4800 after a catalyst requirements or unaware of the
change. Therefore, in this incident, both the Panel experience levels of the operators
Operator and Production Team Leader involved were assigned or …? The answers to these
lacking the specific experience required to safely start up questions will help to formulate
Unit 4800. effective recommendations for
changes to prevent future accidents.
Without the appropriate knowledge and experience Who made this assumption? On what
needed to adjust gas and liquid flows during startup, was it based?
support from the process control system was needed.
However, the process control system was configured for
production, not start-up, and therefore did not provide
the assistance they needed. The accident report says “It
was assumed that the Operators and the Production
Team Leader could manage and control the start-up
manually based on their knowledge and experience.”
22
Section 1.7). The operators produced the work experience and knowledge to create
instructions. the work instructions? Who reviews
these work instructions? Were they
reviewed by anyone? Why are
operators writing their own work
instructions?
After a Unit stops, in the pit stop period, most of the
The accident report does not provide
controls are set on manual. This is justified as giving the
any reason why the process control
Panel Operator more flexibility. However, because the
system was configured only for a
filling, circulating, and heating phases during the
normal production phase. Why did the
preparatory phase for reducing are not included in the
process control system designers make
design, this flexibility can be dangerous and lead to
this decision? Was the major concern
operator errors.
and goal for the process control system
• In an automatic control circuit, the control system
to optimize productivity while safety
regulates and checks that the set value is achieved
was not a high priority? Was there a
and stabilized, without further interference from an
lack of resources? Was there a human
operator. For example, at a set heating rate, both the
factors oriented hazard analysis done
required temperature and the time required for
that considered the risks involved in
heating are checked by the process control system
operators controlling a critical process
and the values are coordinated together. Without
requiring focus and precision without
such an automated control system, manual control by
automated assistance?
the Panel Operator and the Production Team Leader
was required during the wetting and heating of the
reactors. The manual filling, circulating, and heating
phases require a great deal of focus, precision, and
experience on the part of the Panel Operator.
23
• The report says that “Reactions caused by warming feedback? Or was it not on the main
up actions were out of view of panel operator and the screen and had to be called up
production team leader.” There is no further specially? If the problem was not
explanation. simply a matter of a lack of sensor
feedback because there were not
enough sensors, then a human factors
analysis of the interface with which the
operators were interacting is required
to understand the impact of this
contextual factor on the operators’
behavior. Was any human factors
analysis done on the control interface,
either before or after the accident?
Nothing is noted in the accident report.
UCA: The operators did not stabilize or halt process before the explosion when critical process
boundaries were exceeded.
24
incorrect. The accident report says that important information was lost
between the unit designers and the ultimate managers (and the
operators) of the unit. The operators (like everyone else) accordingly
treated the heating phase as a non-hazardous process step and,
therefore, did not identify any critical process conditions for the work
instructions.
The Panel Operator and Production Team Leader did not realize that
the situation was dangerous and therefore did not decide to intervene
in accordance with ESP policy. They did not have a comprehensive
view of all the signs they were getting. They interpreted the signs as
though they resulted from the setting and stabilization of the
circulation flow and normal system dynamics. They did not have a
comprehensive view of the consequences of their actions in relation to
the combination of high-pressure alarms, the liquid level alarm in the
separation vessels, low ethylbenzene flows, and a high pressure
differential.
7
Many companies have policies that require operators to intervene when something goes wrong but do not
specify specific conditions under which to do this or do not provide the information necessary to make this
decision. Such policies are used to place blame on the operators after an accident but do little to improve safety.
25
If the designers did not
The process control system did not have an emergency stop button.
believe such a scenario
Operators were only able to respond to specific conditions using the
was possible, why would
instrument-based safety devices.
the operators? In fact, as
will be seen, nobody at
While there is no emergency stop button for Unit 4800, the Operator
Shell thought it was
does has the option to shut down the Unit 4800 partially or completely
possible.
via the ESD trip switch, which is independent from the automatic trips
(instrument-based safety devices). The automatic trip did not occur
(see Section 1.5 for why) and the manual ESD trip switches were not
used that night. There is no explanation that I could find in the
accident report about why they were not used although one can
guess.
The designers did not envision a scenario whereby a fast and high
pressure build-up was possible. If the designers did not believe such a
scenario existed, why would the operators? In fact, as will be seen,
nobody at Shell thought such a scenario was possible.
UCA: The operators did not manually activate the instrument-based (automated) pressure relief valve.
UCA: Heating was started (around 21:00) while the situation was still unstable and after only 45
minutes of wetting. Proper wetting had probably not been achieved by that time.
26
Why? (Context) Questions Raised
The accident report says that to achieve proper wetting, the circulation
of ethylbenzene must continue for at least 6 hours before hydrogen can
be used. However, given the assumption that ethylbenzene does not
react with the catalyst, it is possible to start heating earlier. Everyone,
including the unit designers, believed the assumption that ethylbenzene
does not react with the catalyst.
The instruction about waiting 6 hours was not included in the work
instructions.
UCA: The operators manually added additional warmth to the ethylbenzene at a time when heat was
increasing precipitously.
27
when the warming-up procedure started. Temperatures were an operator to determine the
shown but not the rate of the increase in the temperature. actual rate of increase. There
was a lot of instability in the
graphs, which the operators
expected from previous start-
ups.
Controlling steam supply to the heat exchanger requires a
Several things seem to require
degree of attentiveness on the part of the operators. It makes a
a lot of attentiveness on the
difference whether the steam valve can be fully opened (low-
part of the operators. What
pressure steam) or whether it can only be opened partly, to
kind of human factors job
create the same conditions (medium-pressure steam).
analysis was done on the total
Furthermore, it is unclear what heat energy is supplied in the
start-up requirements?
latter case. This is evident the second time that the steam valve
was opened further: at this point much more heat energy was
supplied.
Was confusion created in the
The temperature in the reactors is measured using temperature
mind of the operator by
elements that do not allow the temperature throughout the
having the automation
volume of the reactor to be measured. As a result,
operate the temperature
measurements may be delayed and areas in the reactor may be
controls at the same time as
hotter/colder than temperatures registered by the temperature
the operator was doing this?
elements. The aim of circulating is therefore to ensure that the
catalyst bed is wetted and heated homogeneously. The different
temperature controls were sometimes operated manually by the
Panel Operator and sometimes automatically by the system. This
method also meant the Panel Operator had to be extremely
attentive.
UCA: The operators did not notice and respond to hot spots. They also did not notice and respond to
the related negative pressure differential 8
8
Normally the difference in pressure between the top and the bottom of the catalyst bed is low (20-50
millibar). The accident report says that a significantly higher pressure difference (positive or negative) or a sudden
change in the pressure difference can be indicative of contamination or blockage or other malfunctions that can
have a negative impact on the effect of the catalyst.
28
resulted in a negative pressure difference that was not noticed by the that it is difficult to
Operator. speculate about the
answer to this
question.
UCA: The operators did not properly adjust nitrogen flow. The lower (than required) nitrogen flow
was one of the causes of the accident.
29
Despite the previous experience showing that liquid levels
and liquid flows were difficult to stabilize manually, no
automated assistance was provided.
UCA: The Panel Operator did not reopen the connection to the gas discharge system after the liquid
level in the Reactor 2 separation vessel rose so high that the connection to the gas charge system was
closed (by an automated safety device) to prevent liquids from entering the flare tower.9
Summary of the Role of the Operators in the Accident: The operators acted appropriately or at least
understandably given the context, the incorrect work instructions (which they followed), and their lack
of training and required skill and knowledge in performing the work. In addition, they were provided
with almost no assistance from the process control system, while many of the tasks they needed to do
required intense attention, precision, mental effort, deep understanding of process dynamics, and
frequent adjustments to a continually fluctuating process.
The designers of the plant did not recognize the risks (see the later sections of this analysis) so the
risks might not have been communicated thoroughly. Management seemed to rely on operators seeing
something strange and stopping the process, but did not provide the information and training to ensure
it was possible for operators to do this. Such a policy provides a convenient excuse to blame the
operators after an accident, but it does not result in providing adequate assistance to the operators to
carry out their responsibilities.
9
The gas discharge system is that part of the plant that discharges excess gases from the separation vessels
via a safety valve and burns them. The purpose of the automatic closure of the gas discharge system is to prevent
flammable liquids from being supplied to the flare (a hazard)
30
Recommendations: The operators must have the appropriate skills and expertise to perform their
assigned activities, and there must be someone overseeing operations assigned the responsibility for
enforcing this requirement. A human factors study during the job analysis is needed to ensure that the
operators are provided with information and a work situation that allows them to make appropriate
decisions under stressful conditions, better automated assistance should be provided in all phases of
operation, training should be provided for activities that are known to be hazardous like startup, and
work instructions as well as the process for producing them need to be improved.
Relevant Responsibilities
• Identify plant hazards and ensure that they are eliminated, mitigated, or controlled.
• Either provide work instructions for safety-critical activities or review the work instructions provided
by someone else for their safety implications.
• Ensure appropriately trained, skilled, and experienced people are assigned to high risk processes.
• Follow the Management of Change (MOC) procedures by doing a risk assessment for changes and
implement risk controls based on the results.
• Provide for emergency treatment to exposed or injured individuals and ensure required medical
equipment and personnel is available at all times. [The injured personnel were treated effectively on
the scene so this aspect is not considered further.]
• Perform audits of safety-critical activities or assist plant operations management in performing such
audits [It is not clear from the accident report who is responsible for audits but there do appear to
have been audits.]
UCA: Created work instructions that were unsafe or did not adequately review the work instructions
that were created and used. [This CAST analysis, in the absence of detailed information about the safety
management system at Shell Moerdijk and Shell Global, assumes that performing job analyses and
creating safe work instructions was the responsibility of Plant Operations Management (Section 1.7), but
any reasonable safety management system would assign responsibility to safety management for
reviewing these work instructions to ensure they adequately controlled safety.]
31
practice to omit required
information?
UCA: Did not identify and manage potential risks resulting from changes made to the plant, the
catalyst, the processes, and the procedures. Did not reassess risks when changes were made. [It is not
clear from the report whether some of these changes were made by Shell Projects and Technology or
Shell Moerdijk. I am assuming most were made by Shell Moerdijk.]
- Catalyst change: A new catalyst was selected for the reactor and tested between 1999 and 2000
[Was this done by Shell Moerdijk or by Shell Projects and Technology? This analysis is assuming
Shell Moerdijk. Otherwise, just move this unsafe control action upward in the control structure.]
Using a new catalyst led to a higher risk of a reaction occurring with ethylbenzene, but this higher
risk was not recognized.
32
assumed that the properties of the new catalyst were the same as
those of the previous catalyst. The report says that “The persons
performing this risk screening reached this conclusion [of low or no
risk] based on their knowledge and experience.” It is not clear what
this means. The company did not carry out any laboratory tests for
the new catalyst, and the methodology used in the risk screening
was not appropriate for testing complex substances, such as a
catalyst. The altered composition of the new catalyst was stated in
the safety information sheet provided with the product, but safety
engineering at Shell Moerdijk (and/or Shell Projects and
Engineering?) did not notice this change.
- Procedure changes (heating rate, nitrogen flow) were instituted without a risk assessment.
- MSP02 plant and production changes were not systematically examined for their safety effects
and replacements were not systematically examined on the basis of a risk analysis in all cases.
UCA: Shell Moerdijk did not identify the risks involved in opting for a trickle-bed reactor and its
associated design choices. In particular, the risk of a reaction between ethylbenzene and the catalyst
was not identified as well as other risks associated with Unit 4800.
33
Shell Moerdijk licensed the
Desk Safety Review (1997): For new designs, the Shell subsidiary
trickle-bed reactor design
selects the most appropriate risk evaluation method, based on an
from Shell Projects and
“initial assessment” of Shell Projects and Technology. The relevant
Technology. Were the risks
division then selects the method. The division may choose a
identified there and
different method, provided it substantiates its deviation from the
communicated to Shell
Shell norm.
Moerdijk? What kind of
initial assessment was
Among other things, this Desk Safety Review examined various
done by Shell Projects and
failure scenarios for Unit 4800. However, it only looked at failure
Technology?
scenarios for the production and reduction phases, not for the
heating phase (which was when the accident occurred).
There were never any safety studies that specifically focused on the
circulation and heating of Unit 4800 in the MSPO2 plant because it
was considered to be low risk. Studies done in 1977 had shown that
the catalyst (as it existed then) was inert in the presence of
ethylbenzene. This assumption was never reassessed even though
the composition of the catalyst changed over time and incidents
occurred within Shell reactors that should have prompted a re-
examination of that assumption (see below). Accordingly,
ethylbenzene explosion was not included in the quantitative risk
analysis of the Desk Safety Review because they considered it
highly unlikely although they were aware that the impact would be
huge.
Integrated Safety Report (2000): An Integrated Safety Report was
required for companies with major risks by European Legislation
(Seveso II Directive) and its implementation in the Netherlands by
Brzo legislation (see Section 1.11.1). The Integrated Safety Report
describes both internal and external safety, covering environmental
requirements and the requirements of the fire brigade, in addition
to those related to working conditions.
Why HEMP? The
The Integrated Safety Report only describes (in summary) “the
techniques involved (like
biggest” risks in the form of event scenarios. The safety report must
Bow Tie) are 50 years old
include plant scenarios for each plant, such as the MSPO2 plant. In
and the underlying
order to prepare these plant scenarios, safety engineers at Shell
accident that assumes
Moerdijk used HEMP (Hazard and Effect Management)10 for each
accidents are caused by
plant and for each containment system (such as Unit 4800). Unit
chains of failure events is
4800 was considered low risk. Other containment systems11 were
10
The Hazards and Effects Management Process (HEMP) is an analysis technique that reviews identified
hazards and uses a Risk Assessment Matrix to rank the risks based on consequence and likelihood. The hazards and
identified risk rankings of high, medium or low are documented in a Hazard Register. The hazards identified as
being high risk are modeled using the bow ties. Bow tie models combine a fault tree analysis with an event tree
analysis. While the name “HEMP” is relatively new, the techniques involved are at least 50 years old.
11
A containment system consists of one or more appliances in which the components are permanently in
open connection with each other and is intended to contain one or more substances which, in the event of an
34
higher risk and were therefore included. There is no mention of an not true for today’s more
ethylbenzene-related explosion in this Safety Report. Dozens of complex systems and new
quantitative risk analyses were conducted for MSPO2, but Unit technology.
4800 was not included.
The accident report says that because Unit 4800 was no longer
included in the risks analyses from 2001 onward, safety
management thought the Unit was relatively safe. This impression
was not disputed either internally or externally.
(imminent) major accident, can be closed in a short period of time. Unit 4800 is a containment system; MSPO2 is a
plant that is constructed from a number of containment systems.
12
The law requires companies to prepare 10 scenarios per plant (such as MSPO2). For these scenarios, the
company must select the hazards with the greatest risks and the nature of the risks must be varied.
35
A Reactive Hazard Assessment13 was performed by safety
engineering at Shell Moerdijk from 2010-2011 that included Unit
4800. Attention, however, was focused on other processes that
were considered higher risk, and process conditions in the reactor
were not considered. The assessment was primarily focused on
assessing the effects of substances on the environment and not on
safety. Unit 4800 was included, but most of the attention was on
other processes in the MSPO2 plant that were considered higher
risk. The process conditions in the reactor were not taken into
account.
13
A reactive hazard assessment is described in the accident report as an analysis approach derived from an
Environmental Protection Agency method. It is intended for identifying the effects of substances on the
environment.
36
risk of a runaway reaction and their exclusion of an ethylbenzene- their mental model of
related explosion in the hazard analyses and risk assessments. current risk?
UCA: Inadequate learning from incidents: After accidents in similar plants around the world, relevant
signs and conditions involved in these events were not incorporated into new risk analyses or
procedures (including work instructions) for MSPO2.
37
rapidly and in excessive quantities during normal operation, analysis or questioning of
triggering a reaction. This runaway was investigated by Shell Projects the assumptions? What
and Technology and resulted in additional temperature safety devices is wrong with the risk
being installed. Shell continued to assert that a runaway could not management process or
take place in these reactors. The fact that a runaway had occurred the safety culture that
during the start-up did not prompt further analysis or a review of even having something
these assumptions. occur similarly in the past
did not get past their risk
assessment blinders?
Recommendations:
While the problems specific to the explosions on 3 June 2014 should be fixed, there was a lot of
weaknesses in the Shell Moerdijk safety management design and especially practices that were
identified in the official Dutch Safety Agency accident report and in the CAST analysis. These need to be
improved.
• Safety management at Shell Moerdijk needs to be made more effective. Safety engineering
needs to be more than just going through the motions and minimally complying with standards.
38
• All work instructions should be reviewed for safety by knowledgeable people using information
from the hazard analysis. [In this case, the HA was flawed too, but that is a different problem to
fix.]
• MOC procedures must be enforced and followed. When changes occur, assumptions of the past
need to be re-evaluated.
• Hazard analysis and risk assessment methods need to be improved.
• More inclusive leading indicators of risk need to be established.
• Procedures for incorporating and using lessons learned need to be established or improved.
OPERATIONS MANAGEMENT
According to legislation called the Major Accidents Decree, Shell Moerdijk is responsible for taking
all measures necessary to prevent major accidents.
UCA: Did not identify the flaws in the risk analyses performed or the procedures used for these risk
analyses [Why not? Was this a one-time flaw or did it happen continually?
39
with the minimum risk
assessments and hazard
analyses required by law?
Did they think they were
adequate?
The government regulators of Shell Moerdijk provided no indication
that there were flaws in their risk assessment procedures or their
safety management system (see Section 1.11.1)
The hazard analyses and risk assessments used were standard in
the petrochemical industry.
Like everyone else, they thought that the start-up phase was low
risk (process model flaw). The evidence from the two previous
incidents (one at Shell Moerdijk) did not shake this belief.
UCA: Did not enforce Shell Management of Change (MOC) procedures. Allowed work instructions to
change over time and omit important required information necessary to safely operate the plant during
a maintenance stop. Did not require new analyses by Safety Management when changes occurred that
affected the prior analyses.
40
UCA: Allowed work instructions to change over time and omit important required information needed
to safely operate the plant during a maintenance stop.
14
The Design Book, created by Shell Projects and Technology, contains detailed information about the design
and operation of the reactor.
41
work instructions. The operators during the start-up chose to at or for others in the past when
a rate of 50°C based on past experience. The Design Book they were originally removed?
heating rate had been removed from the work instructions over Who controls the updating
time. process? There must have been
Nitrogen flow was ignored in the work instructions. A too low someone beyond the operators
flow was one of the physical factors in the accident. Again, the who were actually doing the
requirements regarding nitrogen flow were removed during the updating. What engineers
periodic update of the procedures. The accident report states were involved in this decision
that the omission was done in an attempt to limit the content making?
of the job analysis to information that was believed to be
essential and to focus attention on what was thought to be the
most important from a safety and operational point of view.
The accident report states that there was an assumption that,
in principle, the Operators needed to be able to adjust the
nitrogen flow during the heating phases at their own discretion
in order to be able to adjust other processes. The nitrogen flow
was not considered critical and was not included in the work
instructions.
UCA: Made a decision to configure the process control system to control the plant during the normal
production phase but not during non-production and high-risk operations.
42
to provide? Was it a cost
reduction issue?
UCA: Allowed two employees from different contractors to work in the adjacent Unit during the start-
up.
UCA: Flawed assignment of operators to the turnaround (or at least the start-up).
UCA: Did not effectively incorporate lessons learned from similar incidents at other plants in changes
to avoid them at Shell Moerdijk.
DCA: Did not design an effective Safety Management System (SMS) for Shell Moerdijk. The safety
management system did not prevent unsafe situations from being overlooked and internal
procedures not being properly followed.
43
Why? (Context) Questions Raised
No details about the design of the Shell Moerdijk SMS (which is
required by the regulatory authorities in the Netherlands) is
provided in the accident report. The report only describes (very
briefly) the action management procedures, audits, and the
measurements used (number of large leaks and lost time due to
injury).
Do the flaws in the Shell
Guidelines for plant safety management systems are imposed by
Moerdijk SMS stem from
Shell Corporate and by regulators.
flaws in the Shell corporate
and government regulatory
guidelines that Shell
Moerdijk follows?
How is the safety
The accident report states that Shell Moerdijk has a Business
management system
Management System in which safety management is integrated.
integrated into the business
Without more details it is difficult to comment, but integrating
management system? Are
safety and management decision making and management is
they separated enough that
dangerous and has been a factor in major petrochemical company
risk-related decision making
accidents (such as Deepwater Horizon). Clearly someone needs to
is not impeded by the way
make decisions about tradeoffs between safety and business
the information is created
decisions, but those decisions should be made by the responsible
and displayed (e.g., by risk
decision makers with full information about all the factors that
assessments that combine
must be considered and not lost by integrating risk information in
business and safety risks
a nontransparent way.
below the appropriate
decision-making level?
UCA: Audits (both internal Shell Moerdijk audits, those by the parent company, and external audits)
did not show any evidence of the shortcomings in the safety studies, the management of changes, the
lessons learned from incidents, and other important factors in the accident.
44
• In general, have an inaccurate view of the risk that existed in the plant.
• Feedback:
- The heating phase did not appear in the report provided by plant safety management.
- Plant safety management did not provide correct risk assessments to operations
management.
Recommendations:
• Establish and enforce proper MOC procedures. If changes occur, retest assumptions that could be
affected by those changes. This implies that these assumptions must be recorded, leading indicators
established for identifying when they may no longer be correct, and a process established for testing
and responding to changes that might affect these assumptions.
• A thorough review of the Shell Moerdijk SMS should be done with emphasis on why it was unable to
prevent this accident. Major factors in this accident are related to basic activities that should have
been controlled by the SMS.
• Update procedures to eliminate the causes of the accident such as lack of control and supervision of
the work instruction creation and oversight processes, inadequate hazard analysis and risk
assessment procedures, assignment of operators to perform the turnaround who did not have the
required skills and expertise, inadequate use of lessons learned from the past, and audit procedures
that did not identify the shortcomings before the accident.
• Improve the process control system to provide appropriate assistance to operators performing
functions that are outside of normal production.
45
SHELL CORPORATE
Three basic functions are examined here: Engineering design (Shell Projects and Technology), corporate
safety management, and executive-level corporate management (including the Board of Directors). The
exact distribution of the safety responsibilities in the Shell Global management structure was not
included in the accident report, although they may be distributed throughout the Shell Global
management structure differently than assumed here. The bottom line is that they need to be
somewhere.
Safety-Related Responsibilities
• Create a safe design: Perform hazard analysis (or use the results of hazard analysis created by
another group) and eliminate or mitigate the hazards in the design.
• Provide design, hazard, and operating information to the plant operators to assist those who are
operating the plants in avoiding any hazardous scenarios that the designers were not able to
eliminate or adequately mitigate in the design itself.
• Learn from the operation of their designs and improve the designs based on this feedback.
UCA: Shell Projects and Technology did not provide design information in a form that could reliably be
translated into safe operating procedures.
UCA: Several MSOP2 design flaws contributed to the accident including a lack of pressure relief
devices that would have been capable of mitigating a runaway (they were not designed for the
pressure increases that occurred) and an inadequate number of temperature sensors in the reactor.
Nitrogen flow requirements provided to the licensees were incorrect. Flaws identified in the central
pump design after the Nanhai incident were never fixed.
46
Why? (Context) Questions Raised
The accident report states that the designers assumed it was
Was this simply a design
impossible for a runaway to occur during the normal production
miscalculation or was it a
phase. On the basis of this assumption, Unit 4800 of the MSPO2
result of the mistaken
was not fitted with pressure relief devices that would have
assumption about the
been capable of mitigating a runaway. The pressure relief valves
reactivity of ethylbenzene and
that were provided on the separation vessels were not
the catalyst or perhaps lack of
designed for the rapid pressure increases that occurred.
emphasis on start-up hazards?
15
Insufficient sensors above a specific point in the ISOM tower was a factor in the Texas City refinery
explosion. In that case, more sensors were possible but were omitted in the design. Apparently, the additional
sensors were not considered necessary, perhaps because of an assumption that the liquid would never rise above
the maximum level in the tower. Is there a lack of adequate worst-case analysis being used in the petrochemical
industry?
47
nitrogen flow is necessary—of approximately 1700 kilograms scientific knowledge that could
per hour—to enable the distribution plate to function properly. not be corrected before the
So the original calculation appears to be incorrect. loss?
Because the central pump has a considerable pump capacity
Who knew this? How do Shell
compared to the capacity of the separation vessels, work has to
Projects and Technology work
be performed with the shut-offs and valves almost closed. The
together with Shell Moerdijk
accident report says that “Shell was aware of this but took no
engineering and project
further action,” but does not qualify whether this knowledge
management to resolve
was at Shell Moerdijk or at Shell Projects and Technology.
identified weaknesses?
The accident report says that the design data and the system
Again, does Shell Projects and
configuration of the process controls do not provide adequate
Technology or Shell Global
information for the filling and circulation phase. After the
Safety Management review the
Nanhai incident in 2010, one of the recommendations was to
installation job analyses?
reassess the design of single central pump. Based on the safety
studies (period 2010-2011), it does not appear that this was
actually done. In any event, there was still only a single central
pump in the MSPO2 in 2014. A turnaround was carried out in
2011, clearly revealing that circulating and heating were not
simple matters. The nearly complete closure of the valve under
the separation vessel and the containment which had a
negative impact on the stability were points of attention.
However, these points were neither examined in sufficient
detail nor were they included (by Shell Moerdijk) in the 2014
job analysis.
What type of worst case and
Insufficient information is provided in the accident report about
hazard analysis is used in
the reason the design flaws (e.g., incorrect assumptions about
design? What type of design
operating conditions or limited scientific knowledge at the
reviews are conducted? Why
time), why they were not detected in the design review
were the identified flaws not
process, and why they were not fixed adequately after the
fixed after the incidents
incidents at Shell Moerdijk in 1999 and Nanhai in 2011.
showed that the design and
underlying design assumptions
were flawed?
UCA: Modifications were made to the production process, including switching to a different catalyst
without retesting the assumptions of the past.
48
composition of the catalyst. These modifications did not
always lead to a new risk analysis.
UCA: The work instructions for this maintenance stop were drawn up by Shell Moerdijk panel
operators, but the accident report says that this WOL was “approved by the staff of the Shell P&T
process owner.”
UCA: After investigating incidents, Shell P&T did not identify relevant signs regarding process
conditions and provide an adequate response to improve safety.
UCA: The risks involved in opting for a trickle-bed reactor and its associated design choices were not
recognized and managed properly.
Tests showed the performance of the catalyst in the liquid full reactor
Was there a technical
was the best. Therefore, this type of reactor was chosen in 1976 for the
reason for not using
49
MSPO1 plant at Shell Moerdijk as well as the first trickle-bed reactor in ethylbenzene in the
Seraya. With the prospect of more SMPO plants, however, all using the tests of the catalysts?
trickle-bed design, they needed an alternative catalyst supplier. During
the test phase of the MSPO2 plant between 1999 and 2000, Shell
compared three catalyst from three different manufacturers. During
these tests, the conditions during the start-up phase were not
considered. The tests also deviated greatly from the plant conditions.
For example, during testing the catalysts were dry reduced using
hydrogen and nitrogen, and therefore they were not tested in the
presence of ethylbenzene. Furthermore, the tests focused mainly on
assessing the normal production phase.
Around 1990, Shell decided to develop the second SMPO plant in Seraya
in Singapore. In the meantime, knowledge had evolved. Research
showed that the production process in the liquid full reactor was less
effective than had previously been expected. There were also new
developments surrounding the trickle-bed reactor.
• The performance of the catalyst had been substantially
improved.
• It was possible to carry out production at much lower pressure
and temperature, which improved safety.
The liquid full reactor had disadvantages with regard to conversion time
and the amount of methyphenyl ketone that had to be circulated over
the catalyst bed to get methylphenylcarbinol.
Shell opted for a trickle-bed reactor for the Seraya plant and shortly
thereafter, for the MSPO2 plant.
The new, inherent risk involved in using a trickle-bed reactor rather than
a full liquid reactor included: insufficient wetting, followed by the
development of hotspots, potentially resulting in a runaway. This risk
was in fact identified for the reduction and production phase, but not for
the heating phase. It was not recognized before the accident on June 3,
2014.
16
“Drop in” means that no changes to equipment or procedures are needed before using this catalyst.
50
Summary of the Role of Shell Projects and Technology:
The design data provided to the licensees was not usable by those creating work instructions at the
plants using the technology. The design had safety-critical design flaws that were not found in hazard
analyses during the initial design phase and were not fixed after receiving information about serious
problems in operations at some Shell plants such as an inadequate number of temperature sensors and
pressure relief valves that could not handle the pressure that occurred. Unsafe and incomplete work
instructions were approved by Shell Projects and Technology for the Unit 4800 turnaround at Shell
Moerdijk.
Without more information about the operations at Shell Corporate, it is difficult to determine exactly
why the unsafe control occurred. More questions than answers arise from the CAST analysis, such as
Why were the design flaws introduced and how did they get through the design process? What type of
hazard analysis is performed by Shell Projects and Technology (or used if it is produced by another
group)? Why were identified design flaws not fixed after the incidents at Shell Moerdijk in 1999 and
Nanhai in 2011? What other types of feedback is provided about the safety of their designs during
operations in the Shell plants? What information about the safety aspects (hazards) of the plant design
are passed from Shell Projects and Technology to the licensees of their designs? What information is
included in the design book? Is the design data provided adequate for the licensees to create safe work
instructions if engineers are writing the work instructions instead of operators and did they not know
who was going to be performing this task? Why did they approve unsafe work instructions that did not
even follow the required Shell format? What information is provided in the Design Book about start-up
and the hazards of start-up? What types of hazard analysis are performed during the design process?
What is the process for ensuring safety when changes are made? How are safety-related assumptions
recorded and what triggers a re-analysis of these assumptions? What feedback do the designers get
about the operation of their designs?
Recommendations:
Fix the design features contributing to accident. Determine how these flaws got through the design
process and improve the design and design review process. Fix the design book to be understandable by
those who are writing the work instructions and to be comprehensive in the information needed to
safely operate installations of the licensed technology. Fix the work instruction review process by Shell
Projects and Technology to ensure the instructions are complete and safe. Review and improve the
hazard analysis process used by Shell Projects and Technology.
Relevant Responsibilities
• Safety of plant design, including conduct of hazard analysis on designs licensed to subsidiaries.
• Oversight of operational safety at the various Shell plants and facilities.
51
• Management of change procedures related to safety: creating them, making sure they are
followed, and improving them using feedback from incidents.
• Communication among separate plants in different parts of the world about incidents, lessons
learned, etc.
• Creating and updating a Shell-wide Safety Information System and ensuring the information is
being communicated adequately both within Shell Corporate and globally and that it is complete
and usable.
UCA: Inadequate risk analysis and/or risk control at the corporate level.
UCA: The safety information system appears to be inadequate, but no information is provided in the
accident report about it. See Section 1.12.2.
UCA: Learning from incidents: After investigating incidents, Shell did not identify relevant signs
regarding process conditions and did not incorporate these into new risk analyses for MSP02 or make
requisite changes to ensure the incidents did not happen again or lead to a major accident.
52
Shell safety information system. How are lessons learned at one investigation, a culture of
Shell site communicated to other sites? denial, not part of Shell defined
procedures and responsibilities,
etc.?
What is wrong with the risk
Lessons learned from the previous incidents at Shell Moerdijk
assessment process that even
were not incorporated into the design of the related plants and
having something occur
procedures. Repeated statements were made that such events
similarly in the past did not get
could not happen in that reactor.
past their risk assessment
blinders?
UCA: Management of Change policies were not implemented adequately in this case.
UCA: Audits and Shell internal safety supervision procedures did not reveal the shortcomings related
to safety studies and the management of change and lessons learned procedures.
Summary of the Role of Corporate Safety Management: There appears to have been a flawed view of
the state of risk and the effectiveness of the safety management system in Shell plants. The flawed
process model is most likely related to inadequate feedback (including audits and leading indicators).
Again, many questions are raised from the CAST analysis that need to be answered to understand the
role of corporate level safety management in the accident and thereby to provide more effective safety
management in the future.
53
Recommendations: Improve Shell safety audits. Review all risk assessment and hazard analysis
processes and, in general, improve their approach to safety both safety analysis and safety
management. Shell is not alone among the large oil companies in needing to update their methods. The
petrochemical industry has too many accidents and incidents that are avoidable.
More specifically, the accident report says that Shell should “evaluate how risk analyses are
performed and make changes. This should include procedures and policies about re-evaluation of earlier
presumptions and assumptions. Conduct new risk analyses, put adequate measures in place and ensure
that the team that performs these analyses has sufficient critical ability. Pay particular attention to
assumptions based on risks that had previously been ruled out.”
Evaluate and improve the corporate safety management system. Improve procedures for learning
from process safety-related incidents. Create better feedback mechanisms (including audits and leading
indicators) and procedures for learning from incidents.
Unsafe control: Corporate management is responsible to ensure that an effective safety management
system is created. Typical policies of an effective safety management system were violated at both Shell
Corporate and Shell Moerdijk. The group overseeing safety at the Shell corporate level was not effective.
There is nothing included in the accident report about the assigned safety-related responsibilities for
corporate management. The Baker Panel report on the Texas City explosion found that BP corporate
management did not have assigned responsibilities for safety, which instead was treated as a local
responsibility. This abdication of responsibility (a practice promoted by HRO, which BP follows) was
identified as a major contributor to the Texas City explosion [Baker 2007]. Is this a problem in general in
the petrochemical industry?
There is also nothing included about context in the accident report that might explain why standard
executive-level responsibilities for safety were not effectively carried out. There seems, however, to be a
safety culture problem at Shell. See Section 1.12.3 for an analysis of the safety culture and high-level
safety policy at Shell. What is the culture of the chemical industry in terms of corporate management
oversight of the safety of global installations?
The accident report notes that the Safety Management System was integrated with the Business
Management System at Shell Moerdijk. Was this also true at the corporate level? This is a very poor
practice (and was a factor in the Deepwater Horizon accident). Safety risk assessments need to be kept
separate from business risk assessments so that information is not hidden from high-level decision-
makers.
54
Recommendations: Review the SMS design and determine why it did not prevent obvious violations of
policy such as shortcomings in safety studies, management of change, learning from accidents, not
following regulations (e.g., having experienced operators and following the format for work
instructions). Determine why audits were not effective in finding such obvious violations of procedures.
While it is possible that this was the first time such lapses have occurred, it is highly unlikely. Strengthen
audit procedures, including identifying better leading indicators of increasing risk than simply the
number of leaks and create other forms of feedback to identify when the safety management system is
drifting off course and risk is increasing. Establish better feedback channels to ensure that management
of change procedures and corporate safety policy are being followed.
CATALYST MANUFACTURER
Safety-Related Responsibilities
• Provide information to customers necessary to evaluate the use of their catalyst in the reactor
being designed and/or operated
• Alert customers when changes are made in the catalyst that could potentially affect the safety
of its use.
Unsafe Control Actions:
UCA: Changes to catalyst made by manufacturer were not reported to Shell Moerdijk although they
were specified in the Catalyst Safety Information Sheet.
The catalyst manufacturer did not report changes because they fell
within the scope of the specifications that had been agreed between
Shell and the manufacturer.
As noted in the Shell Moerdijk safety management analysis, the 2014
Shell Moerdijk risk screening for the new G22-2 catalyst in the MSPO2
plant did not involve any laboratory tests or other investigations. Shell
Moerdijk Safety Management did not notice the change in the Safety
Information Sheet and assumed the properties of the new catalyst
were the same as those of the previous catalyst.
UCA: Did not think catalyst change would affect safety for Shell
55
Without knowing the details of the Shell reactor design, the catalyst
manufacturer cannot determine the safety of the use of their product.
Summary of the Role of the Catalyst Manufacturer in the Accident: The changes made in the catalyst
were not pointed out to Shell, but they were included in a new safety information sheet. While the
catalyst manufacturer cannot determine the impact of their changes are on a customer, there should be
some clear alert (other than simply changing information in a document) that changes have been made
and what they are so that the customers are aware of them.
Recommendations: Change contractual relationships between Shell and its suppliers to ensure that
potentially critical changes are communicated. Make changes within information sheets clear and
obvious.
UCA: Did not identify shortcomings at Shell. Assessed Shell as a well-functioning company in which
they had a great deal of confidence.
56
Brzo supervision was tightened after incidents at Chemie-
Pack (a chemical fire in Moerdijk on 5 January 2011) and
Odfjell (serious safety deficiencies contributed to a large
gasoline spill at a Rotterdam tank terminal). The changes in
oversight of chemical activities that resulted from these
events did not prevent the Shell Moerdijk explosion or the
unsafe control actions listed above.
The regulatory agencies had scarce resources and time for
oversight.
At least partly because of limited resources, the
government authorities do “system-related supervision,”
effectively a form of performance-based regulation where
responsibility is placed on the operator of high-risk
activities to identify their own shortcomings. Regulators
check both the design and operation of the safety
management system and perform annual inspections to
ensure they are operating as designed. Regulators only
ensure that companies have the right procedures in place
(on paper) and spot check that they are being used.
There is no statutory standard for determining whether
the supervision of a Brzo company is adequate.
Under Brzo, the regulators check whether the company
Do the regulators also check
has a safety management system in place, whether the
whether the SMS is effective?
systems and procedures incorporated in that system are
What feedback do they get about
appropriate, and whether the company actually applies
efficacy? Is there feedback
these systems and procedures. They did not notice or did
(required reporting) about
not react to Shell not acting in accordance with its own
incidents and inadequacies?
SMS. As just some examples, changes and upgrades to the
plant were not consistently subjected to risk analyses
(violating the Shell SMS requirements) but this deficiency
was not noted by the regulators nor required to be fixed.
Changes were not adequately evaluated for safety.
Requirements for expertise and training in performing
startups were not enforced. And so on.
In terms of safety management system, Shell Moerdijk was
one of the highest scoring Brzo companies. The regulators
were unanimous in their positive appraisal. Shell Moerdijk
was known to take any identified shortcomings seriously
and to rectify them quickly and effectively. The Brzo
inspections (inspections of companies that are subject to
the Major Accidents (Risks) Decree) during the preceding
five years were conducted in this context.
The number of Brzo inspections is determined by (1)
company risks (nature and size of plants, volume of
57
hazardous substances, and activities of company), (2)
Quality of the safety management system, whereby less
supervision may be required if the management level is
high and more may be required if the management level is
low. Shell Moerdjk ranks highest in terms of risks of all the
72 companies subject to Brzo in the Province. It also
scored high on the judged quality of its safety
management system.
Shell Moerdijk had only one violation of Brzo between
2010 and 2014, a low number compared to other Brzo
companies. The company always initiated an improvement
action when a problem was identified. The shortcoming
was either immediately rectified or the regulators felt it
was clearly on its way to be rectified and was being
systematically monitored in the Shell Moerdijk action
management system.
The regulators considered Shell Moerdijk to be a company
with a good safety performance.
There were several shortcomings at Shell Moerdijk that
regulators did not label as violations. Not identifying
violations contributed to the positive impression of Shell
Moerdijk’s operational safety.
Less intensive supervision was required for companies
whose safety management systems were judged to
function well. But if there are fewer supervision days
allotted, external regulators have more difficulty forming
an opinion of the risks in the company with sufficient
depth.
Even under system-oriented supervision, the inspectors
could have observed that changes and upgrades to the
plants were not consistently subjected to risk analyses and
that the safety management system indeed did not
function well.
“Identification of hazards and assessment of risks” was
covered only once in a Brzo inspection in the last five
years, which was in 2011. The inspectors gave a score of
“moderate” and did not follow up to check whether Shell
Moerdijk had improved.
58
Why only 10? How are the 10 to be
Within the framework of the Safety Report required by
analyzed selected? By definition,
law, Shell Moerdijk subjected all containment systems to a
accidents occur when the
risk assessment. In principle, this process should have led
assumptions about what is most
to thousands of potential plant scenarios. Legislation
risky are wrong, as in this case.
requires the company to prepare 10 scenarios per plant
Note that there was evidence that
(such as MSPO2). In so doing, the company must select the
the assumptions were wrong in the
hazards with the greatest risks and the nature of the risks
form of accidents at plants with a
must be varied. [How do they select the 10 to analyze?]
similar design. Do the regulatory
authorities get information about
accidents at other similar Shell
plants? Can examining only a few
risks that one thinks are the most
important ensure anything about
safety and not simply lead to a
perfunctory and useless exercise
performed on the risks that are
already well understood and
handled and thus unlikely to lead
to an accident? Does the
regulatory authority have any
oversight about which scenarios
are selected?
Several shortcomings were identified that should have
been deemed violations, but were not: for example, the
plant scenarios were not up to date or were incomplete
and the catalyst was not being stored according to the
guidelines. The inspectors gave a score of “good” or
“reasonable” for the majority of their assessments.
Deficiencies in the scenarios were noted during an
inspection in 2011, but no new inspections were
performed to determine if the problems were corrected.
Exactly what is examined in the
Under system-related supervision, it is assumed that the
safety management system
inspectors are not able to identify “deep-seated
evaluation? Why were the safety
shortcomings at Shell Moerdijk if Shell itself has not
management system
identified these.” Under these assumptions, the regulators
shortcomings, especially in the
do not review things such as written procedures for
implementation of the
hazardous activities like start-up, the information passed
requirements, not detected by the
from the designers to the operators, change management
regulators?
activities, whether learning is being incorporated from
similar incidents. And so on. Even under system-related
supervision, it was possible for inspectors to see that the
work instructions did not follow the format required by
Shell and to identify that Shell’s own safety rules were
being violated, such as assigning operators to a turnaround
that did not have the necessary knowledge and
experience.
59
Are each of these equally
At Shell Moerdijk, the Safety Report is approximately 1,000
important? How were these
pages long and the safety management system has
organized? Was it possible to
approximately 350 procedures and guidelines.
identify and review the most
critical aspects of the report and
the most critical procedures and
guidelines?
UCA: There was no in-depth investigation of the operation of the specific maintenance-stop-related
procedures although maintenance stops are high risk. No special attention was paid to hazardous
(critical) activities such as maintenance and startup.
Summary: The accident report implies that regulators gave Shell Moerdijk a pass on behavior that might
have been labeled violations. Plant scenario deficiencies should have been considered a violation but
were not. Scenarios were not up to date or were incomplete. Working under limited resources and time
is difficult under any supervision model but system-level supervision has major limitations in ensuring
public safety. The accident investigation showed many flaws in Shell Moerdijk operations safety
management as defined and as implemented. So what is wrong with the supervision model that the
regulators did not detect them?
Recommendations:
Better supervision of the highest risk activities is needed, including turnarounds. Regulators need to
oversee and ensure that strict procedures are being used for the most dangerous activities and that the
safety management system is operating effectively and following its own rules. Operating under limited
resources does not preclude doing something effective, it simply requires a more intelligent selection of
activities that are performed. There is a need for better evaluation procedures and oversight of safety
management system effectiveness. The regulators should rethink system-level supervision to ensure
that they are doing something that is effective in preventing accidents like the Shell Moerdijk explosions
and fire.
60
- Informing citizens about the results of the measurement of the substances released and
the ensuing recommendations.
UCA: A number of parties did not make consistent use of the LCMS (National Crisis Management
System) for sharing information: the municipalities did obtain information but did not always share
information via the LCMS.
61
• Clear agreements about the use of LCMS were lacking and the
officials were not sufficiently familiar with it.
UCA: Problems occurred in the assessment of hazardous substances and the subsequent
communication surrounding them.
UCA: Some teams also used WhatsApp on their cell phones and thus that information could not be
used in developing an inter-regional information overview.
Recommendations: Several deficiencies were uncovered in LCMS and NL-Alert during this incident.
While they did not lead to loss of life because of the nature of the accident in this case, they could under
other circumstances and what was learned from this case should be used to improve the system,
including why many people used WhatsApp instead and how the official system can incorporate those
features. Accidents create an opportunity to look closely at the actual operation of our systems in times
of stress and provide a learning opportunity that should not be wasted.
62
role in what is practical and effective. There are some general design principles, however, that are
necessary for any safety management system to be effective [Leveson, 2012], many of which do not
appear to be reflected in the Shell SMS. Shell would be well served by reviewing the design of its SMS to
determine whether these principles are implemented.
Even without details provided in the accident report about the Shell and Shell Moerdijk safety
control structure, clear deficiencies can be identified from the factors leading to the accident. The
accident report notes that unsafe situations were overlooked, internal procedures were not properly
followed, lessons were not learned from previous incidents, incorrect assumptions about basic chemical
reactions were not re-evaluated after evidence surfaced that they were incorrect, changes were not
managed and controlled, inadequate hazard analysis and risk assessment procedures were used,
recommendations from previous incidents and accidents were never implemented, and oversight of
critical activities was missing.
Finally, while there is no information beyond a cryptic comment in the accident report that the
Safety Management System is integrated with the Business Management System, in general this is a
poor practice and can undermine good decision making.
17
The highest ranking factor was top-level management concern about safety.
63
(and one of the most profitable) in that industry [Duhigg 2012]. The opposite can occur when leaders
are selected who do not value safety with respect to other organizational goals.
The accident report briefly mentions the Shell safety culture and does not do an in-depth
examination of it. But what is included in the accident report raises important questions and doubts
about the strength of this culture in Shell.
Shell was one of the founders of a safety culture program called Hearts and Minds. It was initially
created by Shell Exploration and Production in 2002. Figure 6 shows what they call the “culture ladder,”
which is described as the “maturity” level of the organization’s safety culture. Maturity level appears to
be borrowed from the current “process maturity” movement. But cultures are not processes, they are
value systems and describing one value system as more “mature” than another makes little sense or one
set of behaviors as more mature than another makes no sense. The employees of a company or
participants of an industry either value safety highly or not. Value systems cannot be compared with
others or ranked on a scale of maturity, but simply evaluated on whether they are successful in
achieving specific goals such as preventing accidents.
Limited information is provided in the accident report, but with respect to the stated goals in the
Shell Hearts and Minds culture ladder, Shell seems to have major weaknesses. The accident report
points to unsafe behavior that seems to imply safety was not a high priority: overlooking unsafe
behavior and warnings, not adhering to internal procedures, not making changes after previous
incidents, and not evaluating assumptions after changes occur.
The report notes that during the four years before the accident, only one safety culture measurement
effort was carried out (in 2011). And in that small effort, only 28 of the more than 800 employees
participated. This measurement was conducted during a middle-management meeting where all 28
individuals attending the meeting took part in the survey, which was then discussed. The report says
that “Shell Moerdijk has not performed any other safety culture measurements in order to assess the
effects of its safety culture efforts.” There was another employee satisfaction survey, containing eight
questions, that gave some evidence of culture-related elements, but it was not a safety culture survey
that would “provide deeper insight into the areas of values, attitude, and behavior as regards safety”
[Dutch Safety Board, 2015].
64
Figure 6. The Hearts and Minds Culture Ladder
In lieu of a direct examination of the Shell safety culture, indirect evidence must be used to identify
safety culture flaws. The company seems to be satisfied with their self-assessed current level on the
Hearts and Minds “culture ladder” of Calculative (which is described in the accident report as the
“required” level), but even that level does not seem to have been achieved (nor the so-called “lower”
levels).
The reactive level, defined as “We do a lot every time we have an accident” leaves undefined what “a
lot” is. Reacting after an accident is not very useful, and what was done after the Nanhai and Shell
Moerdijk 1999 incidents was clearly inadequate in terms of preventing the same thing from happening
again.
The accident report says that “In Shell Moerdijk’s estimation, its safety culture is at the required level
(calculative). This level is defined as “we have systems in place to manage all hazards,” but they clearly
did not achieve this level either as they did not have systems in place to manage the hazards involved in
starting up Unit 4800 on 3 June 2014. In addition, the fact that they are satisfied with not achieving the
top two steps in their own program, i.e., using leadership and values to continuously drive safety
improvement and making safety an integral part of the way business is done at Shell,18 is not
encouraging in terms of the state of Shell’s actual safety culture.
18
HRO is controversial with respect to whether it promotes safety or simply reliability (these are two
different and sometimes conflicting properties). For example, the Baker Panel Report on the Texas City explosion
[Baker 2007] found that the treatment of safety as a local responsibility, a practice promoted by HRO, contributed
to the losses. An evaluation of HRO can be found in [Leveson et.al. 2009].
65
The accident report says that the company deduces the level of its own safety culture from the safety
performance indicators based on number of leaks and industrial accidents resulting in absenteeism.
Neither is a good indicator of process safety.
To evaluate the safety culture at Shell requires looking in depth at the company and employee
behavior, not simply giving surveys of what some people think. How they behave is a much better
indicator of their internal value system than what they answer on surveys. The accident report did not
cover the safety culture in depth, but what is included seems to point to what has been labeled as a
“compliance culture,” where the bulk of the effort is simply complying with standards and the requests
of regulators and not proactively taking steps to improve safety because it is a basic value in the
company. The accident report says that they always fixed immediately what was found to be lacking by
government inspectors. Did they aggressively search for problems internally without being prompted by
a regulatory agency?
The words on paper do not matter with respect to safety culture, but how the employees behave
and how they see their leaders behaving.
Recommendation: Shell Moerdijk and Shell Corporate should do a thorough study of their safety culture
and why it was not strong enough to prevent the events in 2014.
66
include the switch to a new catalyst without testing it or reconsidering that assumptions regarding it
may no longer be true and the removal of parts of the work instructions for Unit 4800 (again without
assessment) because they were not considered critical. For example, requirements regarding nitrogen
flow were removed during periodic updates of the work instructions in an attempt to limit their content
to information that was believed essential and to focus on what was thought to be the most important
from a safety and operational view. Other information was omitted from the work instructions because,
over time, understanding of the most appropriate procedures related to Unit 4800 changed.
Changes may also be unplanned and must therefore be detected. There needs to be a way to detect
unplanned changes that affect safety or prevent them from occurring. Detection may be accomplished
by using leading indicators and safety-focused audits. There may also be periodic planned re-evaluation
of assumptions underlying the original safety-related design features and management procedures. In
this accident, the leading indicators were inadequate and too narrow (number of leaks), audits did not
seem to be effective, and assumptions about the properties of ethylbenzene established in 1977 were
never revisited.
Changes may occur slowly over time, as occurred here with the work instructions for Unit 4800. As
the work instructions were amended before each turnaround, important information was omitted, in
some cases intentionally and in others unintentionally. Examples include the nitrogen flow requirements
mentioned above and the required heating rate for the reactor. Changes do not appear to have been
reviewed by experts, but if they were, then the review process was flawed.
Changes may be known and planned in one system component but appear as unplanned and
unknown changes to another component of the system. The change in composition of the catalyst was
known by the catalyst manufacturer but not by Shell Moerdijk. Clearly communication is an important
factor here.
Recommendation: The Management of Change procedures should be evaluated to determine why they
were not effective in this case and appropriate improvements implemented.
67
Figure 7 shows the safety control structure assumed in the CAST analysis, with flawed control and
feedback contributing to the accident shown with dotted lines. As can be seen, almost the entire
structure was involved.
Shell Global
Executive Management EU
U
Shell Projects Safety Netherlands Govt.
and Technology Management Oversight Agencies
Process Control
System
Chemical Process
68
As an overview of the CAST analysis results, the following table summarizes each component’s role in
the accident along with the recommendations generated for that component. The reasons for the
component’s role in the accident would probably be augmented if the unanswered questions noted in
the CAST analysis details had been included in the accident report.
Physical Role: None of the physical controls failed. The final physical collapse of the reactor
Component and separation vessel after pressure reached a critical level resulted from unexpected
and unhandled chemical and physical interactions. Many of these unsafe interactions
were a result of design flaws in the reactor or in the safety-related controls.
Process Role: The process control system was not configured to provide the necessary help to
Control the operators during a start-up or to allow them to easily stop the process in an
System emergency. The reason for these design decisions rests primarily in incorrect
assumptions by the designers about the impossibility of the scenario that occurred.
Even after previous incidents at similar plants in which these assumptions were
violated, the assumptions were not questioned and revisited.
Recommendations: The operators’ knowledge and skill is most challenged during off-
nominal phases, and most accidents occur during such phases and after changes are
made or occur. The process control system should be redesigned to assist operators
in all safety-critical, off-nominal operations (not just this restart scenario). For manual
operations, the goal should be to provide all necessary assistance to the operators in
decision making and taking action and to reduce attention and time pressures.
Operators Role: The operators acted appropriately or at least understandably given the context,
the incorrect work instructions (which they followed), and their lack of training and
required skill and knowledge in performing the work. In addition, they were provided
with almost no assistance from the process control system, while many of the tasks
they needed to do required intense attention, precision, mental effort, deep
understanding of process dynamics, and frequent adjustments to a continually
fluctuating process. The risks were not communicated properly.
Management relied on the operators seeing something strange and stopping the
process, but did not provide the information and training to ensure it was possible for
operators to do this.
Recommendations: The operators must have the appropriate skills and expertise to
perform their assigned activities, and there must be someone assigned the
responsibility for enforcing this requirement. A human factors study during the job
analysis is needed to ensure that the operators are provided with information and a
work situation that allows them to make appropriate decisions under stressful
conditions, better automated assistance should be provided in all phases of
operation, training should be provided for activities that are known to be hazardous
like startup, and work instructions as well as the process for producing them need to
be improved.
69
Plant Safety Role: (1) The safety analysis methods used were either not appropriate, not applied
Management or were applied incorrectly. However the methods used complied with the Shell
requirements and with the minimum required by the Dutch regulators. Safety
management did not consider some relevant information nor investigate how
ethylbenzene reacting with the catalyst could cause an explosion. Safety
management at Shell Moerdijk, as is common in many places, seems to have been
largely ineffectual, with lots of activity, but much of it directed to minimal compliance
with government regulation. A partial explanation for their behavior is that everyone
believed that a reaction between ethylbenzene and the catalyst was impossible and
that the start-up process was low risk.
(2) Although Shell’s safety management system includes requirements for dealing
with changes, the MOC procedures were not followed or implemented effectively.
Risks resulting from changes made to the plant, the catalyst, the processes, and the
procedures were not identified and managed.
(3) Used the number of leaks as the primary leading indicator of process safety. This
practice is common in the petrochemical industry.
(4) Lessons from similar incidents at Nanhai and at Shell Moerdijk were not used to
reduce risk.
(3) Proper oversight of the generation of work instructions was not provided, which
allowed unsafe work instructions to be used by the operators.
Operations Role:
Management (1) Operations management did not identify the flaws in the risk analyses performed
or the procedures used for these risk analyses. The risk analyses complied with the
minimal requirements of the Dutch regulatory authorities and apparently with the
Shell requirements.
(2) Changes over time were not subjected to assessment in accordance with the MOC
procedures.
70
(3) Work instructions were created by the operators without safety engineering
oversight. They did not comply with the required Shell format for such work
instructions and did not include important criteria for the job such as heating rate
and nitrogen flow.
(4) Made a decision to configure the process control system to control the plant
during the normal production phase but not during non-production and maintenance
phases. They did not think these activities were high risk and that manual operation
would suffice. The reasons for this decision are not in the accident report.
(5) Allowed two employees from different contractors to work in the adjacent unit
during the start-up, probably because they did not believe that phase was dangerous.
(6) Did not assign operators to the start-up that had the qualifications required in the
Safety Report. No reason is given in the accident report as to why this happened.
(7) Did not ensure that lessons learned from similar plants and at Shell Moerdijk in
1999 were incorporated in the design and operation of Unit 4800.
(8) Did not follow MOC procedures or perhaps they and the procedures were
inadequate. No information in the report to determine this.
(9) Conducted internal Shell Moerdijk audits that did not detect any of the clear
shortcomings in practices and procedures. Not enough information is provided to
determine why the audits were ineffective.
Shell Projects Role: The design data provided to the licensees was not usable by those creating
and work instructions at the plants using the technology. The design had safety-critical
Technology design flaws that were not found in hazard analyses during the initial design phase
and were not fixed after receiving information about serious problems in operations
at some Shell plants. These design flaws include an inadequate number of
temperature sensors and the use of pressure relief valves that could not handle the
pressure that occurred. Unsafe and incomplete work instructions were approved by
Shell Projects and Technology for the Unit 4800 turnaround at Shell Moerdijk.
Without more information about the operations at Shell Corporate, it is difficult to
determine exactly why the unsafe control occurred. More questions than answers
71
arise from the CAST analysis here, such as Why were the design flaws introduced and
how did they get through the design process? What type of hazard analysis is
performed by Shell Projects and Technology (or used if it is produced by another
group)? Why were identified design flaws not fixed after the incidents at Shell
Moerdijk in 1999 and Nanhai in 2011? What other types of feedback (beyond
incidents) is provided about the safety of their designs during operations in the Shell
plants? What information about the safety aspects (hazards) of the plant design are
passed from Shell Projects and Technology to the licensees of their designs? What
information is included in the design book? Is the design data provided adequate for
the licensees to create safe work instructions if engineers are writing the work
instructions instead of operators and did they not know who was going to be
performing this task? Why did they approve unsafe work instructions that did not
even follow the required Shell format? What information is provided in the Design
Book specifically about start-up and the hazards of start-up? What types of hazard
analysis are performed during the design process? What is the process for ensuring
safety when changes are made? How are safety-related assumptions recorded and
what are the triggers got a re-analysis of these assumptions? What feedback do the
designers get about the operation of their designs?
Corporate Role: There appears to have been a flawed view of the state of risk and the
Safety effectiveness of the safety management system in Shell plants. The flawed process
Management model is most likely related to inadequate feedback (including audits and leading
indicators). Again, many questions are raised in the CAST analysis that need to be
answered to understand the role of corporate level safety management in the
accident and thereby to provide more effective safety management in the future.
Almost nothing about safety management at the Corporate level is included in the
accident report.
Recommendations: Improve Shell safety audits. Review all risk assessment and
hazard analysis processes and, in general, improve their approach to both safety
analysis and safety management. Shell is not alone among the large oil companies in
needing to update their methods. The petrochemical industry has too many
accidents and incidents that are avoidable.
More specifically, the accident report says that Shell should “evaluate how risk
analyses are performed and make changes. This should include procedures and
policies about re-evaluation of earlier presumptions and assumptions. Conduct new
risk analyses, put adequate measures in place and ensure that the team that
performs these analyses has sufficient critical ability. Pay particular attention to
assumptions based on risks that had previously been ruled out.”
72
Evaluate and improve the corporate safety management system. Improve
procedures for learning from process safety-related incidents. Create better feedback
mechanisms (including audits and leading indicators) and procedures for learning
from incidents.
Recommendations: Review the SMS design and determine why it did not prevent
obvious violations of policy such as shortcomings in safety studies, management of
change, learning from accidents, not following regulations (e.g., having experienced
operators and following the format for work instructions). Determine why audits
were not effective in finding such obvious violations of procedures. While it is
possible that this was the first time such lapses have occurred, it is highly unlikely.
Strengthen audit procedures, including identifying better leading indicators of
increasing risk than simply the number of leaks and create other forms of feedback to
identify when the safety management system is drifting off course and risk is
increasing. Establish better feedback channels to ensure that management of change
procedures and corporate safety policy are being followed.
Catalyst Role: The changes made in the catalyst were not pointed out to Shell, but they were
Manufacturer included in a new safety information sheet. While the catalyst manufacturer cannot
determine the impact of their changes are on a customer, there should be some clear
alert (other than simply changing information in a document) that changes have been
made and what they are so that the customers are aware of them.
73
Recommendations: Change contractual relationships between Shell and its suppliers
to ensure that potentially critical changes are communicated. Make changes within
information sheets clear and obvious.
Dutch Role: The accident report implies that regulators gave Shell Moerdijk a pass on
Regulators behavior that might have been labeled violations. Plant scenario deficiencies should
have been considered a violation but were not. Scenarios were not up to date or
were incomplete. Working under limited resources and time is difficult under any
supervision model but system-level supervision has major limitations in ensuring
public safety. The accident investigation showed many flaws in Shell Moerdijk
operations safety management as defined and as implemented. So what is wrong
with the supervision model that the regulators did not detect them?
Emergency Role: Emergency services were mostly very effective in carrying out their
Services responsibilities, but some deficiencies, particularly in communication, were
uncovered in the accident response.
For the factors that spanned the entire control structure, not much information was included in the
accident report that provided the information used in the CAST analysis, but some weaknesses are
implied by what is included and some general recommendations can be derived.
74
Safety Evidence of an overall inadequate safety control system (safety management
Management system) in the report includes: unsafe situations were overlooked, internal
System procedures were not properly followed, lessons were not learned from previous
incidents, incorrect assumptions about basic chemical reactions were not re-
evaluated after evidence surfaced that they were incorrect, changes were not
managed and controlled, inadequate hazard analysis and risk assessment
procedures were used, recommendations from previous incidents and accidents
were never implemented, and oversight of critical activities was missing. In
summary, the Safety Management System at Shell Moerdijk did not prevent unsafe
situations from being overlooked or internal procedures from not being followed.
There is no information in the accident report about who created the SMS or who
was responsible ensuring that it was working properly.
Safety No information is provided about the safety information system but it appears that
Information people were making decisions without having appropriate information.
System
Recommendations: The safety information system is so critical to the achievement
of high safety that Shell and Shell Moerdijk should evaluate the existing system and
perhaps redesign it.
Safety Culture The accident report did not cover the safety culture in depth, but what is included
seems to point to what has been labeled as a “compliance culture,” where the bulk
of the effort is simply complying with standards and the requests of regulators and
not proactively taking steps to improve safety because it is a basic value in the
company. The accident report points to unsafe behavior that seems to imply safety
was not a high priority such as overlooking unsafe behavior and warnings, not
adhering to internal procedures, not making changes after previous incidents, and
not evaluating assumptions after changes occur.
The Hearts and Minds Safety Culture Program used by Shell has serious
weaknesses. The “culture ladder” is vaguely defined (“we do a lot every time we
have accidents”). Strangely, the company seems to be satisfied with their self-
assessed current level in this program of Calculative (which is described in the
accident report as the “required” level), but even that level does not seem to have
been achieved (nor the so-called “lower” levels).
75
factors analysis of the information provided to the operators and the potential for
human errors created by the design of the process and particularly by the design of
the process control system.
Management Role: A large number of both planned and unplanned changes contributing to the
of Change accident were not assessed for risk.
SUMMARY
CAST is based on a new accident causality model, STAMP, which in turn has a theoretical foundation
in systems theory. As such, accidents are treated as resulting from a lack of control over the components
and non-enforcement of safety constraints. This assumption is in contrast with the standard causality
model which treats accidents as a chain of failure events. STAMP is more general than the chain-of-
failure-events models and therefore encompasses more types of accident causes.
The use of CAST helps to guide an accident investigation, to generate questions to answer, and to
identify the deep seated problems that need to be fixed to prevent a large number of accidents in the
future rather than just preventing very similar ones. The results should help companies get out of the
firefighting mode where seemingly different, but actually related accidents, keep occurring.
REFERENCES
1. Baker Panel. The Report of the BP U.S. Refineries Independent Safety Review Panel, 2007.
2. Peter Checkland. Systems Thinking, Systems Practice, John Wiley & Sons, 1981.
3. Sidney Dekker. The Field Guide to Understanding Human Error, Ashgate, 2006.
4. Charles Duhigg. The Power of Habit: Why We Do What We Do in Life and Business, Random House,
2012.
5. Dutch Safety Board, Explosions MSPO2 Shell Moerdijk, The Hague, 2015.
6. Anthony Hidden. Investigation into the Clapham Junction Railway Accident, Dept. of Transportation,
London, 1990.
7. Urban Kjellan. An Evaluation of Safety Information Systems at Six Medium-Sized and Large Firms,
Journal of Occupational Accidents, 3:273-288, 1982.
8. Nancy G. Leveson, Engineering a Safer World, MIT Press, 2012.
9. Nancy G. Leveson, Safeware, Addison-Wesley, 1995.
76
10. Nancy G. Leveson, Nicolas Dulac, Karen Marais, and John Carroll. Moving Beyond Normal Accidents
and High Reliability Organizations: A Systems Approach to Safety in Complex Systems,
Organizational Studies, 30:227-249, February/March, 2009.
11. Jens Rasmussen. Human Error and the Problem of Causality in Analysis of Accidents, in Human
Factors in Hazardous Situations, eds. D.E. Broadbent, J. Reason, and A. Baddeley, 1-12, Clarendon
Press.
12. Jens Rasmussen, Risk Management in a Dynamic Society: A Modelling Problem. Safety Science,
27(2/3):183-213, 1997.
13. Edgar Shein. Organizational Culture and Leadership, Sage Publications, 1986.
77