Lessons From Longford by Andrew Hopkins
Lessons From Longford by Andrew Hopkins
Lessons From Longford by Andrew Hopkins
Excerpts from
10 July, 2000
Page 1
gw_group/keith/word/safety/longford.doc
Chapter 1: Introduction
Things happened on that day that no one had seen at Longford before. A steel cylinder sprang a leak that let liquid hydrocarbon spill onto the ground. A dribble at first, but then, over the course of the morning it developed into a cascade ... Ice formed on pipework that normally was too hot to touch. Pumps, that never stopped, ceased flowing and refused to start. Storage tank liquid levels that were normally stable plummeted ... I was in Control Room One when the first explosion ripped apart a 14-tonne steel vessel, 25 metres from where I was standing. It sent shards of steel, dust, debris and liquid hydrocarbon into the atmosphere. These are the words of the operator whom Esso blamed for the accident at its gas plant at Longford, Victoria on 25 September 1998.
10 July, 2000
Page 2
gw_group/keith/word/safety/longford.doc
Conclusion
The catastrophic failure of the heat exchanger was triggered by operator error. In fact several staff at Longford participated in the faulty decision to re-warm the metal heat exchanger which had become brittle with cold. But in no sense can these men be blamed for their decision since not even their senior managers understood the danger inherent in the situation. The fact is that none of the men concerned had been properly trained about the dangers of cold metal embrittlement and the company had not developed procedures to deal with this danger. Operator error has proved to be an unsatisfactory explanation here, just as it has in so many other major accident investigations. Nor is it sufficient to explain what happened in terms of
10 July, 2000
Page 3
gw_group/keith/word/safety/longford.doc
Excerpts from Lessons from Longford inadequate training. There are more deep-seated reasons for this training failure. The root causes or latent conditions will be explored in later chapters, generating a number of important lessons which are relevant to high hazard industries in general.
10 July, 2000
Page 4
gw_group/keith/word/safety/longford.doc
HAZOP
The standard hazard identification process in the petrochemical industry is the hazard and operability study or HAZOP. It involves systematically imagining everything that might go wrong in a processing plant and developing procedures or engineering solutions to avoid these potential problems. In the case of existing plant, retrospective HAZOPs were to be carried out as needed. Esso had carried out retrospective HAZOPs in 1994 and 1995 of all these facilities, except gas plant 1. One hint as to why Esso seemed so unconcerned about conducting the HAZOP on gas plant 1 was provided by one of its consultants. This man was asked whether an efficient and astute operator would want to review older plant from time to time to gauge the extent to which it departed from modern standards. His reply was that the general principle was - 'if it ain't broke, don't fix it. He noted that this had applied to gas plant 1 in that it had operated for nearly 30 years without a problem. This view is completely at odds with the philosophy of safety management in relation to rare but catastrophic events. The fact that a major accident has not happened in the past provides no guarantee for the future. But illogical though this witness's position may be it does provide some insight into the kind of thinking which may have Ied to the indefinite deferment of the HAZOP of gas plant 1. Did the lack of a HAZOP contribute to the accident? The Commission came to the view that the failure to conduct the HAZOP indeed contributed to the explosion.
Excerpts from Lessons from Longford operator training on how to handle the failure of warm oil flow. This in turn was a consequence of the lack of appropriate procedures in which operators might have been trained. The absence of procedures stemmed from the failure of the company to carry out the relevant HAZOP. And the failure to conduct the HAZOP was attributable to concern about resources. Where the emphasis is placed depends on one's purposes, but in the context of this chapter, it is the failure to conduct a HAZOP which constitutes the root cause.
Conclusion
Managing major hazards requires that those hazards first be identified. They then require that specific managernent plans be developed for each such hazard. It must also be understood that any significant change has the potential to introduce new hazards and the management of change must therefore include hazard identification processes. Furthermore, it makes good sense for the head office of a company to take direct responsibility for the prevention of rare but catastrophic events.
10 July, 2000 Page 6
gw_group/keith/word/safety/longford.doc
10 July, 2000
Page 7
gw_group/keith/word/safety/longford.doc
Alarm overload
The alarm problem was compounded enormously by the sheer number of alarms which operators were expected to deal with - at least three or four hundred a day. Given a situation of such extraordinary overload it was inevitable that operators would become de-sensitised and that alarms would not be properly attended to. Given the extraordinary situation of alarm overload, operators clearly needed to be highly selective in the alarms they attended to. There was no written guidance available from the company. So how did they decide? Here is what one operator said on the subject We become used to those that are requiring action straight away rather than those that may not necessarily require immediate action. And how did they learn which is which? By observation of how other operators treated that alarm, you pick up the correct measures to be taken in certain instances. These are revealing statements. They suggest that operators have evolved amongst themselves a set of working rules to deal with the chaotic situation they faced. These rules enabled them to distinguish between alarms which needed attention and those which could be tolerated or ignored, and enabled them, moreover, to respond to important alarms in a way which would allow them to continue meeting production targets. There is no reason to think that these were the optimal rules. Indeed they proved in the end not to be, since failure of the operators to control the level of condensate allowed accident sequence to develop. But, until 25 September 1998, the rules had worked. This culture was a natural and necessary adaptation to the otherwise impossible alarm overload situation, which the operators faced.
Excerpts from Lessons from Longford those formally laid down may well be inevitable. Reason speaks of 'necessary violations where non-compliance is essential in order to get the job done.
10 July, 2000
Page 9
gw_group/keith/word/safety/longford.doc
Excerpts from Lessons from Longford of leaks; certain kinds of alarms; particular temperature, pressure or other readings; certain maintenance problems; and machinery in a dangerous condition. Routine reports - management should also consider whether anyone on site is required to fill out an end-of-shift report. If so, might these reports contain warning information, which should be entered into the reporting system? Worker initiatives - workers on site should be encouraged to report not only matters which management has specified but also any other matters about which they are concerned. In some circumstances they will be reluctant to make reports for fear of reprisals. Management will need to find ways to overcome this fear. Management will need to carry out the suggested work, whether or not it seems necessary from an accident prevention point of view, so as to demonstrate good faith. Feedback - it is not enough that people make reports or pass information up the line. The outcome must be fed back in the form of a written response to the person who made the initial report. This will improve the morale of reporters and they will be motivated to take the reporting process more seriously Feedback is important, not only to ensure that reporters take the process seriously, but also to obligate those to whom they report to act on reports conscientiously. Feed forward - to be truly effective the process must not terminate at this point. The next step is to require the person who initially raised the matter to indicate whether the action taken is satisfactory in his or her view. Where the initiator is not satisfied the matter should cycle through the System again until such time as the initiator is satisfied, or alternatively, some senior manager of the company is prepared to over-ride the concerns of the initiator, in writing. Escalation - reporting systems must specify a time by which management must respond, and people making reports should be able to some extent to specify how urgent the matter is and therefore how quickly they require a response, e.g. within a day, within a week, within a month. The initial response may of course be to explain that more time is needed to deal with the matter. If the person to whom the report is made does not respond within the required time the system must escalate, that is, send the message further up the corporate hierarchy. CEO response - whether this whole system works depends, ultimately, on whether the person at the top of the information chain, the CEO, is committed to making it work Auditing - such systems must be carefully audited, that is, tested to see if they are capturing the intended information. One such test is to track some of the information flows which have occurred to see whether bad news, or at least news of problems, is being entered into the system and responded to.
10 July, 2000
Page 11
gw_group/keith/word/safety/longford.doc
10 July, 2000
Page 12
gw_group/keith/word/safety/longford.doc
Excerpts from Lessons from Longford stations, it is this industry, at least in the United States, which has taken the lead in developing indicators of plant safety which have nothing to do with injury or fatality rates. The indicators include: number of unplanned reactor shutdowns (automatic, precautionary emergency shutdowns), number of times certain other safety system's have been automatically activated, number of significant events (carefully defined) and number of forced outages. There is wide agreement in the industry that these are valid indicators, in the sense that they really do measure how well safety is being managed. Certain features of these indicators are worthy of comment. First, they are negative indicators, in the sense that the fewer, the better. The point is that measures of failure are fine as long as the frequency of failures is sufficient to enable us to talk of rates. Second, these indicators are 'hard, in the sense that it is relatively clear what is being counted. A shutdown is a shutdown. This is not true of positive indictors such as number of audits. Audits are of varying quality, from external, high-powered investigations to the internal, tick-a-box exercises. Third, the indicators described above are industry-specific. Whereas LTI rates have the advantage that they can be used to compare safety in different industries, indicators such as the number of reactor shutdowns cannot be used in this way. But being industry-specific means that they are common to all nuclear power stations and can therefore be used to make comparisons between power stations. This is their particular strength. The model works in the nuclear industry because the industry body is powerful enough to mandate the collection of relevant data and to prevent under-reporting. Whether it can work in other hazardous industries probably depends on the strength of their industry associations and the depth of concern in the industry to avoid disaster.
10 July, 2000
Excerpts from Lessons from Longford are to be changed, not the attitudes of senior management. Fourth, a presumption which underlies Esso's approach is that accidents are within the power of workers to prevent and that all that is required is that they develop the right mindset and exercise more care in the way they do their work.
It is clear therefore that Esso's safety culture approach, in principle, ignores the latent conditions which underlie every workplace accident and focuses instead on the workers' attitudes as the cause of the accident. But creating the right mindset is not a strategy which can be effective in dealing with hazards about which workers have no knowledge and which can only be identified and controlled by management. Many major hazards fall into this category. There is an interesting implication here. If culture, understood as mindset, is to be the key to preventing major accidents, it is management culture rather the culture of the workforce in general which is most relevant. What is required is a management mindset that every major hazard will be identified and controlled and a management commitment to make available whatever resources are necessary to ensure that the workplace is safe. In short, if culture is the key to safety, then the root cause of the Longford accident was a deficiency in the safety culture of management.
10 July, 2000
Page 14
gw_group/keith/word/safety/longford.doc
Chapter 7: Auditing
In the widely available video of a lecture on the Piper Alpha disaster Appleton makes the following comment - when we asked senior management why they didn't know about the many failings uncovered by the inquiry, one of them said I knew everything was all right because I never got any reports of things being wrong". In my experience [Appleton said], ... there is always news on safety and some of it will be bad news. Continuous good news - you worry.
Esso auditing
Evidence was given at the Royal Commission that Esso's auditing process was defective In the very same way that auditing at Piper Alpha was. Just six months prior to the explosion, Esso's health and safety management system (called OIMS - Operational Integrity Management System) was audited by a team from Esso's corporate owner, Exxon. The auditing team was presumed to have an arm's length relationship with Esso and therefore to be in a position to provide an accurate evaluation of the system. Longford was one of about 11 sites in Victoria visited by the auditing team. Esso's managing director reported to the inquiry that the audit had shown that most of the eleven elements of the safety management system were functioning at level three or better/which meant that: the system is functioning; procedures for key tasks are documented; adjustments to system process steps have been made to ensure completeness and to ensure the system is functioning as intended; ongoing verification measures indicate that the system is working as intended; results and outputs are being measured; and priority system objectives are satisfied.
The managing director went on to tell the inquiry that since the assessment was conducted by personnel external to Esso, he felt confident that these results represented an independent and unbiased assessment of the state of Esso's OIMS systems. He also noted that an internal review in May 1998, four months before the explosion, highlighted a number of positive results, among them, six months without any recordable injuries ... high levels of near-miss reporting ... and major risk reduction projects. Taken at face value these statements indicate that the reports being received at the most senior level of the corporation contained consistently goods news. This is precisely the situation which led Appleton to say continuous good news - you worry. Esso's executive committee, including its directors, met periodically as a corporate health, safety and environment committee. The results of the external audit had been presented to this committee two months prior to the explosion. The meeting was expected to take two hours and the agenda shows that just thirty minutes were allocated for a presentation to this committee about the external audit. The presentation consisted of a slide show and commentary. It included an overview of positive findings followed by a list of remaining challenges. The minutes of this meeting record that the audit concluded that OIMS was extensively utilised and well understood within Esso and identified a number of Exxon best practices within Esso. Improvement opportunities focussed on enhancing system documentation and formalising systems for elements 1 and 7. Notice that the 'challenges mentioned by the presenter have become improvement opportunities' in the minutes. Note also that the meeting minutes describe this half-hour period as a presentation of findings to the executive committee. There is no indication that executive committee members probed these findings
10 July, 2000 Page 15
gw_group/keith/word/safety/longford.doc
Excerpts from Lessons from Longford in any detail, nor made any decisions or issued any directions as a result of what they were told. The committee is portrayed in the minutes as a fairly passive recipient of a summary report, not as a group of directors and managers actively controlling safety in their company. Earlier chapters have already described some of the bad news which a good audit might have been expected to pick up. First, although accident investigators quickly highlighted the fact that a HAZOP had not been carried out on gas plant 1, the external audit failed to notice this. Second, it was no secret that operators had grown accustomed to managing the plant for long periods without responding to alarms triggered by abnormal circumstances. A thorough-going audit should have detected this. Third, a thorough audit should have picked up the fact that the near miss reporting system was not being used to report significant gas processing problems. The Exxon audit did not pick this up.
It was not just that the audit missed things it should have picked up; its principal conclusion was wrong. Remember that the central finding of the audit, as summed up in the executive committee minutes was that OIMS was extensively utilised and well understood within Esso. The Commission found otherwise: OIMS, together with all the supporting manuals, comprised a complex management system. It was repetitive, circular and contained unnecessary crossreferencing. Much of its language was impenetrable. These characteristics made the system difficult to comprehend by management and by operations personnel. The Commission gained the distinct impression that there was a tendency for the administration of OIMS to take on a life of its own, divorced from operations in the field. Indeed it seemed that in some respects, concentration upon the development and maintenance of the system diverted attention from what was actually happening in the practical functioning of the plants at Longford.
Excerpts from Lessons from Longford might go wrong here and what have you done to control these risks? A rigorous audit needs to examine the hazard identification strategy and make some effort to seek out hazards which may have been missed, so as to be able to make a judgment about how effectively hazard identification and control is being carried out. Identifying unrecognised hazards is clearly a dramatic way of demonstrating deficiencies in management's hazard identification system.
10 July, 2000
Page 17
gw_group/keith/word/safety/longford.doc
Self-regulation
In principle, self-regulation is quite distinct from deregulation. The latter involves retreat by government and an abandonment of the field to the market. Self-regulation differs from this in two fundamental respects. First, although it is up to the enterprise to work out how to achieve a safe workplace, governments provide a legislative framework to achieve this outcome and remain willing to take enforcement action as necessary. For this reason, some authors prefer to describe the process as co-regulation, involving both government and enterprise. Second, employees are an integral part of any enterprise and self-regulation therefore requires active employee participation. While self-regulation and deregulation are quite distinct in principle, it is clear that without active employee involvement and without a commitment by the State to ensuring safe outcomes, self-regulation runs the risk of degenerating into deregulation. Selfregulation is often assumed to be the optimal regulatory style for large employers who have their own health and safety expertise. Interestingly, the findings of the Royal Commission call into question whether the regime of self-regulation which has developed in Australia
Conclusion
The regime in question had evolved in recent years in a self-regulatory direction and it allowed Esso to operate the Longford facility in a manner which fell short of industry best practice. The Commission recommended that the existing regime be replaced with a safety case approach which would prescribe in detail how safety was to be managed at major hazard facilities. The central feature of the approach is that facility operators are required to demonstrate to the authorities that they are managing safety effectively.
10 July, 2000
Page 18
gw_group/keith/word/safety/longford.doc
10 July, 2000
Page 19
gw_group/keith/word/safety/longford.doc
10 July, 2000
Excerpts from Lessons from Longford Maintenance cutbacks foreshadow trouble. Auditing to be good enough to identify the bad news and to ensure that it gets to the top. Companies should apply the lessons of other disasters.
For governments seeking to encourage mindfulness: A safety case regime should apply to all major hazard facilities.
A corrective conclusion
To bring back the human dimension, I close with the thoughts of the operator whose words began this book. I am thankful that I escaped the fate of several others, thrown through the air like rag dolls. I'm glad ... because my bones weren't shattered, my skin scalded by freezing cold liquid and then flames so hot they cooked flesh to the bone ... Yeah, I'm lucky. Very, very Lucky. My wife and children didn't have to endure the torture of eulogies, of burials, of unsaid goodbyes. I'm lucky because they didn't have to wonder if I was going to live through the night. They didn't have to see me comatose, only awake to a new world of pain and scarring, both physical and mental ... While I'm not facing a lifetime of corrective surgery to mitigate disfigurement, I can't work in a place where I once thought I would spend the next 27 years of my life. I cannot doff my hardhat to a company that blamed me for the deaths of two of my workmates, the burning of five others, the destruction of half a billion dollars of gas plant, and wish them well. I cannot respect a company that would gladly have me face the tearful, bewildered stare of a workmate's bereaved family, while the directors of that company seek refuge in the judicial cocoon of their legal advice.
10 July, 2000
Page 21
gw_group/keith/word/safety/longford.doc