ABS-132 Reliability Centered Maintenance
ABS-132 Reliability Centered Maintenance
ABS-132 Reliability Centered Maintenance
RELIABILITY-CENTERED MAINTENANCE
JULY 2004
American Bureau of Shipping Incorporated by Act of Legislature of the State of New York 1862
Copyright 2004 American Bureau of Shipping ABS Plaza 16855 Northchase Drive Houston, TX 77060 USA
Foreword
In recent years, there has been an increase in the use of proactive maintenance techniques by Owners for repair and maintenance of machinery onboard vessels and offshore structures. The resulting preventative maintenance programs developed as a result of applying these techniques are being used by the vessels crew and shore-based repair personnel. There have been numerous advances in condition monitoring technology, trending, and increasingly more powerful planned maintenance software as a result of increased business competition. Since 1978, ABS has cooperated with Owners on developing and implementing preventative maintenance programs. In 1984, ABS issued its first Guide for Survey Based on Preventative Maintenance Techniques with subsequent updates in 1985, 1987, 1995 and then inclusion in the Rule Requirements for Survey After Construction Part 7 in mid-2002. However, machinery systems have continued to become larger and more complex, requiring skilled operators with specialized knowledge of the machinery and systems onboard. The Guide for Survey Based on Reliability-centered Maintenance was issued in December 2003 to provide vessel and other marine structure Owners, managers and operators requirements for the development of a maintenance program using techniques applied in other industries for machinery systems within a maintenance philosophy referred to as Reliability-centered Maintenance (RCM). With the application of RCM principles, maintenance is evaluated and applied in a rational manner that provides the most value to a vessels Owner/manager/operator. Accordingly, improved equipment and system reliability onboard vessels and other marine structures can be expected by the application of this philosophy. The purpose of these Guidance Notes is to provide supplementary information for application of the requirements of the Guide for Survey Based on Reliability-centered Maintenance. Information related to equipment failure, maintenance strategies, risk considerations, conducting and documenting an RCM analysis and sustaining an RCM program is provided in the main section of these Guidance Notes. An Appendix providing an overview of various condition monitoring techniques is included. A brief example RCM analysis for three propulsion engine components is provided to demonstrate the procedure. ABS welcomes comments and suggestions for improvement of this Guide. Comments or suggestions can be sent electronically to rdd@eagle.org.
iii
GUIDANCE NOTES ON
RELIABILITY-CENTERED MAINTENANCE
CONTENTS
SECTION 1 General....................................................................................1
1 2 3 4 Objective ................................................................................1 Application .............................................................................1 Defining Reliability-centered Maintenance ............................1 Definitions ..............................................................................2
SECTION 2
Equipment Failure..................................................................7
1 2 3 Equipment Failure..................................................................7 Equipment Failure Rate and Patterns....................................8 Failure Management Strategy .............................................12
3.1 3.2 3.3 3.4 Proactive Maintenance Tasks ......................................... 13 Run-to-failure .................................................................. 14 One-time Changes .......................................................... 14 Servicing and Routine Inspection.................................... 15
Examples of Dominant Physical Failure Mechanisms for Hardware ................................................................8 Six Classic Failure Rate Patterns ..............................11 Normal, Exponential and Weibull Failure Distributions ...............................................................10 Equipment Life Periods..............................................12
SECTION 3
FIGURE 1 FIGURE 2
FIGURE 3 FIGURE 4
SECTION 4
FIGURE 1
SECTION 5
TABLE 1 TABLE 2
Example of Failure-finding Task Interval Rules .........28 Example of Failure-finding Task Intervals Based on MTTF..........................................................................28 Effect of a Failure-finding Task ..................................26
FIGURE 1
SECTION 6
Consideration of Risks........................................................ 29
1 2 3 Risks In General ..................................................................29 Vessels and Their Risks ......................................................30 Risk Characterization...........................................................31 Example Consequence (Severity) Categories...........34 The General Risk Model ............................................30 Example Risk Model ..................................................32 Sample Risk Matrix ....................................................33
vi
SECTION 7
TABLE 1 TABLE 2 TABLE 3 TABLE 4 TABLE 5 TABLE 6 TABLE 7 TABLE 8 FIGURE 1 FIGURE 2 FIGURE 3 FIGURE 4 FIGURE 5 FIGURE 6 FIGURE 6A
Example Operating Context of Propulsion Functional Group .......................................................39 Example Operating Modes and Operating Context.......................................................................40 Example Function and Functional Failure List...........47 Example Bottom-up FMECA Worksheet ...................49 Example Top-down FMECA Worksheet ....................50 Failure Characteristic and Suggested Failure Management Tasks ...................................................53 Summary of Maintenance Tasks ...............................63 Summary of Spares Holding Determination ..............66 Diagram for RCM Analysis.........................................36 Example Partitioning of Functional Groups ...............42 Example System Block Diagram ...............................45 Simplified Task Selection Flow Diagram ...................54 RCM Task Selection Flow Diagram...........................55 Spares Holding Decision Flow Diagram ....................64 Example of Use of Spares Holding Decision Flow Diagram......................................................................65
vii
SECTION 8
3 4
Results of Sustaining Efforts................................................75 Assessment of RCM Program Effectiveness.......................75 Process to Address Failures and Unpredicted Events ........................................................................74
FIGURE 1
SECTION 9
viii
FIGURE 1
TABLE 1 TABLE 2 TABLE 3 TABLE 4 TABLE 5 TABLE 6 TABLE 7 TABLE 8 TABLE 9 TABLE 10 TABLE 11 TABLE 12 TABLE 13
Machinery and Utilities Operating Characteristics .........................................................102 Propulsion Functional Group Operating Characteristics .........................................................102 Diesel Engine System Operating Characteristics, Modes and Context..................................................103 Example Function and Functional Failure List.........106 Example Consequence/Severity Level Definition Format......................................................................111 Probability of Failure (e.g., Frequency, Likelihood) Criteria Example Format ..........................................113 Risk Matrix Example Format....................................113 Example Bottom-up FMECA Worksheet .................114 Example Maintenance Task Selection Worksheet ................................................................127 Summary of Maintenance Tasks .............................137 Summary of Spares Holding Determination ............141 Breakdown of Maintenance Tasks...........................143 Propulsion Category Risk Matrix .............................144
ix
Loss of Containment Risk Matrix .............................144 Expected Event Frequencies for Propulsion............145 Expected Event Frequencies for Loss of Containment.............................................................145 Example Partitioning Diagram .................................104 Example System Block Diagram..............................105
FIGURE 1 FIGURE 2
SECTION
General
Objective
These Guidance Notes provide a summary of various maintenance techniques used in industry for machinery systems and how these techniques can be applied within a maintenance philosophy referred to as reliability-centered maintenance (RCM). With the application of RCM principles, maintenance will be evaluated and applied in a rational manner that provides the most value to a vessels Owner/Operator. Accordingly, improved equipment and system reliability onboard vessels and other marine structures can be expected by the application of this philosophy. An additional purpose of these Guidance Notes is to introduce RCM as a part of overall risk management. By understanding the risk of losses associated with equipment failures, a maintenance program can be optimized. This optimization is achieved by allocating maintenance resources to equipment maintenance according to risk impact on the vessel. For example, RCM analysis can be employed to: i) ii) iii) Identify functional failures with the highest risk, which will then become the focus for further analyses Identify equipment items and their failure modes that will cause high-risk functional failures Determine maintenance tasks and maintenance strategies that will reduce risk to acceptable levels
The principles summarized in these Guidance Notes are applied in the Guide for Survey Based on Reliability-centered Maintenance.
Application
These Guidance Notes provide supplementary information for the use of the Guide for Survey Based on Reliability-centered Maintenance and apply to any machinery system for which a preventative maintenance plan applying risk-based principles is desired. It is applicable to both vessels and offshore facilities.
Section
General
The objective of RCM is to achieve reliability for all of the operating modes of a system. An RCM analysis, when properly conducted, should answer the following seven questions: i) ii) iii) iv) v) vi) vii) i) ii) iii) iv) v) What are the system functions and associated performance standards? How can the system fail to fulfill these functions? What can cause a functional failure? What happens when a failure occurs? What might the consequence be when the failure occurs? What can be done to detect and prevent the failure? What should be done if a maintenance task cannot be found? Failure modes, effects and criticality analysis (FMECA). This analytical tool helps answer Questions 1 through 5. RCM decision flow diagram. This diagram helps answer Questions 6 and 7. Design, engineering and operational knowledge of the system Condition-monitoring techniques Risk-based decision making (e.g., the frequency and the consequence of a failure in terms of its impact on safety, the environment and commercial operations) The analyses and the decisions taken Progressive improvements based on operational and maintenance experience Clear audit trails of maintenance actions taken and improvements made
Typically, the following tools and expertise are employed to perform RCM analyses:
Documenting and implementing the following formalize this process: i) ii) iii)
Once these are documented and implemented, this process will be an effective system to ensure reliable and safe operation of an engineered system. Such a maintenance management system is called an RCM system.
Definitions
The following definitions are applied to the terms used in these Guidance Notes. ABS Recognized Condition Monitoring Company: The reference to this term refers to those companies whom ABS has identified as an External Specialist. Please refer to Subsection 8/2. Baseline data: The baseline data refer to condition monitoring indications usually vibration records on rotating equipment established with the equipment item or component operating in good order when the unit first entered the Program or the first condition-monitoring data collected following an overhaul or repair procedure that invalidated the previous baseline data. The baseline data are the initial condition monitoring data to which subsequent periodical condition-monitoring data is compared. Cause: See failure cause. Component: The hierarchical level below equipment items. This is the lowest level for which the component: can be identified for its contribution to the overall functions of the functional group; can be identified for its failure modes; is the most convenient physical unit for which the preventative maintenance plan can be specified.
Section
General
Condition monitoring: Condition monitoring are those scheduled diagnostic technologies used to monitor machine condition to detect a potential failure. Also referred to as an on-condition task or predictive maintenance. Confidence: Confidence is the analysts/teams certainty of the risk evaluation. Consequence: The way in which the effects of a failure mode matter. Consequence can be expressed as the number of people affected, property damaged, amount of oil spilled, area affected, outage time, mission delay, dollars lost, etc. Regardless of the measure chosen, the consequences are expressed per event. Corrective Measures: Corrective measures are engineered or administrative procedures activated to reduce the likelihood of a failure mode and/or its end effect. Criticality: Criticality is a measure of risk associated with the failure mode and its effects. The risk can be measured qualitatively (e.g., high, medium, low) or quantitatively (e.g., $15,000 per year). Current likelihood (frequency): The current likelihood (or frequency) of a failure mode occurring is based on no maintenance being performed or, in the case of existing preventative maintenance plans, the failure frequency with the existing plan in place. Current risk: The resulting risk that results from the combination of the severity and the current likelihood (severity times likelihood). Effects: See failure effects. End Effects: See failure effects. Environmental standards: Environmental standards are international, national and local laws and regulations or industry standards that the vessel must operate in conformance with. Equipment items: The hierarchical level below systems comprised of various groups of components. Event: An event is an occurrence that has an associated outcome. There are typically a number of potential outcomes from any one initial event ranging in severity from minor (trivial) to critical (catastrophic), depending upon other conditions and add-on events. Evident failure mode: A failure mode whose effects become apparent to the operators under normal circumstances if the failure mode occurs on its own. Failure cause: The failure cause is the basic equipment failure that results in the failure mode. For example, pump bearing seizure is one failure cause of the failure mode pump fails off. Failure characteristic: The failure characteristic is the failure pattern (e.g., wear-in, random, wearout) exhibited by the failure mode. Failure effects: Failure effects are the consequences that can result from a failure mode and its causes. Local effect: The initial change in the system operation that would occur if the postulated failure mode occurs. Next higher effect: The change in condition or operation of the next higher level of indenture caused by the postulated failure mode. This higher level effect is typically related to the functional failure that could result. End effect: The overall effect on the vessel that is typically related to the consequences of interest for the analysis (loss of propulsion, loss of maneuverability, etc.). For the purposes of this Guide, the term End Effects applies only to the total loss or degradation of the functions related to propulsion and directional control, including the following consequences: loss of containment, explosion/fire, and/or safety occurring immediately after or a short time thereafter as a result of a failure mode. For offshore activities, these may be extended to include functions related to drilling operations, position mooring, hydrocarbon production and processing and/or import and export functions.
3
Section
General
Failure-finding task: A failure-finding task is a scheduled task used to detect hidden failures when no condition-monitoring or planned-maintenance task is applicable. It is a scheduled function check to determine whether an item will perform its required function if called upon. Failure management strategy: A failure management strategy is a proactive strategy to manage failures and their effects to an acceptable risk. It consists of proactive maintenance tasks and/or onetime changes. Failure mode: The failure mode describes how equipment can fail and potentially result in a functional failure. Failure mode can be described in terms of an equipment failure cause (e.g., pump bearing seizes), but is typically described in terms of an observed effect of the equipment failure (e.g., pump fails off). FMECA: The acronym for failure mode effects and criticality analysis. Frequency: The frequency of a potential undesirable event is expressed as events per unit time, usually per year. The frequency should be determined from historical data if a significant number of events have occurred in the past. Often, however, risk analyses focus on events with more severe consequences (and low frequencies) for which little historical data exist. In such cases, the event frequency is calculated using risk assessment models. Function: A function is what the functional group, systems, equipment items and components are designed to do. Each function should be documented as a function statement that contains a verb describing the function, an object on which the function acts, and performance standard(s). Primary function. A primary function is directly related to producing the primary output or product from a functional group/system/equipment item/component. Secondary function. A secondary function is not directly related to producing the primary output or product, but nonetheless is needed for the functional group/system/equipment item/component.
Functional failure: A functional failure is a description of how the equipment is unable to perform a specific function to a desired level of performance. Each functional failure should be documented in a functional failure statement that contains a verb, an object and the functional deviation. Functional group: A hierarchical level addressing propulsion, maneuvering, electrical, vessel service, and navigation and communications functions. Hazard: Hazards are conditions that may potentially lead to an undesirable event. Hidden Failure Mode: A failure mode whose failure effects do not become apparent to the operators under normal circumstances if the failure mode occurs on its own. Indications (Failure Detection): Indications are alarms or conditions that the operator would sense to detect the failure mode. Level of indenture: A relative position within a hierarchy of functions for which each level is related to the functions in the level above. For the purposes of this Guide, the levels of indenture in descending order are: functional group, systems, subsystems, equipment items and components. Likelihood: See frequency. One-time change: One-time change is any action taken to change the physical configuration of a component, an equipment item or a system (redesign or modification), to change the method used by an operator or maintenance personnel to perform an operation or maintenance task, to change the manner in which the machinery is operated or to change the capability of an operator or maintenance personnel, such as by training. Operating context: The operating context of a functional group is the circumstances under which the functional group is expected to operate. It must fully describe the physical environment in which the functional group is operated, a precise description of the manner in which the functional group is operated and the specified performance capabilities of the functional group.
4
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004
Section
General
Operating mode: An operating mode is the operational state that the vessel or marine structure is in. For example, cruising at sea, entering or departing a port. P-F interval: The Potential Failure interval is the time interval between the point at which the onset of failure can be detected and the point at which functional failure occurs. A condition-monitoring task should be performed at less than half of this interval. Parallel redundancy: Parallel redundancy applies to systems/equipment items operating simultaneously. Each system has the capability to meet the total demand. In the event of a functional failure in one system/equipment item, the remaining system/equipment item will continue to operate, but at a higher capacity. For some arrangements, standby systems/equipment items may also be in reserve. Performance and quality standards: Performance and quality standards are the requirements that functional groups/systems/equipment items/components are to operate at, such as minimum/maximum power or pressure, temperature range, fluid cleanliness, etc. Planned maintenance: For the purposes of this Guide, planned maintenance is a scheduled maintenance task that entails discarding a component at or before a specified age limit, regardless of its condition at the time. It also refers to a scheduled maintenance task that restores the capability of an item at or before a specified age limit, regardless of its condition at the time, to a level that provides an acceptable probability of survival to the end of another specified interval. These maintenance tasks are also referred to as scheduled discard and scheduled restoration, respectively. Preventative maintenance plan: The preventative maintenance plan consists of all the maintenance tasks identified as necessary to provide an acceptable probability of survival to the end of a specified interval for the machinery systems. In IACS UR Z20, this is referred to as a Planned Maintenance Scheme. Proactive maintenance task: A proactive maintenance task is implemented to prevent failures before they occur, detect the onset of failures or discover failures before they impact system performance. Projected likelihood: The likelihood (or frequency) of a failure mode occurring based on a maintenance task being performed or a one-time change implemented. Projected risk: The resulting risk that results from the combination of the consequence and the projected likelihood. Random failure: Random failure is dominated by chance failures caused by sudden stresses, extreme conditions, random human errors, etc. (e.g., failure is not predictable by time). Risk: Risk is composed of two elements, frequency and consequence. Risk is defined as the product of the frequency with which an event is anticipated to occur and the severity of the consequence of the events outcome. Risk Matrix: A risk matrix is a table indicating the risk for an associated frequency and consequence severity. Run-to-failure: Run-to-failure is a failure management strategy that allows an equipment item/component to run until failure occurs, and then a repair is made. Safeguards: See corrective measures. Safety standards: Safety standards address the hazards that may be present in an operating context and specify the safeguards (corrective measures) that must be in place for the protection of the crew and vessel.
Section
General
Servicing and Routine Inspection: These are simple tasks intended to (1) ensure that the failure rate and failure pattern remain as predicted by performing routine servicing (e.g., lubrication) and (2) spot accidental damage and/or problems resulting from ignorance or negligence. They provide the opportunity to ensure that the general standards of maintenance are satisfactory. These tasks are not based on any explicit potential failure condition. Servicing and routine inspection may also be applied to items that have relatively insignificant failure consequences, yet should not be ignored (minor leaks, drips, etc.). Severity: When used with the term consequence, severity indicates the magnitude of the consequence. Special Continuous Survey of Machinery: The requirements for Special Continuous Survey of Machinery are listed in 7-2-1/7 Continuous Surveys (Vessels in Unrestricted Service) and 7-2-2/9 Continuous Surveys (Vessels in Great Lakes Service) of the Rules for Survey After Construction Part 7. Special Periodical Survey of Machinery: The requirements for a conventional Special Periodical Survey of Machinery are listed in 7-2-2/7 Special Periodical Surveys (Vessels in Great Lakes Service); 7-2-3/5 Special Periodical Surveys (Vessels in Rivers and Intracoastal Waterway Service); 7-6-2/3 Special Periodical Surveys Machinery (3.1, All Vessels, 3.3, Tankers); 7-6-3/1 Special Periodical/Continuous Survey-Machinery-Year of Grace (Vessels in Great Lakes Service; 7-8-2 Shipboard Automatic and Remote-control Systems, Special Periodical Surveys; 7-9 Survey Requirements for Additional Systems and Services (Cargo Refrigeration, Hull Condition Monitoring System, Quick Release System, Thrusters and Dynamic Positioning System, and Vapor Emission Control System) of the Rules for Survey After Construction Part 7. There are Special Periodical Survey requirements in other Rules and Guides for specific vessel types, services and marine structures not listed here. Subsystems: An additional hierarchical level below system, comprised of various groups of equipment items for modeling complex functional groups. Systems: The hierarchical level below functional group, comprised of various groups of equipment items. Wear-in failure: Wear-in failure is dominated by weak members related to problems such as manufacturing defects and installation/maintenance/startup errors. It is also known as burn in or infant mortality. Wear-out failure: Wear-out failure is dominated by end-of-useful life issues for equipment.
SECTION
Equipment Failure
Equipment Failure
A combination of one or more equipment failures and/or human errors causes a loss of system function. The following factors usually influence equipment failure: i) ii) iii) iv) v) vi) Design error Faulty material Improper fabrication and construction Improper operation Inadequate maintenance Maintenance errors
Note that maintenance does not influence many of these factors. Therefore, maintenance is merely one of the many approaches to improving equipment reliability and, hence, system reliability. RCM analyses focus on reducing failures resulting from inadequate maintenance. In addition, RCM aids in identifying premature equipment failures introduced by maintenance errors. In these cases, RCM analyses may recommend improvements for specific maintenance activities, such as improving maintenance procedures, improving worker performance, or adding quality assurance/quality control tasks to verify correct performance of critical maintenance tasks. While the objective of this document is to improve maintenance, RCM analyses may recommend design changes and/or operational improvements when equipment reliability cannot be ensured through maintenance. To effectively improve equipment reliability through maintenance, design changes or operational improvement, one must have an understanding of potential equipment failure mechanisms, their causes and associated system impacts. Equipment failure should be defined as a state or condition in which a component no longer satisfies some aspect of its design intent (e.g., a functional failure has occurred due to the equipment failure). RCM focuses on managing equipment failures that result in functional failures. To develop an effective failure management strategy, the strategy must be based on an understanding of the failure mechanism. Equipment will exhibit several different failure modes (e.g., how the equipment fails). Also, the failure mechanism may be different for the different failure modes, and the failure mechanisms may vary during the life of the equipment. To help understand this relationship, Section 2, Table 1 examines typical hardware-related equipment failure mechanisms.
Section
Equipment Failure
This information is helpful in determining an appropriate maintenance strategy. The Weibull plot can also be correlated between the probability of failure and operating time. These data can be helpful in establishing task intervals for certain types of maintenance tasks (e.g., rebuilding tasks). Another common statistical measure associated with these distributions is mean time to failure (MTTF). MTTF is the average life to failure for the equipment failure mode. Thus, it represents the point at which the areas under the failure distribution curve are equal above and below the point. Determining the MTTF will, therefore, depend on the type of failure distribution used to model the failure mode. Section 2, Figure 1 also identifies the MTTF for normal and exponential failure distributions. MTTF data are helpful in determining when to perform certain types of maintenance tasks. For example, if the appropriate maintenance strategy is to rebuild an equipment item, the MTTF data can be used to help set the rebuilding task interval. If the MTTF is represented by a normal distribution and the interval is set at the MTTF, then one can assume that there is a 50% chance of the item failing before it is rebuilt. If the interval is set less than the MTTF, then the probability of the item failing before being rebuilt is less than 50%. If the interval is more than the MTTF, then the probability is more than 50%. The increase or decrease in probability as the interval is moved before or after the MTTF depends on the standard deviation of the distribution.
8
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004
Section
Equipment Failure
A more useful measurement, derived from the failure distribution, is the conditional failure rate or lambda (). The conditional probability failure rate is the probability that a failure occurs during the next instant of time, given that the failure has not already occurred before that time. The conditional failure rate, therefore, provides additional information about the survival life and is used to illustrate failure patterns. Section 2, Table 2 shows six classic conditional failure setup patterns. The vertical axis represents the conditional failure rate as a function of time ((t)), and the horizontal axis represents the operating time (t) or another variable (e.g., operating cycles). Understanding that equipment failure modes can exhibit different failure patterns has important implications when determining appropriate maintenance strategies. For example, rebuilding or replacing equipment items that do not have distinctive wear-out regions (e.g., patterns C through F) is of little benefit and may actually increase failures as a result of infant mortality and/or human errors during maintenance tasks. For most equipment failure modes, the specific failure patterns are not known and, fortunately, are not needed to make maintenance decisions. Nevertheless, certain failure characteristic information is needed to make maintenance decisions. These characteristics are: i) Wear-in failure dominated by weak members related to problems such as manufacturing defects and installation/maintenance/startup errors. Also known as burn in or infant mortality failures. Random failure dominated by chance failures caused by sudden stresses, extreme conditions, random human errors, etc. (e.g., failure is not predictable by time). Wear-out failure dominated by end-of-useful life issues for equipment.
ii) iii)
Section
Equipment Failure
10
20 Time (t)
30
Normal (continuous)
f(t) 0.368
0.0
MTTF
Time (t)
Exponential (continuous)
Case 1 2 2 3
2 4 2
1 1 2
1 1 2
f(t)
Weibull (continuous)
10
Section
Equipment Failure
t
(t)
(t)
t (t)
t
(t)
(t)
Pattern F Infant Mortality: High infant mortality followed by a constant or slowly rising failure rate Example: electronic components
Reference 1:
Reliability-centered Maintenance, F. Stanley Nowlan and Howard F. Heap, December 29, 1978, U.S. Department of Commerce, National Technical Information Service.
11
Section
Equipment Failure
These failure characteristics are best illustrated by failure pattern A, shown in Section 2, Figure 2.
Burn in
Wear out
Time
By simply identifying which of the three equipment failure characteristics is representative of the equipment failure mode, one gains insight into the proper maintenance strategy. For example, if an equipment failure mode exhibits a wear-out pattern, rebuilding or replacing the equipment item may be an appropriate strategy. However, if an equipment failure mode is characterized by wear-in failure, replacing or rebuilding the equipment item may not be advisable. Finally, a basic understanding of failure rate helps in determining whether maintenance or equipment redesign is necessary. For example, equipment failure modes that exhibit high failure rates (e.g., fail frequently) are usually best addressed by redesign rather than applying more frequent maintenance.
The purpose of the proactive maintenance tasks in the failure management strategy is to (1) prevent failures before they occur or (2) detect the onset of failures in sufficient time so that the failure can be managed before it occurs. Equipment redesigns, modifications and operational improvements (RCM refers to these as one-time changes) are attempts to improve equipment whose failure rates are too high or for which proactive maintenance is not effective/efficient.
12
Section
Equipment Failure
The key issues in determining whether a specific failure management strategy is effective are the following: i) ii) iii) Is the failure management strategy technically feasible? Is an acceptable level of risk achieved when the failure management strategy is implemented? Is the failure management strategy cost-effective?
Sections 7 and 8 describe the risk-based decision tools and the RCM analysis process, and provide a more detailed discussion on determining effectiveness of the failure management strategy. In addition to proactive maintenance tasks and one-time changes, servicing tasks and routine inspections may be critical to the failure management strategy. These activities help ensure the equipment failure rate and failure characteristics are as anticipated. For example, the failure rate and failure pattern for a bearing drastically changes if it is not properly lubricated. These proactive maintenance tasks, run-to-failure, one-time changes, and servicing and routine inspections are further described in the following Paragraphs.
3.1
3.1.2
Condition-monitoring Tasks A condition-monitoring task is a scheduled task used to detect the onset of a failure so that action can be taken to prevent the functional failure. A potential failure is an identifiable condition that indicates that a functional failure is either about to occur or in the process of occurring. Condition-monitoring tasks should only be chosen when a detectable potential failure condition will exist before failure. When choosing maintenance tasks, conditionmonitoring tasks should be considered first, unless a detectable potential failure condition cannot be identified. Condition-monitoring tasks are also referred to as predictive maintenance. Section 4 provides additional details.
13
Section
Equipment Failure
3.1.3
Combination of Tasks Where the selection of either condition-monitoring or planned-maintenance tasks on their own do not seem capable of reducing the risks of the functional failure of the equipment, it may be necessary to select a combination of both maintenance tasks. Usually, this approach is used when the condition-monitoring or planned-maintenance task is insufficient to achieve an acceptable risk by itself. Sections 7 and 8 provide further information on determining whether this failure management strategy achieves an acceptable level of risk. Failure-finding Tasks A failure-finding task is a scheduled task used to detect hidden failures when no conditionmonitoring or planned-maintenance task is applicable. It is a scheduled function check to determine whether an item will perform its required function if called upon. Most of these items are standby or protective equipment. An example would be checking the safety valve on a boiler. Section 5 provides additional details.
3.1.4
3.2
Run-to-failure
Run-to-failure is a failure management strategy that allows an equipment item to run until failure occurs and then a repair is made. This maintenance strategy is acceptable only if the risk of a failure is acceptable without any proactive maintenance tasks. An example would be permitting a local pressure gauge on a cooling water line, also fitted with a remote-reading pressure gauge, to fail.
3.3
One-time Changes
One-time changes are used to reduce the failure rate or manage failures in which appropriate proactive maintenance tasks are not identified or cannot effectively and efficiently manage the risk. The basic purpose of a one-time change is to alter the failure rate or failure pattern through: i) ii) iii) iv) Equipment redesigns or modification, and/or Operational improvements. Faulty design and/or material Improper fabrication and/or construction Misoperation Maintenance errors
One-time changes most effectively address equipment failure modes that result from the following:
These failure mechanisms often result in a wear-in failure characteristic and, thus, require a one-time change. When no maintenance strategy can be found that is both applicable and effective in detecting or preventing failure, a one-time change should be considered. For failure modes that have the highest risk, a one-time change is mandatory. The following briefly describes each type of one-time change: Equipment redesign or modifications. Redesign or modifications entail physical changes to the equipment or system. An example would be adding drain valves to appropriate lengths of piping to a tankers deck cargo piping to prevent freezing and damage to the piping during vessel transits in freezing temperatures. Operational improvements. Operational improvements may be modifications to the operation of the equipment and/or modifications to the way in which maintenance is performed on the equipment. Operational improvements usually entail changing the operating context, changing operating procedures, providing additional training to the operator or maintainer, or any combination thereof. For example, in the case of a main propulsion engine provided with a noncontinuous rating name plate, the engine could be operated at a lower output closer to its continuous rating so as to reduce downtime for maintenance. (However, this action may cause the vessel to be unable to meet its schedules.)
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004
14
Section
Equipment Failure
3.4
15
SECTION
Planned Maintenance
Introduction
Planned maintenance is a failure management strategy that restores the inherent reliability or performance of the equipment item. These tasks are best employed on equipment items suffering from age-related failure (e.g., wear-out failure characteristic). The basic principle of planned maintenance is that restoring or discarding the item at a specific time before failure is expected can best manage the probability of failure. Following this principle, the planned-maintenance tasks are performed at set intervals, regardless of whether or not a failure is impending. Restoring the item or discarding it and replacing it with a new item prevent the failure.
Age-To-Failure Relationship
The age-to-failure relationship (or wear-out failure characteristic) is distinctive in failure patterns A and B, discussed in Subsection 2/2. Other equipment failure modes may exhibit a less distinctive wear-out characteristic, such as that in failure pattern C. Conceptually, though, performing planned maintenance restores equipment reliability for these failure patterns. For planned maintenance to be effective in managing the failure, the failure mode must exhibit a clear life and most of the equipment items must survive to that life. Section 3, Figure 1 illustrates the wearout period for failure pattern B.
Operating Age
17
Section
Planned Maintenance
Planned maintenance provides an effective failure management strategy for wear-out failures because the conditional probability of failure is reduced to approximately its initial failure rate (e.g., failure rate at time zero). Section 3, Figure 2 illustrates the resetting of the failure rate curve that results from performing a planned-maintenance task.
Operating Age
MTTF
The assumption is that the restoration and discard task restores the equipment item to nearly new condition. However, if the equipment failure mode exhibits both wear-in and wear-out failure patterns (e.g., failure pattern A), additional tasks or a one-time change may be required to manage the wear-in that is likely to occur after the planned-maintenance task (for example, the recommissioning of a gas turbine or a diesel engine after an extensive repair/overhaul).
iii)
When determining whether the planned-maintenance task should be a restoration or discard task, the following considerations must be made: i) ii) Does the task ensure the reliability and performance of the equipment? If the equipment is restored, it must be restored to a nearly new condition. Is the task cost-effective? The cost of restoring the equipment should be less than discarding the equipment and replacing it with a new item.
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004
18
Section
Planned Maintenance
Regulatory requirements (e.g., classification society Rules) should also be considered, especially if data are insufficient to determine a planned-maintenance interval. In addition, the potential consequence (e.g., the resulting effect) and the risk associated should be considered when determining a planned-maintenance interval. RCM employs two concepts when determining a plannedmaintenance interval: safe life limit and economic life limit. These limits are illustrated in Section 3, Figures 3 and 4.
(t)
Operating Age
MTTF
(t)
Operating Age
MTTF
19
Section
Planned Maintenance
If the failure mode could result in a severe safety or environmental effect or a highest-risk event, the planned-maintenance task interval is set using the safe limit concept. That is, the task interval is set to ensure there is little chance of failure occurring before the planned-maintenance task is performed. This usually means setting the interval well before the MTTF point. The economic life limit is used for all other failure modes. In this model, the task interval is based on the economics of the task and the expected equipment life. In this case, the task interval may be before, at, or after the MTTF point. Because few operations currently have enough data to determine optimal planned-maintenance task intervals, the initial task frequency is typically set at a conservative value (especially for highest-risk failure modes) and then optimized as the task is performed. However, performing planned maintenance too frequently can result in an increased failure rate. This increased failure rate results from human errors during the task and/or wear-in failures.
20
SECTION
Condition
Functional failure
Time
F P-F Interval
21
Section
ii)
iii)
iv)
v)
vi)
22
Section
4.1
4.2
ii) iii)
4.3
23
Section
Establishing limits helps ensure that condition-monitoring tasks are effective in detecting and/or preventing the failure.
24
SECTION
Introduction
Failure-finding maintenance tasks are employed to discover equipment faults that are not detected during normal crew operations (e.g., hidden failures). Because these failures are hidden, if proper maintenance is not performed, a second failure must occur and a failure consequence realized before the equipment fault is detected. For example, a standby electrical generator failing to start on loss of power may only be discovered when the primary generator fails and power is lost. Because these types of faults result in hidden failures, condition-monitoring or planned-maintenance tasks are typically not an effective failure management strategy. Failure-finding maintenance tasks usually involve a functional test of the equipment to ensure the equipment is available to perform its function(s) when demanded.
FMF = FIE
=
frequency of occurrence of the multiple failure frequency of occurrence of the initiating event making the hidden failure evident (1 aSYS), or the unavailability of the safety system or backup system availability of the safety system or backup system
a SYS =
aSYS =
This equation can be rearranged to solve for the unavailability of the safety system or backup system:
25
Section
Failure-finding tasks are effective in managing hidden failures because these tasks either (1) confirm that the equipment is functioning or (2) allow us to discover that the equipment has failed and needs repair. Once the task is performed, the unavailability of the safety system or backup system is reset to zero (or nearly zero). Then, as time progresses, the unavailability increases until the item fails or is retested again. If an exponential failure distribution is assumed, the failure rate is constant, which means the probability of the failure increases linearly (or at least nearly so over most reasonable time periods) at a slope equal to the failure rate (e.g., the probability of failure is a product of the failure rate and elapsed time). Section 5, Figure 1 illustrates the effect of failure-finding tasks.
1.0
a(t)
0.0
Time (t)
i) ii) iii)
Must be no applicable or cost-effective condition-monitoring or planned-maintenance task that can detect or prevent the failure. Must be technically feasible to perform. The task must be practical to perform at the required interval and must not disrupt an otherwise stable system. Must reduce the probability of failure (and therefore the risk) to an acceptable level. The tasks must be carried out at an interval so that probability of multiple failures allows an acceptable risk level to be achieved. Agreed-upon risk acceptance criteria should be determined and recorded. Must not increase the risk of a multiple failure (e.g., when testing a relief valve, an overpressure should not be created without the relief valve in service). Must ensure that protective systems are tested in their entirety rather than as individual components that make up the system. Must be cost-effective. The cost of undertaking a task over a period of time should be less than the total cost of the consequences of failure.
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004
iv) v) vi)
26
Section
i) ii)
Mathematically, using reliability equations, or Using general guidelines developed to ensure acceptable risk.
Regardless of the technique used, the key is to ensure that the unavailability of a safety system or backup system is low enough to ensure that frequency of occurrence of a multiple failure is sufficiently low to achieve an acceptable risk. For a given consequence resulting from a multiple failure, an acceptable frequency of occurrence for the multiple failure needs to be established. For example, an acceptable frequency of occurrence for a $1 million operational loss might be 0.01/yr and acceptable frequency of occurrence for a $100,000 operational loss could be 0.1/yr. In both cases, the risk is equivalent ($10,000/yr). These two techniques for setting failure-finding task intervals are briefly explained in the following paragraphs
4.1
i) ii)
If the failure has a wear-in failure characteristic, then either a one-time change or a conditionmonitoring task is usually employed to manage the failure. If the failure has a wear-out failure characteristic, then a condition-monitoring task or a planned-maintenance task should be applied to manage the failure.
To determine a failure-finding task interval, the equation for the frequency of a multiple failure and the equation for the unavailability of the hidden failure are combined as follows: The equation for the frequency of occurrence of a multiple failure is:
The distribution of the failures is exponential. The conditional failure rate times the test interval time ( test interval) is less than 0.1. The time to conduct a failure-finding task is short when compared to the length of time that the system is available. The time to conduct a repair of the system is short when compared to the length of time that the system is available. The multiple failure can only occur from the combination of the specified initiating event concurrent with the unavailability of the backup or safety system.
27
Section
T= where T FIE
2 F ACC MTTF ............................................................................................................. (5) FIE = = test interval acceptable frequency of occurrence of the multiple failure frequency of occurrence of the initiating event making the hidden failure evident mean time to failure for the system with the hidden failure
FACC =
MTTF =
4.2
Section 5, Tables 1 and 2 provide examples of the acceptable probability rules and failure-finding test interval.
When applying this guideline approach, the user must be aware of the assumptions used in developing the rules and task intervals, and ensure that the assumptions are valid.
28
SECTION
Consideration of Risks
Risks In General
Risk can be considered in two parts: how often a loss event occurs (frequency) and how severe the effects are (consequence). Frequency of a loss is usually expressed in loss events per year. The frequency can either be determined from past data (if a large number of events have occurred) or calculated using risk analysis tools (if few data records exist). Consequence can be expressed in terms of a combination of a loss events impact on the following example consequences: i) ii) iii) iv) v) vi) vii) viii) Capital investment. Damage to and cost of repair of equipment Community. Effect on the public Directional control. Complete loss or reduction of maneuverability Explosion or fire. Damage to equipment and/or the vessel Loss of containment. Amount of harmful substances released to the environment (the cleanup costs) Operations. Loss of hire, outage time of functions such as drilling, position mooring (station keeping), hydrocarbon production and processing, loading or unloading functions Propulsion. Complete loss or reduction of propulsive capability Safety. The number of people affected (injured or fatalities)
Having identified the risk of a loss event, ship designers, operators, insurers and regulators should deploy preventative or mitigative measures or both to the extent that the risk can be reduced to an acceptable level. Depending on the operating modes of the vessel or marine structure (ocean transit, cargo discharge, etc.), loss events associated with each operating mode may differ. Thus, identifying the operating mode is the essential first step in addressing risks for vessels. A general risk model to illustrate this concept is shown in Section 6, Figure 1. The operating modes represent the different operating contexts and environments for the vessel. The hazards can then be determined based on the operating modes. The initiating events are specific equipment failures, human errors or external events (e.g., lightning strike) that potentially result in an undesired event. Preventative measures are engineered safeguards (e.g., alarms) or management systems (e.g., personnel training) to prevent an initiating event from propagating into the undesired event. Undesired events are the immediate results of an initiating event and hazard on the vessel (e.g., collision, allision). The losses are the ultimate impact resulting from the undesired event and are usually measured in terms of the consequences listed in Subsection 6/1. The mitigating measures are the engineered safeguards and management systems that reduce or control the loss. Section 6, Figure 2 provides an example of the risk model, illustrating vessel operation in restricted waters.
29
Section
Consideration of Risks
Corresponding Hazards
Initiating Events
Undesirable Events
Losses
Preventive Measures
Mitigating Measures
iv)
Risk acceptance criteria are discussed in general terms in the following Subsection.
30
Section
Consideration of Risks
Risk Characterization
A sample risk matrix is shown in Section 6, Figure 3. A risk matrix is an efficient way to characterize the risk of loss events. A risk matrix is simply a grid of cells that corresponds to defined consequence (severity) and frequency categories into which the loss events can be placed. The consequence and frequency categories are defined broadly enough to help one easily determine an appropriate risk cell for a loss event, but narrow enough to provide varying degrees of resolution for decision making. In many cases, frequency and consequence levels on a risk matrix are graduated by an order of magnitude. Severity levels can be defined for several types of loss consequences from the list of examples in Subsection 6/1. Section 6, Table 1 lists five (5) consequences (directional control, propulsion, loss of containment, fire/explosion and safety) and defines four severity levels for each consequence. An appropriate severity level term for the consequence is to be chosen and defined prior to the risk analysis. For each severity level, several example descriptors are listed. Some descriptors are shown repeated between adjacent severity levels. Some studies use numerals (e.g., 1, 2, 3, 4). The descriptor chosen to describe a particular severity level may vary from analysis to analysis. For example, one analysis may choose the descriptors hazardous and critical to describe the two highest severity levels while another analysis may choose critical and catastrophic. A listing of example frequency categories is shown in Section 6, Figure 3. Frequency categories are typically expressed in units of events per year. An example frequency category from Section 6, Figure 3 is Occasional, corresponding to a range of 0.01 to 0.1 events per year (a range of 1 event every 100 years to 1 event every 10 years). Sometimes, it is hard to obtain a perspective on the smaller frequency categories (e.g., Remote: 0.001 events per year to 0.01 events per year, or from 1 event every 1,000 years to 1 event every 100 years). These tiny frequencies can be better understood when one looks at multiple vessels over longer periods of time. For example, a loss event that has occurred twice across a fleet of 100 vessels over the last 20 years corresponds to a frequency of 0.001 events per year. The risk-based decision-making aspect of a risk matrix is seen in the lines of constant risk. Each cell in the risk matrix corresponds to a defined risk level. Cells with similar risk levels are grouped together to form a line of constant risk. All of the cells on a risk matrix should be categorized into relevant lines of constant risk. Risk-based action levels to address the loss event are based on the risk level represented by the line of constant risk. Usually, the actions necessary to address loss events in each risk level are predefined. In Section 6, Figure 3, there are three lines of constant risk. These are denoted by the different shades in the risk matrix (High Risk, Medium Risk and Low Risk). Referring to Section 6, Figure 3, if the consequence of a loss event were estimated as Hazardous and the frequency estimated as Probable, the loss event falls within a line of constant risk categorized as High Risk. Based on the action levels defined for this risk level, the loss event would be addressed by either a redesign or a one-time change. Once the risk of the loss event has been identified from the risk matrix, a risk reduction action consistent with the required action level should be selected. Typically, maintenance tasks such as condition monitoring and planned maintenance will only reduce the frequency of occurrence of a loss event, while equipment redesign actions and one-time changes may reduce both the frequency and consequence. Essential to the risk-based task selection process is an assessment of the impact that the task has on the loss event. The ultimate objective should be to select an efficient, feasible task to reduce the risk level of the loss event to an acceptable level of risk (e.g., low risk or medium risk).
31
Section
Consideration of Risks
y y y y
Losses Safety - employee injuries Environmental - spill of hazardous materials Property damage - ship hull breached Economic - ship out of service for repairs
y y y y
Preventive Measures Engine overhaul Redundant lube systems Engine inspections Redundant engines
Mitigative Measures Radar Radio communication Vessel subdivision Crankcase explosion relief valve y Dragging anchor y y y y
32
Section
Consideration of Risks
Critical
Hazardous.
Major
Minor
1 Improbable Fewer than 0.001 events/year (Less than 1 event every 1,000 years)
2 Remote 0.001 to 0.01 events/year (1 event every 1,000 years to 1 event every 100 years)
3 Occasional 0.01 to 0.1 events/year (1 event every 100 years to 1 event every 10 years)
High Risk Redesign or other one-time change required to reduce risk Medium Risk One or more maintenance tasks are acceptable to reduce risk (e.g., condition monitoring, preventive maintenance) Low Risk Run-to-failure (no maintenance) is acceptable
33
Section
Consideration of Risks
Minor, Negligible
An occurrence adversely affecting the vessels seaworthiness or fitness for service or route Loss of vessel or results in total constructive loss
Serious/significant commitment of resources and personnel Complete loss of containment. Full scale response of extended duration to mitigate effects on environment.
Catastrophic, Critical
Notes: 1 Safety losses are not intended to be compared to other losses to determine monetary equivalency.
34
SECTION
Introduction
The following procedures provide guidance for conducting RCM analyses. RCM analyses are to be performed in a step-by-step fashion. The basic elements of an RCM analysis process are as follows: i) ii) iii) iv) v) vi) vii) viii) Identify operating modes and corresponding operating contexts Define vessel systems Develop system block diagrams and identify functions Identify functional failures Conduct a failure modes, effects and criticality analysis (FMECA) Select a failure management strategy Determine spare parts holdings Document the analysis
The procedures to perform the RCM analysis are shown in Section 7, Figure 1, along with the crossreference to the corresponding Subsection/Paragraph of this Section.
Defining Systems
In order to efficiently and thoroughly perform an RCM analysis, each system must be thoroughly defined. This activity involves (1) defining the operating characteristics for the ship as a whole and then for each system and (2) partitioning the vessel into functional groups, then into specific systems and then into equipment items. These distinctions are needed to clearly define the boundaries and operational intent of each system that is subject to RCM analysis.
2.1
35
Section
Partitioning systems
Paragraph 7/2.2
Develop system block diagrams and identify functions and functional failures
Subsection 7/3
Subsection 7/4
Subsection 7/5
Paragraph 7/5.3
Section 8
Subsection 7/6
Preventative Maintenance Plan, Spare Parts Plan, Sustainment Process Ready for Implementation of RCM Onboard
2.1.1
Operating Mode An operating mode of a vessel or marine structure is the operational state that the vessel is in. Each operating mode influences the manner in which the shipboard systems and machinery are to be operated. This in turn dictates the development of operating contexts for individual functional groups.
Normal seagoing conditions at full speed Maximum permitted operating speed in congested waters Maneuvering alongside Cargo handling
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004
Section
The following example operating modes are typical for mobile offshore drilling units and offshore oil and gas production facilities:
2.1.2
Drilling operations Position mooring or station keeping Relocation/Towing Hydrocarbon production and processing Import and export functions
Operating Context The operating context of a functional group is the circumstances under which the system is expected to operate. It must fully describe:
i) ii) iii)
The physical environment in which the functional group is operated A precise description of the manner in which the functional group is used The specified performance capabilities of the functional group as well as the required performance of any additional functional groups within which the functional group is embedded
Some of the important factors that must be considered in the development of the operating context for a functional group are: i) Serial redundancy. Applies to arrangements for which an identical standby system/equipment exists to support an operating functional group. In the event of failure of the operating system, the standby system is activated. The operating contexts for the running system/equipment and standby system/equipment are different. For example, a functional failure in the operating system/equipment will likely be evident, while a functional failure in the standby system/equipment will likely be hidden. Parallel redundancy. Applies to systems/equipment operating simultaneously. Each system has the capability to meet the total demand. In the event of a functional failure in one system/equipment, the remaining systems/equipment will continue to operate, but at a higher capacity. In some arrangements, standby systems/equipment may also be in reserve. Performance and quality standards. Systems/equipment may be required to perform at a certain performance level or to provide a service with a certain quality level (e.g., compressed air supplied at specified quantity at certain pressure within certain temperature ranges and humidity limits). Environmental standards. As required by international, national and local laws and regulations (e.g., for an engine emission standard, the operating context of a functional groups impact or potential impact on the environment must be considered). Safety standards. Hazards that might be present in an operating context and the safeguards that must be in place for protection of the crew should be specified. Shift arrangements. For oceangoing vessels, it is assumed the propulsion machinery is operating continuously, except when the vessel is docked. On the other hand, the ships service electrical power system is operating continuously. System arrangements and maintenance strategies must be carefully developed so as to ensure system availability.
ii)
iii)
iv)
v) vi)
37
Section
2.1.3
Developing Operating Contexts of Vessels Operating contexts are to be developed to different degrees of detail at each level. At each level of functional breakdown, an operating context statement should be written for that level, amplifying the operating context written for the preceding level. At the lower levels of the functional breakdown, more detail is included in the operating context statement because at this level, the focus is on the systems and equipment that make up the functional group. Specific performance parameters are necessary to clearly define functions for the functional group and then to determine what constitutes a failure and what effects such failures will have upon specific equipment performance, overall system operation and, ultimately, the vessels roles.
Vessel level. Operating contexts must first be developed for the vessel. They are normally generic to a vessel type. This should include first a physical description of the vessel, the vessel type and the cargoes to be carried, the performance standard of the vessel (speed, maneuverability, fuel capacity and consumption, etc.) and the cargo handling capability. Statements are to be made on the primary roles (e.g., to carry cargo from point A to point B in a certain time, cargo preservation), secondary roles (e.g., crew habitability), and the safety and environmental roles of the vessel. Functional group level. The vessel-level operating context is then used to develop an operating context for each functional group level (e.g., machinery and utilities, then propulsion functional group). The operating contexts at a given functional group level must include all of the operational characteristics needed to define the operating context for the next highest level. For example, the operating contexts for the propulsion, maneuvering, electrical, vessel service, and navigation and communication functional groups must include all of the operating characteristics included in the machinery and utilities functional groups. In addition, an operating context must be developed for each vessels operating mode. As an example, the operating context for the propulsion functional group may be developed in a structured manner, as shown in Section 7, Table 1. Section 7, Table 2 shows an example operating context for the diesel engine within the propulsion functional group.
38
Section
Manner of Use
Propels vessel from 2 to 10 knots, with reversing and stopping capabilities, and assists in mooring To output at 30 to 85 RPM; reversing at 63 RPM, controllable from bridge, centralized control station and locally
Not used
Performance Capability
To output at 30 to 85 RPM; reversing at 63 RPM, controllable from bridge, centralized control station and locally
Not Applicable
39
Section
Manner of Use
Propels vessel from 2 to 10 knots, with reversing and stopping capabilities, and assists in mooring To output at 30 to 85 RPM; reversing at 63 RPM, controllable from bridge, centralized control station and locally
Not used
Performance Capability
To output at 30 to 85 RPM; reversing at 63 RPM, controllable from bridge, centralized control station and locally
Not Applicable
2.2
Partitioning Systems
Because a vessel is made up of many complex systems and subsystems, it is helpful to divide the vessel into functional groups and then into specific systems, subsystems, equipment items and, finally, components within each functional group.
40
Section
2.2.1
Partitioning a Vessel into Functional Groups Partitioning a vessel into functional groups is accomplished using a top-down approach. For most vessels, the top level includes these top-level functional groups:
In most cases, partitioning of these high-level functional groups is necessary to identify major systems for analysis. For example, machinery and utilities should be further divided into the following functional groups: Propulsion functional group Maneuvering functional group Electrical functional group Ship service functional group (e.g., bilge, ballast, firefighting, steam) Navigation and communication functional group
Each functional group should be partitioned using a top-down approach. This is done until a level is reached at which functions are identified with discrete physical units, such as a single system or equipment item. This is sometimes called the level of indenture. The level of indenture is of vital importance as it significantly affects the amount of time and effort required to complete a satisfactory analysis. An analysis carried out at too high a level can become too superficial, while one taken at too low a level can become too cumbersome. The level of indenture will vary depending on the complexity of a system. Highly complex systems will have a large number of failure modes and will tend to be analyzed at lower levels. The level of indenture should be such that the following can be identified for the functional group: i) ii) iii)
2.2.2
Partitioning a Functional Group into Equipment Items Once a satisfactory level of partitioning functional groups has been completed, each functional group is partitioned into specific equipment items. Again, one or two levels of indenture may be needed to satisfactorily divide a functional group into equipment items. The level of indenture chosen for equipment items should be such that the equipment:
i) ii) iii)
Can be identified for its contribution to the overall functions of the functional group Can be identified for its failure modes Is the most convenient physical unit for which maintenance can be specified
Section 7, Figure 2 shows an example of partitioning of functional groups and their associated equipment items.
Note: RCM analyses tend to be performed at too low an indenture level because of the mistaken belief that a failure mode can only be identified at the level of the component. In fact, failure modes can be identified from higher levels too, except that identification at the higher levels will be less structured than that at the lower level.
41
Section
Hull Discipline
Cargo Handling
Functional Groups
Systems
Diesel Engine
Reduction Gear
Propeller
Subsystems
Basic Engine
Engine Support Systems Lube Oil, Water, Fuel, Hydraulic, Air, Exhaust, Control Systems, Monitoring...
Cylinder Cover Assembly Equipment Items Mechnical Control Gear, Chain Drive and Camshaft
Exhaust Valve
42
Section
2.2.3
Selection of Functional Groups for Analysis It may be necessary to identify an order of priority for the analysis of the functional groups so that resources may be targeted most productively. In general, one of the following methods is used to select groups for analysis:
i)
Engineering judgment. This approach relies on the undocumented experience of subject matter experts to select the group. Typically, in selecting a group, a team will subjectively consider the following issues: number of failures that have occurred, the amount of maintenance resources, the opportunity to improve performance and the potential to reduce costly downtime maintenance (e.g., dry-docking maintenance). Once the selection and priorities are determined, the team should document the rationale for its decision. Simple analytical approaches. A more analytical method for selecting functional groups is to use simple analysis tools, such as Pareto analysis and relative ranking. These tools provide the selection team with a structured methodology for ranking the different issues considered during the selection process. When using the Pareto analysis, the team would collect data related to each issue being considered. For example, if number of failures is important, the team would look to collect failure data for each group and then rank each group based on the number of failures. When using relative ranking, the team develops a scoring system that is used to score each issue. The scores are then tabulated and evaluated to rank the groups. Risk assessment. The most comprehensive approach is to perform a risk assessment or use an available risk assessment to select and rank functional groups. Whether the risk assessment is a detailed quantitative analysis or high-level profiling analysis (as used for enterprise risk management), the risk assessment data can be used to identify the groups that have unacceptable risk and those that have the highest risk. The unacceptable risk data can be used to determine if further detailed analysis, such as RCM analysis, is warranted. Then, the group risk ranking can be used to prioritize the groups for analysis (e.g., groups with highest risk are analyzed first). In addition, the risk assessment should be reviewed to determine if there are equipment failures that can be impacted by improved maintenance and if these failures are major contributors to the risk. For example, in reviewing the risk assessment for a group, one might discover that the major contributor to risk is operational errors. For this group, an RCM analysis might not be the best analytical method to reduce the risk. However, highest risk groups in which equipment failures are a major risk contributor are good candidates for RCM analysis.
ii)
iii)
Regardless of which approach is used to select groups, the following considerations should be made: i) ii) The expected cost savings over the predicted remaining life of the equipment should be balanced against the cost of the analysis The human resources required to undertake each analysis must be identified and their availability ascertained.
43
Section
Once the functions are defined, functional failures (e.g., different loss functions that can occur due to failures) are defined. Functional failures can reflect the total loss of function (e.g., provides no compressed air) or partial loss of function (e.g., provides compressed air at reduced pressure and flow). The following paragraphs explain how to identify functions and functional failures in more detail.
3.1
For example, some of the secondary functions of a diesel engine are to have acceptable engine emissions in accordance with some standard, to have a vibration level that will not affect structural integrity, etc.
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004
44
Section
Governor
Torque
Engine RPM
Vapor
Lube OIl
Torque & Vibration Contaminated Lube Oil Lube OIl Lube Oil Lube Oil & Heat
To Propulsion Shafting Cleaning System, Stuffing Box Drain Oil Lube OIl To Lube Oil Sump Tank Sludge (to waste) Cool Lube Oil
Cylinder Lubricating Oil Lube Lube OIl OIl & Heat Freshwater & Heat Cool Freshwater
Scavenge Air
Freshwater
Scavenge Air & Exhaust Gas Systems, including Turbochargers Freshwater & Heat Exhaust Gasses & Noise
Instrumentation & Alarms Central Cooling Water System Alarms Readouts Cylinder Lubricating Oil Seawater Sewater & Heat
The most important of the secondary functions relates to protection and protective devices. Protection and protective devices work in one of the following five ways: i) ii) iii) iv) v) To draw the operators attention to abnormal conditions To shut down the equipment in the event of a failure To eliminate or relieve abnormal conditions that follow a failure and that might otherwise cause more serious damage To take over from a function that has failed To prevent a dangerous situation from arising in the first place
When listing the functions of any system/equipment, the functions of ALL of the associated protective devices must be listed. These devices must receive special attention.
Atmospheric Air
Condensate
Cool Freshwater
Cool Freshwater
45
Section
3.2
Each functional failure should be documented in a functional failure statement that contains a verb, an object and the functional deviation. An example function and functional failure list is shown in Section 7, Table 3.
46
Section
Conducting an FMECA
Once potential functional failures have been identified, the next step in the RCM analysis is to conduct an FMECA. The purpose of this step is to establish the cause-and-effect relationship among potential equipment failures, functional failures and the end effect of the functional failures, and to evaluate the criticality of the postulated failure mode. This information is vital to determine the following: When a failure management strategy is needed What type of failure management strategy is best used to manage the failure mode (e.g., one-time change, planned maintenance or run-to-failure) The importance of the failure management strategy
4.1
47
Section
4.1.1
Bottom-up FMECA Approach The bottom-up approach is performed by explicitly analyzing each equipment item of interest. This approach focuses on determining what effects different equipment failure modes have on the operation of the system. The bottom-up approach determines whether the equipment failure mode results in a local effect that causes a functional failure that causes an end effect of interest. Following are the steps for performing a bottom-up FMECA:
Select an equipment item for analysis Identify the potential failure modes for the equipment item Select a failure mode for evaluation Determine the failure characteristic (e.g., wear-in, random, wear-out) for the failure mode Determine the local, next higher level and end effects for the postulated failure mode If the end effect results in a consequence of interest, determine the causes of the failure mode Determine the criticality of the failure mode using the risk decision tool Repeat steps as necessary until all equipment items and associated failure modes have been evaluated
When performing a bottom-up FMECA, the failure causes are the basic equipment failures that result in the failure mode, and the next higher-level effect typically identifies the resulting functional failure. Section 7, Table 4 provides an example of a bottom-up FMECA. The bottom-up approach helps ensure that all equipment items are analyzed and all plausible equipment failure modes are considered. In addition, a standard list of failure modes can be developed for common equipment items, thus making the analysis somewhat easier to perform and helping to ensure consistency between RCM teams. Appendix 2 of the Guide for Survey Based on Reliability-Centered Maintenance provides a listing of suggested failure modes for marine machinery equipment and components.
4.1.2 Top-down FMECA Approach The top-down approach is performed by analyzing each function and its associated functional failures. This approach focuses on determining what effects different functional failures have on the operation of the system and then what equipment failures (e.g., failure mode) can result in the functional failure. The top-down approach determines whether the functional failure results in an end effect of interest and then determines which equipment failures can cause the functional failure. Following are the steps for performing a top-down FMECA:
Select a function for analysis Select a functional failure for evaluation Determine the local and end effects for the postulated functional failure If the end effect results in a consequence of interest, determine the equipment failures that can result in the functional failure Determine the failure characteristic (e.g., wear-in, random, wear-out) for the failure mode Determine the criticality of the failure mode using the risk decision tool Repeat steps until all functions and functional failures are evaluated
48
Section
15.1
Pump motor failure Pump seizure Pump motor control failure Pump coupling failure
Random failure, Wear-out failure Random failure, Wear-out failure Random failure, Wear-out failure Wear-out failure
Brief shutdown of the engine until standby lube oil pump is started
15.2
Starts prematurely/ operates too long (standby pump) Operates at degraded head/flow performance (on-line pump) (evident) Worn pump gears Wear-out failure Insufficient pressure or flow of lubricant to the camshaft, resulting in a low pressure alarm and requiring standby pump to be started Flows less than 10.3 m3/hr of lubricant to the camshaft Flows lubricant to the camshaft at a pressure less than 4 bar
No effect of interest
15.3
Description: Camshaft Lube Oil Pump Current Likelihood Current Risk Failure Detection/ Corrective Measures
15.1
Propulsion
Minor
Remote
Low
Upon low pressure, sensor sends signal to automatic changeover controller which starts standby pump Upon low pressure, sensor sends signal to automatic changeover controller which starts standby pump
49
Section
25.1a
High engine vibration, requiring a shutdown Rupture of fuel oil line, releasing fuel oil into the engine room Catastrophic release of cylinder pressure, causing shrapnel to be released in the engine room Partial loss of containment of cooling water Relative motion between two parts, fretting Studs eventually break if left undetected Overheating of piston crown, potentially causing piston failure
Potential injury to personnel if hit by shrapnel Damage to cylinder cover and/or piston
25.1b
Loosened piston rod studs at the crosshead (evident) Restricted oil passageway in the piston rod (hidden)
Wear-out
Engine damage due to a loose piston rod Vessel out of service for a time to make repairs Damage to the piston Vessel out of service for a time to make repairs
25.1c
Wear-out
Description: Camshaft Lube Oil Pump Current Likelihood Current Risk Failure Detection/ Corrective Measures
25.1a
Engine noise, exhaust fume odor, and engine vibration will alert the operator to the failure
25.1b
25.1c
50
Section
4.2
i) ii) iii)
Failures that have previously occurred on similar equipment Any other failure modes that have not yet occurred but are considered probable, including those being suppressed by the current preventative maintenance program Failure modes that are possible but considered unlikely (included to show that they have been considered)
When performing a bottom-up FMECA, the following guide phrases may be used to help develop a list of failure modes to be considered: i) ii) iii) iv) v) vi) vii) Premature operation Failure to operate at a prescribed time Intermittent operation Failure to cease operation at a prescribed time Loss of output or failure during operation Degraded output or operational capability Other unique failure conditions
Failure causes, such as normal wear and tear, corrosion, abrasion, erosion, fatigue, etc., should be recorded in sufficient detail to enable an appropriate failure management strategy to be identified. Failures caused by human error should be included if firm evidence exists to support such failures, or if operator error can induce significant consequences. It is important to ensure that the causes are sufficiently identified so that the subsequent maintenance recommendations address the cause of failure rather than its symptoms.
4.2.2 Identifying Failure Effects Failure effects should be identified at three levels:
i)
Local effects are effects local to the system/equipment being analyzed and should include the following: Failure detection methods (alarms, test indicators) Reduced level of performance Whether a standby system/equipment can provide the same function
ii)
Next higher effects are effects on the larger system to which the system/equipment forms a part and should include the following: Potential physical damage to the system/equipment Potential secondary damage to either other equipment in the system or unrelated equipment in the vicinity
51
Section
iii)
End effects are effects on the vessel and should include the following: Threats to safety and the environment Operational effectiveness of the vessel Downtime needed to repair the damage
4.3
Mitigation of the consequences of failure before implementing maintenance (e.g., bringing a standby system online, reconfiguring the system) and the estimated time for such action Defective item repair action (e.g., repair primary and secondary damages as applicable, personnel needed, whether dry-docking or shore support is required, time to perform repair) Spare part identification
Level of Indenture Considerations For the chosen functional group, FMECA should be conducted on systems or equipment at a lower convenient indenture level, typically the level at which maintenance is performed. However, experience has shown that if more than 20 to 30 failure modes can be identified at an indenture level, a lower indenture level should be chosen. Maintenance Considerations The FMECA shall be performed assuming zero-based maintenance (e.g., no proactive maintenance tasks are being performed) for which the data for probability of failure is not available. This is necessary to ensure that the need for a failure management strategy is determined. For application to existing maintenance programs, the probability of failure is to be based on the current maintenance program.
4.3.2
4.4
The following procedure should be used to determine the qualitative risk associated with a failure mode: i) Severity classification. Identify both the consequence of the end effect resulting from each failure mode and the severity category allocated in applying the example shown in Section 6, Table 1. For failure modes that do not directly result in an end effect (e.g., failure of a protective device), the criticality analysis will take account of the multiple failures and assume that the protected function experiences failure with the protective device in the failed state.
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004
52
Section
ii)
Probability of occurrence. Derive the probability of occurrence of each failure mode for those failure modes identified in the FMECA. Determine the probability of failure in accordance with 7/4.3.2. See Section 6, Figure 3. Risk matrix. Obtain the risk level from Section 6, Figure 3 by plotting the severity classification and probability of occurrence on this matrix.
iii)
The criticality ranking (e.g., the risk) for each failure mode/end effect pair is then used in an RCM decision flow chart to determine the proper failure management strategy. Section 7, Tables 4 and 5 include examples of the criticality ranking.
Random failure
Wear-out failure
53
Section
YES
NO
If applicable, select the next cause for evaluation Consider Effective condition monitoring task(s) OR Effective planned maintenance task(s) OR Effective combination of condition monitoring and planned maintenance task(s) 7/5.1.2 and 7/5.1.3
Specify task(s)
Hidden or evident loss of function? 7/5.1.4 EVIDENT YES Specify task(s) Any effective task? HIDDEN
NO
Specify task(s)
54
Section
Yes
Yes
Highest
No
No Lowest
Specify run-tofailure strategy
Yes
No No
Does the cause exhibit a wear-in and/or wear-out failure characteristic?
Yes
Wear-in
Is there a one-time change that is applicable and effective?
Yes
Wear-out No
Is there a planned maintenance task that is applicable and effective?
Consider redesign
Yes
Specify planned maintenance at the appropriate life limit
Is there a combination of conditionmonitoring and planned maintenance tasks that are applicable and effective?
Yes
Specify combination tasks at the P-F interval and the life limit
No
Will the loss of function from this cause be hidden or evident?
Hidden
Evident
Is there a task(s) that is applicable and effective? Is there a failure finding task that is applicable and effective?
Yes
Yes
Specify the tasks at the appropriate interval to achieve a tolerable risk
No
One-time change may be necessary to achieve a tolerable risk
No
55
Section
Yes
No
Reevaluate the risk assuming the selected maintenance tasks and any one-time changes are in place.
No
Does the risk level meet the risk acceptance criteria?
Yes
No
Is the risk level tolerable and no further risk reduction is practically feasible?
Yes
No
A
Reevaluate the maintenance tasks and one-time changes for the failure mode.
5.1
Highest risk. A failure mode with the highest risk typically cannot achieve an acceptable level of risk through maintenance alone. In general, to achieve an acceptable level of risk, a fundamental change in how the equipment is designed or operated is needed. Therefore, a one-time change is required to reduce the risk. Once the one-time change is identified, the FMECA should be updated and any applicable failure modes reevaluated using the RCM Task Selection Flow Diagram. Lowest risk. A failure with the lowest risk is a low-priority failure and, therefore, is acceptable without any failure management strategy for most organizations.
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004
56
Section
Confidence in the risk characterization. High confidence indicates the team is relatively certain that the risk is properly characterized and, therefore, can be used in the RCM flow diagram without any further discussion. Low confidence indicates that the team is uncertain and that additional data (about the probability or consequence of the failure) are needed before the risk can be used in the decision-making process. To be conservative, the failure mode is then assumed to have a medium/moderate risk characterization and is evaluated through the entire RCM Task Selection Flow Diagram.
5.1.2
Second Selection Decision Condition-monitoring tasks are first considered because these tasks typically are the best choice technically and usually the most cost-effective. In determining if the failure mode can be managed by a condition-monitoring task, the team must select a specific task and then determine an appropriate task interval. The following provide criteria for making these decisions.
5.1.2(a) Maintenance task selection criteria. For a condition-monitoring task to be selected, it must first be applicable and effective. When determining the applicability and effectiveness, the following should be considered: i) ii) iii) Must be practicable to implement (e.g., the required maintenance task interval and accessibility for carrying out the task are operationally feasible) Must have a high degree of success in detecting the failure mode Must be cost-effective. The cost of undertaking a task over a period of time should be less than the total cost of the consequences of failure. The costs should include man-hours, spares, tools and facilities, and should be assessed on the basis of through-life costs.
Next, the team must evaluate the potential risk reduction resulting from implementing the condition-monitoring task. This is accomplished by determining the reduction in risk that is anticipated if the task is implemented. In general, proactive maintenance tasks will reduce the probability of the failure mode occurring rather than the severity of the consequence. The reduced risk is then compared to the risk acceptance criteria to determine whether the task should be selected. If the risk reduction does not achieve an acceptable level of risk, the failure mode is further analyzed to determine if other maintenance tasks or a one-time change is needed to manage the failure. 5.1.2(b) Maintenance task interval determination. Ideally, proactive maintenance task intervals are determined using actual failure data, but this is not realistic for most organizations. Therefore, the task frequency must be determined from the following sources listed in ascending order of priority and documented: Generic P-F interval data Manufacturers recommendations Current task intervals Team experience
For condition-monitoring tasks, the task interval must give enough warning of the failure to ensure action can be taken in time to avoid the consequences. The maintenance task interval must be set at less than half the anticipated P-F interval.
57
Section
5.1.3
Third Selection Decision If condition monitoring does not provide an effective failure management strategy, the team must then use its knowledge of the failure characteristics to evaluate the need for other proactive maintenance tasks or one-time change. If the failure mode exhibits a wear-in failure characteristic, the team considers a one-time change or redesign of the equipment item as a means to manage the failure. If the failure mode exhibits a wear-out failure characteristic, the team first considers planned maintenance to manage the failure. The team must select the task and task interval.
5.1.3(a) Maintenance task selection criteria. As for a condition-monitoring task, a plannedmaintenance task must first be applicable and effective to be selected. When determining the applicability and effectiveness, the following should be considered: Must be practicable to implement (e.g., the required maintenance task interval and accessibility for carrying out the task are operationally feasible) Must have a high degree of success in preventing the failure mode Must be cost-effective. The cost of undertaking a task over a period of time should be less than the total cost of the consequences of failure. The costs should include manhours, spares, tools and facilities, and should be assessed on the basis of through-life costs.
Next, the team must evaluate the potential risk reduction resulting from implementing the planned-maintenance task just as in the condition-monitoring task. 5.1.3(b) Maintenance task interval determination. Ideally, proactive maintenance task intervals are determined using actual failure data, but this is not realistic for most organizations. Therefore, the task frequency must be determined from the following sources listed in ascending order of priority and documented: Generic failure data Manufacturers recommendations or failure data Current task intervals Team experience
For planned maintenance to be effective, there must be a clear life for the item, and most items must survive this life, after which the conditional probability of failure increases significantly. The life can be determined based on information from the equipment manufacturer, expert opinion, published reliability data, actuarial analysis, etc. If the risk reduction does not achieve an acceptable level of risk, the failure mode is further analyzed to determine if a combination of planned-maintenance and condition-monitoring tasks can achieve an acceptable risk. If a combination does provide an appropriate failure management strategy, the failure mode is further analyzed in accordance with the maintenance task selection criteria above.
5.1.4 Fourth Selection Decision Decide whether the failure mode is an evident or a hidden failure mode.
Evident failures An evident failure is one that will eventually become evident to the operating crew under normal operating conditions (NOC) (e.g., the loss of function will be noticed at some future, indefinite time without any further incident or intervention).
58
Section
Hidden failures A hidden failure is a failure that will not become evident to the operating crew under NOC if the failure mode occurs on its own. Normally, hidden failures only become apparent after a second but related failure or event occurs. For example, the failure of a protective device that is not fail-safe is a typical hidden failure. Although there will be no immediate consequences of a hidden failure, the consequence of such a failure will be an increased risk of multiple failures. The ultimate safety, environmental or operational implications must therefore be considered fully and recorded as possible failure effects.
If the failure is hidden and there is no condition monitoring, planned maintenance or combination of tasks that will provide an acceptable risk level, the team must determine if a failure-finding task is needed to manage the failure. 5.1.4(a) Maintenance task selection criteria. As for condition-monitoring and plannedmaintenance tasks, a failure-finding task must first be applicable and effective to be selected. When determining the applicability and effectiveness, the following should be considered: Must be practicable to implement (e.g., the required maintenance task interval and accessibility for carrying out the task are operationally feasible) Must have a high degree of success in discovering the hidden failure mode Must be cost-effective. The cost of undertaking a task over a period of time should be less than the total cost of the consequences of failure. The costs should include manhours, spares, tools and facilities, and should be assessed on the basis of through-life costs.
Next, the team must evaluate the potential risk reduction resulting from implementing the failure-finding task just as in the condition-monitoring and planned-maintenance tasks. 5.1.4(b) Maintenance task interval determination. Ideally, proactive maintenance task intervals are determined using actual failure data, but this is not realistic for most organizations. Therefore, the task frequency must be determined from the following sources listed in ascending order of priority and documented: Generic failure data Manufacturers recommendations or failure data Current task intervals Team experience
For failure-finding tasks, availability and reliability information, where possible, is to be used to set failure-finding intervals, T. For example, T can be determined by the following equation: T= where T FIE = = test interval acceptable frequency of occurrence of the multiple failure frequency of occurrence of the initiating event making the hidden failure evident mean time to failure for the system with the hidden failure FACC = 2 F ACC MTTF ................................................................................................ (1) FIE
MTTF =
59
Section
As a guide, the unavailability for safety and environmental functions should be not more than 0.05%; for operational functions, not more than 2.0%; and for non-operational functions, not more than 10%. Also, the team should consider regulatory requirements.
5.1.5 One-time Changes If the failure is evident or it is hidden and there is no failure-finding task that will provide an acceptable risk level, the team must decide if risk cannot be practically reduced to the low risk level and then determine which of the tasks (or combination of tasks) provides the best failure management strategy. If the team determines that the risk can and should be lower than what can be achieved with maintenance, the team should consider one-time changes to manage the failure. To evaluate the effectiveness of one-time changes, the team should determine the potential changes and consider the following:
5.1.6
Does the one-time change reduce the risk to an acceptable level? If not, does the one-time change reduce the risk to a tolerable level with no further risk reduction reasonably possible? Is the one-time change cost-effective? That is, is the cost reasonable for the resulting risk reduction? Are any of the other maintenance tasks discussed more effective, or can they result in more risk reduction than the one-time change?
Rounds and Routine Servicing In addition, the team should examine rounds and routine inspection tasks. These important tasks help ensure the failure rate curve for the failure mode (that is the basis for the proactive maintenance tasks and risk characterization) is not altered (e.g., premature wear-out of a bearing because of lack of lubrication).
5.2
Tasks with safety/environmental consequences should only be adjusted to a shorter task interval to ensure that safety and containment are not compromised Tasks with operational consequences may be adjusted to a longer or shorter task interval. However, when adjusting to a longer interval, the team should obtain the approval of the responsible person in the shipping company.
60
Section
5.2.3
Overall Maintenance Schedule Category B and C task intervals should then be organized to derive an overall maintenance schedule. This is done by adjusting the RCM task intervals (Category B and C tasks only) using the criteria specified in Category B so that the tasks can coincide with the vessels port calling and dry-docking schedules. An example maintenance task summary with the necessary information is indicated in Section 7, Table 7.
5.3
Spares Holding
For the proposed maintenance schedules to be viable, it is essential that the spares that support the identified maintenance tasks are available at the appropriate time. The spares holding requirement is to be developed based on the following considerations: The list of parts necessary to perform tasks to correct each failure mode identified in the RCM analysis, along with the parts required as a result of remedial work to correct conditionmonitoring, planned-maintenance, failure-finding, any applicable and effective and runto-failure tasks. An evaluation of the effects on the functional group or systems operational availability if an outof-stock condition occurs. Assessment for those parts whose use can be preplanned. For those parts whose use cannot be preplanned, determine the quantity necessary to achieve the desired operational availability.
Section 7, Figure 6 is to be applied to select the most appropriate spares holding to achieve the desired level of the End Effects. Section 7, Figure 6A has been provided to illustrate a spares holding determination example. An example spares holding determination summary with the necessary information is indicated in Section 7, Table 8.
5.3.1 Stock-out Effect on End Effects Determine whether the stock-out and further failure will result in End Effects, such as degradation or loss of propulsion, fire, etc. When determining the effect, consider the direct and indirect effects of the stock-out under normal circumstances. The following define direct and indirect effects and normal circumstances.
Direct effect. If the spare is not available and the associated maintenance tasks cannot be carried out, the corresponding failure mode will eventually lead to an End Effect(s) if failure occurs. Indirect effect. If the spare is not available and the associated maintenance tasks cannot be carried out, the corresponding failure mode will not lead to an End Effect(s), unless a further failure occurs. Normal circumstances. occurring. The item is operating within context and without a failure
For the case when: The parts requirements can be anticipated before failure occurs or there is sufficient warning time for the parts to be ordered. Lead-time for parts order is consistent over the life cycle of the equipment item or component.
61
Section
Then order parts before demand occurs. If ordering parts before demand occurs is not acceptable, then consideration is to be given to holding parts onboard or in storage depots, provided: The risk of a stock-out is reduced to an acceptable level. The cost and storage basis to hold the parts is feasible. If the stock-out will result in End Effect(s) (either direct or indirect), it is mandatory to review the RCM analysis with a view to revising the maintenance task. If the stock-out will only have a non-operational effect, it is desirable to review the RCM analysis with a view to revising the maintenance task.
When neither of the two above strategies is feasible, then the following is to be considered:
62
Section
Maintenance Category: Functional Group: System: Equipment Item: Component: Risk Item No. Current Medium Low 2,000 hr MA 901-3.1 Projected 1.3, 1.5 Frequency Procedure No. or Class Reference
Category A, B or C Indicate group name, e.g., Propulsion Indicate system name Indicate equipment item name Indicate component name
Task
Task Type
Visual inspection of the cooling water passages with a borescope 1.4 1.6 1.2 1.1 1.2 Medium Medium 8,000 hr Medium Medium 8,000 hr Medium Medium 4,000 hr Medium Medium 2,000 hr Medium Medium 2,000 hr MA 901-2.2 MA 901-2.1 MA 911-2 MA 901-1 MA 911-2
CM
CM
CM
CM
PM
CM
63
Section
No
Yes
1. 2.
Can the parts requirements be anticipated (i.e., can the parts be obtained before failure is expected to occur)? Does the strategy, ordering parts before demand occurs, provide an acceptable risk?
Yes
No
1. 2.
Is it feasible and cost-effective to hold required parts and quantity in stores? Does the strategy, hold parts, provide an acceptable risk?
Yes
Hold parts
No
Adapted from the diagram in Ministry of Defense, Requirements for the Application of Reliability-centered Maintenance to HM Ships, Submarines, Royal Fleet Auxiliaries, and Other Naval Auxiliary Vessels, Naval Engineering Standard NES 45, Issue 3, September 1999.
64
Section
No
Yes
1. Can the parts requirements be anticipated (i.e., can the parts be obtained before failure is expected to occur)? Degradation of pump bearings can be monitored with condition monitoring program presently implemented. Parts can be ordered/delivered within 7 days. YES. 2. Does the strategy, ordering parts before demand occurs, provide an acceptable risk? In the event of standby pump failure, vessel will be out of service as long as 7 days. Risk is unacceptable. NO.
Yes
Order parts before demand
No
1. Is it feasible and cost-effective to hold required parts and quantity in stores? Bearing replacement kit costs $X. Size and weight are insignificant. Vessel crew is qualified to repair pump. YES. 2. Does the strategy, hold parts, provide an acceptable risk? In the event of standby pump failure, repairs can be completed in 4 hours. YES.
Yes
Hold bearing replacement kit onboard
No
Example Operating Context and Analysis. A Fuel Oil piping system is provided with two fuel oil supply pumps arranged in parallel redundancy. Each pump is sized so as to supply heavy fuel oil to the main propulsion engine and two of the three diesel generator engines operating at their maximum continuous rating. The pumps are operated as follows: the No. 1 pump is operated for one week at a time with the No. 2 pump on standby. After one week, the No. 1 pump is secured and put on standby and the No. 2 pump is operated for one week. Anticipated annual service hours for both pumps are the same.
65
66
Section
Maintenance Category: Functional Group: System: Equipment Item: Component: Risk due to stock-out Procedure No. or Class Reference MA 901-3.1 Medium MA 911-2
Category A, B or C Indicate group name, e.g., Propulsion Indicate system name Indicate equipment item name Indicate component name
Item No.
Spare Parts Identification -Cooling water connection O-rings -Cleaning solvent -Valve seat O-rings -Cooling water connection O-rings -Cylinder cover sealing ring -Cooling water connection O-rings
CM
Removal and function testing of the cylinder puncture valve 1.1 Yes Medium MA 901-1
CM
PM
CM
Section
6.1
Description of the relevant operating modes for the vessel Functional group breakdown and each group boundary Functional group and equipment partitioning Decision tools/criteria used to select functional groups for analysis Analysis priority for the selected functional group and the basis for those decisions Operating context for each selected functional group
67
Section
6.1.2
Identifying Functions and Functional Failures Functions can be documented as a functional block diagram or in tabular format. Each function statement must include a verb, an object and a performance standard. Functional failures are documented in a similar fashion and must be clearly associated with the relevant functions. The following must be documented:
6.1.3
Primary functions Secondary functions, including all protective functions Functional failures related to primary and secondary functions
6.1.4
A description of how the FMECA was conducted A description of the risk-based decision tools used to assess criticality The FMECA worksheets A description of consequence categories A description of the probability categories The risk matrix with risk levels identified The equipment failure mode/cause The functional failure The end effect resulting from the functional failure The criticality associated with the failure mode and the resulting functional failure
The risk-based decision tools are typically documented in a tabular format that includes:
Selecting a Failure Management Strategy Documentation for this step should include:
The RCM decision diagram The task selection worksheets When a one-time change is required or should be considered Types and order of maintenance tasks to be considered When run-to-failure is an acceptable failure management strategy Relevant equipment failure mode/cause and criticality information from the FMECA Decision point in the RCM decision diagram that is the basis for the proposed task or onetime change Proposed tasks and their associated interval
The task selection worksheets are typically documented in a tabular format that includes:
68
Section
Proposed one-time changes Evaluation of the risk reduction anticipated from implementing the proposed task and/or change A description of the RCM analysis process followed The composition of the analysis team Any analysis assumptions or exclusions
6.2
69
SECTION
Introduction
A maintenance program that is based on the RCM philosophy must be dynamic. This is especially true during the early stages of a new program when it is based on limited information. The vessel operator must be prepared to collect, analyze, review and respond to in-service data throughout the operating life of the vessel in order to continually refine the maintenance program. The procedures and processes used to monitor, analyze, update and refine the maintenance program through RCM analysis will sustain the program. These procedures and processes are to be identified in the RCM program plan. The basis for the decisions made during an RCM analysis are not static. As the maintenance program experiences changes because of equipment and system modifications and modernization, reviewing and refining the maintenance program must occur continuously. An organized information system is necessary to capture the data from the performance of the maintenance tasks (selected during the previous RCM analyses) as well as from data from other analyses, such as periodic root cause failure analyses. This information is used to determine what refinements and modifications need to be made to the initial maintenance program. Secondly, it is used to determine the need for taking other actions, such as product improvement or operational changes. Monitoring and adjusting existing maintenance tasks, developing emergent requirements and periodically assessing RCM-generated maintenance requirements meet these two purposes. Analysts use this new information to revise RCM analyses, which subsequently may reflect a need for changes to the maintenance program.
Sustainment efforts should be organized such that the results can be effectively used to support the RCM analysis updates. Following are a list of RCM sustainment processes that can be applied, as appropriate.
71
Section
2.1
Trend Analysis
A trend analysis provides an indication for systems or components that may be in the process of degrading. The measurement factors used for trending may be the same factors listed in Paragraph 8/2.6. Other trending factors that can be used include condition-monitoring parameters (temperatures, pressures, power, etc.) or the results of chronic root cause failure analyses. When performing trend analyses, it is the change in value, rather than the values themselves, which is important. Statistical measures, such as mean and standard deviations, may be used to establish performance baselines and for comparing current performance levels to established control levels. Performance parameters can then be monitored and causes investigated for those parameters that exceed control limits. After the problem has been characterized, the related RCM analysis should be reviewed and updated as necessary. Other corrective action should also be considered and implemented, if necessary, to reduce the causes of performance deviations. Specifically, trend analysis should be established for the following: Repeat equipment failures Comparing machinery reliability before and after implementation of the RCM-derived maintenance tasks
2.2
2.3
2.4
2.5
Failures
A successful RCM program has a process to address failures (loss events) and other unpredicted events, and to determine the appropriate response or corrective action. An example of such a process is shown in Section 8, Figure 1.
72
Section
A root cause analysis should be performed first to develop an understanding of the failure. By employing a structured process, the analysis can identify areas such as maintenance, operations, design, human factors, etc., that require further analysis. The key steps in a root cause failure analysis include: Identifying the failure/loss event or potential failure/loss event Classifying the event and convening a trained team suitable for addressing the issues posed by this event Gathering data to understand how the event happened Performing a root cause failure analysis to understand why it happened Generating corrective actions to keep it (and similar events) from recurring Verifying that corrective actions are implemented Putting all of the data related to this event into an information system for trending purposes
The failure may be addressed by corrective actions for which an RCM analysis is not necessary. Examples of non-RCM corrective actions include technical publication changes and design changes. The root cause analysis may reveal problems that may need immediate attention. Issuing inspection bulletins, applying temporary operational restrictions and implementing operating safety measures are examples of interim actions. The results produced from reviewing the RCM analysis will be a factor that should be considered in determining a response to the failure. It is necessary that an RCM review be part of the overall methodology. The RCM review and update, if required, will determine if changes in maintenance requirements are necessary. The review will indirectly aid in determining if corrective actions are necessary. Decisions not to update the RCM analysis should be documented for audit purposes. During the RCM review, the following questions should be addressed: Is the failure mode already covered? Are the failure consequences correct? Are the reliability data accurate? Is the existing task (or requirement for no task) adequate? Are the related costs accurate?
When new failure modes or failure modes previously thought unlikely to occur are determined to be significant, the RCM analysis is to be updated. The existing analysis for a failure mode may also be determined to be correct or inadequate. Inadequate analyses can result for any number of reasons, such as revision of mission requirements or changes to operator or maintenance procedures. Failures and other unpredicted events are available from several sources, including the following examples: Defect reports issued by maintenance engineering or the vessels crew Defects discovered during routine vessel repairs in a shipyard Vendor and original equipment manufacturer reports related to inspections, rework, or overhauls Design changes, which may be in the form of a single item change or a major system modification Results of tests (such as certification tests or tests performed during the course of a failure investigation or some other unrelated event) that may require RCM review and update
73
Section
Failure
Yes
No
Interim Action Required?
Yes
Interim Action
No
RCM Review
Yes
RCM Update
No
Document Results
Guidelines for the Naval Aviation Reliability-centered Maintenance Process, Published by Direction of Commander, Naval Air Systems Command, NAVAIR 00-25-403, 01 February 2001.
74
Section
2.6
The identification of the highest contributors entails detailed data analyses and communication with operators and maintainers. This analysis is limited to identifying only the worst performing items, not those in the process of degradation. Some items by their very nature and use may appear at the top of the list. Further RCM analyses of these items may prove to be beneficial, and other analysis techniques, such as root cause analysis, may need to be employed to improve performance.
2.7
Other Activities
Changes to the RCM analysis and/or preventative maintenance tasks may be required as a result of internal audits by the operator.
Other changes that may occur as a result of sustaining efforts include system or equipment redesign, or operational changes or restrictions.
75
Section
The man-hours expended in performing scheduled and unscheduled maintenance may provide an indication of the maintenance programs effectiveness. Comparison of man-hours expended prior to implementation of RCM-generated tasks with man-hours expended afterward may provide a useful measure. A similar approach may be used for measuring the effectiveness of the sustaining efforts. The effectiveness of RCM-generated tasks may also be measured by the availability of the equipment or system before and after implementation of the RCM program. Certain equipment operating without the benefit of an RCM program may require extensive unscheduled maintenance, which negatively impacts availability. Also, equipment that is subject to too much maintenance will also affect availability. Other relevant maintenance metrics that can be used to monitor the RCM program include: Compliance with the RCM maintenance plan Safety performance metrics (e.g., number of recordable incidents, incident rate) Environmental performance metrics (e.g., permit exceedances, average emission rates) Miles/ton of fuel Asset downtime Number of breakdowns Port maintenance days Comparison of actual maintenance costs to budgeted maintenance costs
76
SECTION
General
RCM analyses can be conducted for existing machinery systems. The vessels Owners/Operators will have considerable operating and maintenance knowledge and experience with the equipment to be analyzed. The current proactive/preventative maintenance plan is satisfactory, but possibly excessive. The RCM analysis results can be used to: Verify the suitability of the existing preventative maintenance plan Identify equipment failure modes not addressed by the system design or by the existing maintenance plan Identify unnecessary maintenance activities
There are numerous methods presently available seeking to streamline RCM analyses. These methods all seek to reduce the time and effort required to perform the RCM analysis. Any analysis that does not address all failure modes of the system being analyzed may result in the development of a preventative maintenance plan that is inadequate with the risk that a preventable consequence may occur. Therefore, any RCM analysis performed should consider all failure modes.
System Templates
Many marine systems and equipment installed on various vessel types are similar in arrangement and purpose. As an aid to Owners/Operators, the Bureau has developed several templates for piping systems and equipment. These templates are partially completed Failure Modes and Effects Analyses. They provide the following information: High level system schematic Detailed system schematic Listing of system functions and suggested functional failures Failure modes and effects analysis including: System equipment item/component Suggested failure mode Possible causes Local effects Functional failure End effects Failure detection and corrective measures (indications and safeguards)
77
Section
These templates will reduce the time necessary to perform a thorough analysis and provide the Bureau with a consistent analysis from various Owners/Operators. However, individual vessel classes may have features not shown on these templates or failure modes unique to their equipment. It is the Owner/Operators responsibility to verify and revise as necessary the templates to be representative of the actual systems onboard. The templates have been developed for the operating mode Normal Seagoing Conditions at Full Speed. Analyses for the remaining operating modes listed in 2/4.1 of the Guide for Survey Based on Reliability-centered Maintenance are to be completed by the Owner/Operator.
78
APPENDIX
Introduction
Equipment failures are many times preceded by an advanced warning period, and maintenance techniques used to detect this warning are known as condition-monitoring (CM) tasks. More specifically, CM tasks are maintenance techniques that are used to detect the onset of an equipment failure so that the failure can be prevented or the consequences associated with the failure can be mitigated by providing the opportunity for preemptive action to be taken. This Appendix provides: i) ii) Descriptions of and specific examples for general CM categories A discussion of factors that should be considered when selecting a CM technique
ii)
iii)
iv)
79
Appendix 1
v)
Nondestructive Testing. Nondestructive testing involves performing tests (e.g., x-ray, ultrasonic) that are noninvasive to the test subject. Many of the tests can be performed while the equipment is online. Electrical Testing and Monitoring. Electrical condition-monitoring techniques (e.g., high potential testing, power signature analysis) involve measuring changes in system properties such as resistance, conductivity, dielectric strength and potential. Some of the problems that these techniques will help detect are electrical insulation deterioration, broken motor rotor bars and a shorted motor stator lamination. Observation and Surveillance. Observation and surveillance condition-monitoring techniques (e.g., visual, audio and touch inspections) are based on human sensory capabilities. They can serve as a supplement to other condition-monitoring techniques. These techniques will help detect problems such as loose/worn parts, leaking equipment, poor electrical/pipe connections, steam leaks, pressure relief valve leaks and surface roughness changes. Performance Monitoring. Monitoring equipment performance is a condition-monitoring technique that predicts problems by monitoring changes in variables such as pressure, temperature, flow rate, electrical power consumption and/or equipment capacity.
vi)
vii)
viii)
3.1
3.2
P-F Interval
The warning period during which CM tasks can be used to detect the onset of a failure is known as the P-F interval (e.g., the interval between the point at which the onset of failure becomes detectable and the point at which the condition deteriorates into a functional failure see Appendix 1, Figure 1).
Condition
Functional failure
Time
F P-F Interval
80
Appendix 1
Understanding the P-F interval plays an important role in selecting the CM task frequencies. Too high a task frequency can result in wasted resources. Too low a task frequency can result in the inability to take corrective actions due to the onset of failure being discovered late in the P-F interval, or the task frequency being longer than the P-F interval, thus resulting in the P-F interval not being detected. The consistency of the P-F interval also needs to be considered. If the interval varies consistently within a range, then the shortest interval needs to be considered when assigning a CM task frequency. If the interval range is wildly inconsistent, then it might not be possible to establish a meaningful CM task interval (or consider continuous monitoring, if practical). Even if equipment is determined to be operating in the P-F interval, the interval might not be long enough to provide an opportunity to respond and to reduce or eliminate the consequences of the functional failure.
3.3
Measurement Precision/Sensitivity
The measurement precision and sensitivity of the CM technique being used need to be understood because they affect the reaction time available to reduce or eliminate the consequences of the functional failure. Take the example of using ultrasonic testing versus human auditory sense as a CM approach. If both of these CM techniques were used in the same service, ultrasonic testing will provide more precision and sensitivity (e.g., can detect less intense noises). Therefore, it will consistently provide more reaction time once the onset of failure is detected. Of course, other variables (e.g., economic, available resources) might drive the use of the human auditory sense.
3.4
Skills
CM techniques require varying skill levels, so this must be taken into consideration when selecting tasks. An investment might have to be made to train personnel, or outside sources might need to be contracted to perform the tasks.
3.5
3.6
81
Appendix 1
4.1
4.1.2
Advantages of this technique include the following: The test is simple and no special training is required to observe the results. The paint retains the color of the highest temperature reached, providing a permanent record. Once the paint color changes, it does not change back to the original color. The effective life of an application of the paint is usually one (1) or two (2) years, or until the paint changes color.
Infrared Thermography This noncontact technique uses infrared scanners to measure the temperature of heat-radiating surfaces within the line of sight of the scanner. (Note: Infrared radiation is emitted from all objects above the temperature of absolute zero [-273C]). The scanner measures temperature variations on the surface of the object being monitored and converts the temperature data into video or audio signals that can be displayed or recorded in a wide variety of formats for future analysis. This form of condition monitoring results in color or gray-scale images that identify temperature differences in the surface being examined. The sensitivity of the technique is affected by the reflectivity of the object being observed. The scanners are available for a wide range of temperature sensitivities and resolutions. This technique can be used to scan elevated, large, distant or hot surfaces.
82
Appendix 1
Advantages of this technique include the following: i) ii) iii) iv) v) vi) i) ii) iii) Scanners can be portable and are generally considered easy to operate. It provides dramatic images of the objects temperature profile. It provides noncontact testing (e.g., safe to measure energized systems, can measure object without disturbing its temperature). The temperature of large surface areas can be observed quickly and continuously. A wide variety of equipment options is available, including various lenses and zoomview capabilities. Test data can be recorded, printed, logged or fed to other digital equipment. Equipment costs are considered moderate to expensive. Interpretation of the results requires training and experience. The scanners do not measure well through metal or glass housings or barriers.
4.2
Advantages of this technique include the following: The analysis is effective when looking for beats, pulses, instabilities and a multitude of other conditions of interest. The technique often provides more information than frequency analysis. The time waveforms can be complex and confusing. Testing can consume a considerable amount of time. Personnel need considerable practice and experience to interpret complex waveforms.
Spectrum Analysis Spectrum analysis transforms data that are in the time domain to the frequency domain, using the fast Fourier transform algorithm, by either the data collector itself or a host computer. After the data are collected and transformed (e.g., organized by frequency), they are compared to the baseline or expected values. Problems are identified by comparing a devices current spectra to its previous spectra to detect changes in amplitude at selected frequencies. This technique can be used to monitor shafts, gearboxes, belt drives, compressors, engines, roller bearings, journal bearings, electric motors, pumps and turbines.
83
Appendix 1
Advantages of this technique include the following: The equipment is portable and easy to use. Software is available that makes the mathematical transformation of the data rapid and accurate. Small performance changes in the equipment being tested can be identified by these tests. Characteristic frequencies usually allow the user to isolate the problem to a component.
One disadvantage of this technique is that random noise and vibrations of nearby equipment can interfere with the tests.
4.2.3 Shock Pulse Analysis Shock pulse analysis measures the impact of rollers with the raceway and produces a shock pulse reading that changes as the conditions within the bearing deteriorate. This technique uses a shock pulse analyzer that is set up specifically for the type and size of bearings being tested and is fed a signal from an accelerometer placed on a bearing housing. It can identify issues such as lubricant problems, problems with oil seals and packings and incorrect bearing installation and/or alignment. This technique can be used to monitor roller bearings, impact tools and internal combustion engine valves.
Advantages of this technique include the following: Test equipment is portable and easy to operate. Test results are essentially immediate. The sensitivity of the test is generally considered better than conventional vibration analysis. The test is limited to roller-type bearings. The test is highly dependent on accurate bearing size and speed information.
Ultrasonic Analysis When used as a dynamic monitoring technique, ultrasonic analysis helps detect changes in sound patterns caused by problems such as wear, fatigue and deterioration in moving parts. Ultrasound (e.g., high-frequency sound waves that are above human perception from 20 kHz to 100 kHz) is detected by an ultrasonic translator and converted to audible or visual output. [Note: See ultrasonic testing as a nondestructive condition-monitoring technique (A1/4.5.3) for its other capabilities.] This technique can be used to monitor bearing fatigue or wear.
84
Appendix 1
Advantages of this technique include the following: i) ii) iii) Tests are quick and easy to do. The location of the noise source can be pinpointed accurately. Equipment is portable and monitoring can be done from a long range.
One disadvantage of this technique is that random noise and vibrations of nearby equipment can interfere with the tests.
4.3
months trained semiskilled worker to take the sample and experienced technician to perform and interpret the analysis
Advantages of this technique include the following: i) ii) Ferrography is more sensitive than many other tests at identifying early signs of wear. The slide provides a permanent record and allows the measurement of particle size and shape. The test is time-consuming and requires expensive equipment. In-depth analysis requires an electron microscope. The primary target is limited to ferromagnetic particles.
Particle Counter Particle counter testing monitors particles in both lubricating and hydraulic oils caused by problems such as corrosion, wear, fatigue and contaminants. There are several types of particle counting tests available. Two in particular are light extinction and light scattering particle counters. In a light extinction particle counter test, an incandescent light shines on an object cell that the oil sample fluid moves through under controlled flow and volume conditions. A particle counter (e.g., photo diode) receives the light passing through the sample and, based on the amount of light blocked, it indicates the number of particles in a predetermined size range. A direct reading of the ISO cleanliness value can be determined from this test.
85
Appendix 1
In a light scattering particle counter test, a laser light shines on an object cell that the oil sample fluid moves through under controlled flow and volume conditions. When opaque particles pass through the laser, the scattered light created is measured and translated into a particle count by a photo diode. A direct reading of the ISO cleanliness value can be determined from this test. This technique can be used to analyze oil used in engines, compressors, transmissions, gearboxes and hydraulic systems. Typical P-F interval: Skill level: i) ii) iii) i) ii)
4.3.3
Advantages of this technique include the following: Test results are quickly available. Tests are accurate and reproducible. Tests are more accurate than graded filtration. The tests are dependent on good fluid conditions and are hampered by air bubbles, water contamination and translucent particles. The tests provide no information on the chemical nature of the contamination.
Sediment Tests (ASTM D-1698) Sediment testing provides information about sediment (e.g., inorganic sediment from contamination and organic sediment from oil deterioration or contamination) and soluble sludge from electrical insulating oil deterioration. It involves the use of a centrifuge to separate sediment from oil, and the sediment-free portion is subject to further steps (e.g., dilution, precipitation and filtration) to measure the soluble sludge. The total sediment is weighed and then baked to remove the organics, which provides an organic/inorganic composition. This technique can be used to analyze petroleum-based insulating oils in transformers, breakers and cables.
weeks electrician to take the sample and trained laboratory technician to perform and interpret the analysis
The test is relatively quick and easy to complete. Samples can be taken online. Only low-viscosity oil can be sampled. Testing must be performed in a laboratory.
Atomic Emissions Spectroscopy Atomic emissions spectroscopy identifies problems such as corrosion, wear metals, contaminants and additives in lubrication and hydraulic oil samples by measuring the characteristic radiation emitted when samples are subjected to high energy and temperature conditions. The test results are in parts per million (ppm) for a wide variety of elements of interest, including iron, aluminum, chromium, copper, lead, tin, nickel and silver, and components of oil additives such as boron, zinc, phosphorus and calcium. This technique can be used to analyze oil used in diesel and gasoline engines, compressors, transmissions, gearboxes and hydraulic systems.
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004
86
Appendix 1
weeks to months trained semiskilled worker to take the sample and experienced technician to perform and interpret the analysis
The tests are fairly low cost. The tests yield rapid and accurate results. The range of elements identified is large. The tests do not identify the wear process that contaminated the oil. Large particles in the sample may not be counted in the results.
Infrared Spectroscopy Infrared spectroscopy involves placing an oil sample in a beam of infrared light and then measuring the absorbent light energy at various specific wavelengths to determine the level of an element in a sample without destroying the sample. Mathematical manipulations of the absorption data result in a fingerprint of the sample oil, which can be compared to prior samples or standards by intelligent software. The analysis can provide information about oil deterioration, oxidation, water contamination or oil additives. This technique can be applied to turbine generators, sulfur hexafluoride or nitrogen sealed systems, transformer oils and breakers.
weeks to months trained semiskilled worker to take the sample and trained laboratory technician to perform and interpret the analysis
Advantages of this technique include the following: i) ii) iii) Data can be used to determine ASTM parameters. The test is highly repeatable. Data can be used to generate a total acid number (TAN) and a total base number (TBN). Test equipment manufacturers are not consistent in the processing of data. Typically, the test is limited to about 1000 ppm water contamination.
Potentiometric Titration Total Acid Number (TAN) Potentiometric titration TAN is used to determine the extent of breakdown in lubrication or hydraulic oil by determining the level of acidity in an oil sample. The test involves mixing the oil sample with solvents and water and then measuring the change in the electrical conductivity as the mixture is titrated with potassium hydroxide (KOH). The more KOH a sample uses, the higher the acid number and oil deterioration. This technique can be used to test oil used in diesel/gasoline engines, gas turbines, transmissions, gearboxes, compressors, hydraulic systems and transformers.
weeks to months trained semiskilled worker to take the sample and trained laboratory technician to perform and interpret the analysis
87
Appendix 1
The test can be performed on any color oil. The test is considered accurate within 15%. The test is limited to petroleum-based oils. Some of the chemicals used to complete the tests are hazardous.
Karl Fischer Titration Test (ASTM D-1744) The Karl Fischer titration test measures moisture in a lubrication or hydraulic oil sample, which is an indicator of a degraded oil condition, by measuring electrical current flow between two electrodes immersed in the sample solution. Karl Fischer reagent is metered into the sample until all of the entrained water is reacted with the reagent. Results are reported in ppm of water. This technique can be used to analyze oil in enclosed oil systems such as engines, gearboxes, transmissions, compressors, hydraulic systems, turbines and transformers.
Advantages of this technique include the following: The test is accurate for small quantities of water contamination. The test can be completed fairly quickly. Results are repeatable. Considerable skill is required to interpret the results. Automated equipment is relatively expensive and not portable.
Kinematic Viscosity The kinematic viscosity test provides an indication of oil deterioration over time or contamination of the oil by fuel or other oils. The test measures the fluids resistance to flow under known pressure and temperature conditions and involves forcing a sample to flow through a capillary viscometer. Based on the test results, the dynamic viscosity of the oil sample can be calculated. This technique can be used to test oil used in diesel/gasoline engines, turbines, transmissions, gearboxes, compressors and hydraulic systems.
weeks to months trained semiskilled worker to take the sample and trained laboratory technician to perform and interpret the analysis
Advantages of this technique include the following: i) ii) i) ii) The test can be used for most lubricating oils, both transparent and opaque. Results are repeatable. The test is not done in the field. Flammable solvents are used.
88
Appendix 1
4.3.9
Dielectric Strength Tests Dielectric strength tests are used to measure the insulating quality of electrical insulating oils. Potential quality deterioration is often caused by contamination or oil breakdown. The test is performed by subjecting the sample to an electrical stress at a given temperature by passing voltage through the sample. This technique can be used to test insulating oils in transformers, breakers and cables.
months electrician to take the sample and trained laboratory technician to perform and interpret the analysis
Advantages of this technique include the following: i) ii) i) ii) iii) The test is rapid and relatively simple. The equipment does not need to be offline to perform the test. The sampling technique can affect the test results. The test must be completed in the lab. The materials and equipment used to complete the test are hazardous.
4.4
Advantages of this technique include the following: Corrosion effects can be accurately predicted when the environment is consistent over the test period. Testing is relatively inexpensive and yields vivid examples of the corrosion to expect. Testing can take a long time to complete. Determining the corrosion rates can take several weeks or months of testing. The tests involve working directly with the potentially hazardous corrosive material streams.
89
Appendix 1
4.4.2
Corrometer Corrometer testing helps measure the corrosion rate of equipment by monitoring the change in the electrical resistance of a sample material. As the sample materials cross-section is reduced due to corrosion, the electrical resistance of the sample will increase. The measured resistance change corresponds to the total metal loss and can be converted to a corrosion rate. This technique can be used to perform tests at petroleum refineries, process plants, underground/undersea structures, water distribution systems, paper mills and electrical generating plants, and for cathodic protection monitoring, abrasive slurry transport and atmospheric corrosion monitoring.
Advantages of this technique include the following: Portable equipment is available. Testing works in many environments. Testing can be made continuous with an online monitor. Results are easily converted to corrosion rates. Portable equipment does not provide permanent records. The test does not typically indicate changes in the corrosion rate.
Potential Monitoring Potential monitoring helps identify the corrosion state (e.g., active or passive) of material by monitoring localized corrosion and indicating when active corrosion is in progress. This test takes advantage of the fact that metals in an active corrosion state (e.g., higher corrosion rate) have a different electrical potential than when they are in a passive corrosion state (e.g., lower corrosion rate). A voltmeter is used to measure the potential of the sample area. This technique can be used to perform monitoring at chemical process plants, paper mills, pollution control plants, electrical utilities and desalination plants. The technique is best suited to stainless steel, nickel-based alloys and titanium.
varies depending on material and rate of corrosion trained and experienced technician
Advantages of this technique include the following: The test provides a rapid response to change. Localized corrosive effects are monitored. The test does not provide corrosion rates. Testing is influenced by changes in temperature and acidity.
90
Appendix 1
4.5
months trained and experienced technician to take the radiographs and trained and experienced technician or engineer to interpret the radiographs
The technique examines the inside of test materials to locate hidden flaws (e.g., areas that cannot be seen externally). The technique provides a permanent record of the test. Sometimes several views are required to locate the flaw. The test is not very sensitive to crack-type flaws.
Liquid Dye Penetrant The use of liquid dye penetrants can help detect surface discontinuities or cracks due to problems such as fatigue, wear, surface shrinkage and grinding. The technique involves applying liquid dye penetrant to a test surface and then allowing sufficient time for the dye to penetrate the surface. Next, excess penetrant is removed from the surface, and the surface is retreated with a developer that draws the penetrant to the surface revealing the location of imperfections. Liquid penetrants are categorized according to the type of dye (e.g., visible dye, fluorescent penetrant and dual sensitivity penetrant) and the type of processing (e.g., water washable, postemulsified or solvent removed) required to remove the dye from the surface. This technique can be used to analyze ferrous and nonferrous materials such as welds, machined surfaces, steel structures, shafts, boilers, plastic structures and compressor receivers.
Advantages of this technique include the following: Visible dye penetrant kits are cheap (Note: fluorescent kits are more sensitive but more expensive). Surface problems on nonferrous materials can be detected. Testing will not work on highly porous materials. The technique is not conducive to online testing. Experienced personnel are required to evaluate the results. A dark work area is required for fluorescent dye testing.
91
Appendix 1
4.5.3
Ultrasonic Analysis Ultrasonic analysis helps detect changes in sound patterns caused by problems such as leaks, wear, fatigue or deterioration. Ultrasound (e.g., high-frequency sound waves that are above human perception from 20 kHz to 100 kHz) is detected by an ultrasonic translator and converted to audible or visual output. [Note: See ultrasonic analysis as a dynamic conditionmonitoring technique (A1/4.2.4) for its other capabilities.] This technique can be used to detect leaks in pressure/vacuum systems and underground pipes or tanks, and to detect static discharge.
Advantages of this technique include the following: The tests are quick and easy to do. The location of the noise source can be pinpointed accurately. Equipment is portable and monitoring can be done from a long range. Some tests can only be done under vacuum. In general, test results do not indicate the size of a leak.
Ultrasonic Transmission Ultrasonic testing using a transmission technique helps to detect surface and subsurface discontinuities caused by problems such as fatigue, heat treatment, inclusions, and lack of penetration and gas porosity welds. It can also measure thickness in test subjects. The test involves using one of the available transmission techniques to apply an ultrasound signal to a test object and then receiving the signal back and analyzing it for changes that might indicate the presence of discontinuities in the test object. Ultrasonic techniques include pulse echo, transmission, resonance and frequency modulation. This technique can be used to inspect ferrous and nonferrous welds, steel structures, boilers, tubes, plastic structures and vessels/tanks.
One advantage of this technique is that the tests are applicable to a majority of materials. One disadvantage of this technique is that the test results do not clearly distinguish between types of defects.
4.5.5 Magnetic Particle Inspection Magnetic particle inspection helps detect the location of surface/near-surface cracks and discontinuities caused by problems such as fatigue, wear, inclusions, laminations, heat treatment, hydrogen embrittlement, seams and corrosion. The technique involves magnetizing the test piece and spraying it with a solution containing very fine iron particles. Discontinuities on the surface of the test piece will cause the iron particles to accumulate and form an indication of the flaw. The results are then interpreted. This technique can be used to analyze ferromagnetic metals such as vessels/tanks, welds, machined surfaces, shafts, steel structures and boilers.
92
Appendix 1
Advantages of this technique include the following: i) ii) iii) i) ii) iii)
4.5.6
The test is reliable. The test is sensitive. The test is widely used. The test is limited to detecting surface imperfections. The test is time-consuming. The test is not applicable as an online test.
Eddy Current Testing Eddy current testing helps detect surface and subsurface flaws caused by problems such as wear, fatigue and stress, and it helps detect dimensional changes that result from problems such as wear, strain and corrosion. It can also help determine material hardness. The technique involves applying high-frequency alternating current to conductive material test objects and inducing eddy currents around discontinuities. The electrical effects in the test part are amplified and shown on a cathode ray tube or a meter. This technique can be used to analyze boilers, heat exchangers, hydraulic tubes, hoist ropes, railroad lines and overhead conductors.
Advantages of this technique include the following: The test can be performed on a wide variety of conductive materials. Permanent records can be made via data recorders.
One disadvantage of this technique is that nonferrous materials respond poorly to the test.
4.5.7 Acoustic Emission Acoustic emission testing monitors the plastic deformation and crack formation caused by problems such as fatigue, stress and wear. The technique involves subjecting the test object to loads and listening to the audible stress waves that result. The test results can be displayed on a cathode ray tube or an x-y recorder. This technique can be used to test structures, pressure vessels, pipelines and mining excavations.
Advantages of this technique include the following: The test can be performed remotely in relation to the flaws and can cover the entire structure. Active flaws can be detected. Relative loads used in testing can be used to estimate failure loads in some cases. The test object has to be loaded. Some electrical and mechanical noises can interfere with the results. Results analysis can be difficult.
93
Appendix 1
4.5.8
Hydrostatic Testing Hydrostatic testing helps detect breaches in a systems pressure boundaries caused by problems such as fatigue, stress and wear. The testing involves filling a system to be tested with water or the operating fluid, sealing the system, and increasing the pressure to approximately 1.5 times the systems operating pressure. Then, the pressure is held for a defined period while inspections and monitoring are conducted for visible leaks, a system pressure drop and makeup water/operating fluid additions. The principle of hydrostatic testing can also be used with compressed gases. This technique can be used to test components (e.g., tanks, vessels, pipelines) and completely assembled systems that contain pressurized fluids or gases.
One advantage of this technique is that the results are easy to interpret. Disadvantages of this technique include the following: i) ii) iii)
4.5.9
The test has the potential to overpressurize and damage the system. The test will not identify defects that have not penetrated a pressure boundary. The test is not applicable as an online test.
Visual Inspection Borescope Visual inspections with a borescope allow internal inspections of the surface of narrow tubes, bores, pipes, chambers of engines, pumps, turbines, compressors, boilers, etc. The inspection helps locate and orient surface cracks, oxide films, weld defects, corrosion, wear and fatigue flaws. The borescope provides a system to channel light from an external light source to illuminate parts not easily visible to the naked eye, and it also provides a means to photograph and/or magnify the illuminated surface of interest.
Advantages of this technique include the following: The equipment provides excellent views. The parts being examined can be photographed and magnified. The inspection is limited to surface conditions. The lens systems are often inflexible. Technicians can suffer eye fatigue during prolonged inspections.
4.6
Appendix 1
One advantage of this technique is that it is simple and well understood. One disadvantage of this technique is that online testing cannot be conducted.
4.6.2 High Potential Testing High potential testing helps detect motor winding ground wall insulation deterioration. The test involves applying high direct current voltage to the stator windings in graduated steps to help determine the voltage at which nonlinearity in the test current or a drop in the insulation resistance occurs. If the insulation withstands a specified voltage, it is considered to be safe, and the motor can be returned to service. Also, trending the voltage at which the current becomes nonlinear or the resistance drops can be used to predict the remaining motor life. This technique can be applied to AC and DC motors.
One advantage of this technique is that the test results usually correlate with surge comparison tests. Disadvantages of this technique include the following: i) ii)
4.6.3
Motors must be offline for testing. The test voltage can be destructive to motor parts.
Surge Testing Surge testing helps identify insulation faults in induction/synchronous motors, DC armatures, synchronous field poles and various coils or coil groups. The technique involves using a high-frequency transient surge applied to two separate but equal parts of a winding, and then the resulting reflected waveforms are compared on an oscilloscope. Normally, if no problems are detected at twice the operating voltage, plus 1,000 volts, the winding is considered good.
One advantage of this technique is that the test is portable. Disadvantages of this technique include the following: i) ii)
4.6.4
The test is complex and expensive. Careful repetition is required to determine the location or severity of a fault.
Power Signature Analysis Power signature analysis can be used to detect motor problems such as broken rotor bars, broken/cracked end rings, flow or machine output restrictions and machinery misalignment. This online technique involves monitoring current flow in one of the power leads at the motor control center or starter. The electrical current variations identified in the test indicate changing machine operating conditions and can be trended over time. Also, line frequencies can be compared with motor frequencies to help detect various motor flaws. This technique can be used to analyze AC induction motors, synchronous motors, compressors, pumps and motor-operated valves.
weeks to months experienced electrician to connect the test equipment and experienced technician to perform the analysis and interpret the data
95
Appendix 1
Testing is conducted online. Test readings can be taken remotely for large or high-speed machines. Equipment is expensive. Analysis results are complex and often subjective.
Motor Circuit Analysis A motor circuit analysis helps to yield a complete picture of motor conditions by performing a series of tests. The test applies voltage at the motor control center power bus to measure resistance to ground, circuit resistance, capacitance to ground, inductance, rotor influence, DC bar-to-bar and polarization index/dielectric absorption. The results can identify changes in conductor path resistance caused by loose or corroded connections and loss of copper (turns) in the stator; phase-to-phase inductance caused by magnetic interaction between the stator and rotor; stator inductance affected by rotor position, rotor porosity and eccentricity, stator turn, and coil and phase shorting; and winding cleanliness/resistance to ground. This technique can be used to analyze electric motors (e.g., DC, AC induction, synchronous and wound rotor).
Advantages of this technique include the following: The test is low voltage and nondestructive. Tests can be performed at the motor control center, which does not require motor disassembly.
One disadvantage of this technique is that the test cannot be performed online.
4.6.6 Battery Impedance Testing Battery impedance testing helps detect battery cell deterioration. The test involves injecting an AC signal between the battery posts and measuring the resulting voltage. The battery impedance is then calculated and compared to (1) the batterys last test and (2) the impedance of other batteries in the same bank. If the comparison results are outside a certain percentage, then this could indicate a cell problem or capacity loss.
One advantage of this technique is that the test can be performed online. One disadvantage of this technique is that the tests are lengthy for large batteries.
4.7
96
Appendix 1
One advantage of this technique is that the versatility of human observation combined with experience can identify an extremely wide range of problem types. One disadvantage of this technique is that unless inspections are scheduled and recorded, observers can become so familiar with their surroundings that changes of interest go unnoticed.
4.7.2 Audio Inspections Audio inspection practices are common CM techniques employed in industry. The monitoring of machinery and equipment by listening to it operate helps identify a broad range of potential problems, including worn high-friction bearings, steam leaks, pressure relief valve leaks or discharges, coupling leaks, excessive loading on pumps, poor mechanical equipment alignment, etc. Humans are particularly sensitive to new or changed sounds and are easily taught to report and investigate unusual sounds. This technique is often a supplemental inspection to visual inspections. Also, human sensory-based inspections can verify the results from other CM techniques.
One advantage of this technique is that the versatility of human hearing combined with experience can identify an extremely wide range of problem sounds. One disadvantage of this technique is that the inspections must be assigned so that the inspectors gain sufficient experience to be able to detect new or changed noises.
4.7.3 Touch Inspections Using touch as an inspection technique can be extremely useful. Heat, scaling and roughness changes can all be detected by touch. Human touch is extremely sensitive and able to differentiate surface finish differences not discernable by eye. This technique is often a supplemental inspection to visual inspections. Also, human sensory-based inspections can verify the results from other CM techniques.
One advantage of this technique is that the hands and fingers are extremely sensitive to surface finish and to heat. One disadvantage of this technique is that the inspectors can be burned by touching hot objects and can be injured or shocked by touching operating equipment.
4.8
Appendix 1
targets for data collection include diesel and gasoline engines, pumps, motors, compressors, etc. Data are often already collected for other reasons, and test data can also be used to optimize performance. In addition, most of the computer control equipment (e.g., distributed control systems, programmable logic controller) has data analysis and alarming features that can be used to trend equipment performance. Typical P-F interval: Skill level: varies widely trained semiskilled workers are normally required
One advantage of this technique is that the data are often already collected. One disadvantage of this technique is that baseline data may not exist, which necessitates longer time periods to develop trends.
5.1
5.2
ii) iii)
98
Appendix 1
iv) v)
P-F Interval. This column lists the typical P-F interval for the CM technique and failure condition. Skill. This column lists the skill level required for the CM technique.
5.3
Sources
The information provided in this document and in the CM matrices was based on review of several CM publications. The primary sources of this information were as follows: Moubray, J., Reliability-centered Maintenance, Butterworth-Heinemann Ltd, Oxford, England, 1991. Preventive/Predictive Maintenance, Section 8 Predictive Maintenance, Marshall Institute, 1998. Reliability Centered Maintenance Guide for Facilities and Collateral Equipment, Chapter 4 Predictive Testing & Inspection (PT&I) Technologies, National Aeronautics and Space Administration, February 2000.
99
APPENDIX
This Appendix provides an example RCM analysis for selected portions of a propulsion low-speed diesel engine. The purpose of this Appendix is to illustrate the RCM analysis process outlined in Section 7. Please note that this Appendix does not include RCM analysis data for the entire engine, nor does it contain all of the information that should be provided in a complete RCM analysis report. Specifically, this Appendix includes excerpts of the RCM analysis sections for the basic engine, the governor system and the camshaft lubrication system.
1.1
1.2
101
Appendix 2
Operate continuously 24 hours per day for up to 22 days at 85% maximum continuous rating (MCR), 280 days per year Required MCR power to be developed by the Machinery System for propulsion: 16,860 kW Reversible engine Propel ship at 20 knots up to Sea State 3 Conditions Fuel: Intermediate fuel oils IFO 380 and IFO 180 Compliance with ABS Rules, SOLAS, MARPOL, etc. Other common characteristics associated with machinery and utilities not relevant to propulsion are also to be listed above if an RCM for the entire Discipline is to be performed.
Operate continuously 24 hours per day for up to 22 days at 85% MCR, 280 days per year Single engine installation Controllable from the bridge, centralized control station in machinery space and locally Required MCR power to be developed by the Propulsion Functional Group for propulsion: 16,860 kW at 91 RPM Fuel: Intermediate fuel oils IFO 380 and IFO 180 Reversible engine Propel ship at 20 knots up to Sea State 3 Conditions Maintain propulsion at following vessel angles of inclination: 15 static and 22.5 dynamic, athwartship 5 static and 7.5 dynamic, fore-and-aft Compliance with ABS Rules, SOLAS, MARPOL, etc. Other common characteristics associated with the line and propeller shafting, shaft support bearings and propeller not relevant to the diesel engine are also to be listed above if an RCM for the entire Functional Group is to be performed.
Performance Capability
102
Appendix 2
Manner of Use
Propels vessel from 2 to 10 knots, with reversing and stopping capabilities for up to 72 hours maximum.
Propels vessel from 2 to 10 knots, with reversing and stopping capabilities, and assists in mooring for up to 4 hours maximum.
Not used
Performance Capability
To output up to MCR of 16,860 kW @ 91 RPM; reversible to maximum speed of 63 RPM. Controllable from bridge, centralized control station in machinery space and locally. Maintain propulsion at following vessel angles of inclination: 15 static and 22.5 dynamic, athwartship 5 static and 7.5 dynamic, fore-and-aft Fuel: Intermediate fuel oils IFO 380 and IFO 180 Specific fuel consumption: 171 g/kW-hr maximum at 85% MCR Fuel oil lower calorific value: 42707 kJ/kg or 10200 kcal/kg Lube oil consumption, system oil: 9 kg/cylinder 24 hours Cylinder oil: 1.1-1.6 g/kWh Gases Exhaust gas flow: 136200 kg/h, Exhaust gas temperature 250C Air consumption: 37.0 kg/s Crankcase vapors: X kg/h Controls signal, alarm, and readout details listed here along with parameters Compliance with ABS Rules, SOLAS, MARPOL, etc.
Not Applicable
103
Appendix 2
Hull Discipline
Cargo Handling
Functional Groups
Systems
Diesel Engine
Reduction Gear
Propeller
Subsystems
Basic Engine
Engine Support Systems Lube Oil, Water, Fuel, Hydraulic, Air, Exhaust, Control Systems, Monitoring...
Cylinder Cover Assembly Equipment Items Mechnical Control Gear, Chain Drive and Camshaft
Exhaust Valve
1.3
104
Appendix 2
Governor
Torque
Engine RPM
Vapor
Lube OIl
Torque & Vibration Contaminated Lube Oil Lube OIl Lube Oil Lube Oil & Heat
To Propulsion Shafting Cleaning System, Stuffing Box Drain Oil Lube OIl To Lube Oil Sump Tank Sludge (to waste) Cool Lube Oil
Cylinder Lubricating Oil Lube Lube OIl OIl & Heat Freshwater & Heat Cool Freshwater
Scavenge Air
Freshwater
Scavenge Air & Exhaust Gas Systems, including Turbochargers Freshwater & Heat Exhaust Gasses & Noise
Instrumentation & Alarms Central Cooling Water System Alarms Readouts Cylinder Lubricating Oil Seawater Sewater & Heat
The functional block diagram is used to identify the functions needed for the engine to properly operate at sea. The outputs from each functional block represent the functions that must be provided and are used to develop specific function statements. Each function statement must include a verb representing the functionality required, a noun on which the functionality is performed and performance parameters (when possible). Functional failures are then identified for each function statement. The functional failures include total and partial loss of each function. Partial losses of each function are determined by postulating deviations from each performance parameter in the function statement. Appendix 2, Table 4 provides an example of the functions and functional failures for the low-speed diesel engine.
Atmospheric Air
Condensate
Cool Freshwater
Cool Freshwater
105
Appendix 2
Function Item No. 1 Function Statement Transmit 16,860 kW of power at 91 rpm to the propulsion shafting Function Type Primary Item No. 1.1 1.2 1.3 1.4 1.5 2 Transmit 200 N-m of torque at 91 rpm to the control system Primary 2.1 2.2 2.3 2.4 2.5 3 Meter fuel to engine at 171 g/kW-hr, at 4 bar pressure, and 150C temperature (max.) Primary 3.1 3.2 3.3 3.4 3.5 3.6 3.7 4 Flow 43.2 kg/s of combustion air to the engine at 1,000 mbar (inlet), 25C temperature, and Z condition Primary 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8
Functional Failure Functional Failure Statement No transmission of power to the propulsion shafting Transmits less than 16,860 kW of power to the propulsion shafting Transmits more than 16,860 kW of power to the propulsion shafting Operates at less than 91 rpm (Reduce rpm) Operates at more than 91 rpm No transmission of torque to the control system Transmits less than 200 N-m of torque to the control system Transmits more than 200 N-m of torque to the control system Operates at less than 91 rpm Operates at more than 91 rpm No metering of fuel to the engine Meters less than 171 g/kW-hr of fuel to the engine Meters more than 171 g/kW-hr of fuel to the engine Meters fuel to the engine at a pressure less than 4 bar Meters fuel to the engine at a pressure more than 4 bar Meters fuel to the engine at a temperature less than 50C Meters fuel to the engine at a temperature more than 150C No flow of combustion air Flows less than 43.2 kg/s of combustion air Flows more than 43.2 kg/s of combustion air Flows inlet combustion air at a pressure less than 800 mbar Flows inlet combustion air at a pressure more than 1,200 mbar Flows combustion air at a temperature less than 15C Flows combustion air at a temperature more than 30C Fails to condition combustion air (dry air)
106
Appendix 2
107
Appendix 2
108
Appendix 2
26.7
26.8 27 Flow lube oil at 400 m3/hr from the main engine to the lube oil sump tank Secondary Protection 27.1 27.2
27.3
27.4
109
Appendix 2
Functional Failure Function Type Secondary Protection Item No. 28.1 28.2 28.3 28.4 28.5 28.6 28.7 28.8 Functional Failure Statement No flow of lubricant to the camshaft Flows less than 10.3 m3/hr of lubricant to the camshaft Flows more than 10.3 m3/hr of lubricant to the camshaft Flows lubricant to the camshaft at a pressure less than 3.5 bar Flows lubricant to the camshaft at a pressure more than 4.5 bar Flows lubricant to the camshaft at a temperature less than 40C Flows lubricant to the camshaft at a temperature more than 50C Fails to adequately clean camshaft lubricant Fails to engage barring interlock when demanded Engages barring interlock prematurely Fails to disengage barring interlock No starting air when demanded Flows less than 0.75 m3 of starting air when demanded Flows more than 0.75 m3 of starting air when demanded Flows starting air at a pressure less than 30 bar when demanded Flows starting air at a pressure more than 30 bar when demanded Stops flowing starting air when demanded Flows starting air prematurely Inability to bar the engine Incorrect barring position
29
Secondary Protection
30
SecondaryProtection
31
SecondaryProtection
31.1 31.2
110
Appendix 2
1.4
Appendix 2, Table 8 provides an example FMECA worksheet. This table contains a header that identifies the equipment item or component being evaluated and 12 major columns: Item, Failure Mode, Causes, Failure Characteristic, Local Effects, Functional Failures, End Effects, Matrix, Severity, Current Likelihood, Current Risk and Failure Detection/Corrective Measures. Item is used to link the equipment item or component with the failure mode under investigation. In this example, the cylinder liner is Equipment Item No. 3 and there have been eight failure modes identified, numbered 3.1, 3.2 3.8. Matrix identifies the consequence/severity level definition table, e.g., Propulsion, Loss of Containment, Safety and Explosion/Fire. The definitions for the remaining column headings are provided in Subsection 1/4.
3 4
Propulsion Directional Control Drilling Position Mooring (Station Keeping) Hydrocarbon Production and Processing Import and Export Functions
111
Appendix 2
Severity Level 1 2
Descriptions for Severity Level Minor, Negligible Major, Marginal, Moderate Critical, Hazardous, Significant Catastrophic, Critical
Definition for Severity Level Minor impact on personnel/ No impact on public Professional medical treatment for personnel/No impact on public
Safety Serious injury to personnel/ Limited impact on public Fatalities to personnel/Serious impact on public
3 4
Severity Level
Descriptions for Severity Level Minor, Negligible Major, Marginal, Moderate Critical, Hazardous, Significant Catastrophic, Critical
Definition for Severity Level No damage to affected equipment or compartment, no significant operational delays. Affected equipment is damaged, operational delays An occurrence adversely affecting the vessels seaworthiness or fitness for service or route Loss of vessel or results in total constructive loss
Explosion/Fire
3 4
112
Appendix 2
113
114
Appendix 2
No.: 3
Description: Cylinder liner, including cylinder lubrication passageways and cooling jacket
Item
Failure Mode
Causes
3.1
Wear-in
Random
Leak in the cylinder liner between cooling water jacket and cylinder (evident) Exhaust gases enter jacket cooling water system Cooling water leaking into cylinder, resulting in abnormal increase in freshwater makeup rate Transmits less than 16,860 kW of power at 91 rpm to the propulsion shafting
Jacket water expansion tank level will rise setting off high level alarm (engine operating) Cooling water pressure will fluctuate (standby and operation) Water will exit indicator cocks during blow-down prior to starting engine (standby)
3.2
No credible cause
3.3
Random
Wear-out
Severity Level 3
Occasional
High
Deformed/ damaged cylinder liner (e.g., badly scored/ scuffed) (evident) Excessive consumption of cylinder lube oil during combustion Partial loss of containment of engine vapors, combustion gases and pressure
Increased cylinder exhaust temperature Fire in scavenge air space Exhaust gas blowby into scavenge air space
Degraded lube oil (e.g., improper lube oil viscosity or total base number)
Appendix 2
No.: 3
Description: Cylinder liner, including cylinder lubrication passageways and cooling jacket
Item
Failure Mode
Causes
3.4
Normal use
Wear-in
Worn cylinder liner (evident) Excessive consumption of lube oil during combustion Partial loss of containment of engine vapors, combustion gases and pressure
Random
Wear-out
Increased exhaust temperature level for affected cylinder Exhaust gas blow-by into scavenge air space
Degraded lube oil (e.g., improper lube oil viscosity or total base number)
Under-cooling of scavenge air allowing condensation on cylinder liner and causing poor cylinder lube oil film
3.5
Random
Wear-out
115
116
Appendix 2
No.: 3
Description: Cylinder liner, including cylinder lubrication passageways and cooling jacket
Item
Failure Mode
Causes
3.6
Random
Fouled scavenge air port (evident) Flows less than 43.2 kg/s of combustion air Propulsion Severity Level 3 Remote Medium
Wear-out
Insufficient air fed to the engine, resulting in inefficient combustion and excessive smoking of the engine
3.7
Wear-in
Random
External leak of the cooling water jacket (evident) Partial loss of containment of freshwater
Wear-out
Removes and discharges less than 2,850 kW of heat from the engine
Seal rings at bottom of cylinder liner leak, allowing cooling water into scavenge air space
Release of cooling water to the atmosphere/scavenge air, resulting in excessive consumption of freshwater and potentially insufficient cooling water being delivered to the engines cylinder
3.8
Random
Wear-out
Remote
Medium
Cylinder cooling water temperature will increase, potentially alerting the operator to the failure
Appendix 2
No.: 12
Item
Failure Mode
Causes
Failure Characteristic Controls engine speed at more than 91 rpm Complete loss of propulsion Propulsion Severity Level 4 Improbable Medium
12.1
Random
Loose/broken wire connection No transmission of power to the propulsion shafting No transmission of torque to the control system Controls engine speed at more than 91 rpm No transmission of power to the propulsion shafting No transmission of torque to the control system Complete loss of propulsion Propulsion Severity Level 4
Governor increases engine speed, resulting in overspeed protective device tripping and the engine stopping
12.2
Random
117
118
Appendix 2
No.: 12
Item
Failure Mode
Causes
Failure Characteristic Controls engine speed at less than 91 rpm Transmits less than 16,860 kW of power to the propulsion shafting No transmission of power to the propulsion shafting Transmits less than 200 N-m of torque to the control system No transmission of torque to the control system Controls engine speed at less than 91 rpm Transmits less than 16,860 kW of power to the propulsion shafting Controls engine speed at more than 91 rpm Transmits more than 16,860 kW of power to the propulsion shafting Propulsion Severity Level 3 Improbable Low Governor slows the engine down, resulting in reduced vessel speed or the engine shutting down Function is reduced. resulting in operational delays Propulsion Severity Level 3 Improbable Low
12.3
Random
12.4
Random
Appendix 2
No.: 12
Item
Failure Mode
Causes
Failure Characteristic Erratic engine speed, resulting in erratic vessel speed Controls engine speed at less than 91 rpm Propulsion Severity Level 3 Improbable Low
12.5
Random
Loose wire connection Transmits less than 16,860 kW of power to the propulsion shafting Controls engine speed at more than 91 rpm Transmits more than 16,860 kW of power to the propulsion shafting
119
120
Appendix 2
No.: 13
Item
Failure Mode
Causes
Failure Characteristic Governor reduces engine speed to zero (e.g., engine stops) No transmission of power to the propulsion shafting No transmission of torque to the control system Governor reduces engine speed Controls engine speed at less than 91 rpm Propulsion Severity Level 3 Remote Medium Complete loss of propulsion Propulsion Severity Level 4 Remote High
13.1
Random
13.2
Random
Loose wire connection Transmits less than 16,860 kW of power to the propulsion shafting Reduce rpm Controls engine speed at more than 91 rpm No transmission of power to the propulsion shafting No transmission of torque to the control system Complete loss of propulsion Propulsion Severity Level 4
13.3
Random
Improbable
Medium
Appendix 2
No.: 13
Item
Failure Mode
Causes
Failure Characteristic Erratic engine speed, resulting in erratic vessel speed Controls engine speed at less than 91 rpm Propulsion Severity Level 3 Improbable Low
13.4
Random
Governor electronic failure Transmits less than 16,860 kW of power to the propulsion shafting Controls engine speed at more than 91 rpm Transmits more than 16,860 kW of power to the propulsion shafting
121
122
Appendix 2
No.: 14
Item
Failure Mode
Causes
14.1
Random
Wear-out
Actuator fails to respond on demand (evident) Transmits more than 16,860 kW of power to the propulsion shafting Controls engine speed at less than 91 rpm Controls engine speed at more than 91 rpm
Appendix 2
No.: 14
Item
Failure Mode
Causes
Failure Characteristic Controls engine speed at less than 91 rpm Transmits less than 16,860 kW of power to the propulsion shafting Controls engine speed at more than 91 rpm Transmits more than 16,860 kW of power to the propulsion shafting Erratic control of engine rpm Controls engine speed at less than 91 rpm Transmits less than 16,860 kW of power to the propulsion shafting Controls engine speed at more than 91 rpm Transmits more than 16,860 kW of power to the propulsion shafting Propulsion Severity Level 3 Remote Medium Propulsion Severity Level 3 Improbable Low
14.2
Random
Wear-out
14.3
Random
Wear-out
123
124
Appendix 2
No.: 14
Item
Failure Mode
Causes
Failure Characteristic Controls engine speed at less than 91 rpm Transmits less than 16,860 kW of power to the propulsion shafting Controls engine speed at more than 91 rpm Transmits more than 16,860 kW of power to the propulsion shafting Controls engine speed at less than 91 rpm Transmits less than 16,860 kW of power to the propulsion shafting Controls engine speed at more than 91 rpm Transmits more than 16,860 kW of power to the propulsion shafting Propulsion Severity Level 3 Remote Medium Propulsion Severity Level 3 Remote Medium
14.4
Random
Wear-out
14.5
Wear-in
Random
Wear-out
Appendix 2
No.: 15
Item
Failure Mode
Causes
Failure Characteristic Partial loss of containment of lube oil Total loss of containment of lube oil No flow of lubricant to the camshaft None No effect of interest Severity Level 1 Remote Low
15.1
Wear-in
Random
Loss of containment
Wear-out
Release of lube oil in machinery space. If leak is large, standby pump will start.
Standby lube oil pump will start and resume function Oil spill tray around pump will collect and drain spilled lube oil
15.2
Random
Wear-out
Pump seizure
15.3
Random
Wear-out
15.4
Random
Wear-out
Suction blockage
125
Appendix 2
1.5
126
Appendix 2
No.: 3
Description: Cylinder liner, including cylinder lubrication passageways and cooling jacket
Item
Failure Mode
Failure Char.
H/E(1)
3.1
Wear-in
Random
Leak in the cylinder liner between cooling water jacket and cylinder Remote
Cooling water leaking into cylinder, resulting in abnormal increase in freshwater makeup rate 1.2, 13.1 Propul sion SL-3 Occasional High
Medium
In operating instructions
3.3
Random
Wearout
1 2 3 4 5
Abbreviations are: E is evident, H is hidden Risk Characterization abbreviations are: S is severity; SL is severity level, CL is current likelihood; CR is current risk Task Selection abbreviations are: PL is projected likelihood; PR is projected risk Functional failure Item Nos. are listed in Appendix 3/Table 4 Severity Levels are listed in Appendix 3, Table 5
127
128
Appendix 2
No.: 3
Description: Cylinder liner, including cylinder lubrication passageways and cooling jacket
Item
Failure Mode
Failure Char.
H/E
Local
3.4
Wear-in
Random
Wearout
Excessive consumption of lube oil during combustion 1.2, 13.1 Propul sion SL- 3 Occasional High Preventative Maintenance plan for lube oil service system Remote
3.5
Random
Wearout
Medium
Appendix 2
No.: 3
Description: Cylinder liner, including cylinder lubrication passageways and cooling jacket
Item
Failure Mode
Failure Char.
H/E
3.6
Random
Wearout
Insufficient air fed to the engine, resulting in inefficient combustion and excessive smoking of the engine
3.7
Wear-in
9.2, 16.1
Remote
Medium
Random
Wearout
Release of cooling water to the atmosphere/scavenge air, resulting in excessive consumption of freshwater and potentially insufficient cooling water being delivered to the engines cylinder
3.8
Random
Wearout
Overheating of cylinder, potentially resulting in the cylinder liner cracking and/or scoring of the liner
129
130
Appendix 2
No.: 12
Item
Failure Mode
Failure Char.
H/E
Local
12.1
Random
Governor increases engine speed, resulting in overspeed protective device tripping and the engine stopping
12.2
Random
Medium
Governor increases engine speed, resulting in overspeed protective device tripping and the engine stopping
Medium
Appendix 2
No.: 12
Item
Failure Mode
Failure Char.
H/E
12.3
Random
Governor slows the engine down, resulting in reduced vessel speed or the engine shutting down 1.2, 1.3, 21.2, 21.3 Function is reduced or increased, resulting in operational delays Propulsion SL-3 Improbable Low Functional check of speed setting system, engine with bridge control system-4000 hr Improbable
12.4
Random
12.5
Erratic signal
Random
131
132
Appendix 2
No.: 13
Item
Failure Mode
Failure Char.
H/E
Local
13.1
Random
13.2
Random
1.2, 21.2
Function is reduced, resulting in operational delays Complete loss of propulsion Propulsion SL-4 Improbable Medium Functional check of overspeed device8000 hr Functional check of speed setting system, engine with bridge control system-4000 hr
Propulsion SL-3
Remote
Medium
Functional check of speed setting system, engine with bridge control system-4000 hr
Improbable
Medium
13.3
Random
1.2, 2.1, 21.3
Improbable
Medium
Governor increases engine speed, resulting in the overspeed protective device tripping and the engine stopping
Improbable
Medium
13.4
Erratic signal
Random
Functional check of speed setting system, engine with bridge control system-4000 hr
Improbable
Low
Appendix 2
No.: 14
Item
Failure Mode
Failure Char.
H/E
Local
14.1
Random
Wear-out
Function is reduced or increased, resulting in operational delays Functional check of speed setting system, engine with bridge control system-4000 hr Improbable Low Inspect and lubricate linkages-4000 hr Functional check of speed setting system, engine with bridge control system-4000 hr Remote Medium Inspect and lubricate linkages-4000 hr Change governor oil4000 hr
Propulsion SL-3
14.2
Random
Wear-out
14.3
Random
Wear-out
133
134
Appendix 2
No.: 14
Item
Failure Mode
Failure Char.
H/E
Local
14.4
Random
Wearout
Failure to adjust fuel rack, resulting in improper engine speed 1.2, 1.3, 21.2, 21.3 Inspect and lubricate linkages-4000 hr Change governor oil4000 hr Remote Medium Improbable Improbable Low Low
14.5
Fractured linkage
Wear-in
Random
Wearout
Appendix 2
No.: 15
Item
Failure Mode
Failure Char.
H/E
15.1
Wear-in
Random
Release of lube oil in machinery space. If leak is large, standby pump will start.
15.2
Random
Wearout
15.4
Random
Wearout
Insufficient pressure or flow of lubricant to the camshaft, resulting in a low pressure alarm and standby pump started
135
Appendix 2
1.6
136
Appendix 2
Category A Propulsion Diesel Engine Basic Engine Cylinder liner, including cylinder lubrication passageways and cooling jacket Risk Item No. Current High Medium Perform before engine startup 2000 hr List the Task Procedure No. or Operating Instruction No. here Inspection is to detect corrosion, erosion, cracking and plugging Develop detailed procedures for this task May be required sooner based on results of inspection of cylinder liner Use results for water treatment as necessary 8000 hr 8000 hr To restore honing pattern to cylinder walls and therefore ability to hold lube oil Projected 3.1 Frequency Procedure No. or Class Reference
Task
Task Type(2)
Comments
Turn engine at least one revolution prior to starting, check if indicator valves on cylinders leaks fluid 3.3, 3.4, High Medium
AAET
Visual inspection of the cylinder liner with a borescope via the scavenge port 3.5 3.6 Medium Low 4000 hr High Medium 8000 hrs
CM
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004 3.7, 3.8 3.7, 3.8 3.4 Medium Low Medium Medium Medium Medium 1000 hr
PM
PM
CM
PM
PM
Category A Can be undertaken at sea by the vessels personnel Category B Must be undertaken alongside by equipment vendors or with use of dockside facilities Category C Must be undertaken in a dry dock facility CM Condition monitoring PM Planned maintenance FF Failure finding AAET Any applicable and effective task OTC One time change
137
138
Appendix 2
Category B Propulsion Diesel Engine Basic Engine Cylinder liner, including cylinder lubrication passageways and cooling jacket Risk Item No. Unmitigated High Medium Perform before installation Perform at time of manufacture ABS Rules 4-2-1/Table 1 ABS Rules 4-2-1/Table 2 Mitigated 3.1 Frequency Procedure No. or Class Reference
Task
Task Type(2)
Hydrostatic pressure test the cylinder liner before each installation 3.1 High Medium
FF
OTC
Maintenance Category(1): Functional Group: System: Equipment Item: Component: Risk Item No. Unmitigated Low Low Mitigated 15.2, 15.4 Frequency 1000 hrs
Category A Propulsion Diesel Engine Engine Support Lube Oil Camshaft Lube Oil Pump (standby) Procedure No. or Class Reference
Task
Task Type(2)
Comments This duty pumps operating context is to run until a failure occurs, then standby is started.
FF
Category A Can be undertaken at sea by the vessels personnel Category B Must be undertaken alongside by equipment vendors or with use of dockside facilities Category C Must be undertaken in a dry dock facility CM Condition monitoring PM Planned maintenance FF Failure finding AAET Any applicable and effective task OTC One time change
Appendix 2
Maintenance Category(1): Functional Group: System: Equipment Item: Component: Risk Item No. Unmitigated Medium Medium 8000 hr List the Task Procedure No. or Operating Instruction No. here Mitigated 12.1, 12.2, 13.3 12.1, 12.2, 12.3, 12.4, 13.1, 13.2, 13.3, 13.4, 14.1, 14.2 14.1, 14.2, 14.3, 14.4, 14.5 14.1, 14.3, 14.4, 14.5 Low Low 4000 hr Medium Low 4000 hr High Medium 4000 hr Frequency Procedure No. or Class Reference
Task
Task Type(2)
Comments
FF
Functional check of speed setting system, engine with bridge control system
FF
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004 Lube oil analysis was considered
PM
PM
Category A Can be undertaken at sea by the vessels personnel Category B Must be undertaken alongside by equipment vendors or with use of dockside facilities Category C Must be undertaken in a dry dock facility CM Condition monitoring PM Planned maintenance FF Failure finding AAET Any applicable and effective task OTC One time change
139
Appendix 2
1.7
140
Appendix 2
Maintenance Category(1):: Functional Group: System: Equipment Item: Component: Risk due to stock-out Procedure No. or Class Reference List the Task Procedure No. or Operating Instruction No. here
Category A Propulsion Diesel Engine Basic Engine Cylinder liner, including cylinder lubrication passageways and cooling jacket
Item No.
Spare Parts Identification List Spare parts identification data here Parts and equipment necessary for borescope inspection Parts and equipment necessary for tasks
Visual inspection of the cylinder liner with a borescope via the scavenge port 3.5 3.6 3.7, 3.8 3.7, 3.8 3.4 Yes Medium Yes Medium Yes Medium Yes Medium Yes Medium
CM
PM
ABS GUIDANCE NOTES ON RELIABILITY-CENTERED MAINTENANCE . 2004 Parts and equipment necessary for cleaning Parts and equipment necessary for analysis Parts and equipment necessary for cleaning Parts and equipment necessary for tasks
PM
CM
PM
PM
Category A Can be undertaken at sea by the vessels personnel Category B Must be undertaken alongside by equipment vendors or with use of dockside facilities Category C Must be undertaken in a dry dock facility CM Condition monitoring PM Planned maintenance FF Failure finding AAET Any applicable and effective task OTC One time change
141
142
Appendix 2
Maintenance Category(1):: Functional Group: System: Equipment Item: Component: Risk due to stock-out Procedure No. or Class Reference
Item No.
Spare Parts Identification Parts and equipment necessary for tasks Parts and equipment necessary for tasks
FF
Functional check of speed setting system, engine with bridge control system
FF
12.1, 12.2, 12.3, 12.4, 13.1, 13.2, 13.3, 13.4, 14.1, 14.2 Yes Low
PM
PM
Category A Can be undertaken at sea by the vessels personnel Category B Must be undertaken alongside by equipment vendors or with use of dockside facilities Category C Must be undertaken in a dry dock facility CM Condition monitoring PM Planned maintenance FF Failure finding AAET Any applicable and effective task OTC One time change
Appendix 2
2.1
2.2
143
Appendix 2
To determine the frequency reduction for Severity Level 4 for Propulsion in Appendix 2, Table 15, we refer to the Severity Level 4 row in Appendix 2, Table 13. To calculate the Current Events/yr upper bound in Appendix 2, Table 15, Severity Level 4, we note there is one Current Risk in the Remote column and three Current Risks in the Improbable column of Appendix 2, Table 13. The frequency range for Remote is 0.001 to 0.01 events/yr and for Improbable, <0.001 events/yr. The Current Events/yr upper bound is 1 * (0.01) + 3 * (0.001) = 0.013 and Current Events/yr lower bound is 1 * (0.001) + 3 * (0.000) = 0.001. The Projected Events/yr is calculated similarly. The Frequency reduction is determined by subtracting the Projected Events/yr from the Current Events/yr for the upper bound and for the lower bound. For Severity Level 4, the proposed maintenance tasks are projected to reduce the frequency of a Severity Level 4 event by 0.001 to 0.009 events/yr. If an economic value is assigned to the Severity Level, an annual economic risk reduction can be estimated.
Shade
144
Appendix 2
Projected Events/yr
Projected Events/yr
145