Reliability and Safety


It would appear obvious that reliable operation of our production equipment and the safety of our plant personnel go hand in hand. When the plant is operating efficiently, we assume that the risk of hazard exposure should be reduced.

This two-part article addresses these assumptions and the important role reliability can play in safety. In Part 1 (which appeared in the May/June issue), I discussed the link between the safety risks of our personnel and risks associated with equipment reliability. In Part 2, I will explore how we can leverage reliability improvements to reduce overall risk, and perhaps provide some new considerations for how to apply risk reduction techniques from the world of personnel safety to enhance equipment reliability and vice versa.


Not all injuries are created equal. Some incidents result in injuries that can be addressed with first aid, while others are potentially much more severe, even possibly fatal. Logically, we want to focus on reducing the severity and the probability of occurrence of the most serious injuries.

Initially, the understanding by the scientific community was that safety risk of injury could be represented by a triangle, with the most severe injuries and fatalities at the peak, and the least severe, such as first aids, at the base. The popular notion was that if you focused on reducing the frequency of exposures that led to your first aid injuries and near misses (the base of the triangle) you would see resultant reduction of your more severe injury frequency farther up the triangle.

Over about 50 years, industry as a whole made significant progress in reducing overall injury exposure — but the frequency of fatalities was actually increasing. Further research revealed the logic that a very specific set of exposures led to severe injuries and fatalities, and only by focusing specifically on those exposures could you make progress on reducing serious injury or fatality (SIF) incidents (see Fig. 4).

a figure reliability and safety
Fig. 4

The actual list of SIF exposures may be slightly different for each type of process facility, but some of the most commonly recurring SIF exposures are:

  • Working in confined spaces
  • Working at heights (greater than 4ft)
  • Pedestrian vehicle interaction
  • Exposure to potential energy — electrical, mechanical, thermal, kinetic, etc.
  • Slips, trips, and falls — particularly backward, or onto an impaling object
  • Dropped/falling objects
  • Machine guards — either missing or defeated
  • Lifting/moving heavy loads

Another variation of the Failure Developing Period curve is called the Design — Installation — Potential Failure — Failure (DIPF) curve, which correlates a component’s resistance to failure and its operating life (see Fig. 5). The figure lists some of the activities or occurrences usually associated with each phase of the component’s operating evolution from initial design to functional failure.

a figure reliability and safety
Fig. 5

The types of exposures that can potentially lead to serious injury or fatality are always present throughout the DIPF work cycle, though the likelihoods will be significantly different depending upon whether the work is conducted in the proactive range or the reactive range. For example:

  1. When proactive measures are being taken, particularly those listed in the “Precision” or “Predictive” sections, the equipment is typically in operation so the more concerning exposures tend be related to chemical or thermal exposure and rotating or energized equipment.
  2. When reactive work is being executed, possibly after complete breakdown, the equipment is most likely shut down. So, the exposure profile changes from potentially energized equipment, to entering confined spaces, falls from heights, or dropped objects.

Note: I am not implying that there is no risk of exposure to energized equipment when breakdown repair occurs, nor am I implying that you will only be observing operating equipment from the ground level. The point is to remain aware of the likelihood of a particular exposure based upon the type of maintenance work being conducted.


Another obvious overlap between safety and reliability revolves around the desire to prevent incidents from occurring through effective planning and scheduling. Detailed job plans and safety instructions are designed to reduce risk. Unfortunately, there is so much risk present in our work environments that predicting exactly where the next incident will occur is almost impossible. But we can analyze our risk exposure profile and focus on the most likely sources of risk. Where are we most prone to have failure with SIF consequences?

For activities that are most critical to the safety of our personnel or the effective operation of our equipment, there are always specific steps that, if not done properly, result in extremely negative consequences. These are indicated by the red circle in Fig. 6.

a figure reliability and safety
Fig. 6

Think about the first break on a pressurized process line or the first contact on a job requiring high voltage energy isolation. Now think about jumping out of an airplane. Feels pretty similar, right? Unless extreme care is taken during these steps, SIF injury is a likely result.

Similar critical steps during a job plan can be identified on the reliability front. Figure 7 illustrates a definition of preventive maintenance. The preventive maintenance tasks that IDCON has deemed to be part of Essential Care of equipment — such as precision alignment of the shafts being coupled, balancing, and proper lubrication — are critically important to the reliability of our operating equipment.

figure reliability and safety
Fig. 7

Objective methods of condition monitoring provide the opportunity to extend the life of your equipment, reduce the risk of a breakdown incident, and as a result, reduce maintenance cost. Focusing on precision when conducting these critical steps provides significant benefits to the life of your equipment and your bottom line.

For instance, the study highlighted in Fig. 8 shows the positive impact a vibration monitoring program can have on the life of your rotating equipment and your profit-ability. Appropriate focus on critical steps in the essential care process, such as precision alignment and balancing, reduces the likelihood of vibration and provides tremendous potential payback.

a fig the safety reliability overlap
Fig. 8


OSHA long ago recognized the overlap of safety and reliability. Back in the 1980s, after several very serious incidents involving exposures to highly hazardous chemicals in process industries that resulted in catastrophic loss of human life, OSHA issued CFR 1910.119, Process Safety Management (PSM) of Highly Hazardous Chemicals. The purpose of this regulation was to provide requirements for preventing or minimizing the consequences of catastrophic releases of toxic, reactive, flammable, or explosive chemicals that could result in toxic, fire, or explosion hazards.

The regulation described 14 elements of focus for industries handling these types of chemicals (see Fig. 9). The last two PSM elements in this list, incident investigations and mechanical integrity, are important to remember as essential elements of our focus on reliability.

Incidents, whether they result in equipment damage/breakdown or personnel injury, are very undesirable; as such, we do everything possible to prevent them from occurring. However, once they have occurred — regardless of severity of injury or equipment damage — incidents are extremely valuable as learning opportunities and should never be wasted. The most effective organizations can recognize the exposures present before incidents occur and conduct a pre-accident investigation to address gaps in their protections without the cost of equipment damage or personnel injury.

In his book “Pre-Accident Investigations,” Todd Conklin outlines a six-part process for predicting safety exposure:

  1. Look for high consequence activities.
  2. Look for small signals that can indicate system weaknesses or problems within the normal work process.
  3. Look for error provoking systems, steps, and processes.
  4. Look for error-likely conditions.
  5. Listen to your workers.
  6. Ask yourself what keeps you up at night.

On the reliability side, we use condition monitoring to identify these “signals” before the incident occurs. Once the incident occurs, a thorough investigation should be conducted that identifies not just the circumstances of the incident, but also the context within which the incident occurred. An appropriate Root Cause Failure Analysis should be conducted to address not just the obvious symptoms of the problem, but tall underlying causal factors and the true root cause(s) of the incident.

Corrective (to address the immediate consequences of the incident) and preventive (to reduce or eliminate the likelihood of recurrence) actions should be identified, planned, scheduled, and executed. Best Practice: An often-neglected step in the process is to circle back to analyze the effectiveness of the corrective and preventive actions taken. Have they been fully executed as planned and have they achieved the desired result of reducing or eliminating the risk factors identified and targeted by the investigation? That’s why we at IDCON refer to this process as Root Cause Problem Elimination.

a figure reliability and safety
Fig. 9

The other element of PSM that significantly overlaps with reliability is item number 14 on Fig. 9: Mechanical Integrity (MI). This part of the OSHA regulation describes the types of equipment covered:

  • Pressure vessels and storage tanks
  • Piping systems
  • Relief and vent systems and devices
  • Emergency shutdown systems
  • Controls (sensors, alarms, interlocks)
  • Pumps and associated equipment

The MI section of PSM also covers specific requirements for written maintenance procedures, training, inspection, and testing; repairs to correct equipment deficiencies; quality assurance processes for new equipment construction/ installation; inspections; and materials and spare parts. Underlying all requirements of MI is a concept called RAGAGEP, which stands for “Recognized and Generally Accepted Good Engineering Practices.” In simpler terms, the company must be able to demonstrate why they have decided that what they’re doing is the right thing to do. The basis may be an established code or standard, published technical report, recommended practice, or a similar document. Primary sources would include:

  • Published and widely adopted codes such as those published by the National Fire Protection Association (NFPA).
  • Published consensus documents such as those from the American Society of Mechanical Engineers (ASME) or the American National Standards Institute (ANSI).
  • Published non-consensus documents such as those from the Chlorine Institute or the Center for Chemical Process Safety (CCPS).
  • Certain internal documentation.

Companies can develop their own standards as long as they don’t contradict other sources of RAGAGEP and provide additional layers of protection.


In conclusion, personnel safety and production process reliability are two facets of the same risk reduction process. Good planning and scheduling processes reduce risk and increase the efficiency of corrective and preventive actions, which impact both personnel and equipment. Understand that people inherently create variability in processes. Building systems and processes that acknowledge that human error is inevitable reduces the severity of events when they do occur. Identify your SIF exposures and when they are most prevalent in your work cycle and address them accordingly. Identify and pay particular attention to critical steps in your processes.

Try to address exposures before incidents occur. When an incident does occur, whether it results in injury or equipment failure, take advantage of the opportunity to learn from it through effective incident investigation and Root Cause Analysis techniques. Then make sure to address all exposures identified as a result. Sound mechanical integrity processes, while focused primarily on increasing equipment/production reliability, provide significant benefit for personnel safety as well.

The most successful enterprises approach safety and reliability with the same level of discipline. The result: these companies leverage the risk reduction efforts in both areas to multiply the benefits to employee morale and their profitability.


Serious Injury and Fatality (SIF) prevention: BST Dekra

Tor Idhammar is president, IDCON, Inc. and section editor, Reliability & Maintenance, for Paper360° magazine. You can reach him at .