How Reliability and Maintenance Improves Safety

The expected outcome is that everyone should arrive home every day without any injuries or fear of the work environment. In my opinion, that isn’t too much to ask. How do I, as a maintenance guy, stay safe when working in a stressful industrial environment?

My own injuries came from playing hockey or other sports—mostly knee and back issues. How was I able to avoid injuries at work? Is it because I work with safety in mind or was it just luck? Thinking back to my own experience, and what I consider high risk behavior or situations, here are a few that I experienced or heard about.

Utility pole fall: I was around 11 years old when my grandfather showed me a picture of a utility pole. This was from the time when they were setting up electrical power in the countryside—utility poles were raised to connect farms and rural locations. Both my grandfather and great-grandfather worked on wiring the poles.

One Saturday evening, they got ready to go home and enjoy their rest the next day. The last utility pole to be wired was on a large farm nearby. Because they were close, they decided to put up temporary wires before they left so the farmer could enjoy his new lighting for the first time over the weekend.

My great-grandfather suffered a fatal fall while working with the live wires that evening. His son, my grandfather, was 13 years old at the time, the oldest in a family of eight kids. This story has never left my mind, and I still have the picture of the utility pole in my home office.

Lifeboat drill: When I was 19 and working as the 3rd engineer on a car carrier, we had to wait one day in the South China Sea to receive a load of cars from Japan, destined for Europe. We used this time to do a lifeboat drill, as required by regulations. These were the old-style lifeboats that were pulled to the side of the ship with davits and then winched down to the water (think of the movie Titanic).

Two of us jumped into a lifeboat and hung onto a rope attached to a wire between the davits. That rope was meant to be a safety line in case something happened. As the winch slowly lowered the lifeboat into the water, the aft winch wire snapped.

Suddenly, the lifeboat was hanging vertically by one end with the two of us holding onto each safety rope with our bare hands—around 50 ft above the ocean. Fortunately, the 2nd officer pulled in the ropes toward the ship so we could climb on board. Back to safety, we were able to recover the lifeboat and replace the wires.

Dropped bearing in reactor tank: Years later, I was working as a maintenance manager in a chemical plant. The time had come for our annual shutdown. We had historically inspected and replaced bottom steady bearings on all our reactor tanks ourselves, but this year we hired a contractor to do the work on the agitator bearings.

The process was to follow the tank entry procedure, with one person going down in the tank and another lowering the tools and parts down with a rope. The tank watch contractor started to lower the bottom steady bearing but had not tied it on correctly—even after being trained on the procedure. Naturally, the bearing fell hard onto the contractor mechanic’s head. The mechanic spent a few days in the hospital, but luckily had a full recovery.

The procedure was then changed to lower the tools and parts before entering the tank, as well as using a hook to secure the rope.

Silo explosion: I eventually became operations manager for the solid materials unit in the same plant. During that time, truck drivers loaded our silos on a weekly basis. One day, a truck driver was unloading a binder used in our solid products, and I was in the control room on the second floor looking out the window right next to the top of the silos outside. As I stood watching, the silo exploded, temporarily blinding me.

Regaining my composure, I saw that the top of the silo had blown off, as it was designed to do. Inside the control room, I could see that the wall had flexed a foot, indicated by the compacted hanging ceiling. The silo design and maintenance was in good order, so we started to inspect the truck.

The truck blower had failed, causing the bearing to create sparks that got into the silo even with all safety systems in place and working correctly. Nobody was injured, but after that, no truck driver could use their own equipment to unload (because we did not maintain the truck company equipment). From that point on, the trucks had to use our blowers to unload.


Each of these incidents contained an element of high-risk environment and/or high-risk behavior. So how does this point back to reliability and maintenance? Maintenance is about identifying risks of failure and then preventing the failure, or identifying a failure early and correcting before it impacts the function/creates an unsafe condition.

It is about doing the right thing over and over; that will provide sustainable results. This can also be compared to safety culture. In safety you identify risk; develop procedures; train people to follow the procedures; coach individual engagement; instill personal responsibility; and provide oversight to follow the procedures, evaluate the process, and continuously improve.

We can relate the incidents to these categories regarding safety:

  • Not doing a risk analysis.
  • High-risk behavior (includes not following procedure).
  • Equipment: Lack of preventive maintenance (PM)—preventing failures to extend the life and decrease the risk for unsafe condition.
  • Equipment: Lack of condition monitoring—finding failures early and correcting in a timely manner.
  • Inadequate procedures, increasing the safety risk.
  • Not providing safety training.

High risk behavior related to managing R&M:

  • Not using best practice design for equipment.
  • Not using best practice installation practices—project management.
  • Not doing equipment criticality analysis.
  • Not developing and documenting preventive and condition monitoring procedures (can include on-line monitoring/machine learning tools).
  • Inadequate process to manage work management to execute corrective maintenance.
  • Inadequate process for skills management.

I believe that identifying risky situations and/or behavior is how I avoided injury, even though I have been part of many high-risk situations. I learned to evaluate every situation in terms of managing or minimizing risk. I also feel that an excellent PM program is a major factor in improving and maintaining safety in any plant or mill. So, how can you build a PM program from scratch or improve your existing one?


training in current best practices

PHASE I: Set-up
Step 1: Commitment
  1. Understand: The first step is to understand the current state of the PM process. Asses the quality, execution, and management of all current PMs.
  2. Scope out: Next, determine the scope of the PM project.
  3. Develop a rough plan: Scope out the resources, cost, goals, expected results, a business case, and pilot area; then select a team.
  4. Get site commitment: Present the business case and get a commitment for the resources from managers and key decision makers. The management team needs to understand what is expected, the cost to support the project, and the results.
  5. Middle management onboarding: When you receive the project go-ahead, the extended management team (including supervisor level) needs to be informed and have input on the implementation plan. It is important to discuss the short- and long-term benefits of implementing best practices of the PM process.
  6. Establish priorities: Select pilot area(s) to start the implementation. The purpose is to spend some time learning how to develop the PM system and make improvements before starting in other areas.
  7. Identify resources: Depending on project scope, the next step is to select inhouse and outside resources necessary to complete the project on time. In-house resources should consist of a senior management sponsor, project manager, and team members as Subject Matter Experts (SMEs). The SMEs consist of tradespeople for mechanical, electrical, instrumentation, maintenance supervisors, process engineers, and reliability engineers.
  8. Develop communication plan: Make a communication plan to describe the future state and business benefits of an improved PM system that will increase reliability. Communicate goals, approach, resources, and costs (Basic Change Management).
Step 2: Roadmap
  1. Training in Current Best Practices (CBP): Provide CBP training on what good PM looks like for the organization, how it will be implemented, and which methodologies are necessary for the team to put together a comprehensive action plan.
  2. Develop a detailed implementation plan of how to document preventive maintenance.
  3. Design workflows: There must be a clear PM documentation method. The workflows should include the thought process used to identify tasks, PM interval, when to operate to breakdown instead of doing PM, which tools are needed, who should do the PM, etc. Basic workflows include tasks, outcomes, important decisions, and a RACI chart.
Step 3: Document

Start documenting with the PM pilot area selected in Step 1 above; then execute and repair equipment findings in the first batch of improved PMs. The average operation will have around 3,000—10,000 possible failure modes that will impact operations, product quality, customer service, safety, and the environment. The point of the PM strategy is to reduce risks that impact the business. Here are some typical steps for documenting the PM strategy using Condition Monitoring Standards (CMS):

  • Select equipment, describe the operating context, and determine criticality.
  • Review the equipment documentation; divide into components/subcomponents.
  • Select essential care, fixed time maintenance, condition monitoring tasks, or operate-to-breakdown. Find the CMS for the component and select the task. For specialized equipment, develop a new CMS. When cost- effective, choose condition-based tools and machine learning.
  • Under what condition can the ECCM task be completed (on-the-run, stopped and select frequency of inspection or measurement)?
  • Decide who is doing the task.
Step 4: Execute

After the master data is completed, assign team members and craftspeople to review the documented PM tasks. Walk down the inspecting routes and PM work orders in the pilot area.

Step 5: Repair

After the pilot area’s routes and PM instructions have been checked, initiate work requests in the computerized maintenance management system (CMMS). Communicate all failures found during initial inspections, with additional communication when they have been corrected and repaired.

Step 6: Evaluate and Adjust

It is common that PMs are skipped due to poor work management, inadequate planning and scheduling processes, unsolved root causes, and/or lost spare parts.

PHASE III: Deployment
Step 7: Implementation

Once the pilot is complete, select the next area. Some problems may take a while to repair because they need engineering, a shutdown, or a long lead time on a spare part.

PHASE IV: Continuous Improvement
Step 8: Systematize

At this stage, make sure the preventive maintenance process is established and that execution is rooted in the daily work process. Make sure to maintain the PM database and update as needed. To fully systemize the process, you’ll need to design workflows, establish an auditing process, coach management and craftspeople in their roles, and commit to continuous improvements.

Improving your PM process WILL have a positive impact on safety and plant/mill performance.

Owe Forsberg is vice president at IDCON Inc., a consulting firm offering common sense consulting and training for industry reliability and maintenance. He has more than 30 years of experience in the industry coaching clients to increase output and decrease cost of manufacturing and operation.