Refinery at Night

Invaluable free resource documents regarding best practices in process safety management (PSM), process safety engineering (PSE), process hazard analysis (PHA), hazard and operability study (HAZOP), layer of protection analysis (LOPA), incident investigation (II), root cause analysis (RCA), safety instrument system (SIS), safety integrity level (SIL), and safety instrument function (SIF). With compliments from the process safety management experts at Process Improvement Institute!

Hello to registrants for the 2021 AIChE Spring Meeting and and 17th Global Congress on Process Safety! To make it easier for you to find the presentation documents and videos that we’re showing at this year’s conference, all documents applicable to this week’s presentations are broken out separately below from the rest of our Free Resources documents. (And the software glitch that was affecting the rest of our legacy documents has been fixed, so all of those documents are now available again for your use as well!)

(Simply select the category line of interest to see the document titles relative to that discipline along with a short description, and then select a document title to view the complete document.)

Documents/Videos Presented at the 2021 AIChE Virtual Spring Meeting and 17th Global Congress on Process Safety

Process Hazard Analysis (PHA), including HAZOP

Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (NEW – 2021)

Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (NEW – 2021; PDF; 834 KB) Hazard evaluations, also called process hazard analysis (PHAs) have been performed formally in gradually improving fashion for more than five decades. Methods such as HAZOP and What-If analysis have been developed and honed during this time. Some weaknesses identified 30 years ago still exist in the majority of PHAs performed around the world. Critically, most PHAs do not thoroughly analyze the errors that can occur during startup, shutdown, and other non-routine (non-normal) modes of operations; sadly the commonly used approaches for PHA of continuous mode of operation only find about 5 – 10% of the accident scenarios that may occur during startup, shutdown, and online maintenance. This is true even though about 80% of major accidents occur during non-routine operations. Instead of focusing on the most hazardous modes of operation, most PHAs focus on normal operations (e.g., HAZOP of equipment nodes). In a majority (perhaps more than 80%) of both older operations and new plants/projects, the non-routine modes of operations are not analyzed at all. This means that perhaps 70% of the accident scenarios during non-routine operations are being missed by those PHAs. If the hazard evaluation does not find the scenarios that can likely occur during these non-routine operations, the organization will not know what safeguards are needed against these scenarios.

One reason that many companies do not perform PHA of Procedures is because they believe the time required for such analysis will be excessive. This paper shows clearly the best practices for (1) screening and ranking which procedures/tasks are most critical for analysis, (2) deciding on the method to optimize the investment of time, and (3) streamlining the documentation of the results of the PHA of procedures. Following the steps outlined in this paper reduces the time needed for PHA of procedures by about 40% with noticeably effecting the number of accident scenarios found during PHA of procedures. The result should be more sites completing PHA of procedures.

POSTER – Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (NEW – 2021; PDF; 870 KB) Poster presentation of the above paper.

VIDEO – Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (NEW – 2021; PDF; 142 MB) Video presentation of the above paper.

Layer of Protection Analysis (LOPA)

More Issues With LOPA – From the Originators (NEW – 2021)

More Issues with LOPA – From the Originators (NEW – 2021); PDF; 855 KB) Layer of protection analysis (LOPA) has now been around for more 25 years (and in general use for 20 years), with the initial textbook being officially published in 2001. More recently, two companion books have been published on the topics of Enabling Events & Conditional Modifiers and on Initiating Events and Independent Protection Layers (IPLs). Many papers have been published in the past 20 years on LOPA.

This paper shares observations and lessons learned from two originators of LOPA and provides further guidance on how to and how Not to use LOPA. The paper provides specific examples of best practices, some of which are not covered well enough in or are omitted from the textbooks on the topic. This paper is an update of two earlier papers (2010, 2015) by the originators of LOPA.

More Issues with LOPA – From the Originators (NEW – 2021); Vimeo/MP4; 1.38 GB) Layer of protection analysis (LOPA) has now been around for more 25 years (and in general use for 20 years), with the initial textbook being officially published in 2001. More recently, two companion books have been published on the topics of Enabling Events & Conditional Modifiers and on Initiating Events and Independent Protection Layers (IPLs). Many papers have been published in the past 20 years on LOPA.

The paper referenced in this video shares observations and lessons learned from two originators of LOPA and provides further guidance on how to and how Not to use LOPA. The referenced paper provides specific examples of best practices, some of which are not covered well enough in or are omitted from the textbooks on the topic. The referenced paper is an update of two earlier papers (2010, 2015) by the originators of LOPA.

Safety Instrumented System (SIS), including Safety Instrument Function (SIF) and Safety Integrity Level (SIL)

Is Your SIF Trying to Do Too Much? (NEW – 2021)

Is Your SIF Trying to Do Too Much? (NEW – 2021); PDF; 332 KB) The number of inputs and outputs to a safety instrumented function (as well as how they vote) affects the probability of failure on demand (PFD) and the SIL (safety integrity level), assuming the test interval remains the same. The larger the number of inputs and outputs, the higher the PFD and potentially the lower the SIL.

What if the SIF includes all the sensors and final elements that are involved in any trip of a large processing unit like a heater, a reformer, or a distillation column? What if the SIF includes the actions to shutdown upstream units that feed the unit being tripped? What if the SIF includes the actions to shutdown downstream units that feed the unit being tripped? When SIL verification is done for an SIF as described here, the result may be that the target PFD and SIL cannot be achieved. The temptation may be to add redundancy in sensors and final elements or to reduce the proof test interval in an attempt to reduce the calculated PFD. Frustration may abound as capital and operating costs rise steeply.

This paper shows how to use the principles of LOPA (layer of protection analysis) and the information in the PHA (process hazard analysis) to split up the massive SIF into smaller SIFs that more manageable. The smaller SIF need include only the sensors, logic solver(s), and final elements that detect and prevent a specific scenario (one cause leading to one consequence). The approach makes sure all the smaller SIFs can protect against all the scenarios that the massive SIF was intended to prevent. Trips of upstream and downstream units are considered as orderly shut-down actions. If needed, trips of upstream and downstream units are analyzed as small SIFs, as well.

With reasonable size SIFs, there is an opportunity to design the SIF with a reasonable number of sensors and final elements, and a reasonably long proof test interval.

Is Your SIF Trying to Do Too Much? (NEW – 2021); YouTube/MP4; 439.2 MB) Video presentation of the above paper.

What is the Real Risk Reduction for 3 Sensors Using the Mid-Value for Control and 2oo3 Voting for Safety? (NEW – 2021)

POSTER – What is the Real Risk Reduction for 3 Sensors Using the Mid-Value for Control and 2oo3 Voting for Safety? (NEW – 2021; PDF; 1.76 MB) What happens to the risk when two good ideas are combined? To reduce spurious trips of SIFs, many plants moved from 1oo1 or 1oo2 voting on the sensors to 2oo3 voting — a good idea. To improve stability for critical process control loops, many plants went from one or two sensors to three sensors using the mid-value for control (also called median-select) — also, a good idea. Without really analyzing it, some facilities combined the two ideas, using the mid-value of three sensors for a control loop and then, using the same three sensors voting 2oo3 for an SIF. The intent of the SIF was to protect against consequences that could be caused by a failure of the control loop. This arrangement violates the fundamental premise of LOPA (layer of protection analysis) and ANSI/ISA 84.00.01 (IEC 61511); an independent protection layer shall be independent of causes of the consequence that the layer protects against. The new configuration must be analyzed by Fault Tree Analysis (FTA), supplemented by Markov analysis. The FTA considers a failure of each of the three sensors and determines which of the remaining devices in the SIF can detect and prevent the consequence. The PFD (probability of failure on demand) is calculated and compared with the PFD of the total independent 2oo3 sensor SIF. The paper suggests guidance for appropriate use of the combined configuration and suggests how to approximate the risk reduction.

VIDEO – What is the Real Risk Reduction for 3 Sensors Using the Mid-Value for Control and 2oo3 Voting for Safety? (NEW – 2021; MP4; 65 MB) Video presentation of the above paper.

Human Factors

Human Factors Elements Missing from Most Process Safety Management Systems (NEW – 2021)

Human Factors Elements Missing from Process Safety Management (PSM) Systems (NEW – 2021; PDF; 1 MB) Process safety is about controlling risk of failures and errors; controlling risk is primarily about reducing the risk of human error. All elements of Risk-Based Process Safety (RBPS) and alternative standards for process safety (such as US OSHA’s standard for Process Safety Management [PSM] or ACC’s Process Safety Code™ [PSC]) have many elements, and each of these in turn helps to reduce the chance of human error or else helps to limit the impact of human error. But each process safety standard has some weakness in the control of human error. This paper presents an overview of human factor fundamentals, discusses why many PSM systems are weak on human factors and outlines a comprehensive process safety element on Human Factors. It describes what belongs in each category within the Human Factors element and explains the intent, content, and the benefit of each category. The paper also presents examples of Human Factors’ deficiencies and selected examples of industry practices for human factors control are provided. This paper builds on earlier papers, starting from 2010, on the same topic. Process safety is about controlling risk of failures and errors; controlling risk is primarily about reducing the risk of human error. All elements of Risk-Based Process Safety (RBPS) and alternative standards for process safety (such as US OSHA’s standard for Process Safety Management [PSM] or ACC’s Process Safety Code™ [PSC]) have many elements, and each of these in turn helps to reduce the chance of human error or else helps to limit the impact of human error. But each process safety standard has some weakness in the control of human error. This paper presents an overview of human factor fundamentals, discusses why many PSM systems are weak on human factors and outlines a comprehensive process safety element on Human Factors. It describes what belongs in each category within the Human Factors element and explains the intent, content, and the benefit of each category. The paper also presents examples of Human Factors’ deficiencies and selected examples of industry practices for human factors control are provided. This paper builds on earlier papers, starting from 2010, on the same topic.

Human Factors Implementation – for Plant Workers (NEW – 2021)

Human Factors Implementation – for Plant Workers (NEW – 2021; PDF; 1.2 MB) Process safety is about controlling risk of failures and errors; controlling risk is primarily about reducing the risk of human error. Often it is believed that human errors committed by plant workers are the cause of most process safety accidents. However, the entire organization contributes to these human errors. Therefore, actions are needed by the organization to reduce plant worker human errors by improving Human Factors that contribute to the plant worker human errors, and to build an organizational culture that seeks to learn from plant worker human errors, rather than to assign blame. This paper introduces the multiple types and categories of human error and the Human Factors that influence the rate at which human errors are made. It establishes the need for creating management systems for these Human Factors, and how to implement them in a manner to reduce plant worker human errors. Finally, it describes a Safety-Principled Organizational Culture that allows an organization to create the proper environment to enable these critical improvements to reduce plant worker human error. This paper builds on earlier papers, starting from 2010, on the same topic. The data presented is from basic research by the authors on the root causes of more than 3000 accidents and near misses; and also based on the review of hundreds of accidents analyzed by others and summary data from many companies. This Video Presentation and the related slides are even more keenly focused on the selected human factors for which the front line workers should take the lead so that the base human error rate at a site is as low as possible. Case studies and examples are used to illustrate key points.

VIDEO – Human Factors Implementation – for Plant Workers (NEW – 2021; Vimeo/MP4; 1.28 GB) Video presentation of the above paper.

Controlling Human Performance Between Both Unplanned and Planned Tasks within Abnormal Operation Mode (NEW – 2021)

Controlling Human Performance Between Both Unplanned and Planned Tasks within Abnormal Operation Mode (NEW – 2021; PDF; 689 KB) There is confusion in terminology used in the chemical-related industry for the class of procedures commonly referred to as Abnormal Mode of Operation and Abnormal situation management. This paper provides a clear definition of each mode of operation and gives examples of how the human performance is controlled for each.

1. Normal – either a continuous mode or a normal batch mode of operation.

2. Planned Non-routine (Non-Normal)– Startup, shutdown, are online maintenance are the main non-routine or non-normal modes of planned operation. But planned Temporary procedures, with time limits, are also part of these.

3. Abnormal– These include those activities covered by generalize procedures or guides (planned in a general sense) the activities that don’t have written, step-by-step procedures (but noting that many companies have guides for most of these abnormal situations). A further breakdown is possible:

a. Response to upsets using a Trouble-shooting Guides (TSG) for handling deviations from the operating window (these or normally triggered on an alarm); these may ultimately lead to a shutdown, safe park, or emergency shutdown (all of which are also proceduralized), if the deviation cannot be corrected in time.

b. Response to failures using a Temporary procedure – such as how to run in bypass mode if the flow controller fails and you want to keep running in manual mode.

c. Response to Unanticipated Events – The FIRST occurrence of this is handled by what is generally referred to as Emergency-MOC, which is really saying we will make a change immediately, and do the risk review of the change later, and then learn from this one case and proceduralize into a Temp Procedure or TSG for the next time it comes up.

4. Emergency Operations- a diminished or reduced operating plan; normally a Temporary procedure (similar to 3.b. above)

5. Emergency Shutdown – a TSG that fails to resolve the issue in time will go to this; or for some events such as a sudden loss of containment, we go straight to these.

6. Emergency Response- normally in conjunction with 4 or 5, but focused on the protection of people, assets, environment, given the release or other imminent harm is in play.

This paper will provide a framework to ensure no mode of operation is overlooked and it will help sites understand what is needed to control risk during each mode of operation. Most of the time in the paper and presentation will be focused on classification 3, Abnormal Mode of operation.

VIDEO – Controlling Human Error during Unplanned and Planned Abnormal Situations (NEW – 2021; Video/MP4; 153.2 MB) Video presentation of the above paper.

Legacy Free Resources Documents

Process Safety Management and Process Safety Engineering

Real Process Safety Culture (PDF; 815 KB) Many implementers see Process Safety Culture (PSC) as an intangible attribute of a company or site. Some workers see PSC as ‘code words” for management not wanting to take responsibility for process safety management. Others see PSC as something that can be affected directly by the actions of management or by an active program targeted directly at the site culture. This paper shows what affects the true “culture” at a site and it shows that tangible, real, activities within a site are what make safety culture a reality. The paper also reviews the approaches to direct and indirect measurement of process safety culture, and the value of these.

POSTER - Four Common Gaps in Process Safety - Worldwide (PDF; 706 KB) Infographic explaining the four major gaps that are preventing most companies worldwide from achieving excellent process safety performance.

Understanding the Interrelationships Between the PSM Elements For Effective Implementation (PDF; 492 KB) Every element of PSM has a role in controlling risk. These elements have an interrelationship – each depends on the implementation of one or more elements for effective implementation. Some elements have a greater influence than others. Understanding these relationships is critical for building and sustaining effective PSM programs. This paper will show through figures and discussions, these interrelationships, key attributes of each element that influence other elements and how poor implementation of each element adversely affects the implementation of the other elements.

Four Major Gaps that Are Preventing Most Companies Worldwide From Achieving Excellent Process Safety Performance (PDF; 949 KB) Process safety requires implementing many management systems, specific engineered features, and operating and maintenance practices effectively. Most companies believe they have done just that, and yet major accidents continue to occur. Why is that? What is missing? This paper looks at the statistics of major accidents, combined with results from audits and assessment from more than 50 chemical, petrochemical, oil/gas, and related processing companies world-wide. The paper illustrates the four major gaps that are common to the companies/sites that keep having major accidents, compared to those companies/sites that do not have such accidents.

Common Hurdles, Benefits, and Costs for Fully Implementing Process Safety Worldwide–Especially in Countries without PSM Regulations (PDF; 935 KB) Process safety is implemented around the world and most of those sites do not have government regulations for compliance to push them along. The hurdles for effective (full) implementation appear to be roughly common from country to country, and site to site. This paper summarizes the lessons learned from multiple companies/sites around the world. Specifically, the paper compares hurdles to effective implementation and how companies crossed these hurdles. We also update earlier papers on the costs and benefits of effective implementation of process safety. Each of these implementations is an example of process safety implementation at a non-covered process and in many of the cases mentioned, the facilities implementing process safety outside of countries with process safety regulations do so better than those in regulated countries and extend process safety to all processes (including to processes such as steel making).

Keys to Avoid Making A Dog’s Breakfast Out of Your MOC System (PDF; 697 KB) This paper discusses how to ensure the Management of Change system addresses a change’s effect on implementation of other process safety management systems and discusses common MOC workflow weaknesses that can undermine implementing an efficient, compliant and effective MOC system. This will be illustrated based on the recent eMOC system rollout at Irving Oil, New Brunswick, Canada.

Process Safety Culture – Making This Real (PDF; 482 KB) Process Safety Culture (PSC) has received considerable attention recently. "Culture" is a very complex concept and can be very difficult to measure, influence, and manage. However, it is possible to identify, measure, analyze, and improve certain activities and characteristics that are recognized as key components of a positive PSC. This paper shows what Contra Costa County (one regulator) is doing to encourage establishment and measurement of process safety culture. It also shows that tangible, real activities within a site are what make safety culture a reality.

Process Safety Competency (PDF; 1 MB) Successful Process Safety requires the utilization, involvement, and full support of nearly ALL staff at a site. Success also demands that a substantial portion of staff be competent and capable of contributing to process safety programs. This paper describes the basics of building competencies in each aspect of process safety, including those tasks that require expert levels of competencies. It also describes different companies' safety competency progression plans and the typical requirements to reach each new level.

Human Factors Missing from PSM (PDF; 816 KB) Management systems for optimizing Human Factors to control human error rates must be developed by an organization involved in implementing Process Safety Management (PSM). This paper presents an overview of Human Factor fundamentals, discusses why many PSM systems are weak on human factors, and outlines a comprehensive process safety element on Human Factors. It describes what belongs in each category within the Human Factors element and explains the intent, content, and benefit of each category.

The Cost & Benefits of Process Safety Management (PDF; 400 KB) Since 1986, state and federal regulators have been mandating implementation of Process Safety Management (PSM) programs at workplaces that handle hazardous chemicals, including explosives, toxics, and flammables. This paper presents the actual costs that some companies have expended and provides estimates of future costs to comply with either self-imposed standards or government regulations related to PSM. Also discussed are the types of benefits and, where possible, the actual benefits that have been achieved by implementing PSM programs.

Risk Tutorial – Playing the Killer Slot Machine (Adobe PDF; 133KB) This paper explores how the acceptability of risk changes under a variety of circumstances. It also explores how these same principles apply to hazard analysis teams that are judging the acceptability of engineered and administrative controls, and whether or not to generate recommendations.

Process Hazard Analysis (PHA), including HAZOP

Business Case for PHA of Procedures (to Find the Accident Scenarios that are Otherwise Missed) NEW - 2023 (PDF; 840 KB) Hazard evaluations, also called process hazard analysis (PHAs) have been performed formally in gradually improving fashion for more than five decades. Methods such as HAZOP and What-If analysis have been developed and honed during this time. Some weaknesses identified 30 years ago still exist in the majority of PHAs performed around the world. Critically, most PHAs do not thoroughly analyze the errors that can occur during startup, shutdown, and other non-routine (non-normal) modes of operations; sadly, the commonly used approaches for PHA of continuous mode of operation only find about 5 - 10% of the accident scenarios that may occur during startup, shutdown, and online maintenance. This is true even though about 80% of major accidents occur during non-routine operations. Instead of focusing on the most hazardous modes of operation, most PHAs focus on normal operations (e.g., HAZOP of equipment nodes). In a majority (perhaps more than 80%) of both older operations and new plants/projects, the non-routine modes of operations are not analyzed at all. This means that perhaps 70% of the accident scenarios during non-routine operations are being missed by those PHAs. If the hazard evaluation does not find the scenarios that can likely occur during these non-routine operations, the organization will not know what safeguards are needed against these scenarios.

This presentation focuses on the business case for doing PHA of Procedures, based on hundreds of PHA. Data from PHAs/HAZOPs show that 50 to 85% of the risk reduction opportunities are found during PHA of procedures, resulting in $100,000,000 USD or more in risk reduction savings per week of PHA/HAZOP of procedures. The return on investment for PHA of procedures (the savings in risk avoidance by implementing doing PHA of procedure) is more than 1000 times the cost of the PHA of procedures.

POSTER - Business Case for PHA of Procedures PDF; 566 KB) Poster presentation of the above paper.

Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (CLICK TO EXPAND)

Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (PDF; 834 KB) Hazard evaluations, also called process hazard analysis (PHAs) have been performed formally in gradually improving fashion for more than five decades. Methods such as HAZOP and What-If analysis have been developed and honed during this time. Some weaknesses identified 30 years ago still exist in the majority of PHAs performed around the world. Critically, most PHAs do not thoroughly analyze the errors that can occur during startup, shutdown, and other non-routine (non-normal) modes of operations; sadly the commonly used approaches for PHA of continuous mode of operation only find about 5 - 10% of the accident scenarios that may occur during startup, shutdown, and online maintenance. This is true even though about 80% of major accidents occur during non-routine operations. Instead of focusing on the most hazardous modes of operation, most PHAs focus on normal operations (e.g., HAZOP of equipment nodes). In a majority (perhaps more than 80%) of both older operations and new plants/projects, the non-routine modes of operations are not analyzed at all. This means that perhaps 70% of the accident scenarios during non-routine operations are being missed by those PHAs. If the hazard evaluation does not find the scenarios that can likely occur during these non-routine operations, the organization will not know what safeguards are needed against these scenarios.

POSTER - Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (PDF; 870 KB) Poster presentation of the above paper.

VIDEO - Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (YouTube/MP4; 142 MB) Video presentation of the above paper.

Lessons Learned from Scenarios Found during PHA of Startup, Shutdown, and Online Maintenance (CLICK TO EXPAND)

Lessons Learned from Scenarios Found during PHA of Startup, Shutdown, and Online Maintenance (PDF; 0.3 MB) Hazards may go unrecognized or underappreciated due to a variety of influences. Ignorance of the hazard is a convenient excuse after an incident. Many incidents are touted as black swan events when they are quite predictable. This paper shows examples from many PHAs of scenarios found from startup, shutdown, and online maintenance modes of operation for which there were no or not enough IPLs. These include (1) introducing cryogenic liquids in columns or vessels before ensuring proper pressurization with gas, (2) valves being left in the wrong position despite many double checks, (3) valves opened or closed in the wrong sequence, despite checklists and significant emphasis, (4) purges being left off despite frequent checks by multiple shifts, and other similar incidents. These types of scenarios have led to many well-known and extremely costly industrial incidents around the globe and may be lurking now in plants operated by the reader, unprotected against and ready to wreak havoc.

The paper describes how these scenarios are found and how they can be prevented using a basis of human reliability and risk assessment of abnormal modes of operation (PHA of Procedures).

Lessons Learned from Scenarios Found during PHA of Startup, Shutdown, and Online Maintenance (PPTX; 2.9 MB) Accompanying presentation slides for Lessons Learned from Scenarios Found during PHA of Startup, Shutdown, and Online Maintenance document.

Lessons Learned from Scenarios Found during PHA of Startup, Shutdown, and Online Maintenance (MP4; 141.7 MB) Accompanying presentation video for Lessons Learned from Scenarios Found during PHA of Startup, Shutdown, and Online Maintenance document.

The Uses and Users of PHA/HAZOP Results (PDF; 1.3 MB) Process Hazard Analyses (PHAs) performed using methods such as HAZOP and What-if that are augmented by checklists, have become well established as a core for understanding risk in a hazardous chemical process and other processes. Some see the PHA results as an end to them itself. But, the real benefit of performing a PHA lies in its usefulness within all aspects of controlling risk day-to-day. This paper explains the different uses of the PHA results and who uses the PHA results. It charts the path for extracts of the PHA results (including the formal PHA report) to the rest of process safety implementation and process safety control. The many uses, intended from the start of hazard evaluations in the 1960s or discovered years later, will surprise many in the industry. Knowing the uses will help you implement process safety thoroughly and more efficiently, and this knowledge of the uses will change the amount of effort you put into the documentation of the PHA results.

Building Competency in Internal PHA/HAZOP Leaders - Lessons Learned in 40 Years of Doing So (PDF; 1.4 MB) Process safety is a deep topic and requires the involvement of nearly ALL staff at a site. But, how do you make sure your staff are up to the task? And how do you judge the competency of subcontractors or third party experts? This paper describes the basics of building competencies in one of the process safety activities that requires expert levels of competency: PHA/HAZOP LEADERSHIP. The paper shows how many companies, beginning with Olin Chemicals and others in the 1970s through hundreds of companies today have planned for the progression to full competency of PHA/HAZOPLeaders and Scribes.

Best Practices for PHA Revalidations (PDF; 745 KB) Process hazard analyses (PHAs) must be updated and revalidated every 5 years or sooner. PHA/HAZOP Revalidation is entirely different than a baseline or original PHA. The original textbook from CCPS on Revalidating PHAs was issued in 2001. This paper describes the current best practice approaches to revalidation and explains when to use each approach. The approaches are described in detailed flowcharts. Checklists for decisions making and quality control are provided, along with examples of completed documentation. This paper is based on completion of more than 300 Revalidations.

The Art of PHA Scribing: The Invisible Role (PDF; 1.2 MB) There is not much information on PHA Scribing because the definition seems to be self-explanatory and the skills required, obvious: Organized and good typing skills. But, how much of a difference can a bad/decent/excellent scribe have in the PHA overall? Is “As long as we don’t have to wait too much for him/her to record” enough? Should the scribe just “listen and record?" Is there anything the Scribe can do to improve the quality of the meetings? This paper is backed by more than a million hours of PHA scribing and tries to answer all these questions by going beyond the trivial set of skills needed. It describes key side-tasks the scribe can do to optimize the PHA team efforts, the skills required to do them, and the interaction with the PHA Leader.

Recipe for a Complete Process Hazard Analysis–Especially Addressing the Key Demands from US CSB (PDF; 1.4 MB) Surprising, more than 80% of PHAs performed today do not comply with the current interpretations by US OSHA, much less the industry best practices. Most PHAs address less than 10% of the hazards during startup, shutdown, and online maintenance and less than about 30% address damage mechanisms such as corrosion, erosion, external impacts, external stresses, vibration, etc. How to address these hazards has been part of the CCPS Guidelines for Hazard Evaluation Procedures since 1991 and the US CSB and US OSHA have noted how these weaknesses have led to many accidents. The citations and comments from US regulators and the CSB are detailed to provide some of the business case for performing more thorough PHAs on these key issues. This paper and presentation also illustrates step-by-step how to address all hazards of the process during ALL modes of operation, during a PHA.

Identify SIF and Specify Necessary SIL, and other IPLs, as part of PHA/HAZOP (PDF; 1.5 MB) Identifying Safety Instrumented Functions (SIFs) and other Independent Protection Layers (IPLs) is important for any organization. These can be identified in a simplified risk assessment such as a Layer of Protection Analysis (LOPA). But, these also can be identified with relative ease in a purely qualitative setting of a Process Hazard Analysis (PHA) using hazard and operability analysis (HAZOP) or other PHA methods. This paper shows how to apply the qualitative definition of IPLs within the setting of a process hazard analysis (PHA) to get most of the gain from LOPA without doing a LOPA (i.e., without using numerical values).

Necessity of PHA of Non-Normal Modes of Operation (PDF; 1.2 MB) Most Process Hazard Analyses (PHAs) do not thoroughly analyze the errors that can occur during startup, shutdown, and other non-routine (non-normal) modes of operations, despite the fact that about 70% of major accidents occur during non-routine operations. This paper explains the business case for doing PHAs of procedure steps for non-routine modes of operation, while also describing the growing regulatory pressure from US OSHA and others. The reader will be able to use the results of this paper to estimate the number of accident scenarios they may be missing and to estimate the time it would take to complete an efficient and thorough PHA of the non-routine modes of operation.

Implementation of Process Hazard Analysis as SSTPC (PDF; 691 KB) This paper provides insights into the challenges faced while implementing process hazard analysis at SINOPEC – SABIC TIANJIN PETROCHEMICAL COMPANY (SS-TPC) and shares lessons learned on how to get best practices implemented in such joint ventures and across very diverse cultures.

PHA of Non-Continuous Operating Modes (PDF; 844 KB) Most Process Hazard Analyses (PHAs) do not thoroughly analyze the errors that can occur during startup, shutdown, and other non-routine (non-normal) modes of operations, despite the fact that about 70% of major accidents occur during non-routine operations. This paper shows practical ways to efficiently and thoroughly analyze the step-by-step procedures that are used to control non-routine operating modes, as well as those for batch and between batch operations. The reader will be able to use the results of this paper to estimate the number of accident scenarios they may be missing and to estimate the time it would take to complete an efficient and thorough PHA of the non-routine modes of operation.

Optimizing PHAs/HAZOPs while Maximizing Brainstorming (PDF; 575 KB) Process Hazard Analysis (PHA) optimization is executing the PHA analysis with practices that are thorough and efficient. Success is dependent on: 1) strong PHA Team Leadership, 2) complete and thorough PHA management practices, and 3) the strength of other process safety management (PSM) practices. This paper presents PHA team leadership techniques and rules, discusses content of PHA management practice, policy, and procedures, and explains the relationship of PHAs with some other PSM elements where if those elements are weak, can impact the quality of the PHA and increase the PHA meeting or documentation time. This paper shares secrets that will speed up your hazard evaluations without sacrificing thoroughness or brainstorming.

Controlling Risk During Major Capital Projects (PDF; 703K) This paper describes the best practices for scheduling and performing Process Hazard Analyses (PHAs) during various key phases of major projects. The paper outlines the scope and content of each project phase hazard review and what the outcomes should be. The reader will first be given the basics, and then provided best practices and examples from various companies. An outline is provided of how information related to process safety should be developed during, and then delivered from, a major project.

Addressing Human Factors During PHAs (PDF; 129 KB) Recent accidents and new regulations underscore the need for companies to identify potential human errors and to reduce the frequency and consequences of such errors as part of an overall Process Safety Management (PSM) program. This paper describes an approach for integrating human factors considerations into Process Hazard Analyses (PHAs) of process designs, operating procedures, and management systems. Critical issues related to human factors can be identified and addressed in different phases of a hazard evaluation. Case studies illustrating the effectiveness of this strategy are provided.

Selection of Hazard Evaluation Techniques (PDF; 265 KB) A successful hazard evaluation can be defined as one in which (1) the need for risk information has been met, (2) the results are of high quality and are easy for decision makers to use, and (3) the study has been performed with the minimum resources needed to get the job done. Obviously, the technique selected has a great bearing on each hazard evaluation’s success. A variety of flexible hazard evaluation techniques is presented in this paper, and each of them has been applied in the chemical process industry and is appropriate for use in a wide variety of situations.

Incident Investigation & Root Cause Analysis, including for Near Misses (Close Calls)

Gains From Getting Near Misses Reported NEW - 2023 (PDF; 783 KB) Data indicates that there are probably about 100 Near Misses for every accident. Understandably, learning from near misses is much, much cheaper than learning from accidents. Yet many companies get less than one near miss reported for each accident. This paper describes in detail barriers to getting near misses reported and solutions for each of these barriers. It also shares how companies have increased the reporting ratio (number of near misses reported to accidents reported) to as high as 105:1 (whereas typical reporting ratios are 0-20:1).

Process Safety Competency (PDF; 1 MB) Successful Process Safety requires the utilization, involvement, and full support of nearly ALL staff at a site. Success also demands that a substantial portion of staff be competent and capable of contributing to process safety programs, including Incident Investigation and Root Cause Analysis. This paper describes the basics of building competencies in each aspect of process safety, including those tasks that require expert levels of competencies. It also describes different companies' safety competency progression plans and the typical requirements to reach each new level.

Proven Approach to Investigating Near Misses (PDF; 939 KB) Near miss reporting is one tool that process industries use to improve process safety performance. However, getting near misses reported is a major hurdle for most companies because workers and management fear the investigation system (and themselves) may potentially become overload. This paper explains approaches to manage efficiency and effectiveness in near miss investigations that have been successful in handling near miss reporting within both large and small companies. In addition, these approaches help to ensure high value from the investment of reporting and analysis. (This paper builds on the updated paper presented in 2012 at GCPS on "Gains from Getting Near Misses Reported.”)

Exxon’s Worldwide Incident Investigation Training (PDF; 172 KB) Exxon Company, International (ECI) identified the need to have a common methodology and structured tools for incident investigations (including root cause analysis) across all of it's affiliates. Exxon Production Research (EPR), on behalf of ECI, conducted a survey of various available incident investigation techniques and training programs. The techniques chosen were causal factors charting and the Root Cause MapTM (similar to the current version of the Root Cause ChartTM, which is from PII) from JBF Associates (JBFA). A two-day Exxon training program was developed that addressed the entire process of incident investigation. This paper discusses the background for developing the training, the content of the training and the results of the training.

Layer of Protection Analysis (LOPA)

POSTER - Human IPLs and Non-Human IPLs to Prevent Human Error (NEW - 2023; PDF; 231 KB)

POSTER - Using Time-At-Risk to Set Maximum Time that an IPL Can Be Bypassed (PDF; 2.1 MB)

Two Full Capacity Generators - Why is the Calculated Emergency Power System PFD so High? (CLICK TO EXPAND)

Two Full Capacity Generators - Why is the Calculated Emergency Power System PFD so High? (PDF; 0.5 MB) The probability of failure on demand (PFD) for an emergency generator system is important when electrical power is needed to protect humans from harm or to prevent equipment damage. Emergency lighting is needed for emergency responders during a power outage. Pumps, compressors, and blowers need to operate during the power outage to safely shut down the facility. To evaluate the PFD for an emergency power system, we must consider more than just two generators. We must also consider all the components for the generator fuel, the generator controls, the transfer switches, and the circuit breakers in the feeders to the emergency load. In addition, it is important to consider the capability of the weekly, monthly, semiannual, annual, and four-year inspections and proof test to detect all the failure modes that can prevent the generator system from operating correctly. There may be many common cause events that can prevent both generators from starting or running. For example, the fuel storage system, the generator control system including over-voltage and overload protection, the downstream electrical system including the transfer switches and circuit breakers may have single points of failure affecting the power supply from both generators. In addition, human action during maintenance and testing introduce points of failure, such as leaving the transfer switches in test mode instead of automatic. While the emergency power system may be designed and operated according to NFPA 110, it is critical to evaluate and eliminate single points of failure. The paper will suggest opportunities to provide redundancy, to manage human error, and to improve inspections and proof testing to detect more failure modes.

Two Full Capacity Generators - Why is the Calculated Emergency Power System PFD so High? (PPTX; 1.9 MB) Accompanying presentation slides for LOPA in Action: Making Sure Initiating Events (IEs) and Independent Protection Layers (IPLs) Are Included in Integrated Management Systems document.

Two Full Capacity Generators - Why is the Calculated Emergency Power System PFD so High? (MP4; 260 MB) Accompanying presentation video for Two Full Capacity Generators - Why is the Calculated Emergency Power System PFD so High? document.

LOPA in Action: Making Sure Initiating Events (IEs) and Independent Protection Layers (IPLs) Are Included in Integrated Management Systems (CLICK TO EXPAND)

LOPA in Action: Making Sure Initiating Events (IEs) and Independent Protection Layers (IPLs) Are Included in Integrated Management Systems (PDF; 0.7 MB) Every organization spends a lot of effort conductingProcess Hazard Analyses (PHA), Hazard and Operability Studies (HAZOPs), Layer of Protection Analysis (LOPA), and perhaps other risk assessments. However, the usefulness of these studies is significantly reduced if the likelihood of the causes (initiating events, IEs) and the probability of failure on demand (PFD) of the safeguards (particularly the independent protection layers, IPLs) are not maintained. Yet, many companies (1) do not have a defined program for ensuring that each IE and each IPL is included in the mechanical integrity (MI) and/or reliability program and (2) do not effectively audit to ensure the IEs and IPLs are included and that the needed Inspection, Testing and Preventative Maintenance (ITPM) is performed, documented and managed. This paper provides guidance and examples of how to do both.

LOPA in Action: Making Sure Initiating Events (IEs) and Independent Protection Layers (IPLs) Are Included in Integrated Management Systems (PPTX; 4.9 MB) Accompanying presentation slides for LOPA in Action: Making Sure Initiating Events (IEs) and Independent Protection Layers (IPLs) Are Included in Integrated Management Systems document.

LOPA in Action: Making Sure Initiating Events (IEs) and Independent Protection Layers (IPLs) Are Included in Integrated Management Systems (MP4; 28.7 MB) Accompanying presentation video for LOPA in Action: Making Sure Initiating Events (IEs) and Independent Protection Layers (IPLs) Are Included in Integrated Management Systems document.

Understanding IPL Boundaries (PDF; 1.2 MB) Layer of protection analysis (LOPA) is a simplified risk assessment tool that has been in use for almost three decades. The technique has improved the focus on independent protection layers (IPLs) that can prevent the progression of an initiating cause to an undesired consequence (a scenario). An IPL must be capable of preventing the scenario from reaching the consequence. To execute the simplified LOPA approach, the IPL must be independent of the initiating cause and other IPLs. The paper provides examples and illustrations for several types of IPLs: safety instrumented functions, dikes, relief device with fire-resistant insulation and cladding on the vessel, operator response to alarm, and deflagration arrester. The paper includes diagrams to illustrate the concepts.

LOPA: Performed When and By Whom (PDF; 467 KB) Layer of protection analysis (LOPA) was introduced in the mid-1990s by Art Dowell at Rohm and Haas Chemical Company (became Dow Chemical, now DowDuPont, Inc.) and by William Bridges at ARCO Chemical (now Lyondell-Basel) and JBF Associates. The first book was published in 2001 by CCPS. Since then, the method has swiftly grown in popularity for use in making risk judgments and in deciding on the SIL rating for an SIF. But, many users of LOPA do not know when to use LOPA and so they overuse this tool; and they do not know who should be doing LOPA, so they many times use a team, similar to or the same as a PHA/HAZOP team. This paper explains what the originators of LOPA intended and why, and also brings the industry up-to-date on the lessons learned from different approaches to using LOPA, related to when to do LOPA and who should do LOPA.

More Issues with LOPA – From the Originators (PDF; 826 KB) Layer of protection analysis (LOPA) has now been around for more 20 years (and in general use for 15 years), with the initial textbook being officially published in 2001. This paper shares observations and lessons learned from two originators of LOPA and provides further guidance on how to and how Not to use LOPA. The paper provides specific examples of best practices, some of which are not covered well enough in or are omitted from the textbooks on the topic.

Lesson from Applying LOPA throughout the Process LifeCycle (PDF; 840 KB) Layer of protection analysis (LOPA) has been implemented throughout major capital projects, on existing facility PHAs, and in PHA revalidations and management of change risk reviews. This paper discusses lessons learned for implementing LOPA in each phase of a process lifecycle and outlines some of the ways to optimize the use of LOPA. The paper describes how implementation of standards for IPLs and initiating event maintenance is necessary in each company. The paper also covers consolidation of SIL evaluation into the related PHA and LOPA at each life cycle phase. Special emphasis is given to optimizing the application of LOPA and SIL evaluation through the various phases of a major capital project.

Impact of Human Error on LOPA (PDF; 1.9 MB) Identifying and sustaining independent protection layers (IPLs) is the heart of LOPA. And all initiating events (IEs) and independent protection layers (IPLs) are inherently tied to Human Error. This paper explains the relationship between human factors and the resultant IE frequency and Probability of Failure on Demand (PFD), and provides an overview of how to validate these risk reduction values at a site. The paper also covers the more involved topic of dependent human errors in IPLs, such as high integrity SIS and other high reliability IPLs such as relief systems. Actual examples are provided to illustrate key learnings.

LOPA and Human Reliability – Human Errors and Human IPLs (Updated) (PDF; 943 KB) Estimating the likelihood of human error and measuring the human error rate at a site are troublesome tasks within the framework of a Layer of Protection Analysis (LOPA). For this reason, some companies do not give credit for a human Independent Protection Layer (IPL). This paper (based on a similar paper from 2010) discusses the data needed for adequately counting the human in a LOPA (and other risk assessments), and includes discussion of the theory of human factors. Actual plant data and tests are included in the paper to provide the reader with some examples of how a simple data collection and validation method can be set up within their companies. This paper also provides an overview of an alternative method for estimating the Probability of Failure on Demand (PFD) of a Human IPL, based on plant and scenario specific factors (such as stress factors, complexity, and communication factors).

LOPA_and_Human_Factors.pdf (PDF; 711 KB) Estimating the likelihood of human error and measuring the human error rate at a site are troublesome tasks within the framework of a Layer of Protection Analysis (LOPA). For this reason, some companies do not give credit for a human Independent Protection Layer (IPL). This paper discusses the data needed for adequately counting the human in a LOPA (and other risk assessments), and includes discussion of the theory of human factors. Actual plant data and tests are included in the paper to provide the reader with some examples of how a simple data collection and validation method can be set up within their companies.

Issues with LOPA – Perspectives from one of the Originators of LOPA (PDF; 246 KB) This paper focuses on problems observed with LOPA during the first 8-years of broad use. These problems include using LOPA without following the rules of LOPA; overuse of LOPA; overwork of LOPA when it is used; using LOPA in PHA team settings; and improper match of an IPL to a consequence (due to a weak definition of the consequence being avoided). This paper also summarizes the many benefits LOPA has produced for the industry.

LOPA Articles (PDF; 690KB) The first article "Layer of Protection Analysis: A New PHA Tool After HAZOP, Before Fault Tree Analysis" introduces LOPA as a new Process Hazard Analysis (PHA) tool. LOPA uses the data developed in the HAZard and OPerability analysis (HAZOP) along with suggested screening values to account for the risk reduction of each safeguard. The mitigated risk for an impact event can then be compared with the corporation's criteria for unacceptable risk to determine whether additional safeguards or independent protection layers need to be added. The paper provides examples to illustrate the LOPA process. The second article "Risk Acceptance Criteria and Risk Judgment Tools (now called Layer of Protection Analysis [LOPA]) Applied Worldwide within a Chemical Company" describes the process one chemical company used to provide a standard for evaluating risk of potential accident scenarios. This paper presents the evolution of the risk tolerance and risk judgment approach used by the company. Although other companies may follow a different path to achieve the same goals, there are valuable lessons to be learned from this company's particular experiences.

Safety Instrumented System (SIS), including Safety Instrument Function (SIF) and Safety Integrity Level (SIL)

Is Your SIF Trying to Do Too Much? (CLICK TO EXPAND)

Is Your SIF Trying to Do Too Much? (PDF; 332 KB) The number of inputs and outputs to a safety instrumented function (as well as how they vote) affects the probability of failure on demand (PFD) and the SIL (safety integrity level), assuming the test interval remains the same. The larger the number of inputs and outputs, the higher the PFD and potentially the lower the SIL.

With reasonable size SIFs, there is an opportunity to design the SIF with a reasonable number of sensors and final elements, and a reasonably long proof test interval.

Is Your SIF Trying to Do Too Much? (YouTube/MP4; 439.2 MB) Video presentation of the above paper.

What is the Real Risk Reduction for 3 Sensors Using the Mid-Value for Control and 2oo3 Voting for Safety? (CLICK TO EXPAND)

POSTER - What is the Real Risk Reduction for 3 Sensors Using the Mid-Value for Control and 2oo3 Voting for Safety? (PDF; 1.76 MB) What happens to the risk when two good ideas are combined? To reduce spurious trips of SIFs, many plants moved from 1oo1 or 1oo2 voting on the sensors to 2oo3 voting -- a good idea. To improve stability for critical process control loops, many plants went from one or two sensors to three sensors using the mid-value for control (also called median-select) -- also, a good idea. Without really analyzing it, some facilities combined the two ideas, using the mid-value of three sensors for a control loop and then, using the same three sensors voting 2oo3 for an SIF. The intent of the SIF was to protect against consequences that could be caused by a failure of the control loop. This arrangement violates the fundamental premise of LOPA (layer of protection analysis) and ANSI/ISA 84.00.01 (IEC 61511); an independent protection layer shall be independent of causes of the consequence that the layer protects against. The new configuration must be analyzed by Fault Tree Analysis (FTA), supplemented by Markov analysis. The FTA considers a failure of each of the three sensors and determines which of the remaining devices in the SIF can detect and prevent the consequence. The PFD (probability of failure on demand) is calculated and compared with the PFD of the total independent 2oo3 sensor SIF. The paper suggests guidance for appropriate use of the combined configuration and suggests how to approximate the risk reduction.

VIDEO - What is the Real Risk Reduction for 3 Sensors Using the Mid-Value for Control and 2oo3 Voting for Safety? (MP4; 65 MB) What happens to the risk when two good ideas are combined? To reduce spurious trips of SIFs, many plants moved from 1oo1 or 1oo2 voting on the sensors to 2oo3 voting -- a good idea. To improve stability for critical process control loops, many plants went from one or two sensors to three sensors using the mid-value for control (also called median-select) -- also, a good idea. Without really analyzing it, some facilities combined the two ideas, using the mid-value of three sensors for a control loop and then, using the same three sensors voting 2oo3 for an SIF. The intent of the SIF was to protect against consequences that could be caused by a failure of the control loop. This arrangement violates the fundamental premise of LOPA (layer of protection analysis) and ANSI/ISA 84.00.01 (IEC 61511); an independent protection layer shall be independent of causes of the consequence that the layer protects against. The new configuration must be analyzed by Fault Tree Analysis (FTA), supplemented by Markov analysis. The FTA considers a failure of each of the three sensors and determines which of the remaining devices in the SIF can detect and prevent the consequence. The PFD (probability of failure on demand) is calculated and compared with the PFD of the total independent 2oo3 sensor SIF. The paper suggests guidance for appropriate use of the combined configuration and suggests how to approximate the risk reduction.

POSTER - A Proven Streamlined Approach to SIL Assessment Requirements (PDF; 523 KB) Many companies put FAR too much redundant effort into determining what SIL (safety integrity level) is needed and then verifying the SIF (safety instrumented function) design will give the SIL targeted. This paper shows how to apply the qualitative definition of independent protection layers (IPLs) within the setting of a process hazard analysis (PHA) to get most of the gain from Layer of Protection Analysis (LOPA) without doing a LOPA (without using numerical values). We show how we use a PHA team to identify when a SIF is needed and to select the proper target SIL. This portion of the SIL evaluation and the identification and labeling of the IPLs during the PHA/HAZOP does not take any longer than a normal PHA/HAZOP, once the right habits are established. Note that this approach eliminates the need for a separate SIL Evaluation Study to identify the SIFs and select the target SIL. Then, this paper describes how to perform the SIL Verification and Safety Requirements Specification (SRS) remotely, again without the need for a redundant team meeting. This approach has been used at many sites and for thousands of SIFs.

SIL-3, SIL-2, and Unicorns (There Is a High Probability Your SIL 2 and SIL 3 SIFs Have No Better Performance Than SIL 1) (PDF; 3.1 MB) This paper shows that specific human error during testing, calibration maintenance, and restoration of a SIF is a significant contribution to the true PFD of the SIF for SIL 2 and dominates SIL 3 designs. Unless the human errors are accounted for and then compensated for, it is more likely to find a Unicorn than to actually get two or three orders of risk reduction from SIL 2 and SIL 3 SIFs.

A Streamlined Approach for Full Compliance with SIF Implementation Standards (PDF; 1.2 MB) Many companies put FAR too much redundant effort into determining what SIL (safety integrity level) is needed and then verifying the SIF (safety instrumented function) design will give the SIL targeted. This paper shows how to apply the qualitative definition of independent protection layers (IPLs) within the setting of a process hazard analysis (PHA) to get most of the gain from Layer of Protection Analysis (LOPA) without doing a LOPA (without using numerical values). We show how we use a PHA team to identify when a SIF is needed and to select the proper target SIL. This portion of the SIL evaluation and the identification and labeling of the IPLs during the PHA/HAZOP does not take any longer than a normal PHA/HAZOP, once the right habits are established. Note that this approach eliminates the need for a separate SIL Evaluation Study to identify the SIFs and select the target SIL. Then, this paper describes how to perform the SIL Verification and Safety Requirements Specification (SRS) remotely, again without the need for a redundant team meeting. This approach has been used at many sites and for thousands of SIFs.

Accounting for Human Error Probability in SIL Verification Calculations (PDF; 465 KB) This paper shows that human error during testing, maintenance, and restoration of a Safety Instrumented Function (SIF) can potentially dominate it's Probability of Failure on Demand (PFD) value, calling into question whether the required risk reduction is indeed being met. Example methods for estimating the contribution of human error probability for SIL Verification calculations are provided, as well as some proven approaches for controlling human factors that affect the base error rate (for a given mode of operation). It also discusses ways to prevent or else detect and recover from errors made in redundant channels (such as used in 1oo2, 1oo3, or 2oo3 voting).

LOPA and Human Factors 1 (PDF; 280 KB) Estimating the likelihood of human error and measuring the human error rate at a site are troublesome tasks within the framework of a Layer of Protection Analysis (LOPA). For this reason, some companies do not give credit for a human Independent Protection Layer (IPL). This paper discusses the data needed for adequately counting the human in a LOPA (and other risk assessments), and includes discussion of the theory of human factors. Actual plant data and tests are included in the paper to provide the reader with some examples of how a simple data collection and validation method can be set up within their companies.

Operating Procedures and Documentation

Best Practices for Writing Operating Procedures and Trouble-Shooting Guides (PDF; 1.3 MB) There is no complete, best practice guideline or textbook for writing operating procedures and trouble-shooting guides. This paper presents the proven, best approach for developing accurate operating procedures and for ensuring the page formatting and step writing are optimized to reduce human error rates. This approach and the 32 rules established in this paper are based on the foundations set by Swain and others (in 1970) for control of human error rates, but uses experiences from more than 100 sites where this approach has been successfully followed. The approach and rules for developing operational troubleshooting guides (procedures for responding to process deviations such as those needed for Human IPL) are again the best approaches found and have been applied successfully since the early 1990s. Several case studies are provided that show the gains from following this approach. The guidelines in this paper build upon ones presented in 1999 at CCPS and 2016 at GCPS.

Writing Effective Operating Procedures (PDF; 3 MB) Part I of this articles provides a summary of generally accepted procedure-writing guidelines, based on decades of experience in writing operating and maintenance procedures, and many years of human factors analysis. It also includes steps that a company/writer can take to safeguard against written procedures not being followed. Part II offers strategies for developing an operating manual that will comply with regulatory requirements (particularly OSHA's PSM requirements) for processes containing highly hazardous chemicals. This part also tells how to comply with other regulatory requirements, including developing procedures for all phases of operations, addressing safety and health considerations, and describing safety systems and their functions.

Human Factors

Use of Human Reliability Analysis to Supplement LOPA for Scenarios Dominated by Human Error (NEW - 2023; PDF; 2.1 MB) There are many scenarios and situations for which Layers of Protection Analysis (LOPA) may not be a suitable methodology or may be difficult to use. One such case is a scenario is dominated by human error and yet there does not appear to be a way to have one or more IPLs. This paper illustrates Human Reliability Analysis ([HRA], including Human Reliability Event Tree [HRET]) which can be an alternative and it illustrates how to augment LOPA and SIL Verification calculations for human error probability estimates. This paper will help the implementor understand how to get the most from LOPA in high human error scenarios and when it is appropriate to consider an alternative approach.

Technique to Perform Petrochemical Complex-Wide Inadvertent Chemicals Mixing and Reactivity Study (CLICK TO EXPAND)

Technique to Perform Petrochemical Complex-Wide Inadvertent Chemicals Mixing and Reactivity Study (PDF; 443 KB) Operating chemical plants require the delivery of chemicals from outside sources. According to the US National Association of Chemical Distributors, 40 million Tons of chemicals were delivered in 2016 to customers every 8.4 seconds. These chemicals may be in any phase or shape like solid, liquid or even gases. Chemicals transported into a petrochemical plant may be used as raw material, catalyst, water treatment or process treatment chemical etc. These chemicals may be hazardous by nature and may even be more hazardous upon unintentional mixing with each other or with process.

Among all the chemical transportation happening in a petrochemical plant, liquid chemicals for water or process treatment are of most interest due to the frequency of makeup, batch process, involvement of human action and hazardous nature of the chemical. Chemicals being transported via pipeline pose a lesser risk on inadvertent mixing and this is studied in detail in a normal HAZOP as misdirected flow etc. Solid chemicals pose a lesser risk due to less expected reactivity upon mixing and usually less frequent make up, loading and unloading.

In a typical Olefins complex, the count of chemicals with credibility of inadvertent mixing and hazardous reactivity may go as high as 30 chemicals. These chemicals include anti fouling chemicals, dispersants, acids, amines and proprietary chemicals. Credibility of inadvertent mixing of chemicals can help to shortlist the chemicals of greatest risk from inadvertent mixing.

Typically, the hazards of inadvertent mixing are studied within the boundaries of individual plants, while ignoring the credible scenarios of cross mixing from Plant A to Plant B within the same Petrochemical Complex.

This paper explains a proven technique to perform a complex-wide study of the chemicals mixing credibility and hazardous reactivity that reveals hidden risks, which may not otherwise be discovered through the typical process hazard analysis techniques like HAZOP of individual plants.

POSTER - Technique to Perform Petrochemical Complex-Wide Inadvertent Chemicals Mixing and Reactivity Study (PDF; 227 KB) Poster presentation of the above paper.

Business Case for PHA of Procedures (to Find the Accident Scenarios that are Otherwise Missed) (CLICK TO EXPAND)

Business Case for PHA of Procedures (to Find the Accident Scenarios that are Otherwise Missed) (PDF; 762 KB) Hazard evaluations, also called process hazard analysis (PHAs) have been performed formally in gradually improving fashion for more than five decades. Methods such as HAZOP and What-If analysis have been developed and honed during this time. Some weaknesses identified 30 years ago still exist in the majority of PHAs performed around the world. Critically, most PHAs do not thoroughly analyze the errors that can occur during startup, shutdown, and other non-routine (non-normal) modes of operations; sadly the commonly used approaches for PHA of continuous mode of operation only find about 5 - 10% of the accident scenarios that may occur during startup, shutdown, and online maintenance. This is true even though about 80% of major accidents occur during non-routine operations. Instead of focusing on the most hazardous modes of operation, most PHAs focus on normal operations (e.g., HAZOP of equipment nodes). In a majority (perhaps more than 80%) of both older operations and new plants/projects, the non-routine modes of operations are not analyzed at all. This means that perhaps 70% of the accident scenarios during non-routine operations are being missed by those PHAs. If the hazard evaluation does not find the scenarios that can likely occur during these non-routine operations, the organization will not know what safeguards are needed against these scenarios.

This presentation focuses on the business case for doing PHA of Procedures, based on hundreds of PHA. Data from PHAs/HAZOPs show that 50 to 85% of the risk reduction opportunities are found during PHA of procedures, resulting in $100,000,000 USD or more in risk reduction savings per week of PHA/HAZOP of procedures. The return on investment for PHA of procedures (the savings in risk avoidance by implementing doing PHA of procedure) is more than 1000 times the cost of the PHA of procedures.

POSTER - Business Case for PHA of Procedures PDF; 566 KB) Poster presentation of the above paper.

Human Factors Elements Missing from Most Process Safety Management Systems (CLICK TO EXPAND)

Human Factors Elements Missing from Process Safety Management (PSM) Systems (PDF; 1 MB) Process safety is about controlling risk of failures and errors; controlling risk is primarily about reducing the risk of human error. All elements of Risk-Based Process Safety (RBPS) and alternative standards for process safety (such as US OSHA’s standard for Process Safety Management [PSM] or ACC’s Process Safety Code™ [PSC]) have many elements, and each of these in turn helps to reduce the chance of human error or else helps to limit the impact of human error. But each process safety standard has some weakness in the control of human error. This paper presents an overview of human factor fundamentals, discusses why many PSM systems are weak on human factors and outlines a comprehensive process safety element on Human Factors. It describes what belongs in each category within the Human Factors element and explains the intent, content, and the benefit of each category. The paper also presents examples of Human Factors’ deficiencies and selected examples of industry practices for human factors control are provided. This paper builds on earlier papers, starting from 2010, on the same topic. Process safety is about controlling risk of failures and errors; controlling risk is primarily about reducing the risk of human error. All elements of Risk-Based Process Safety (RBPS) and alternative standards for process safety (such as US OSHA’s standard for Process Safety Management [PSM] or ACC’s Process Safety Code™ [PSC]) have many elements, and each of these in turn helps to reduce the chance of human error or else helps to limit the impact of human error. But each process safety standard has some weakness in the control of human error. This paper presents an overview of human factor fundamentals, discusses why many PSM systems are weak on human factors and outlines a comprehensive process safety element on Human Factors. It describes what belongs in each category within the Human Factors element and explains the intent, content, and the benefit of each category. The paper also presents examples of Human Factors’ deficiencies and selected examples of industry practices for human factors control are provided. This paper builds on earlier papers, starting from 2010, on the same topic.

Human Factors Implementation – for Plant Workers (CLICK TO EXPAND)

Human Factors Implementation – for Plant Workers (PDF; 1.2 MB) Process safety is about controlling risk of failures and errors; controlling risk is primarily about reducing the risk of human error. Often it is believed that human errors committed by plant workers are the cause of most process safety accidents. However, the entire organization contributes to these human errors. Therefore, actions are needed by the organization to reduce plant worker human errors by improving Human Factors that contribute to the plant worker human errors, and to build an organizational culture that seeks to learn from plant worker human errors, rather than to assign blame. This paper introduces the multiple types and categories of human error and the Human Factors that influence the rate at which human errors are made. It establishes the need for creating management systems for these Human Factors, and how to implement them in a manner to reduce plant worker human errors. Finally, it describes a Safety-Principled Organizational Culture that allows an organization to create the proper environment to enable these critical improvements to reduce plant worker human error. This paper builds on earlier papers, starting from 2010, on the same topic. The data presented is from basic research by the authors on the root causes of more than 3000 accidents and near misses; and also based on the review of hundreds of accidents analyzed by others and summary data from many companies. This Video Presentation and the related slides are even more keenly focused on the selected human factors for which the front line workers should take the lead so that the base human error rate at a site is as low as possible. Case studies and examples are used to illustrate key points.

VIDEO - Human Factors Implementation – for Plant Workers (YouTube/MP4; 1.28 GB) Video presentation of the above paper.

Controlling Human Performance Between Both Unplanned and Planned Tasks within Abnormal Operation Mode (CLICK TO EXPAND)

Controlling Human Performance Between Both Unplanned and Planned Tasks within Abnormal Operation Mode (PDF; 689 KB) There is confusion in terminology used in the chemical-related industry for the class of procedures commonly referred to as Abnormal Mode of Operation and Abnormal situation management. This paper provides a clear definition of each mode of operation and gives examples of how the human performance is controlled for each.

1. Normal – either a continuous mode or a normal batch mode of operation.

b. Response to failures using a Temporary procedure - such as how to run in bypass mode if the flow controller fails and you want to keep running in manual mode.

c. Response to Unanticipated Events - The FIRST occurrence of this is handled by what is generally referred to as Emergency-MOC, which is really saying we will make a change immediately, and do the risk review of the change later, and then learn from this one case and proceduralize into a Temp Procedure or TSG for the next time it comes up.

4. Emergency Operations- a diminished or reduced operating plan; normally a Temporary procedure (similar to 3.b. above)

5. Emergency Shutdown - a TSG that fails to resolve the issue in time will go to this; or for some events such as a sudden loss of containment, we go straight to these.

6. Emergency Response- normally in conjunction with 4 or 5, but focused on the protection of people, assets, environment, given the release or other imminent harm is in play.

VIDEO - Controlling Human Error during Unplanned and Planned Abnormal Situations (YouTube/MP4; 153.2 MB) Video presentation of the above paper.

Why Are the New People Making the Same Mistakes That the Old-Timers Made 25 Years Ago? Managing Organizational Change (PDF; 199 KB) History repeats itself. The same process safety incidents occur over and over. The same occupational safety incidents occur repeatedly. Why? “Organizations have no memory” – Trevor Kletz. As the operating, maintenance, and engineering personnel are promoted, retire, take vacation, are absent, OR job duties are reallocated, critical safety information and expertise is lost. The paper gives examples of memory loss and its consequences. The right critical knowledge and skills must be available on site, all the time, when they are needed. The paper describes a straight-forward Management System to maintain the needed competency as the organization changes. The system is self-documenting and can easily be kept up-to-date. Examples are given from years of experience.

Proven Approaches to Ensuring Operators Can Respond to Critical Process Deviations in Time (Human Response IPL (PDF; 1.7 MB) Humans can be the cause on an accident scenario (the Initiating Event [IE]) or humans can serve or participate as an independent protection layer (IPL). Validating Human IPLs has been a show stopper for many companies considering the use of human response as an IPL. Human IPLs include preventative steps that may stop a scenario from progressing once it is initiated, but more typically the human IPLs are responses to alerts or alarms or troubling readings and sample results. This paper first describes the fundamentals of clear alarms, practical actions, and having enough time to perform the action, all without being in harm’s way at the end of the action.

Human Factors and their Optimization (PDF; 797 KB) Weak control of Human Factors leads directly to error. Not only do humans cause accidents (unintentionally) by making errors directly related to the process itself, but they also cause errors by creating deficiencies in the design and implementation of management systems. Human error is also the cause of failure of each layer of protection. This paper discusses each of the 10 primary human factors and describes what we know about their relative importance in accident causation. It also details proven ways to optimize these human factors so that the base human error rate at a site is as low as possible.

Addressing Human Factors During PHAs (PDF; 125 KB) Recent accidents and new regulations underscore the need for companies to identify potential human errors and to reduce the frequency and consequences of such errors as part of an overall Process Safety Management (PSM) program. This paper describes an approach for integrating human factors considerations into Process Hazard Analyses (PHAs) of process designs, operating procedures, and management systems. Critical issues related to human factors can be identified and addressed in different phases of a hazard evaluation. Case studies illustrating the effectiveness of this strategy are provided.