Refinery at Night

Invaluable free resource documents regarding best practices in process safety management (PSM), process safety engineering (PSE), process hazard analysis (PHA), hazard and operability study (HAZOP), layer of protection analysis (LOPA), incident investigation (II), root cause analysis (RCA), safety instrument system (SIS), safety integrity level (SIL), and safety instrument function (SIF). With compliments from the process safety management experts at Process Improvement Institute!

Hello to registrants for the 2021 AIChE Spring Meeting and and 17th Global Congress on Process Safety! To make it easier for you to find the presentation documents and videos that we’re showing at this year’s conference, all documents applicable to this week’s presentations are broken out separately below from the rest of our Free Resources documents. (And the software glitch that was affecting the rest of our legacy documents has been fixed, so all of those documents are now available again for your use as well!)

(Simply select the category line of interest to see the document titles relative to that discipline along with a short description, and then select a document title to view the complete document.)

Documents/Videos Presented at the 2021 AIChE Virtual Spring Meeting and 17th Global Congress on Process Safety

Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (NEW – 2021)

Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (NEW – 2021; PDF; 834 KB) Hazard evaluations, also called process hazard analysis (PHAs) have been performed formally in gradually improving fashion for more than five decades. Methods such as HAZOP and What-If analysis have been developed and honed during this time. Some weaknesses identified 30 years ago still exist in the majority of PHAs performed around the world. Critically, most PHAs do not thoroughly analyze the errors that can occur during startup, shutdown, and other non-routine (non-normal) modes of operations; sadly the commonly used approaches for PHA of continuous mode of operation only find about 5 – 10% of the accident scenarios that may occur during startup, shutdown, and online maintenance. This is true even though about 80% of major accidents occur during non-routine operations. Instead of focusing on the most hazardous modes of operation, most PHAs focus on normal operations (e.g., HAZOP of equipment nodes). In a majority (perhaps more than 80%) of both older operations and new plants/projects, the non-routine modes of operations are not analyzed at all. This means that perhaps 70% of the accident scenarios during non-routine operations are being missed by those PHAs. If the hazard evaluation does not find the scenarios that can likely occur during these non-routine operations, the organization will not know what safeguards are needed against these scenarios.

One reason that many companies do not perform PHA of Procedures is because they believe the time required for such analysis will be excessive. This paper shows clearly the best practices for (1) screening and ranking which procedures/tasks are most critical for analysis, (2) deciding on the method to optimize the investment of time, and (3) streamlining the documentation of the results of the PHA of procedures. Following the steps outlined in this paper reduces the time needed for PHA of procedures by about 40% with noticeably effecting the number of accident scenarios found during PHA of procedures. The result should be more sites completing PHA of procedures.

POSTER – Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (NEW – 2021; PDF; 870 KB) Poster presentation of the above paper.

VIDEO – Further Lessons Learned on How to Efficiently Perform the Necessary PHA of Startup, Shutdown, and Online Maintenance (NEW – 2021; PDF; 142 MB) Video presentation of the above paper.

More Issues With LOPA – From the Originators (NEW – 2021)

More Issues with LOPA – From the Originators (NEW – 2021); PDF; 855 KB) Layer of protection analysis (LOPA) has now been around for more 25 years (and in general use for 20 years), with the initial textbook being officially published in 2001. More recently, two companion books have been published on the topics of Enabling Events & Conditional Modifiers and on Initiating Events and Independent Protection Layers (IPLs). Many papers have been published in the past 20 years on LOPA.

This paper shares observations and lessons learned from two originators of LOPA and provides further guidance on how to and how Not to use LOPA. The paper provides specific examples of best practices, some of which are not covered well enough in or are omitted from the textbooks on the topic. This paper is an update of two earlier papers (2010, 2015) by the originators of LOPA.

More Issues with LOPA – From the Originators (NEW – 2021); Vimeo/MP4; 1.38 GB) Layer of protection analysis (LOPA) has now been around for more 25 years (and in general use for 20 years), with the initial textbook being officially published in 2001. More recently, two companion books have been published on the topics of Enabling Events & Conditional Modifiers and on Initiating Events and Independent Protection Layers (IPLs). Many papers have been published in the past 20 years on LOPA.

The paper referenced in this video shares observations and lessons learned from two originators of LOPA and provides further guidance on how to and how Not to use LOPA. The referenced paper provides specific examples of best practices, some of which are not covered well enough in or are omitted from the textbooks on the topic. The referenced paper is an update of two earlier papers (2010, 2015) by the originators of LOPA.

Is Your SIF Trying to Do Too Much? (NEW – 2021)

Is Your SIF Trying to Do Too Much? (NEW – 2021); PDF; 332 KB) The number of inputs and outputs to a safety instrumented function (as well as how they vote) affects the probability of failure on demand (PFD) and the SIL (safety integrity level), assuming the test interval remains the same. The larger the number of inputs and outputs, the higher the PFD and potentially the lower the SIL.

What if the SIF includes all the sensors and final elements that are involved in any trip of a large processing unit like a heater, a reformer, or a distillation column? What if the SIF includes the actions to shutdown upstream units that feed the unit being tripped? What if the SIF includes the actions to shutdown downstream units that feed the unit being tripped? When SIL verification is done for an SIF as described here, the result may be that the target PFD and SIL cannot be achieved. The temptation may be to add redundancy in sensors and final elements or to reduce the proof test interval in an attempt to reduce the calculated PFD. Frustration may abound as capital and operating costs rise steeply.

This paper shows how to use the principles of LOPA (layer of protection analysis) and the information in the PHA (process hazard analysis) to split up the massive SIF into smaller SIFs that more manageable. The smaller SIF need include only the sensors, logic solver(s), and final elements that detect and prevent a specific scenario (one cause leading to one consequence). The approach makes sure all the smaller SIFs can protect against all the scenarios that the massive SIF was intended to prevent. Trips of upstream and downstream units are considered as orderly shut-down actions. If needed, trips of upstream and downstream units are analyzed as small SIFs, as well.

With reasonable size SIFs, there is an opportunity to design the SIF with a reasonable number of sensors and final elements, and a reasonably long proof test interval.

Is Your SIF Trying to Do Too Much? (NEW – 2021); YouTube/MP4; 439.2 MB) Video presentation of the above paper.

What is the Real Risk Reduction for 3 Sensors Using the Mid-Value for Control and 2oo3 Voting for Safety? (NEW – 2021)

POSTER – What is the Real Risk Reduction for 3 Sensors Using the Mid-Value for Control and 2oo3 Voting for Safety? (NEW – 2021; PDF; 1.76 MB) What happens to the risk when two good ideas are combined? To reduce spurious trips of SIFs, many plants moved from 1oo1 or 1oo2 voting on the sensors to 2oo3 voting — a good idea. To improve stability for critical process control loops, many plants went from one or two sensors to three sensors using the mid-value for control (also called median-select) — also, a good idea. Without really analyzing it, some facilities combined the two ideas, using the mid-value of three sensors for a control loop and then, using the same three sensors voting 2oo3 for an SIF. The intent of the SIF was to protect against consequences that could be caused by a failure of the control loop. This arrangement violates the fundamental premise of LOPA (layer of protection analysis) and ANSI/ISA 84.00.01 (IEC 61511); an independent protection layer shall be independent of causes of the consequence that the layer protects against. The new configuration must be analyzed by Fault Tree Analysis (FTA), supplemented by Markov analysis. The FTA considers a failure of each of the three sensors and determines which of the remaining devices in the SIF can detect and prevent the consequence. The PFD (probability of failure on demand) is calculated and compared with the PFD of the total independent 2oo3 sensor SIF. The paper suggests guidance for appropriate use of the combined configuration and suggests how to approximate the risk reduction.

VIDEO – What is the Real Risk Reduction for 3 Sensors Using the Mid-Value for Control and 2oo3 Voting for Safety? (NEW – 2021; MP4; 65 MB) Video presentation of the above paper.

Human Factors Elements Missing from Most Process Safety Management Systems (NEW – 2021)

Human Factors Elements Missing from Process Safety Management (PSM) Systems (NEW – 2021; PDF; 1 MB) Process safety is about controlling risk of failures and errors; controlling risk is primarily about reducing the risk of human error. All elements of Risk-Based Process Safety (RBPS) and alternative standards for process safety (such as US OSHA’s standard for Process Safety Management [PSM] or ACC’s Process Safety Code™ [PSC]) have many elements, and each of these in turn helps to reduce the chance of human error or else helps to limit the impact of human error. But each process safety standard has some weakness in the control of human error. This paper presents an overview of human factor fundamentals, discusses why many PSM systems are weak on human factors and outlines a comprehensive process safety element on Human Factors. It describes what belongs in each category within the Human Factors element and explains the intent, content, and the benefit of each category. The paper also presents examples of Human Factors’ deficiencies and selected examples of industry practices for human factors control are provided. This paper builds on earlier papers, starting from 2010, on the same topic. Process safety is about controlling risk of failures and errors; controlling risk is primarily about reducing the risk of human error. All elements of Risk-Based Process Safety (RBPS) and alternative standards for process safety (such as US OSHA’s standard for Process Safety Management [PSM] or ACC’s Process Safety Code™ [PSC]) have many elements, and each of these in turn helps to reduce the chance of human error or else helps to limit the impact of human error. But each process safety standard has some weakness in the control of human error. This paper presents an overview of human factor fundamentals, discusses why many PSM systems are weak on human factors and outlines a comprehensive process safety element on Human Factors. It describes what belongs in each category within the Human Factors element and explains the intent, content, and the benefit of each category. The paper also presents examples of Human Factors’ deficiencies and selected examples of industry practices for human factors control are provided. This paper builds on earlier papers, starting from 2010, on the same topic.

Human Factors Implementation – for Plant Workers (NEW – 2021)

Human Factors Implementation – for Plant Workers (NEW – 2021; PDF; 1.2 MB) Process safety is about controlling risk of failures and errors; controlling risk is primarily about reducing the risk of human error. Often it is believed that human errors committed by plant workers are the cause of most process safety accidents. However, the entire organization contributes to these human errors. Therefore, actions are needed by the organization to reduce plant worker human errors by improving Human Factors that contribute to the plant worker human errors, and to build an organizational culture that seeks to learn from plant worker human errors, rather than to assign blame. This paper introduces the multiple types and categories of human error and the Human Factors that influence the rate at which human errors are made. It establishes the need for creating management systems for these Human Factors, and how to implement them in a manner to reduce plant worker human errors. Finally, it describes a Safety-Principled Organizational Culture that allows an organization to create the proper environment to enable these critical improvements to reduce plant worker human error. This paper builds on earlier papers, starting from 2010, on the same topic. The data presented is from basic research by the authors on the root causes of more than 3000 accidents and near misses; and also based on the review of hundreds of accidents analyzed by others and summary data from many companies. This Video Presentation and the related slides are even more keenly focused on the selected human factors for which the front line workers should take the lead so that the base human error rate at a site is as low as possible. Case studies and examples are used to illustrate key points.

VIDEO – Human Factors Implementation – for Plant Workers (NEW – 2021; Vimeo/MP4; 1.28 GB) Video presentation of the above paper.

Controlling Human Performance Between Both Unplanned and Planned Tasks within Abnormal Operation Mode (NEW – 2021)

Controlling Human Performance Between Both Unplanned and Planned Tasks within Abnormal Operation Mode (NEW – 2021; PDF; 689 KB) There is confusion in terminology used in the chemical-related industry for the class of procedures commonly referred to as Abnormal Mode of Operation and Abnormal situation management. This paper provides a clear definition of each mode of operation and gives examples of how the human performance is controlled for each.

1. Normal – either a continuous mode or a normal batch mode of operation.

2. Planned Non-routine (Non-Normal)– Startup, shutdown, are online maintenance are the main non-routine or non-normal modes of planned operation. But planned Temporary procedures, with time limits, are also part of these.

3. Abnormal– These include those activities covered by generalize procedures or guides (planned in a general sense) the activities that don’t have written, step-by-step procedures (but noting that many companies have guides for most of these abnormal situations). A further breakdown is possible:

a. Response to upsets using a Trouble-shooting Guides (TSG) for handling deviations from the operating window (these or normally triggered on an alarm); these may ultimately lead to a shutdown, safe park, or emergency shutdown (all of which are also proceduralized), if the deviation cannot be corrected in time.

b. Response to failures using a Temporary procedure – such as how to run in bypass mode if the flow controller fails and you want to keep running in manual mode.

c. Response to Unanticipated Events – The FIRST occurrence of this is handled by what is generally referred to as Emergency-MOC, which is really saying we will make a change immediately, and do the risk review of the change later, and then learn from this one case and proceduralize into a Temp Procedure or TSG for the next time it comes up.

4. Emergency Operations- a diminished or reduced operating plan; normally a Temporary procedure (similar to 3.b. above)

5. Emergency Shutdown – a TSG that fails to resolve the issue in time will go to this; or for some events such as a sudden loss of containment, we go straight to these.

6. Emergency Response- normally in conjunction with 4 or 5, but focused on the protection of people, assets, environment, given the release or other imminent harm is in play.

This paper will provide a framework to ensure no mode of operation is overlooked and it will help sites understand what is needed to control risk during each mode of operation. Most of the time in the paper and presentation will be focused on classification 3, Abnormal Mode of operation.

VIDEO – Controlling Human Error during Unplanned and Planned Abnormal Situations (NEW – 2021; Video/MP4; 153.2 MB) Video presentation of the above paper.

Legacy Free Resources Documents