We do NOT use a risk matix of any kind in the PHAs (HAZOPs) is we get our way. On 20% of the PHA/HAZOPs we do, we have to use a matrix and we manage such analysis as well as anyone can (we have led 10,000 PHAs/HAZOPs of entire units). But, yes in 1/2 of the times, the team will make the matrix location match their qualitative judgement. By the way, their qualitative judgment is usually more accurate (we feel) that using numbers or matrices. Note that ALL numbers that folks use in Matrices are coming from some other source and if you track those sources down, Most are either entirely expert opinion on the "average" value to use and many others are "adjusted by expert opinion". I know this because I'm on such committees that come with the numbers/factors that others use.
If you get your team well calibrated (usually happens on the first day of meetings for a group of folks) and the team leader is good (not sure what % of time this is true; we have seen many Poor leaders who were noentheless leading PHA/HAZOP teams), then the teams qualitative judgment is better than having a team qualititative judge which numbers provided by others to use for their site. Also note that site-specific data is what we prefer and normally the only decent source of such data is imbedded in the expert opinion.
Finally note that the main reason I was on the first LOPA method development was to allow us a good way and rationale for deleting the use of risk matrices from qualitative analysis. The reason Martin Gollin was involved was to help Arco Chemical have a good way to defend against arbitrary overspecification of SIF (and avoid spec of high SIL). The reason Art Dowell dod a parrellel development at Rohm & Haas (now Dow) was to have an approved methodology "after" (not within) HAZOP but before QRA (to which he could resolve 90+% of recommendations that called for further analysis).
The key to note is that everyone wants to do an accurate "assessment of risk"... but there is really no such thing since the data has error factors of an order of magintude on either side of the average. But as tehcnical types, we have trouble admitting we can't really do something accurately. So, then another goal is to be consistent. Can LOPA (or any consistent factoring) help with this? Sure. But, you can also get your HAZOP teams just as consistent within a company WITHOUT LOPA.
Note that I am NOT pushing use of LOPA, yet I am one of the co-originators, have perhaps trained more LOPA folks than others have, and I help write the books. There is a reason I do not overly push it. But, I do think LOPA is better to use than QRA, in 95% of the cases that folks want to use QRA.... that is because the error factors do Not support the use of QRA, in my opinion... but now i just said "your baby is ugly" to a lot of risk analysts (mainly consultants) around the world. One way to see my last point is to go back to basic rules for establishing significant digits.... once you look at the error factor for the data we use, you will see that the significant digit is the exponent; only. So there is no such thing as 2.3 x 10-3 in risk assessment. There is only 10-2 +/- 1 .
Ask yourself Why no-one shows the error range in the data? The uncertainty? I wish they would because otherwise it is misleading. I wish we had in the first LOPA book (instead of only stating this issue in general).
a5903a71-2271-4a91-b204-47b8eb0e59b2|0|.0
The question often raised is “Should LOPA be combined with HAZOP meeting.” To set the background, I'm one of the three originators of LOPA in the mid-1990s and co-authored the first textbook on LOPA (CCPS/AIChE, 2001) and am primary author of the second textbook issuing in a few months: Guidelines for Independent Protection Layers and Initiating Events (CCPS/AIChE, 2012). We have led thousands of PHAs (HAZOPs), either with LOPA integrated or without. On the ones with LOPA included, we have done with ALL scenarios requiring LOPA (100% LOPA) to 10% of scenarios requiring LOPA during the meetings. Based on these trials, it is MUCH better to do LOPA after the HAZOP is completed and to use LOPA only for those scenarios that the HAZOP team found too confusing to make a qualitative risk decision in the HAZOP setting.
There are many reasons, based on data, for this decision, which include:
- Brainstorming in the HAZOP is maximized because you analytical discussions in the HAZOP meetings. The HAZOP (or qualitative analysis approach using brainstorming methods) is your only chance to Find scenarios (LOPA does not help with that chore) and so you should do nothing to limit brainstorming.
- LOPA can easily be done by one analyst (no need for a team and a team is not even recommended; read the LOPA book and the many papers since then for a description of same). So waste everyone else's time on LOPA when one can do it very well (with a little help from one or more folks on some LOPA).
- Only about 3-5% of scenarios are complex enough to need LOPA; why do LOPA or more than is needed.
- Doing LOPA in the meetings will frustrate many team members as they will see it as a waste of time; this will further hurt brainstorming.
We realize many are pushing LOPA + HAZOP combos as something "new and improved"... it is not... in fact LOPA was invented to remove semi-quantitative analysis from HAZOP meetings. So, combining LOPA + HAZOP is taking a step 20 years back in time and ignores one of the main reasons for the develop of the rules for LOPA.
To find out more, visit www.piii.com and download the papers there on LOPA. Or schedule to attend our course on LOPA:
http://www.process-improvement-institute.com/LOPA.html
fdd32019-7e3f-4e8f-9a74-ee01d03c82cf|0|.0
The real issue is will your SIL 2 SIF lower the risk of the final consequence by a factor of 100 and similarly will your SIL 3 SIF lower the risk by a factor of 1000. If not, and if a SIL 2 or 3 SIF was required for your scenario to reach tolerable risk, then you have not accomplished your duty to lower the risk to tolerable levels. One way to see the problem is to consider the system boundary illustration below:

Currently, more than 95% of the SIL Verifications we have reviewed, and nearly ALL of the internal company standards for the calculations of SIL Verification miss the systemic error nd the specific human errorr and especially miss the huge contribution of human errors during maintenance and process startups after outages. Because of these omissions, the End Users (owners) have bought and installed supposedly high integrity systems that will in practice perform no better than a BPCS loop or SIL 1 SIF. Some companies now realize this, but the standards and technical reports (guidance) from the international committees for SIS have not yet been amended to account for such human errors. Of the systemic and specific human errors, the major ones that degrade the SIL is the time zero probability of leaving an entire SIF in bypass (intentionally or unintentionally) and the probbability of leaving a root valve on a sensor/transmitter in bypass. Given normal baseline human error rates, such probabilities are greater than 0.01 and so the PFD of the entire SIF is greater than 0.01 and so a SIL 2 (let alone a SIL 3) cannot be achieved in actual use of the SIF.
On the other hand, if the SIL Verification protocol required specific (descrete) consideration of systemic error, and especially human error probability for interventions, then it is likely that some of the errors can be made detectable and therefore minimized. But note in many applications, it has Not Been Possible to a achieve a SIL 3 (with a PFD < or = 0.001) when there is a system bypass (soft or hard bypass) available to the end users.
You can download a free paper on this issue, with a couple of worked examples, from:
http://www.process-improvement-institute.com/_downloads/Accounting_for_Human_Systematic_Error_During_SIL_Verification_website.pdf
In addition, the very new book from CCPS/AIChE, Guidelines for Initiating Events and Independent Protection Layers, 2012 (at the publishers now) notes the same issue with high integrity safety systems (such as SIL 2 and higher and such a relief systems) and demotes the PFD available from such systems, unless the systemic error has been accounted for and addressed.
To learn more, see the courses and consulting services from PII. www.piii.com
ec94c3aa-9321-4bbc-a721-d9ec68481624|0|.0
An update of the definitive paper on Near Miss Reporting can be downloaded for FREE:
http://www.process-improvement-institute.com/_downloads/Gains_from_Getting_Near_Misses_Reported_website.pdf
This was just updated last month and contains the latest data. It is an update of the papers produced in 1997 and 2008. There are many other great papers there as well. Visit: www.piii.com for a complete listing of our free papers, and to keep track of upcoming public courses visit:
http://www.process-improvement-institute.com/calendar.html

5a359aa7-f6a1-4929-98ce-1505a30a3a11|0|.0
Toyota estimates that their workers make 20,000 errors on average for each major loss event. Many chemical and manufacturing companies estimate the same order of magnitude - about 10,000 human errors and failures, per loss event (minor thru major); the data we have seen also indicates there are perhaps 1,000,0000 errors (in operation, maintenance, procurement, etc.) per Major (catastrophic) event. Airlines estimate about the same ratios. They record about 1,000,000 errors between crashes.... roughly; they also have the lowest error rate in operators (pilots) of any industry; based on cockpit data, the flight crew makes 1 mistake per 200 steps. This is half the rate of the lowest rate we have recorded in the process industry and is 3 times lower than the average error rate in the process industry. They achieve this likely because they control human factors (performance shaping factors) better than any industry. None of these organizations believe in Zero ERROR as a goal, of course. They do believe in achieving low error rates and in having multiple backups (independent protection layers) when a failure or error occurs.
c0e59ed6-bfd0-4d5c-903e-05f0281a7815|0|.0
A Human independent protection layer (IPL) can be a valid and valuable IPL. A human can be combined with an SIS-based alarm, but the human would then be part of the Logic Solver (decides what to do) and is part of the final control element. The criteria for counting human IPLs will be more completely defined in the upcoming textbook, Guidelines for Independent Protection Layers and Initiating Events, CCPS/AIChE, 2012 (pending early 2012). The human IPL needs to be specified and validated similar to other IPLs; part of the validation is testing of the hardware functioning up to the annunciation and part is testing (drills) of the human, who must complete the action (1) within the maximum allowable response time (MART) and (2) be out of harm’s way by then as well, since a human IPL is not valid if it places the responder (trouble-shooter) in harm’s way.
cb50cb46-cb42-43ca-b8de-516155c34628|0|.0
IEC 61511 mentions systemic errors should be controlled and also estimated in the SIL Verification. But, ISA folks say they found it too complicated to include a calculation method for systemic error in TR84.00.02 guidance; which is why we almost never see it considered in actual practice. There is systemic error for process-related failures (such as pluggage by dust, scale, etc., of level taps, instrument ports, etc.). There are also systemic errors for the human intervention to test the SIF or for returning to automatic service after a bypass. This is especially true for continuous processes where bypasses are needed and where root valves for instruments are needed. We are about to publish a paper on a simple method for estimating system human errors, it is just a simplified estimation of system errors for each software/security bypass and each bypass valve or root valve. The human error probability is of course an "OR gate" for each opportunity for leaving a bypass or valve in the wrong position. The error rate is in turn estimated from standard HRA approaches, but simplified a bit with respec to estimating "error recovery rate." These are adjusted by the rigor of human factor control at the site and is SITE specific.
There are 10 human factor categories, which need to be estimated for the site to get an adjust of the baseline human erorr rate to make it fit a specific site. The Best (lowest) error rate we have seen in practice at a chemical plant or refinery or gas plant is about 1 error per 100 steps of on an instruction. But, this is heavily influenced by the human factors of fitness for duty (such as fatigue), miscommunication, quality/accuracy of procedures, use of checklist at each step in the field, etc. We have not found (yet) any facility that had actual data on human error rates, though there are folks moving that way. So, if the actual human error probability is 0.02 for a step and there is a step to open a bypass valve after a test of a final isolation valve (or a a step to re-open a root valve of a level sensor), then that factor would be added to the PFD calc for that portion of the SIF and so the SIL verification would >0.02 for that portion.
Then the next issue is the common cause error for the human error. If the human leaves 1 bypass open (or one root cause closed), it is likely they will make similar mistakes on the same day by the same person. Compound this with the fact that such maintenance/inspection are not staggered (most sites use the same instrument tech on the same day to test/check many SIF) and you can start to see the problem. By the end, it is likely you are adding a 0.01 or 0.02 to the entire hardware SIL calc; obvious in such cases, a SIL 2 or SIL 3 is not possible. But in batch processes, the issue is not the same; and in a conitnuous process, there may be ways for instruments and transmitters and limit switches to detect some of these errors. These have to be accounted for as well. But you cannot ignore the composite human errors for testing/validating SIF; it is the dominating portion of SIL 2 and SIL 3 in most cases we have checked or verified.
The new book from CCPS/AIChE, Guidelines for Independent Protection Layers and Initiating Events, will say the same thing as above; i.e., SIL is not equalvalent to risk reduction factors (inverse of PFD) unless the human error probability is considered in the SIL verification calc. It is the same with other IPLs as well. If you do not account for the human errors in interventions with relief valves, then PFD is not equal to the pop test data. Bottom-line: You can’t hope to be accurate here (since there is not enough plant-specific data); but you also do not want to omit the greatest of the factors.
Come to one of our training courses or download our papers to learn more. www.piii.com
b31d8039-d64f-438b-b36d-81bd9eb99aa8|0|.0
A team is neither necessary nor required (this is true for 95% of the LOPA we have done since 1996). If the PHA/HAZOP was documented well, a single analyst can complete the LOPA in about 20-60 minutes; with a phone call or two to a team member for some of the LOPA. The originators (or 2/3 of us) do not recommend a team. The single analyst is described in the LOPA book (CCPS, 2001), with a team being mentioned as required for some confusing scenarios. It is best of the LOPA analyst was the PHA/HAZOP team leader. The analyst should have excellent background in human factors (the real key to risk analysis since system human errors dominate the risk factors) and he/she should be trained in actual LOPA (per the textbook rules; and later per the rules of the IPL/IE book coming our this spring), not some third-party made up version of LOPA. So, 20+ years experience in plant operations and engineering, with an engineering degree (Chem Eng preferred), and with proven analytical skills... essentially the same skills/experience we want in PHA/HAZOP leaders.
cf4867fd-af8c-4e4e-9d3c-91f6b5302498|0|.0
Before we start, note that the priority for Process Safety issues is much more skewed toward the engineered features in process safety. For Occupational Safety, the prority should be:
- Optimize human factors to as good as possible to consistently achieve the lowest average base human error rate of 1/100 (ish)... this residual error rate is nearly impossible to break below for an annual average (though data from black boxes on planes indicate 1/200 is achieving in very stringent systems). Of course, for some "reflex" actions for work that is repeated many times a day, the error rate can be driven lower (to about 1/1000) for those tasks. Note that improving management systems will not reduce the residual error rate; but multiple layers of protection or redesign can reduce the probability of the accident/loss.
- Use peer-to-peer observation approach to help reduce residual errors that may be habits (but many of the residual errors will be random in nature as well, and not be habits).
- Establish a good near miss reporting system and follow-thru system to address errors and failures and faults (including broken payment) before a loss occurs.
- Establish and maintain management systems to maintain the error rates and to proactively look for system weaknesses (for instance, a good reliability systems will inspect concrete and structural systems proactively, and will inspect and test all protection systems proactively).
- Make sure all investigations get to the underlying reason why the error or fault occurs and recognize when practical limits of error reduction are reached (so that it is apparent when other levels of protection or re-design becomes necessary).
- Make sure the residual risk is low enough (using a risk tolerance criteria agreed to by stake-holders); if not low enough, add independent protection layers or re-design to make inherently safer (less failure prone or less error prone).
- Manage the risk of changes.
This has been the basics of risk management for decades... it works. But one thing to note is that there are limits for control of human error; we have a lower error rate that seems pretty hard to break below; improving management systems run out of steam there. Other measures (re-design or perhaps adding a new independent protection layers) is needed to lower the risk further.
0a504a0b-8f7b-406f-a4d7-c48354dbfb2e|0|.0
It is important to know what certification means. Mostly, certification is just a way for the certifier to make money off of the unsuspecting purchasing. Certification in this or that is big business. But in most cases, certification of proficiency is not easily possible. This is why the folks in the Chemical Process Safety field have not yet authorized or approved any “international certification” for PHA/HAZOP leadership. With that said many companies request that their staff be “independently” certified. So, what do we do? We have the best reputation in the industry (since we have done more training (7,000+ graduates in PHA/HAZOP), and done more PHAs/HAZOPs (+10,000 unit-wide analyses). So, at PII we offer the following:
- Graduates of the 4 or 5-day course are awarded a “Certificate of Completion” of the training. After 4 or 5 days, the instructor will have a decent feeling (judgment) on if someone can progress to “proficient” within a reasonable period of time. But, after leaving class, a lot of practice is needed. The same is true of college degrees. When finished, the university certifies you have completed the curriculum to say be a “Chemical Engineer”. But what does that mean? It means he or she is capable of learning how to do useful things, such as design a part of a process or learn how to manage a project. But upon graduate, the individual is not a “proficient” engineer. Proficiency can only be determined by watching someone while they do the assignments and then coaching them through the weaknesses observed; only an expert can serve in the coaching/watching role. Class-time has a lot of practice time (70%), but that is Not enough time to make a judgment on proficiency for something as large as a PHA/HAZOP.
- Of the 7000+ folks we have trained, we have certified about 150 to be proficient. How did we do that? We watched them as the prepared for an actual PHA/HAZOP, then watched and coached them during leading, then watched and coached them during documentation. This normally takes 1 or 2 additional weeks of observations. If you do not have someone within your organization who is qualified to do this coaching, then it is best to bring someone in to coach you through a major PHA/HAZOP; normally takes 1-2 weeks of consulting effort.
ca06ba0e-6a0a-440a-b75f-1bdaddfae175|0|.0