Before we start, note that the priority for Process Safety issues is much more skewed toward the engineered features in process safety. For Occupational Safety, the priority should be:
- Optimize human factors to as good as possible to consistently achieve the lowest average base human error rate of 1/100 (ish)… this residual error rate is nearly impossible to break below for an annual average (though data from black boxes on planes indicate 1/200 is achieving in very stringent systems). Of course, for some “reflex” actions for work that is repeated many times a day, the error rate can be driven lower (to about 1/1000) for those tasks. Note that improving management systems will not reduce the residual error rate; but multiple layers of protection or redesign can reduce the probability of the accident/loss.
- Use peer-to-peer observation approach to help reduce residual errors that may be habits (but many of the residual errors will be random in nature as well, and not be habits).
- Establish a good near miss reporting system and follow-thru system to address errors and failures and faults (including broken payment) before a loss occurs.
- Establish and maintain management systems to maintain the error rates and to proactively look for system weaknesses (for instance, a good reliability systems will inspect concrete and structural systems proactively, and will inspect and test all protection systems proactively).
- Make sure all investigations get to the underlying reason why the error or fault occurs and recognize when practical limits of error reduction are reached (so that it is apparent when other levels of protection or re-design becomes necessary).
- Make sure the residual risk is low enough (using a risk tolerance criteria agreed to by stake-holders); if not low enough, add independent protection layers or re-design to make inherently safer (less failure prone or less error prone).
- Manage the risk of changes.
This has been the basics of risk management for decades… it works. But one thing to note is that there are limits for control of human error; we have a lower error rate that seems pretty hard to break below; improving management systems run out of steam there. Other measures (re-design or perhaps adding a new independent protection layers) is needed to lower the risk further.