What Happens When Your Resiliency Plan Fails?
We expect power to be available at the flip of a switch. Unfortunately, bad weather, animals and equipment failure across the electric grid can threaten energy access. It’s a very real risk to our energy systems and businesses.
If you are a provider of critical services, you can’t just cease business or production in the case of an outage (whether that is caused internally or externally). You may already have a fleet of generators installed for emergency power, but even backup generators fail if they are not monitored or properly maintained.
Are there flaws to your resiliency plan? Answer the following questions to help build a stronger plan for your business.
2. What is the financial impact of infrastructure failure or extended loss of power?
3. Do you have a strategic risk mitigation plan?
4. Has key infrastructure maintenance been delayed due to enterprise competition for capital?
5. Do you have aging infrastructure and have you analyzed its remaining useful life?
- Uninterruptable power supplies (UPS)
- Backup generators
- Solar and/or wind with battery storage
- Distributed energy systems
While these resilience plans are needed, manufacturers are also facing losses of technical staff needed to operate and maintain these systems. Energy resilience planning that misses the mark on technical and operations staff availability may result in a less than hoped for outcome.
Outages are, by definition, unplanned downtime
Weather happens and trees fall at the most inconvenient times. The chart below shows that major power outages are caused most often by weather and tree-related events.
Many businesses invest in resilience systems and have business continuity plans in place to help ensure ongoing and uninterrupted operations. These businesses recognize the cost of downtime and have made the necessary investments to curtail losses. When these systems work as designed, meaning everything starts up and runs exactly when needed, your continuity plans are validated.
Worst-Case Scenario – Your Resilience System Fails to Start or Stops Running
- Why would this scenario happen?
- Does it ever happen?
- How can you guard against this scenario?
All good questions. It usually begins with continuity planning defects.
What’s Missing in Many Continuity Plans?
*Source: Business Continuity Institute
Lack of Testing and Maintenance
We reviewed information from the Office of the Assistant Secretary of Defense as part of a document entitled “Energy Resilience: Operations, Maintenance & Testing (OM&T) Strategy and Implementation Guidance” dated March 2017. You can review here. In the document, the Department of Defense (DOD) explains energy resilience and related OM&T.
The document states that energy resilience can be achieved in a number of different ways: to include redundant power supplies (generators); integrated or distributed fossil, alternative or renewable energy technologies; microgrid applications and storage; diversified or alternate fuel supplies; upgrading, replacing, operating, maintaining, or testing current energy generation systems, infrastructure and equipment; as well as mission alternatives such as reconstitution or mission-to-mission redundancy.
As in any working Continuity Plan, success is contingent on establishing strategies, processes and procedures addressing clarification of objectives, implementation of the resilience plan, development of operations, and metrics to measure performance.
The goal of the plan needs to include a cost-effective approach to improve resilience system performance and reliability. Testing, during implementation and ongoing over the life of the resilience system, is conducted to ensure that the system will operate as needed to support the outages as they occur. Integrated system tests, often overlooked, will help uncover many issues that might arise during outages.
As part of the Continuity Plan, maintenance intervals for resilience system and associated fuel depends on the mission criticality of the operating environment.
Successful resilience, detailed in this Continuity Plan, includes routine maintenance and full-scale maintenance. Resiliency systems that are not tested and maintained may or may not start and run properly during outages. Will your investment in resilience pay off during an outage?
Research On Failures During Outages
Unfortunately, there’s a lot less data out there about failures to start and run than outages costs in general. We’ve heard stories of older systems failing to start and run to load up to 38% of the time. Industry norms seem to state that systems fail to start and run to load around 12%-15% of the time, on average, when outages occur. We believe that by implementing an ongoing Continuity Plan, these systems should only fail to start and run to load 1%-5% of the time when an outage occurs. It takes dedication to testing and maintenance for these metrics to be achieved.
Generator Failure Caused Outages
Here are some stories about worst-case scenarios.
- Several generators at a data center failed to start during a grid outage. Many of the web’s most popular destinations were offline for several hours. The data center uses a flywheel UPS system – rather than batteries – to provide “ride-through” electricity to keep servers online until the diesel generators can automatically start up and begin powering the facilities. The generators had to be started manually, which was delayed until staff could respond. Several major customers were affected. (Reference)
- During a severe storm, power from the utility was knocked out. One of the data centers hosted clients indicated that a UPS system initially functioned but failed with the switch from battery to generators. The outage, even though the company had invested in a resilience solution, required two to three hours to recover. (Reference)
- A very large data center suffered an outage that took more than two hours to restore. In this case, a module on one of the UPS systems failed. The data center had to replace power supplies in servers, replace firewalls, reconfigure switches and manually log on to servers to get them to boot properly. (Reference)
- Backup generator failure leaves Pomona students in the dark. Power was lost in dorms, dining halls, academic and administrative buildings for hours. The failure had a cascading effect that required the shutdown of the feeder that provided power to part of the campus. (Reference)
Ways to Reduce Your Exposure to Energy Resilience System Failure
As a practical matter, not every system failure event can be prevented, but focusing on testing and maintenance, as well as measuring performance, is critical. Investing in and updating your Continuity Plan can help ensure your systems are ready to start when they are needed thanks to ongoing testing and maintenance.
Maintaining system availability metrics will reduce your exposure to system failure.
Here are some checklist items to consider:
- Controls “not in auto”
- Ran out of fuel
- Battery failure
- High fuel alarm
- Oil, fuel or coolant leaks
- Breaker trip
- Low coolant levels and temp alarms
- Air in the fuel system
Consider Outsourcing Energy Resilience System Testing and Maintenance
- Help provide experience and assist you with risk management: We can provide technical and equipment knowledge to help your business perform better. By outsourcing certain components of your business to a team with the proper skillsets and knowledge of the responsibilities, you’ll help mitigate potential risks and problems.
- Using proven technology and dedicated resources: The continuous operation of your business requires clean, reliable and constant power. Power interruptions can result in lost sales, products and revenue. We help dedicate resources to maintain your equipment and provide warranties for the term of the agreement.