Resilience
The Essentials Newsletter, Thirty-second Edition
The last couple of newsletters have focused on hurricane response, given the two major hurricanes that have made landfall in the U.S. over the last month and a half. It’s also been exactly 12 years since Superstorm Sandy hit New York and the mid-Atlantic. The good news is that we are just a month away from the end of hurricane season in the U.S., which officially goes from June 1-November 30. Let’s all hope that we have seen the last of the major storms this year.
As I think about this hurricane season and evaluate our response and recovery, I’m reminded yet again of the importance of resilience in our critical infrastructure (CI) sectors – especially in the “lifeline” sectors such as water, electric, telecommunications, and transportation. I’m having these deep thoughts sitting in my car typing in the wee hours of the morning on Halloween (my younger daughter is a competitive swimmer, so practice is insanely early). As an aside, it’s bittersweet that my two girls are now old enough to do their own things on Halloween. On the one hand, not as much rushing around on Halloween afternoon, but on the other hand, well, it’s the end of an era in our family – I’m sure many of you can relate.
But back to the topic at hand. Resilience has been much discussed in the electric sector, especially over the last 15 years or so, as we have seen the cybersecurity threat evolve from a combination of increased deployment of digital, communications-enabled devices on our grids and increased targeting of these devices/systems by our adversaries. This proliferation, along with ongoing physical security concerns and natural disaster response needs, have focused the electric sector and other CI sectors not only on how to prevent operational consequences from such threats, but also on how to improve recovery when they do result in operational consequences. Such ability to “bounce back” is what we mean by resilience. How quickly can we assess the situation, apply needed remedies, and restore service/operations?
This sounds simple but, of course, is nothing but. Even the nomenclature is confusing, and definitions abound – for example, is it “resilience” or “resiliency?” While both are widely used, the grammatically correct term is “resilience.” According to writingexplained.org:
Resilience is a perfectly legitimate noun, but you might be more likely to see its associated adjective resilient in most contexts…Resiliency is a variant of the same noun. Most spell checkers won’t flag it as an error, but usage authorities consider it a needless variant.
Despite the fuzzy grammar (I will try not to judge – taking a deep, cleansing breath), I think the reason resiliency often gets used by industry representatives is its interrelationship with the term “reliability,” which is a term of art describing how dependable (or “reliable”) a CI system is. Reliability at the electric distribution utility level is measured by indices such as SAIDI and SAIFI. According to Power Magazine (powermag.org):
SAIDI (System Average Interruption Duration Index) and SAIFI (System Average Interruption Frequency Index) are widely used reliability indices that measure the performance of power distribution systems. SAIDI represents the total duration of interruptions for an average customer over a given time period, typically a year. It is calculated by taking the sum of all customer interruption durations and dividing it by the total number of customers served. The unit of measure for SAIDI is minutes of interruption per customer. A lower SAIDI value indicates better reliability, as it means customers experienced shorter total outage durations on average. SAIFI, meanwhile, denotes the average number of sustained interruptions experienced by a customer over a given time period, also typically a year. It is calculated by taking the total number of customer interruptions and dividing by the total number of customers served. The unit of measure for SAIFI is power interruptions per customer. A lower SAIFI value indicates better reliability, as it means customers experienced fewer interruptions on average.
Other reliability metrics exist, such as CAIDI (customer average interruption index), CAIFI (customer average interruption frequency index) and ASAI (average service availability index), but the point is that measuring reliability is obviously important for ongoing system evaluation and improvement. Benchmarking such reliability against other utilities in similar circumstances can also help them to better understand how to improve reliability over time. However, reliability metrics do not consider the “why” of outages. For example, generally speaking, utilities in Florida are going to have more outages due to hurricanes while utilities in Wisconsin are going to have more outages due to snowstorms. Understanding the particular weather hazards, physical security concerns (how exposed are key parts of their system, etc.), and cybersecurity risks (digital assets and their risk to cyber-attack) allow utilities to best evaluate risks on their particular systems. Evaluating, fully understanding, and managing these risks over time enable utilities and other CI providers to have improved resilience.
For readers of this newsletter, my next comment will be no surprise – CI sectors must also understand the risks faced by other, closely aligned, CI sectors so that they can build in the potential that such sectors may also be impacted on a “bad” day. For example, many CI sectors provision their own telecommunications networks (known as “private” networks) to be able to communicate internally when the larger carrier networks are down. This has been the case for many decades, not just since the inception of digitization. Another example is that electric utilities work with companies that specialize in logistics and supply chain to ensure the correct equipment and provisions are available when responding to large natural disasters, such as hurricanes. Representatives of these companies are imbedded with their utility clients in the preparation for, and response to, such events.
In addition to deep situational awareness related to their own systems and operations as well as understanding the key overlaps and potential pain-points with other CI sectors, another major issue that improves resilience is situational awareness about the broader threat landscape. Whether via increasingly better assessments by weather satellites/analysts or via briefings by federal agencies on the cybersecurity threat landscape, these types of analyses and relationships provide the context to CI sectors and help them plan for various eventualities. I have seen an improvement in the relationships between CI sectors and the federal government over multiple administrations, regardless of party. The gaps that still need to be filled, in my opinion, relate to better cross-sector CI coordination and to better cross-agency coordination within the federal government. To put it another way – “vertical” communication has improved, but we need to add more “horizontal” communication and coordination. We also need to include our state and local government partners in more meaningful ways to engage in these conversations.
I am a “glass-half-full” type of person and am, therefore, heartened to have seen significant improvement in these areas over time. However, the recent situation in western North Carolina (NC) after Hurricane Helene is an abject example that additional improvements are needed. What struck me the most in that situation was the lack of access to essentials – when roads have been out in other areas, we have seen water (boats/amphibious vehicles) and air (helicopters) transportation methods deployed within hours after the event, but it appeared to have taken days for those types of resources to be deployed in western NC. So, that was lacking, at least initially. I’ve already spoken about the telecommunications problems. For water, I have now heard that the pipes themselves were damaged in some cases, so back-up power would not have helped in those situations. However, again, if more transportation options could have been better deployed, potable water, food, and better overall situational awareness could have given responders and loved ones outside the region more to go on. Electric utilities and others also deploy unmanned aerial vehicles (drones) to assess damage to their own systems. I would be curious as to whether there was further coordination to use those drones in other places, if needed. Most likely, the answer will be yes, but how timely were those conversations and could they have been better pre-planned?
All of these scenarios will be dissected in what the CI industries call a “hot wash.” Lessons-learned will be applied to future events to improve our resilience and, most importantly, help to minimize or even eliminate any loss of life.
Whether you say resilience or resiliency (tomato/tomahto), we must continue to improve our ability to bounce back – the threats will continue. On that sobering note, I’m going to apply a bit of sugar therapy with my favorite Halloween candy…