One of the most common public key infrastructure (PKI) challenges organizations face is dealing with unexpected outages when certificates expire unexpectedly. Unfortunately, rather than mitigating this challenge, many experience more frequent and severe outages.
Why are outages increasing? And what does that mean for the future of enterprise PKI?
Ponemon Institute’s second-annual State of Machine Identity Management Report, which includes survey responses from over 1,200 global IT and security leaders across 12 industries, digs into these questions and more to take a closer look at identity management challenges and opportunities currently facing organizations.
Among the many trends uncovered in the report, one of the most notable is a recent spike in certificate-related outages. In this blog, we’ll unpack just why that is and how organizations can combat outages with better control and automation. First, lets start with the root cause.
92% of organizations experienced at least one outage in the past 24 months
There’s no doubt about it: The frequency and severity of certificate-related outages is increasing, with nearly all respondents (92%) reporting an experience with at least one outage due to an expired or misconfigured certificate in the past 24 months.
This increase is likely driven by the September 2020 change to cut the lifespan of SSL/TLS certificates in half. Although we’ve been talking about this change for years, many organizations are only realizing its full effect now as security teams are forced to renew certificates after one year, rather than two.
While organizations do recognize this situation, with 65% admitting that shorter SSL/TLS lifespans are increasing the workload on their teams and the risk of outages – a 10% increase from 2021 – the new challenges this change brings don’t appear to be dissipating any time soon. Nearly two-thirds of respondents rank outages due to expired certificates as likely or very likely to continue occurring over the next two years.
68% say it takes 3-4 or more hours to recover from an outage
Organizations also face challenges in effectively recovering from outages when they do occur. In fact, the average time to recover is 3.3 hours, with 68% of organizations reporting a recovery time of 3-4 hours or more.
What’s driving this long recovery time? A lack of visibility and centralized management. Specifically, 55% of respondents say they don’t know exactly how many keys and certificates their organization actually has. This situation makes responding to an outage difficult.
That’s because resolving an outage isn’t just as simple as “replacing a certificate” – it requires finding all the locations where the expired certificate lives, re-issuing a certificate, provisioning that new certificate to all relevant systems and then restarting those services. Doing all of those steps efficiently also requires automation, which many teams currently lack.
Consider the case of Epic Games, which experienced an outage that lasted more than five hours in April 2021. One of the company’s wildcard TLS certificates was used across many internal-facing services expired, and because it was untracked, the team was not aware of the upcoming expiration. This led to a series of events that brought down the Epic Games online store and required over 25 critical IT staff to resolve.
The fact that Epic Games didn’t know this expiration was coming, combined with a lack of visibility into everywhere the certificate lived and a lack of automation to re-issue a new certificate, led to the extended disruption and significant effort to recover.
Nearly two-thirds are prioritizing visibility and automation in response
The good news is organizations now recognize the path forward to resolving these challenges.
While 42% of respondents still use spreadsheets for certificate tracking, they now realize the need for a dedicated solution that can offer far more capabilities than a simple spreadsheet. We’ve already seen progress in this area, with 44% of respondents using a dedicated certificate lifecycle management solution in 2022, up from 36% in 2021.
We can expect even more movement away from spreadsheets and homegrown tools to dedicated, best-in-class solutions. In fact, respondents rated visibility into all certificates (60%) and lifecycle automation (57%) as the top two most important features to prioritize for PKI and certificate lifecycle management in 2022.
These priorities are promising. If organizations can deliver on these goals, they’ll be well on their way toward reducing the frequency and severity of outages despite the need for more frequent certificate renewals. It will be particularly interesting to see how this relationship ends up playing out over the course of the coming year.
Find more insights
What else did the Ponemon report uncover about machine identities in the enterprise? We’ve reviewed some of the most important findings on our blog, including a look at the machine identity attack surface and the impact of cloud and zero-trust on identity access management.
Ready for more? For a closer look at these trends and more that are shaping the role of machine identities in today’s organizations, click here to download the full report.