Holiday Outage Recaps: What Went Dark, Why, and How to Harden for the Next Peak

Holidays are supposed to be predictable: increased travel, heavier traffic on networks, and higher public expectations for services. Instead, the 2024 holiday season and the first half of 2025 reminded operators and buyers of critical infrastructure that low probability events and human factors collide at the worst possible moments. Below I recap the major holiday-related outages, what the root causes tell us, and practical, field-ready actions teams can implement before the next surge.

The notable incidents

EstLink 2, Christmas Day 2024: On December 25, 2024 the EstLink 2 subsea power cable between Finland and Estonia failed, removing roughly 650 MW of cross-border capacity and triggering immediate investigations and regional alarm. Authorities found evidence consistent with a large vessel dragging anchor across the seabed near the cable, and Finnish police detained crew members from a Russia-linked tanker while probes continued. The outage underscored how physical maritime events can cascade into power and communications risk during holidays when staffing and reaction windows are constrained.

Puerto Rico, New Year’s Eve 2024: An island-wide blackout left most of Puerto Rico without power as people prepared for New Year’s celebrations. The outage was attributed to a failure in an underground transmission line and exposed a fragile grid that relies on limited redundancy and aging assets. Restoration took up to 48 hours in many places and highlighted how holiday demand plus fragile infrastructure creates humanitarian and operational strain.

Airline systems, Christmas Eve 2024: American Airlines temporarily grounded domestic flights on December 24, 2024 due to a technical systems issue that affected boarding and departure processes during the busiest travel window. Even short-lived IT system failures can produce outsized operational impacts when they hit peak travel days and when manual fallback procedures are not fully exercised.

Subsea cable faults around year end: Multiple submarine cable faults and cuts in late December 2024 and early 2025 disrupted international traffic across regions from East Africa to Southeast Asia and Europe. The Asia-Africa-Europe-1 system experienced breaks that degraded capacity on important routes, and those failures interacted with other regional incidents to create degraded routing and latency spikes for ISPs and services that depend on predictable international links. These failures show how physical disruptions far from data centers still translate into user-facing outages.

Common themes

1) Single points of failure at system and geographic scale: Many outages were not simple component failures. They exposed dependencies that operators had normalized. A single subsea cut or a single undersea-to-land transition failure can remove large swaths of capacity without quick alternatives.

2) Physical world risks remain primary attack vectors: Ships dragging anchors, terrestrial excavation, and aging transmission equipment matter as much as software bugs. In multiple incidents investigators looked at anchor damage, suspicious vessel activity, and maintenance errors. Physical protection and maritime-domain awareness are operational priorities.

3) Peak demand amplifies small faults: Holidays concentrate user demand and shrink maintenance windows. An IT outage or a transmission fault that might be manageable on a quiet day becomes a cascading emergency during travel peaks or public holidays.

4) Communication and trust suffer when status pages fail: In Puerto Rico the outage coincided with a downed restoration tracking site. When customers cannot see authoritative status, speculation and reputational damage worsen. Visible, authenticated outage channels matter.

Practical steps for defenders and operators (a checklist you can implement this quarter)

Pre-holiday readiness

Freeze risky network changes. Block noncritical routing, maintenance, and config pushes for a defined holiday window. If change is unavoidable, require peer approvals and an on-call escalation path. This reduces the chance of human error during critical windows.
Run a holiday tabletop and war room drill. Simulate the combination of a regional power loss plus comms degradation and practice escalation to the same people who will be on-call over the holiday. Test manual fallbacks for ticketing, boarding, or dispatch systems.
Identify and harden geographic single points. Map subsea landings, interconnects, and long-haul fiber routes used by your services. For any path that has only one route to a major market, arrange alternate peering or capacity agreements in advance. Use active probing to know if the alternative path is truly ready.

Redundancy and fallback

Push caching to the edge. For content and authentication flows that can tolerate graceful degradation, cache tokens and static content closer to users so that short upstream network blips do not render services unusable.
Multi-homing and routing diversity. Where possible, multi-home at the IX level and work with upstream providers to confirm that BGP failover behaves as expected under load. Include BGP path prep and emergency route announcements in runbooks.
Plan for out-of-band comms. Have validated SATCOM, VoIP-over-SAT, or emergency cellular gateways and trained teams on how to bring them online. Confirm that critical staff carry pre-authorized SIMs or satellite comms and that credential resets can be performed without the primary network.

Detection and monitoring

Monitor infrastructure health with cross-domain signals. Correlate energy grid telemetry, vessel AIS data, and subsea cable partner notices into a simple dashboard so you can detect physical world events that predict comms instability. This is a force-multiplier for security teams monitoring supply chain effects.
Synthetic transactions and layered probes. Synthetic probes from multiple geographic vantage points reveal when a problem is localized versus systemic. Alert on divergence, not absolute failure counts, to catch partial degradation early.

Customer-facing operations

Publish the playbook publicly. Customers understand that failures happen. What they do not accept is opacity. Maintain an authenticated status channel, clear SLAs for holiday windows, and a compensations policy to reduce friction after an outage. Puerto Rico and airline incidents show reputational costs can be as large as technical ones.
Pre-position emergency staffing and legal levers. For infrastructure that depends on maritime safety and port operations, pre-authorize emergency port escorting, and have legal teams on standby for rapid vessel seizures or evidence preservation if sabotage is suspected. Incidents in the Baltic Sea show that rapid law enforcement coordination matters.

Post-incident actions and signal hardening

Conduct blameless postmortems and publish sanitized timelines. Include service impact metrics, mitigation steps, and code or configuration changes. Use the report to drive budget requests for redundancy where needed.
Fund diversity projects. Subsea cable outages are slow to fix due to cable ship scarcity. Where you rely on a single corridor, invest in regional alternatives, terrestrial microwave hops, or partner with satellite providers for burst protection during repairs.
Improve maritime situational awareness. Subscribe to AIS anomaly feeds and partner with coastal authorities to get earlier alerts of suspicious anchor drags or unplanned vessel behavior near cable corridors. Use those feeds to trigger protective reroutes for sensitive traffic.

Concluding blueprint

Holidays are a stress test that the adversarial and accidental worlds both exploit. The incidents from Christmas 2024 through the New Year window showed a mix of physical and software causes, and a common denominator: dependencies we accepted as stable suddenly were not. A practical resilience program treats these dependencies as hostile territory. Map them, exercise fallback plans, harden comms, and buy diversity before a holiday forces you into a crisis.

If you want a short starter: run a single holiday blackout drill this quarter where the primary power feed and the primary international uplink are declared down for four hours. Use that exercise to validate your emergency comms, multi-homing, and customer status processes. You will find the weak links fast, and you will reduce both downtime and reputational loss when the next holiday arrives.

If you want help converting this checklist into a tabletop runbook or a one-page executive briefing, I can draft those templates to match your environment and threat model.