Current:

1/6/22 5:23PM CST: Update: The issue is fully identified and isolated to our WAN routers. The devices are configured redundantly but failed to respond to a route change. We’re bringing up the devices as we speak and will have another update soon.

 

Completed:

Network Outage

1/4/22 7:10PM CST: We are currently looking into an issue with packet loss affecting some services within the network.

1/4/22 7:50PM CST: We are working with network engineers to fully restore service, the root of the issue has been found.

1/4/22 8:19PM CST: Issue still being worked on

1/4/22 8:45PM CST: Most services have been restored, we are still working to restore service to several VPS nodes.

1/4/22 9:07PM CST: We are still working to restore all networking, we are currently seeing issues bouncing between different network carriers.

1/4/22 9:45PM CST: We’ve identified routing issues within the US still but most other GEO locations are fine, still working hard to fully resolve this incident.

1/4/22 10:22PM CST: Some equipment is needing to be restarted to apply changes, we do not have an ETA yet for a resolution but are still working as quickly as possible.

1/4/22 10:32PM CST: We are waiting on some devices to come back online from the reboots to properly apply the necessary changes to get us back on track.

1/4/22 11:10PM CST: There are several network team members on site working on the network gear doing everything possible to bring us back online.

1/4/22 12:28AM CST: Our networking team is still all hands on deck working on the issue.

1/4/22 1:25AM CST: The network has been restored, we are checking all aspects to ensure everything has come up properly.

1/4/22 1:38AM CST:  It seems we are still experiencing routing issues within the US, this is #1 priority right now, we will also need to reload core configs on some networking gear as its currently running on rescue configs to bring this back online.

1/4/22 1:38AM CST: Routing has been resolved as well as configs restored, we are still seeing restoring service to some of the VPS nodes but shouldn’t be much longer.

1/5/22 3:07AM CST: Service has been fully restored as of ~2 hours ago, including those affected by the initially isolated routing incident earlier in the evening. We’ll be sending out a full report as soon as possible.

 

Network Maintenance: 1-3AM CST Monday January 3rd

Some of the last steps are being taken tonight to complete our on going network upgrade.

We expect the interruption to be less than 5 minutes with a maximum of 15 minutes and will occur sometime within this window.

Tasks being performed:

  1. 3rd and 4th cables to access layer added into aggregate bundles and enabled on the access switch.
  2. Introducing all newly configured network devices into existing network.
  3. upstream configuration updated to use logical interfaces on aggregate and placed back into production traffic path then monitored for 1 hour

1:00AM – Maintenance has began
1:40AM – Introducing all newly configured network devices into existing network
1:53AM – All access has been restored, we are monitoring results now.
2:00AM – We are still seeing some packet loss, we are applying changes as necessary, updates to follow.
2:10AM – All network traffic appears to be stable, seeing no packet loss currently, we are continuing to monitor and will update soon.
2:26AM – Seeing some packet loss again, we are continuing to check into it.
2:37AM – Once again, seeing things stabalize as some customers were still being impacted, we are still monitoring.
3:11AM – Everything still remaining stable, working towards completing maintenance, will have more updates soon.
3:16AM – One of the network devices required a config change which is causing some packet loss again, we are monitoring.
3:25AM – Seeing full network stability again, continuing to monitor and finalize maintenance.
3:40AM – No further issues or work is being done, the network upgrade is complete at this point with some minor maintenance windows to follow to apply further optimizations and cleanup.

 

 

Network Maintenance: 1-3AM CST Sunday January 2nd

Some of the last steps are being taken tonight to complete our on going network upgrade.

No impact expected, but ~5 minutes at most while DDoS protection gets reintroduced.

Tasks being performed:

  1. 3rd and 4th cables to access layer added into aggregate bundles and enabled on the access switch
  2. bswan02 upstream configuration updated to use logical interfaces on aggregate and placed back into production traffic path then monitored for 1 hour
  3. bsagg01, bswan01 isolated from production traffic path
  4. bsagg01 updated to 20.3R3S2.2
  5. bsagg01, bswan01 reintroduced to production traffic path
  6. GHS Monitoring updated

1:30AM – Maintenance has began
3:40AM – Maintenance still ongoing.
4:10AM – We may see some packet loss for a few minutes while finalizing some changes to the network.
4:53AM – Maintenance has been completed

 

 

Network Maintenance: 1-3AM CST Saturday January 1st

Some of the last steps are being taken tonight to complete our on going network upgrade.

No impact expected, but ~5 minutes at most while DDoS protection gets reintroduced.

1:00AM – Maintenance has started
3:05AM – Maintenance is on going but should be finishing shortly.
3:50AM – Maintenance has been completed

 

 

Network Maintenance: 1-3AM CST Thursday 30th

This is the 3rd and final maintenance that will occur in order to complete our network upgrade.

No impact expected, but ~5 minutes at most while DDoS protection gets reintroduced.

Tasks being completed:

  1. bswan01 placed back into production traffic path
  2. Routed interface config on bsdis0# migrated to bswan01
  3. 1 subnet moved at a time for the first 5 as a test to start with then done in batches of 10
  4. bswan02 logically removed from production traffic path and monitored for 1 hour
  5. bswan02 gets de-racked and re-racked in rack 246
  6. Cabling from bsagg01/bsagg02 to bswan02:
  7. Pull out all the XFP modules before proceeding to avoid confusion, may need to collect optics from storage
  8. Need to grab the other breakout cable from the shipment
  9. Insert 4 x 10GBASE-SR XFP modules into the onboard ports 0-3, if coming up short just add as many as possible
  10. Port 96 of agg01 to bswan02 onboard ports – all 4 tails
  11. WAN connections from bswan02 re-cabled to bsagg02:
  12. Use ports 82 onwards
  13. Note which is which
  14. bswan01/bswan02 interconnection:
  15. bswan01 port 1/0/1 to bswan02 port 1/0/1
  16. bswan01 port 1/1/0 to bswan02 port 1/1/0
  17. bswan01 port 1/1/1 to bswan02 port 1/1/1

1:15AM – Maintenance has started

1:30AM – Routing interface config on distribution layer migrated to new network device
Subnet gateways moved one at a time to the new devices
Old device logically removed from production traffic path and monitored for 1 hour

4:04AM – Maintenance has been completed.

 

 

Network Maintenance: 1-3AM CST Tuesday 28th

This is the 2nd and final maintenance that will occur in order to complete our network upgrade.

No impact expected, but ~5 minutes at most while DDoS protection gets reintroduced.

12:50PM – We are preparing to start on the 2nd maintenance window.

1:40AM – Maintenance has been completed, we will be adding one additional maintenance window as not all tasks were able to get carried out.

 

 

Network Maintenance: 1-3AM CST Monday 27th

This is the 1st part of two maintenances that will occur in order to complete our network upgrade.

No impact expected, but ~5 minutes at most while DDoS protection gets reintroduced.

  • 1:00AM – The maintenance has been started.
  • 1:17AM – Moving traffic to our redundant device in preparation of introducing a new device.
  • 1:37AM – Shutting off downstream connections between old device and core.
  • 2:30AM – Rack termination points are being moved one by one to the new devices.
  • 5:28AM – Maintenance has been completed.