Current:
1/6/22 5:23PM CST: Update: The issue is fully identified and isolated to our WAN routers. The devices are configured redundantly but failed to respond to a route change. We’re bringing up the devices as we speak and will have another update soon.
Completed:
Network Outage
1/4/22 7:10PM CST: We are currently looking into an issue with packet loss affecting some services within the network.
1/4/22 7:50PM CST: We are working with network engineers to fully restore service, the root of the issue has been found.
1/4/22 8:19PM CST: Issue still being worked on
1/4/22 8:45PM CST: Most services have been restored, we are still working to restore service to several VPS nodes.
1/4/22 9:07PM CST: We are still working to restore all networking, we are currently seeing issues bouncing between different network carriers.
1/4/22 9:45PM CST: We’ve identified routing issues within the US still but most other GEO locations are fine, still working hard to fully resolve this incident.
1/4/22 10:22PM CST: Some equipment is needing to be restarted to apply changes, we do not have an ETA yet for a resolution but are still working as quickly as possible.
1/4/22 10:32PM CST: We are waiting on some devices to come back online from the reboots to properly apply the necessary changes to get us back on track.
1/4/22 11:10PM CST: There are several network team members on site working on the network gear doing everything possible to bring us back online.
1/4/22 12:28AM CST: Our networking team is still all hands on deck working on the issue.
1/4/22 1:25AM CST: The network has been restored, we are checking all aspects to ensure everything has come up properly.
1/4/22 1:38AM CST: It seems we are still experiencing routing issues within the US, this is #1 priority right now, we will also need to reload core configs on some networking gear as its currently running on rescue configs to bring this back online.
1/4/22 1:38AM CST: Routing has been resolved as well as configs restored, we are still seeing restoring service to some of the VPS nodes but shouldn’t be much longer.
1/5/22 3:07AM CST: Service has been fully restored as of ~2 hours ago, including those affected by the initially isolated routing incident earlier in the evening. We’ll be sending out a full report as soon as possible.
Network Maintenance: 1-3AM CST Monday January 3rd
Some of the last steps are being taken tonight to complete our on going network upgrade.
We expect the interruption to be less than 5 minutes with a maximum of 15 minutes and will occur sometime within this window.
Tasks being performed:
- 3rd and 4th cables to access layer added into aggregate bundles and enabled on the access switch.
- Introducing all newly configured network devices into existing network.
- upstream configuration updated to use logical interfaces on aggregate and placed back into production traffic path then monitored for 1 hour
1:00AM – Maintenance has began
1:40AM – Introducing all newly configured network devices into existing network
1:53AM – All access has been restored, we are monitoring results now.
2:00AM – We are still seeing some packet loss, we are applying changes as necessary, updates to follow.
2:10AM – All network traffic appears to be stable, seeing no packet loss currently, we are continuing to monitor and will update soon.
2:26AM – Seeing some packet loss again, we are continuing to check into it.
2:37AM – Once again, seeing things stabalize as some customers were still being impacted, we are still monitoring.
3:11AM – Everything still remaining stable, working towards completing maintenance, will have more updates soon.
3:16AM – One of the network devices required a config change which is causing some packet loss again, we are monitoring.
3:25AM – Seeing full network stability again, continuing to monitor and finalize maintenance.
3:40AM – No further issues or work is being done, the network upgrade is complete at this point with some minor maintenance windows to follow to apply further optimizations and cleanup.
Network Maintenance: 1-3AM CST Sunday January 2nd
Some of the last steps are being taken tonight to complete our on going network upgrade.
No impact expected, but ~5 minutes at most while DDoS protection gets reintroduced.
Tasks being performed:
- 3rd and 4th cables to access layer added into aggregate bundles and enabled on the access switch
- bswan02 upstream configuration updated to use logical interfaces on aggregate and placed back into production traffic path then monitored for 1 hour
- bsagg01, bswan01 isolated from production traffic path
- bsagg01 updated to 20.3R3S2.2
- bsagg01, bswan01 reintroduced to production traffic path
- GHS Monitoring updated
1:30AM – Maintenance has began
3:40AM – Maintenance still ongoing.
4:10AM – We may see some packet loss for a few minutes while finalizing some changes to the network.
4:53AM – Maintenance has been completed
Network Maintenance: 1-3AM CST Saturday January 1st
Some of the last steps are being taken tonight to complete our on going network upgrade.
No impact expected, but ~5 minutes at most while DDoS protection gets reintroduced.
1:00AM – Maintenance has started
3:05AM – Maintenance is on going but should be finishing shortly.
3:50AM – Maintenance has been completed
Network Maintenance: 1-3AM CST Thursday 30th
This is the 3rd and final maintenance that will occur in order to complete our network upgrade.
No impact expected, but ~5 minutes at most while DDoS protection gets reintroduced.
Tasks being completed:
- bswan01 placed back into production traffic path
- Routed interface config on bsdis0# migrated to bswan01
- 1 subnet moved at a time for the first 5 as a test to start with then done in batches of 10
- bswan02 logically removed from production traffic path and monitored for 1 hour
- bswan02 gets de-racked and re-racked in rack 246
- Cabling from bsagg01/bsagg02 to bswan02:
- Pull out all the XFP modules before proceeding to avoid confusion, may need to collect optics from storage
- Need to grab the other breakout cable from the shipment
- Insert 4 x 10GBASE-SR XFP modules into the onboard ports 0-3, if coming up short just add as many as possible
- Port 96 of agg01 to bswan02 onboard ports – all 4 tails
- WAN connections from bswan02 re-cabled to bsagg02:
- Use ports 82 onwards
- Note which is which
- bswan01/bswan02 interconnection:
- bswan01 port 1/0/1 to bswan02 port 1/0/1
- bswan01 port 1/1/0 to bswan02 port 1/1/0
- bswan01 port 1/1/1 to bswan02 port 1/1/1
1:15AM – Maintenance has started
1:30AM – Routing interface config on distribution layer migrated to new network device
Subnet gateways moved one at a time to the new devices
Old device logically removed from production traffic path and monitored for 1 hour
4:04AM – Maintenance has been completed.
Network Maintenance: 1-3AM CST Tuesday 28th
This is the 2nd and final maintenance that will occur in order to complete our network upgrade.
No impact expected, but ~5 minutes at most while DDoS protection gets reintroduced.
12:50PM – We are preparing to start on the 2nd maintenance window.
1:40AM – Maintenance has been completed, we will be adding one additional maintenance window as not all tasks were able to get carried out.
Network Maintenance: 1-3AM CST Monday 27th
This is the 1st part of two maintenances that will occur in order to complete our network upgrade.
No impact expected, but ~5 minutes at most while DDoS protection gets reintroduced.
- 1:00AM – The maintenance has been started.
- 1:17AM – Moving traffic to our redundant device in preparation of introducing a new device.
- 1:37AM – Shutting off downstream connections between old device and core.
- 2:30AM – Rack termination points are being moved one by one to the new devices.
- 5:28AM – Maintenance has been completed.