Last night (2019-09-19) we experienced several short periods of complete packet loss on a number of inbound transit routes between 22:25 and 23:50 UTC, resulting in some disruption to network reachability. Due to the limited number of impacted routes, our monitoring systems failed to detect and respond to the incident as quickly as we'd expect and our engineering team were unable to take any mitigating action at the time as a result.
We've now identified that the incident was caused by disruption in one of our upstream transit networks and we're currently working with the provider to establish the root cause. We'll provide more information here in due course.
We have not experienced any further disruption since 2019-09-19 23:50 UTC and do not anticipate any further impact on network performance going forward. If you have any concerns or are still experiencing any difficulty as a result of the incident then please contact support via email at support@brightbox.com.
We take the performance and reliability of our infrastructure very seriously and are conducting an immediate review of our network monitoring systems to improve the coverage of our reachability monitoring so we can avoid similar failures in future.
UPDATE: The provider was performing scheduled network maintenance work which was not deemed to impact our connectivity, so we were not given prior notice. An issue during the maintenance work resulted in some services flapping on a number of their routers in the London metro area, resulting in the disruption we experienced. We've requested a review of the provider's maintenance notification procedure and that we be informed of any "at risk" works in future.