Accessibility could be restored immediately through local backup transit, but such a hard termination of BGP sessions is always associated with some convergence time, which is why it took 1-5 minutes for the situation to stabilize for all servers in MainCubes. The IPv6 connection was affected for a few minutes longer.
maintenance work
In order to permanently solve the underlying problem, we will be carrying out maintenance work on the connection between our locations on 20.06.2024 between 1 and 5 hours. According to plan, there will be no failures.
However, there is always a residual risk in the event of critical interventions, which is why we hereby announce the work as a precaution and carry it out in the time with the least amount of traffic. There may also be minimally increased latencies, as some upstreams and in particular peerings will be temporarily unavailable.
Technical background
Our Interxion POP is redundantly (i.e. intersection-free) connected to the maincube via its own fiber optic connections and terminates on two redundant switches from the manufacturer Juniper Networks on the MainCubes side. In theory, this means that there is no single point of failure — especially because we connect the switches together using MC-LAG technology, which allows each device to be independently controlled. We have deliberately avoided solutions in which several switches form a virtual network that can be controlled like a single switch (virtual chassis) in such a critical area of our network, as firmware bugs may well tear off a virtual chassis in such a way that it can no longer be controlled until the devices are physically restarted (power out, power in). Our devices remained controllable, which is why we were able to restart them normally and restore full redundancy even in a very short time.
Nevertheless, practice has shown that our concept did not work as perfectly as we would have liked, and we apologize for that.
Our aim when it comes to connecting our locations is that this connection should never completely fail. In the last ten days, we have therefore explored many options and finally decided to completely dismantle the culprit aggregation switches and instead implement a direct (also redundant, of course) router connection without upstream aggregation switches. Since we have received the last components required for the upcoming migration of fiber optic connections today, we can now announce them.
Feel free to follow us on instagramif you want to get a glimpse of the work.