Legacy Outage

Two days ago, May 5 of the year 2019 we saw a peculiar BGP outage, affecting autonomous systems in the customer cone of one very specific AS with the number 721.

Right at the beginning, we need to outline a couple of details for our readers:

  1. All Autonomous System Numbers under 1000 are called “lower ASNs,” as they are the first autonomous systems on the Internet, registered by IANA in the early days (the late 80’s) of the global network. Today they mostly represent government departments and organizations, that were somehow involved in Internet research and creation in 70-90s.
  2. Our readers should remember, that the Internet became public only after the United States’ Department of Defense, which funded the initial ARPANET, handed it over to the Defense Communication Agency and, later in 1981, connected it to the CSNET with the TCP (RFC675)/IP (RFC791) over X.25. A couple of years later, in 1986, NSF swapped the CSNET in favor of NSFNET, which grew so fast it made possible ARPANET decommission by 1990.
  3. IANA was established in 1988, and supposedly at that time, existing ASNs were registered by the RIRs. It is no surprise that the organization that funded the initial research and creation of the ARPANET, further transferring it to another department because of its operational size and growth, only after diversifying it into 4 different networks (Wiki mentions MILNET, NIPRNET, SIPRNET and JWICS, above which the military-only NIPRNET did not have controlled security gateways to the public Internet.

After the establishment of one of the ICANN functions in the form of the IANA (Internet Assigned Numbers Authority), it started allocating ASNs to the organizations that were part of this network creation from the beginning. It is interesting that the first ASNs were post factum taken into consideration by various registriesaround the globe, allowing to assume that different countries’ Government’s Official Departments were a part of the Internet creation in the 80s. We know that a lot of new ideas came from CERN, which began the installation of TCP/IP between the years 84 and 88, and was interconnected to the rest of the networks in 1989.

So what happened on May 5?

As we may suppose, all those networks that existed within the ARPANET and besides that had not ceased to exist. After the modern IP addressing took its place, and the first IP prefixes were assigned, network resources were already within those networks.

After establishing ICANN, IANA and the RIRs it was necessary to “register” all those addresses and prefixes and correlate them to the corresponding Autonomous System — a term introduced in the EGP draft from 1982. So it is no surprise that the United States’ Department of Defense which, once again, funded the initial ARPANET research, got a lot of “lower ASNs” for their needs. Nowadays, 70 ASNs are belonging to different DoD’s departments, including USAF, ISC, NAVY NNIC, and DNIC. What unites them and make the whole situation so unique?

The answer is — they all have one upstream to the world in the form of AS721.

Why is that peculiar? Let us quote the 2018 National Internet Segment Reliability Report:

Strictly speaking, when the BGP and the world of interdomain routing were in the design stage, the creators assumed that every non-transit AS would have at least two upstream providers to guarantee fault tolerance in case one goes down. However, the reality is different: more than 45% of ISP’s have only one connection to an upstream provider.

An opportunity to see one individual ISP overwhelmed with traffic is on the table most of the time. For us, it is quite surprising that such a serious state organization, as the Department of Defense, haven’t updated their image of how a network should interconnect from the late 80s. Everything that connects through the outer world through the AS721 relies on it as the only connectivity medium, which might, and the Sunday events show that such a feature would be exploited.

Such a network, serving internal purposes and not trying to earn money by transit, should have much more controllable upstreams to be reliable and failure-tolerant. The option of having only one critical external gateway could sound like something easy to control and therefore secure, though ultimately it casts doubt on the ability of such a network, and therefore organization which it belongs to, to sustain the needed level of connectivity.

AS721, as it is seen on the Radar graph, connects to the Internet only with the help of one transit provider — CenturyLink. Again we have to quote the 2018 Reliability Report:

However, the big news involving Cogent comes from the United States. For two years — 2016 and 2017 — we identified Cogent’s AS 174 as the crucial one for that market. This is no longer the case — in 2018, the CenturyLink AS 209 replaced Cogent, and the change sent the United States up the list by three places, to 7th.

However, even in the case of a reliable ISP — a single outer connection is the pain point for any internet infrastructure IRL. In case of an emergency, technical failure, mistake or catastrophe such a single link is expected to fail, or at least to suffer degradation. That is why Qrator Labs and the Radar project have, again, to remind a simple one-word verb for 2019: diversify.