Is BGP Safe Yet?

April 20, 2020

Is BGP Safe yet?

Comments & Education on Cloudflare's New BGP Tool

Over the weekend, Cloudflare released a tool called isBGPSafeYet.com, to "track deployments and filtering of invalid routes by the major networks". The release has managed to get some notoriety (both positive and controversial) — more than one might expect for such a technical topic.

If you haven't seen or heard of the website, this is what it looks like as of April 20, 2020:

Is BGP Safe Yet?

Looks kind of alarming, doesn't it? In case you're not familiar with some of the terminology being thrown around, let's review these concepts. If you're already familiar with BGP & RPKI technologies, feel free to skip ahead.

BGP

BGP is the dynamic routing protocol that quite literally runs the internet. ISPs, cloud providers, and larger enterprises use BGP to advertise their networks to one another, which is how anything on the internet knows how to get to anything else. Take the below example:

$ traceroute -a -n www.google.com
 1  [AS22773] 172.19.10.126
 2  [AS22773] 10.96.48.1
 3  [AS22773] 100.125.108.70
 4  [AS22773] 100.99.28.32
 5  [AS22773] 68.1.4.252
 6  [AS22773] 68.105.30.142
 7  [AS15169] 108.170.247.129
 8  [AS15169] 108.170.247.193
 9  [AS15169] 216.239.41.4510
10 [AS15169] 108.170.234.41

In this traceroute, the highlighted hop 7 shows where BGP is happening. AS22773, or Cox Communications (my home's ISP), directly interconnects with AS15169, or Google. My home network can reach Google through this interconnection, which is commonly referred to as a peering arrangement.

As you can probably imagine, there's a lot more to BGP than this, very little of which is in-scope for this article. There are lots of great resources out there to learn more about BGP and internet peering; here are a few of my favorites.

RPKI

RPKI, or Resource Public Key Infrastructure, is a security mechanism introduced via RFC6487 in February 2012.

To understand RPKI, one must first understand the problem it's trying to solve: BGP Hijacking. Cloudflare's isBGPSafeYet.com controversial website actually does a fantastic job of illustrating what BGP Hijacking is:

BGP Hijacking

In a nutshell, an attacker can advertise networks they don't own to ISPs and cause traffic to web resources within those networks to route through the attacker's network. This can lead to the web resources appearing completely down, or worse, the traffic can be intercepted and sent on to its original destination with the end-user being none the wiser.

Does This Really Happen?

Yes, this really happens! Here are just a few examples:

This Cloudflare blog article also recaps some of the more infamous hijacks:

1997 - AS7007 mistakenly (re)announces 72,000+ routes (becomes the poster-child for route filtering).
⁠
⁠2008 - ISP in Pakistan accidentally announces IP routes for YouTube by blackholing the video service internally to their network.
⁠
⁠2017 - Russian ISP leaks 36 prefixes for payments services owned by Mastercard, Visa, and major banks.
⁠
⁠2018 - BGP hijack of Amazon DNS to steal crypto currency.

By default, BGP allows this to happen, but it's up to the network operators and providers to prevent it.

RPKI is a certificate system which may be used by an organization to cryptographically sign which networks they own, and from where they should originate. These certificates are called ROA (Route Origin Authorization) records.

There are two primary requirements to ensure RPKI is effective:

1. Sign ROAs

A BGP-operating organization must sign and issue ROAs for their networks, which are validated and maintained by Regional Internet Registries. While this is a critical first step, it doesn't solve any problems by itself.

2. Reject Invalid Routes

For this organization's ROAs to do anyone any good, internet providers upstream from the organization have to reject any route advertisements that are signed by the organization, but advertised by someone else.

For example, if Google says 108.170.192.0/18 should only originate from Google (AS15169) via their ROAs, and 108.170.192.0/18 is suddenly advertised from an attacker (whether maliciously or by mistake), internet providers should reject the route so that the traffic is never sent to the attacker:

BGP Hijacking Not Working

You might be wondering why a provider would accept a route to a network they don't own in the first place. This is an important question! Network providers should never accept route advertisements from customers or peers unless the organization is authorized to advertise that prefix. Unfortunately, it's impossible to enforce such a statement on a global scale.

RPKI's goal is to provide a mechanism through which participating networks can prevent these hijacks from happening, or at least reduce their impact.

Back to isBGPSafeYet.com

If you've read the above background information, or are already familiar with these technologies, you can see that RPKI adoption is an important step to improving the security of the internet. The discussion has been ongoing in the community for nearly a decade. Many content providers (like Cloudflare) have championed the implementation of RPKI, and some major internet carriers have recently taken a major step by rejecting RPKI-invalid route advertisements from ever entering their networks (Telia, NTT).

To date, Cloudflare has been doing fantastic work related to routing security in the internet operator community; they're probably the largest vocal advocate of RPKI adoption in the world. However, the release of isBGPSafeYet.com has been very controversial in the community, and with good reason.

Public shaming is rarely a successful or healthy catalyst for change, and Cloudflare's new tool enables potentially under-educated end-users to quickly (2 clicks) call out their service providers for not "implement[ing] BGP safely", which isn't necessarily accurate.

A couple of highly-respected network engineers (@Benjojo12, @JobSnijders tested Cloudflare's own claim to "implement BGP safely", and found some interesting results. They intentionally advertised a network onto the internet — 194.32.71.0/24 — from an invalid ASN, which should be considered an RPKI-invalid network. Cloudflare's own RPKI Validation Tool confirms this:

Cloudflare confirms 194.32.71.0/24 is invalid

If Cloudflare truly rejects all invalid RPKI networks as they claim to, traffic originating from this network (from the "attacking" source) should be unreachable for Cloudflare. In other words, you shouldn't be able to reach Cloudflare from that network. And yet:

Cloudflare can still reach 194.32.71.0/24

This screenshot shows that traffic originating from the "attacker" can reach Cloudflare's network, and that Cloudflare's network has reachability back to the "attacker" network. Another engineer set up a site using Cloudflare Workers that also shows Cloudflare able to reach their own invalidity test site.

To sum up our take on isBGPSafeYet.com and Cloudflare:

We love Cloudflare. This site is served over Cloudflare's CDN, we exclusively use their DNS services, and we recommend them to just about all our customers.

However, we believe isBGPSafeYet.com is a well-intentioned, poorly-implemented scare tactic that will only serve to raise RPKI awareness in a negative light, at a very sensitive time in the global internet community due to the COVID-19 pandemic. And, if Cloudflare is going to be the RPKI standards bearer for everyone to follow, shouldn't they implement RPKI correctly themselves?

Cloudflare's method of testing a network's "safe" implementation of BGP is reachability to their network 103.21.244.0/24, which is intentionally advertised with an invalid ROA. However, reachability to this network isn't a sole indicator of an unsafe BGP implementation. Not all network providers accept a full routing table from their upstream carriers. Indeed, many providers only accept routes local to a particular region, with a default route to cover the rest. In these cases, the provider has no control over its reachability to an RPKI-Invalid network. This further illustrates that the attitude of Cloudflare's isBGPSafeYet.com tool is not the right approach to fostering the adoption of RPKI.

What is Stellar doing about RPKI?

At Stellar, we place extremely high importance on routing security. You can read more about our routing policies here. In summary:

Customer & Peer Routing

We ensure invalid routes never enter our network from customers and peers by only accepting routes that are either a) listed in an Internet Routing Registry, b) manually validated by our engineering team, or both.

We also use BGP policies to ensure only routes we or our customers are authorized to advertise ever leave our network.

Route Origin Authorization

As a cloud provider, our primary objective in the RPKI ecosystem is to prevent the hijacking of our own networks by a third party. This is accomplished by signing and issuing ROAs, which inform upstream RPKI participants of valid paths for the containing networks. At Stellar, we issue ROAs for our address space:

AS14525 Signed ROAs

Rejecting RPKI-Invalid Routes

> show bgp ipv4 unicast 103.21.244.0/24 bestpath
BGP routing table entry for 103.21.244.0/24, version 24720714
BGP Bestpath: deterministic-med
Paths: (6 available, best #3, table default)
  Advertised to update-groups:
     14
  Refresh Epoch 1
  174 13335, (aggregated by 13335 162.158.140.1)
    38.104.117.177 from 38.104.117.177 (154.26.7.144)
      Origin IGP, metric 2020, localpref 150, weight 200, valid, external, best
      Community: 174:21001 174:22013 14525:0 14525:20 14525:1021 14525:2840 14525:3001 14525:4001 14525:9001
      rx pathid: 0, tx pathid: 0x0

The above is the BGP routing table entry for Cloudflare's intentionally invalid RPKI test network. The fact that it exists in our routing table means that we do not yet reject RPKI-Invalid routes from our upstream providers.

While it's been on our roadmap for some time, we've placed a higher priority on other objectives, such as our customer & peer routing policies, global network redundancy, and making our network faster and more reliable. Additionally, there are very few options available for the server implementation of RPKI - all of them open source. While we fully support and contribute to the open source community, the use of open source software in production networks requires a great deal of testing, validation, and architecture effort to ensure what we implement meets our Service Level Agreement across all our data centers.

We do plan to implement an RPKI solution to reject invalid routes, but it is still in progress at this time.