Can NFO use "anycast" to help with attacks or performance?

Post by **Edge100x** » Thu May 09, 2019 8:54 pm

This is an interesting question that we've gotten a couple of times recently, as apparently another company is marketing a service around this term in relation to game servers. The other company is not actually using the term correctly, and the marketing isn't quite right about how it works or how it can help with game server hosting -- there's a bit of snake oil involved.

Real anycast hosting is when a particular IP address (and its containing prefix) is assigned to multiple separate machines and advertised from multiple locations. By duplicating the service at every Point of Presence (PoP), a host can reduce the time it takes for a client to get a response from a copy of the service.

This is most commonly applied to webservers and DNS servers. In both cases, the appeal is that content can be quickly served directly from the network edge, while only more complicated requests have to be served from a central source (in which case, the machine proxies those requests).

An example of modern anycast usage is how CloudFlare operates. CloudFlare runs webservers in many different cities, all operating as the same IP address; when a client tries to connect to that IP address, their ISP will generally send it to the physically closest one, and a cached copy of the webpage is downloaded right from there.

Anycast doesn't apply to game servers that are hosted by the public. This is because these game servers are completely centralized. In order for the players to interact with each other, there is a single game server executable running that tracks all in-game activity. All incoming internet traffic has to go to that one server. Unlike with CloudFlare, it is not possible to run additional copies of a game server on the same IP address and different machines. If multiple PoPs are used, they have to forward incoming traffic to that game server, wherever it is.

When multiple locations are used to advertise the same IP address space and tunneling is used between the locations to unify them and forward the traffic to the real server address, that isn't an example of anycast. That can just be called a "network" or "backbone". Since these terms originally referred to physical networks (with dedicated cables or fiber between them), a more precise term might be "virtualized network" or "tunneled-over-transit multi-PoP network" (TMN).

We use a TMN internally for some of our own services, and used it previously for unifying our Dallas and Denver locations while Denver was being decommissioned. But, we don't wish to use it for game server hosting, because this type of network has serious drawbacks that make it a suboptimal choice for that purpose. These include:

Higher latency

The nature of routing on the internet means that scenic paths are often taken when connected PoPs are involved that are all advertising the same IP address and forwarding between them.

For instance, consider a network with two connected locations (with a tunnel between the two): Seattle and Los Angeles. If a client in Portland wants to connect, his ISP will see the advertisement from Seattle as being closer and send his traffic up there, after which it has to be forwarded internally by the TMN down to Los Angeles. The extra distance will add latency over the normal scenario of having the server advertised only from Los Angeles, where the ISP would otherwise send it directly (giving perhaps 40ms instead of 25ms).

Large transit providers -- so-called "tier 1" networks that hosts and ISPs buy from or peer with -- have lots of extra PoPs to help mitigate this problem, and they have special relationships with ISPs and other networks that allow them to communicate and consider the internal distance to a service in addition to the external distance.

INAP, of course, has based its reputation around buying from multiple transit providers, including some large ISPs (like Comcast, CenturyLink, and Verizon) directly, so that traffic to a particular IP address often doesn't even have to leave the ISP's own network on its way to the isolated PoP that it belongs to. INAP also runs its PoPs independently, to avoid the scenic routing problem.

The bottom line is that a TMN adds some amount of latency for clients in-game (even though the fake ping outside of the game looks lower). For some clients, it may be very little, while for others, it can be significant. The level of latency involved may not matter much for a webserver, but we have always taken the view that every millisecond counts for game server hosting.
Reduced DDoS resistance

Most of the largest DDoS attacks now come from botnets. These botnets are typically made up of one of two types of bots:

a) Compromised devices that are primarily hosted overseas
b) Spun-up or compromised instances at cloud hosts

In either case, the traffic from the botnet will not come in over all upstream providers evenly, and it will not come in to a particular city from all other cities evenly. In fact, quite the opposite! Traffic from Amazon will all come in over a single upstream provider and frequently from a single server farm from them, for instance; traffic from a large botnet on a single Chinese host will all come into the US through Seattle, LA, and/or the bay area.

This presents a serious problem for a TMN. Because botnet traffic so often comes into a small subset of PoPs, the choice has to be made to either have all PoPs be quite large and independently DDoS-resistant, or to allow various upstream links to be saturated during an attack. When an upstream link is saturated, packet loss will occur and clients will suffer in-game (in the form of skipping and possibly even disconnecting).

Worse, a customer at one PoP who is frequently being targeted with attacks can potentially cause problems at multiple other PoPs, if different botnets are involved.

This leaves little benefit to running a TMN, from a mitigation standpoint, over independent locations.
Increased complexity

Setting up, maintaining, and troubleshooting the necessary routers and the open-internet tunnels between locations takes a non-trivial amount of time. Take, for instance, an upstream that goes down at one location, briefly preventing a tunnel from working to another location -- causing an unexpected impact for clients that is difficult to troubleshoot. Or, imagine an attack coming into one PoP not being automatically mitigated and flooding out the tunneled link to another PoP.

An ASN and independent IP space are not required for a TMN to work using transit providers, but an ASN is necessary for establishing peering relationships, and independent IP space helps when it comes to using multiple upstreams at different PoPs. Obtaining these can be a challenge, though certainly not an insurmountable one. Both add expense.

One thing that the other GSP uses their TMN for, right now, is to fake server responses to queries made through the in-game/Steam browser. Because they advertise all IP addresses from all PoPs, a host running a TMN can choose to proxy requests made to their game servers (such as A2S_INFO requests for Source-based games) -- masquerading as the game server and issuing responses from every router, at every PoP, making clients think that the servers are always hosted nearby and have low latencies. This could be seen as a valuable benefit, or it could be seen as unethical cheating, depending on your point of view. Regardless, if a player actually connects to the server, he or she will be quickly disappointed by its lower performance, so this is a questionable tactic from a community-building perspective.