An analysis of large Internet outages affecting Iranian networks in early 2020

Ramakrishna Padmanabhan (CAIDA, UC San Diego), Anant Shah (Verizon Media Platform), Nima Fatemi (Kandoo), Alberto Dainotti (CAIDA, UC San Diego)

Index

Overview

Internet connectivity in Iran has been known to suffer from disruptions, especially during times of crises and political upheavals. In this post, we use several complementary data sources to examine Internet connectivity in Iranian networks in a two-month period (February 17 to April 17 2020) covering events such as the legislative election (held on Feb 21 2020) and the early spread of COVID-19 cases in Iran. We analyze four Internet connectivity outages affecting Iranian networks during this time.

Our findings show:

a) Widespread Internet connectivity outages, affecting multiple networks simultaneously, continued to occur in Iran.

b) Isolated outages (non-widespread across ISPs) can significantly impact individual major ISPs but without large effects on other ISPs.

c) The complementary views offered by multiple data sources increases outage detection accuracy and also allows us to uncover additional nuances.

Background and motivation

Iran’s Internet connectivity has experienced several large-scale outages in the recent past. The most notorious of these occurred in November 2019, when the Iranian government mandated a week-long Internet connectivity shutdown in response to widespread protests over fuel prices. Several reports ( Oracle, Netblocks), including our collaborative post with OONI, showed the unprecedented scale and complexity of this event. During this event, cellular ISPs such as IranCell (AS44244) and MCCI (AS197207) were first affected on November 16th. A few hours later, most of the other large ISPs also experienced outages, including both state-owned ISPs such as Iran Telecom Co (AS58224) as well as non-state-owned ISPs such as Shatel (AS31549). Recovery from the outage occurred a week later, from November 23rd onwards.

In the time since, there have been additional reports of Internet outages in Iran but there has been uncertainty regarding the extent to which various networks were affected. A potential Internet connectivity shutdown event in December 2019 was covered by media outlets (BBC, U.S. News) but the scale of the outage was unclear. Newsweek reported upon several “mild outages” in the wake of the downing of Ukraine International Airlines Flight 752 in early January 2020 but the evidence for these outages was sometimes anecdotal. There were also reports of an outage due to a cyber-attack on February 8 2020 but which networks were affected and to what extent is unclear.

The ambiguity in assessing the extent of outages arises from the inherent challenges in detecting outages of various kinds: depending upon the nature of the outage and the networks affected, it may be apparent in some outage monitoring tools but not others. For example, even a widespread power outage that affects primarily end-users may not be visible in Internet routing data, since Internet routers are often in well-provisioned data-centers with backup power. However, such an outage may be visible in a data source containing measurements from user’s machines (such as IODA’s darknet data source).

Given the increasing susceptibility of Iranian ISPs to Internet outages on the one hand and the challenges in accurately detecting outages on the other, we studied Iranian Internet connectivity from February 17 to April 17 2020 using diverse measurement “lenses” obtained from a variety of data sources. This period includes the quadrennial Iranian legislative election and also the early spread of COVID-19 cases in Iran. In this initial report, we cover four large outages that we observed during this time.

Methodology

We used a variety of data sources to investigate Internet connectivity in Iranian networks. We first describe IODA’s data sources. Next, we describe two novel and experimental data sources: (a) ZeusPing, a prototype fine-grained active probing system under development at CAIDA and (b) a dataset from a large Content Delivery Network (CDN). These diverse data sources helped us detect outages more accurately and also discover additional nuances about these outages.

Existing IODA data sources

The Internet Outage Detection and Analysis (IODA) project of the Center for Applied Internet Data Analysis (CAIDA) at University of California San Diego measures Internet connectivity outages worldwide in near real-time.

In order to track and confirm Internet disruptions with greater confidence, IODA uses three complementary measurement and inference methods based on Internet routing (BGP) announcements, active probing, and Internet Background Radiation (IBR) traffic. The routing announcements from BGP allow us to track reachability according to the Internet global routing system (the so-called Internet control plane). IODA uses routing data extracted from RouteViews and RIPE RIS to obtain the BGP “signal”. IODA’s existing active probing approach uses the Trinocular methodology developed at USC’s ISI to detect outages. Our implementation of Trinocular pings a few addresses at random from /24 blocks that are likely to respond to pings. We send pings to each block in 10-minute rounds. Using Bayesian inference, the system reasons about responses from blocks and detects outages when a /24 block’s responsiveness is lower than expected.The Darknet data source represents Internet Background Radiation (IBR) traffic that is often from actual user machines. IBR traffic is generated by millions of machines worldwide and is often a result of these machines being infected by malware or misconfigured in some other way. These methods result in connectivity “liveness” signals, whose status (for each country) is always publicly visible in the IODA dashboard.

ZeusPing

To augment IODA’s existing Trinocular-based active-probing scheme, we launched ZeusPing, a novel fine-grained active-probing-based system under development at CAIDA. IODA’s existing Trinocular-based system detects outages at the /24 granularity and may not identify an outage if even a single address in a /24 block responds to probing. Thus, it potentially neglects outages affecting /24 blocks only partially, including larger outages affecting multiple /24 blocks. The ZeusPing system probes much more broadly than Trinocular and is therefore capable of detecting a superset of Trinocular outages, including those that affect many /24 blocks only partially. Thus, ZeusPing complements IODA’s existing data sources and enables fine-grained analysis of outages that allows detailed characterization about the IP addresses affected by outages, their geographic-scope (which regions are affected), and the outages’ durations.

Since mid February 2020, ZeusPing has been sending pings from 4 globally distributed vantage points provided by the Kandoo team to 50% of the IP addresses geolocating to Iran (around 6M addresses) every 10 minutes. We call these 10-minute periods “ rounds” of measurement; every measured address receives a ping from 4 different vantage points in each round.

Figure 1 shows the number of addresses that responded to pings from the distributed vantage points in each 10-minute round for various address aggregates. The first high-level observation from this figure is that these signals contain diurnal patterns: more addresses tend to be ping-responsive during the day and fewer during the night. We can also observe some unusual peaks and drops: the drop on Mar 3 corresponds to an outage that we analyze in detail further below, whereas the peak on Mar 25 likely corresponds to network reconfiguration events.

Figure 1: Visualizing ZeusPing's inferences about Internet connectivity for various IODA address-aggregates in Iran: each curve shows the number of addresses that were responding to pings sent from any of the globally distributed vantage points in a 10-minute measurement round for a group of addresses. For example, the orange curve displays the signal when considering all addresses in Iran: we see that roughly 650K - 750K of the addresses in Iran were responding to pings in most 10-minute rounds. The graph also shows the curves for two of the largest ASes in Iran (AS58224 (Iran Telecom Co) and AS31549 (Shatel)) and also two large Iranian provinces (Tehran and East Azerbaijan).

We find evidence of potential outages by analyzing responses (or the lack thereof) to these pings per round. By finding rounds where there is a significant drop (and subsequent increase) in ping-responsive addresses, we are able to determine with fine granularity when outages begin and end.

CDN dataset

Among IODA’s data sources, the Darknet data source offers the best view into outages affecting end-users. However, it uses the count of IP addresses that are sending traffic from a network to measure “liveness”. This address-based view can distort the liveness signal in networks that use Carrier Grade NAT (CGN) since multiple users may be using the same IP address in such networks. Cellular networks often use CGN technology; consequently, the Darknet data source can sometimes lack visibility into cellular address space.

We collaborated with a large commercial CDN vendor to obtain a complementary “liveness” signal that may be able to capture end-user outages from cellular and non-cellular networks. We term this dataset the “CDN dataset”. This dataset consists of the total number of requests per minute from Iranian ASNs that were sent to the global CDN platform. The time series of the number of requests for each ASN was scaled by a value unique to that ASN, thus only preserving the “trends”, i.e., fluctuations seen during the analysis period. Consequently, signals across ASNs cannot be compared for volume since they are scaled differently. However, for the purpose of determining if an ASN continues to have Internet connectivity, we are interested in the trend of its signal and not its volume. Thus, in the results that we present below, we further normalized these scaled values to fall between 0 and 1, to enable easy trend-comparison with signals from the other data sources.We expect that during an outage, users that lose connectivity will not be able to reach the CDN for fetching content. This should result in a drop in the number of requests seen from the network/ASN to the CDN. Thus, a drop in the number of requests signal from the CDN not only serves as ground truth for validating outages seen in other active or passive outage detection systems but also provides visibility into ASNs where existing tools might lack visibility (such as wireless providers). Additionally, like the darknet data, the CDN data is available at fine time-granularity—-once every minute for each ASN.

Towards a more accurate, nuanced view

By using these complementary data sources, we obtain a more complete view of outages. This complementarity is a result of (a) the different network phenomena that each data source measures and (b) the different time-granularities of measurements.

The data sources we used present lenses into different aspects of Internet connectivity. While the BGP data source measures Internet routing traffic and can therefore yield highly accurate outage inferences when there is a measured drop, outages do not always affect Internet routers (and consequently routing traffic) and can therefore be invisible in the BGP signal. Active probing has the potential to yield fine-grained outage data in networks which respond to active probes but several networks block probing traffic. While the Darknet and CDN data represent liveness traffic collected from user-machines, they can sometimes be erratic, leading to difficulties in accurately interpreting their signals for outage detection.

The different time granularities of these data sources also results in more effective outage detection. The Darknet and CDN data sources have 1 minute time granularities, the BGP data source has 5-minute, and the active probing data sources (both IODA’s Trinocular-based system and ZeusPing) have 10 minute time granularities. Consequently, the active probing data sources may not be able to detect sub-10-minute outages but the Darknet and CDN Requests data may be able to detect such short-duration-outages as well.

Detected outages

Here, we report upon four large Internet outages in Iran during this period. We present a summary of our findings about each outage and then follow-up with detailed visualizations and analyses.

Several overlapping network outages on Mar 3 2020

All three IODA signals for Iran experienced a significant drop. https://ioda.caida.org/ioda/dashboard#view=inspect&entity=country/IR&lastView=overview&from=1583179200&until=1583208000 [publicly accessible]

Summary

Shatel (AS31549)

Pars Online (AS16322)

Iran Telecom Co (AS58224)

ITC (AS12880)

Iran Cell (AS44244)

MCCI (AS197207)

Several overlapping network outages on Mar 11 2020

Summary

Shatel (AS31549)

Iran Telecom Co (AS58224)

Iran Cell (AS44244)

Internet outage on Apr 3 2020 for ITC (AS12880)

Summary

Internet outage on Feb 27 2020 for Shatel (AS31549)

Summary

Conclusion

In this post, we used diverse data sources to study Internet connectivity in Iran between February 17 to April 17 2020. We presented analyses about four significant outages from this period that were visible in at least one of the data sources and highlighted our findings about the outages (how many networks were affected, duration etc.) and about their visibility in different data sources. We found that the lenses offered by these data sources allowed us to detect outages more accurately and also discover additional nuances about these outages, thereby reinforcing the need for multiple data sources to study Internet connectivity.

While this post analyzed some of the largest outages, other outages occurred during this period as well. With the exception of the CDN data source, the data collected from the other sources are publicly available. The data from IODA’s data sources for all these outages can be accessed through the IODA platform. Data collected from the prototype ZeusPing data source (which is currently under development) in this blogpost will eventually be released publicly; for now, it is available upon request.

Acknowledgments

We are deeply grateful to the Open Technology Fund for supporting this research. We would also like to thank David Belson for his helpful feedback.