Internet AS-level Topology Archive


The Growth of Internet AS-level Topology

Introduction

This site serves as an archive of the historical Internet AS-level topology data for academic research, providing the following features:

In addition to the pure topological data, this site also provides the essential semantic information of topology: AS relationship and prefix origin, powered by Cyclops.

Methodology

The historical AS-level topology data is derived from BGP data collected by Route Views, RIPE RIS, PCH, and Internet2. Here is the list of BGP data collectors. The BGP dataset comprises one RIB file per collector per day and all available updates. A daily (monthly) snapshot consists of AS-to-AS links appearing within that day (month), which is determined by the timestamp on the file name of raw BGP data. The topologies in IPv4 network and IPv6 network are extracted separately. A link is contained in IPv4 (IPv6) topology if the corresponding AS path is originated from a prefix in an IPv4 (IPv6) address format. In extracting links from raw data, we discard AS-SETs, private ASNs and loop paths. For details, see the comments of our Perl script, which reads "show ip bgp" output directly, reads MRT format data with a modified version of bgpdump, and outputs AS links.

AS relationship data and IPv4 prefix origin data are directly dumped from the database of Cyclops monthly. The method of AS relationship inference is described in our paper. Note that the date on the file name of those two types of data ONLY indicates the date when the dump operation is executed, and is NOT associated with the time when links or prefixes are observed. The links in AS relationship data should be ONLY used as the index of links in topology data.

Data Format

Topology data is represented by undirected graph consisting of AS-to-AS links in a plain text format, where each line is a link ASN1 ASN2 with a convention that ASN1 < ASN2 numerically, \t as field separator, and \n as line separator. ASN is in the asplain format. In a monthly snapshot, the third field FREQ in each line (for each link) is the frequency (number of days) of that link observed within that month.

AS relationship data is represented by bidirected graph, where each AS pair appear twice with their bilateral relationship. In IPv4 prefix origin data, each line denotes a prefix, in CIDR notation, with its original AS.

The data files are compressed by gzip, and the format of path to them is as follows:

Caveats

Before drawing any conclusion from this dataset, please be aware of that this dataset is definitely suffering from the following issues:

Download

Please be considered and do not use more than one parallel connection to download files, otherwise you will be automatically blocked!

Old Project Site

Contact

Yu Zhang : yuzhang at hit dot edu dot cn