Metalinks

Metalinks are mirror lists that contain HTTP, FTP and P2P resources. Read more about Metalinks here. Or read the intro from the Internet Draft:

Metalink is an XML-based document format that describes a file or list of files to be downloaded from a server. Metalinks can list a number of files, each with an extensible set of attached metadata. Each listed file can have a description, checksum, and a list of URIs that it is available from.

Often, identical copies of a file are accessible in multiple locations on the Internet over a variety of protocols (FTP, HTTP, and Peer-to-Peer). In some cases, users are shown a list of these multiple download locations (mirror servers) and must manually select a single one on the basis of geographical location, priority, or bandwidth. This is done to distribute the load across multiple servers, and to give human users the opportunity to choose a download location that they expect to work best for them. At times, individual servers can be slow, outdated, or unreachable, but this can not be determined until the download has been initiated. This can lead to the user canceling the download and needing to restart it. During downloads, errors in transmission can corrupt the file. There are no easy ways to repair these files. For large downloads this can be especially troublesome. Any of the number of problems that can occur during a download lead to frustration on the part of users, and bandwith wasted with retransmission.

Knowledge about availability of a download on mirror servers can be acquired and maintained by the operators of the origin server, or by third a party. This knowledge, together with checksums, digital signatures, and more can be stored in a machine-readable Metalink file. The Metalink file can transfer this knowledge to the user agent, which can peruse it in automatic ways or present the information to a human user. This guidance provided, user agents can fall back to alternate mirrors if the current one has an issue. Thereby, clients are enabled to work their way to a successful download even under adverse circumstances. All this can be done transparently to the human user and the download is much more reliable and efficient. In contrast, a traditional HTTP redirect to one mirror conveys only comparatively minimal information - a referral to a single server, and there is no provision in the HTTP protocol to handle failures.

Other features that some clients provide include multi-source downloads, where chunks of a file are downloaded from multiple mirrors (and optionally, Peer-to-Peer) simultaneously, which frequently results in a faster download. Metalinks can leverage HTTP, FTP and Peer-to-Peer protocols together, because regardless over which protocol the Metalink was obtained, it can make a resource accessible through other protocols. If the Metalink was obtained from a trusted source, included verification metadata can solve trust issues when downloading files from replica servers operated by third parties. Metalinks also provide structured information about downloads that can be indexed by search engines.

I would like to underline the significance of this technology. It has the potential to improve the usability of important aspects of the Internet at large, particularly for users in countries with poor Internet connectivity.

It can be a nearly impossible for a user in e.g. an African country to install an OpenOffice.org package, which is a download of over 100 MB, by conventional means.

Metalinks are a mature technology that solves this. Adoption of this technology is still lacking though. With the standardization, and publishing of the standard as RFC, this is hopefully going to change. In particular, native support by web browsers is missing. This is a chicken and egg problem: Unless browsers don't have native support for this technology, many content providers won't offer Metalinks. But as long as Metalinks are not offered widely, there is few incentive for browser developers to implement native support in their browser. Thus, a critical mass needs to be reached.

The MirrorBrain project contributes to this technology by

  • implementing the first real-time Metalink generator,
  • active work on the Internet Draft of the Metalink standard,
  • providing reference implementations for Metalink generation,
  • deploying Metalinks at large scale,
  • collecting data and insights about usage (and deployment) patterns,
  • and implementing transparent negotiation of Metalinks, which are breakthrough for metalinks in becoming a seamless user experience.

The intensely busy download infrastructure of openSUSE.org, a major Free Software distributor, has served as a role model and testbed for these efforts. Operating with worldwide coverage, and serving far-away users, openSUSE.org is facing huge challenges, and Metalinks solve them nicely.