Implementation
Apache Module
The core of the redirector is mod_mirrorbrain, a module for the Apache HTTP server, written in C, and designed for high performance and scalability, with security in mind.
The previous name of mod_mirrorbrain was mod_zrkadlo, pronounced mod zurrcat low. Zrkadlo is Slovakian for mirror, a word that I learnt when I was travelling in Slovakia in 2007. I later renamed the module because most people are not able to spell or even memorize the name correctly, and I want to make things easy. Sorry, Fridrich :-)
The module does a single, lightweight database query per client request, using database connection pools provided by the Apache DBD framework. This is the most efficient way conceivable.
To cope with filetrees that are huge and changed frequently, the redirector doesn't simply choose one mirror for a client once, but acts as granular as on file-level, because mirrors are known to be incomplete, especially if content changes often. To achieve this, the redirector is supported by an SQL database which knows the exact contents of each mirror. The database is periodically updated by scanning all mirrors with a scanner program. In addition, there is a probing program which intermittently checks each mirror for responsiveness, and which can disable or pause redirection to a certain mirror, should it fail. When a mirror becomes offline it is watched and automatically enabled later when it becomes available again.
This page is a little outdated, but shows pseudocode which gives an outline how the redirection module works.
Mirror Database
The mirror database is a PostgreSQL database. Its main purpose is to store data about
- mirrors (their location, base URL, ...)
- files that were seen while scanning the mirrors
The database comes in a very packed format, designed to store millions of files, and make the access scale well at the same time. See this news post for more thoughts on this.
While the framework is based on PostgreSQL, it is modular and the the core, mod_mirrorbrain could be used with any database that Apache's DBD Api can connect to.
Mirror Scanner
The mirror scanner is a program which crawls the mirrors via rsync, FTP, or HTTP protocol. It updates the database with the file list found on the mirror machines, and checks whether the mirrors support the correct delivery of large files.
Mirror Probe
The mirror probe is run at short intervals and checks for each mirror if it is alive. If a mirror doesn't reply, redirection to it is disabled until it comes back.
Admininstration Framework
There is a handy commandline tool for maintaining mirrors, their status and doing things with them. It allows
- creation of a new mirror in the database
- adding and editing comments about them (so one can keep notes)
- triggering scans
- functional tests of mirrors
- listing mirrors per country, per region, list disabled mirrors, ...
- list, add, delete files in the database
- create mirror lists for web pages
- export data for backup, reporting or migration
- etc.
The commandline tool is written in Python in a modularized way and comes with a Python library that can be used from other scripts. The Python module is planned to be the basis for a future web frontend.
Of course, any script or web application can also connect directly to the database.