Post-2.7 Work On The Toolchain.

Mon, 09 Mar 2009

Since the new database format came about with MirrorBrain 2.7, there was a number of interesting improvements in the toolchain.

A new tarball has been spun and can be downloaded from the download section.

In the scanner, deletion of files for subdirectory scans from the mirror database is now implemented. This required a full scan before, because the database was too bloated to efficiently select the affected files. So this became possible with the new database schema. Very cool is that this opens the door for better scanning, which works much more directory-based now and can do cleanups whenever needed. This again allows for a tighter integration of mirror syncing with the database update. A (push) rsync can not only trigger a scan right after syncing a directory, but it could also enter the files directly into the database -- and delete the ones that are obsolete.

A bug in the scanner which prevented the correct usage of inclusion/exclusion of top-level directories in relation to subdirectory scans as been fixed.

The mirror choice can now be influenced with a query parameter, as=1234, appended to the URL. The number specifies the autonomous system number which the server will base its mirror selection on, instead of the AS of the client IP. Another possible parameter is country=XY, where XY is a two-letter country code. As an example, you could look at the following URLs:

The first URL gives a result depending on your location. The other two generate a list for AS 680, or for the United Kingdom, respectively. This shows some of the criteria for mirror selection that MirrorBrain uses. (In reality, it uses more criteria for mirror selection; whatever is available.)

Just as the mirrorlist is more or less for human admins to see what's going on, the as= and country= are not meant for machine clients to technically influence the mirror selection. For that, it would be more appropriate to override the IP address "detection" in the first place. The IP address, as looked up by mod_geoip and mod_asn, could be passed via a X-Forwarded-For header, for instance. This would allow frontend servers to influence the mirror selection appropriately. mod_geoip already supports this. For mod_asn I plan to add this in the future. mod_mirrorbrain just lets mod_asn and mod_geoip do that work and uses what it finds in Apache's subprocess environment.

The "mb list" tool has new options to customize what's being displayed when mirrors are listed, namely:

--country --region --prefix --as --prio

The "mb file ls" tool can now probe files that were looked up in the mirror database. So, contrary to "mb probefile", which probes for a given file on all mirrors, "mb file ls --probe" looks up which mirrors are known to have a certain file, or a certain list of files matching a pattern. The --probe switch causes it to probe the file on each mirror, and the --md5 switch to display the md5 hash of the returned content. This can be used to check functionality of the mirrors. Example:

 % mb file ls '*i586/ConsoleKit-0.2.10-63.8.i586.rpm' --probe --md5
as vn  100 ok       ok                         200 140b82137811ee451929f9977266ab73
eu de   50 ok       ok            200 140b82137811ee451929f9977266ab73
eu hu  100 ok       ok                          200 140b82137811ee451929f9977266ab73

"mb new" fills in some data automatically now (AS number and prefix).

View other news