News Archive
2.18.0 release
Here's 2.18.0, nearly two years after the last release. Well, with MirrorBrain running solidly here and there, what much should there be to do? Yes, one must prevent "bitrotting", make sure that building on today's platforms works, and there were also quite some accumulated bug reports and even a few patches. So here we go! Foremostly, an annoying bug was fixed that hit new installs (error message about a missing database column). Plus numerous other small bug fixes. And most pleasingly, the HTML output of the .mirrorlist pages has been modernized. (You might want adjust your CSS styling, therefore.)
Update: We issued another point release already, 2.18.1, because the geoip-lite-update script had a little bug now.
Many people have contributed these changes; see the detailed list of changes!
Note: a new version of mod_asn was also issued recently.
Packaged binaries are built and ready for upgrading. You will find them on the download page, as usual.
Please consider a donation.
Thank you for your support!
MirrorBrain used by VLC
I nearly missed this one! Since quite a while now, VLC downloads are handled via MirrorBrain. That's great news! The VLC folks describe here how they used Sourceforge for a certain time, and then rethought their mirror infrastructure to use MirrorBrain.
MirrorBrain lives - new user sightings
I am very happy to update the users page with links to further people using MirrorBrain. The list is now decorated with Antergos, Kiwix, the Qt Project and GNOME!
Wow.
2.17.0 release supporting IPv6, Instance Digests and Web Linking
The 2.17.0 release brings exciting new features – and some small bug fixes.
New is that IPv6 geolocation for IPv6 clients is now enabled. (This requires recent versions of GeoIP (1.4.8) and mod_geoip (1.2.7).)
Furthermore, MirrorBrain responses now include HTTP headers containing cryptohashes, according to RFC 3230/RFC 5843 HTTP Instance Digests. In addition, RFC 6249 (Metalink/HTTP: Mirrors and Hashes) is supported (based on RFC 5988 Web Linking).
Here's a screenshot which says it all :-)
mb vacuum has a new option: -q|--quiet. mb edit now respects the VISUAL environment variable. Internally, a database versioning and migration scheme was implemented.
Bug fixes are: In mb makehashes a problem was fixed with filenames containing certain characters. mirrorprobe now handles incomplete responses better.
As new platforms, Ubuntu 11.10 and Debian 6.0 packages are now built and tested!
Please refer to the 2.17.0 release notes for more details.
Packaged binaries are built and ready for upgrading. You will find them on the download page, as usual.
Please consider a donation.
Thank you for your support!
KDE community uses MirrorBrain
Tom Albers, sysadmin for the KDE Community, announced the use of MirrorBrain for their extensive mirror network. Further details. Welcome on board!
2.16.1 with bug fixes
The 2.16.1 release fixes a few bugs, and includes just two little new features.
Please refer to the 2.16.1 release notes for details.
Packaged binaries are built and ready for upgrading. You will find them on the download page, as usual.
Please consider a donation.
Thank you for your support!
2.16.0 with bug fixes and URL signing
The 2.16.0 release sums up small fixes that piled up slowly, over a good year.
Also, URL signing (introduced in 2.14.0) is now considered stable. See Configuring URL signatures for more information on this interesting way to restrict downloads with temporary URLs.
The following bugs were fixed:
- The server could crash if there was configuration for fallback mirrors in place and acquisition of a database connection failed.
- mb scan: If some directories on a mirror return 404, the scanner crashed. Patched by Thorsten Behrens.
- logging messages in the scanner's large file check have been silenced and their formatting improved.
- A compatibility issue with newer Python versions has been fixed, found by Christian Lohmaier. Also, non-availability of some feature in old (ancient) Python versions is handled better.
Please refer to the 2.16.0 release notes for more details.
Packaged binaries are built and ready for upgrading. You will find them on the download page, as usual.
Please consider a donation to support MirrorBrain and keep development going!
Thanks everyone for their support!
2.15.0 with support for Yum mirror lists
Here comes MirrorBrain 2.15.0! Enjoy
- support for generating Yum-style mirror lists,
- scanning Nginx directory listings,
- bugfixes and improvements in the scanner, and
- extensive directions on tuning PostgreSQL for MirrorBrain.
Please refer to the 2.15.0 release notes for more details!
Packaged binaries are built and ready for upgrading. You will find them on the download page, as usual.
Please consider a donation to support MirrorBrain development!
Thanks everyone for their support!
2.14.0 takes geographical distances into account
MirrorBrain 2.14.0 is out. It comes with several new features and bug fixes. A short summary follows below — please refer to the 2.14.0 release notes for more details!
- MirrorBrain can now use geographical distance as additional criterion in mirror selection.
- Per-file mirror lists visualize the closest mirrors via Google Maps. Example.
- Running behind a load balancer or other reverse proxy was tested and a bug fixed in this regard.
- MirrorBrain can now be used in conjunction with a multitude of access control mechanisms available in Apache. A bug has been fixed which prevented this.
- Experimental support for restricted downloads has been implemented, by redirecting to temporary URLs whose validity can be verified by the mirrors.
Packaged binaries are built and ready for upgrading. You will find them on the download page, as usual.
Please consider a donation to support MirrorBrain development!
Thanks everyone for their support!
2.13.4 with usability improvements
MirrorBrain 2.13.4 improves usability of the mirror scanner, by adding a terse report format (which makes it easy to spot problems), and a totally quiet mode where only errors will be output. Surely something that everybody has been waiting for (and myself not the least).
This release also improves usability in some other corners, and adds important documentation. Noteworthy are the added instructions on setting up automatic GeoIP database updates. See the 2.13.4 release notes for details.
Packaged binaries are built and ready for upgrading. You will find them on the download page, as usual.
Thanks everyone for their support!
MirrorBrain serves The Document Foundation
The Document Foundation was launched on 28th September 2010 and is proud to be the home of LibreOffice, the next evolution of the world's leading free office suite. Its mirror network was built on MirrorBrain from the start, and in a very short time. Thanks to the awesome support of the involved mirrors, mere 24 hours were enough to get 33 mirrors up and running. Viva la LibreOffice!
2.13.3 with bug fixes in Metalink generator and Torrents
MirrorBrain 2.13.3 fixes two important bugs regarding Metalink generation. In addition, it includes several compatibility fixes in the Torrent generator.
Please find the details in the 2.13.3 release notes.
If you use Torrents, please read the upgrade notes as well:
Packaged binaries are built and ready for upgrading. You will find them on the download page, as usual.
Thanks everyone for their support!
2.13.2 with "stylish" new features
MirrorBrain 2.13.2 adds worthwhile new features to the mirror list generator that you will enjoy:
- The content of the mirror lists (details pages) are now wrapped into a XHTML/HTML DIV container to allow for individual styling. In addition, an arbitrary XHTML/HTML header and footer can be specified to be are placed around the page body.
- Due to popular demand, the way hashes are sent can now be influenced. A client can request the pure hash, without filename, via a query parameter in the URL. Likewise, admins can configure this site-wide with a new Apache configuration directive.
- mirmon integration was updated for the current mirmon release.
Please read the full 2.13.2 release notes, which go into more detail.
Most packages are built and ready for upgrading. You will find them on the download page, as usual.
Thanks for your support. Have fun!
2.13.1 bug fix and improvements
This quick release fixes a little regression that was introduced with the last release, and adds two very convenient improvements that were kindly contributed by Phillips Smith.
See the 2.13.1 release notes for details about them.
Packages are built and ready for upgrading. You will find them (linked) on the download page, as usual.
For the first time, there are also packages for Arch Linux. Thanks to Phillip again.
ArchServer switch to MirrorBrain
The ArchServer folks yesterday announced that they are using MirrorBrain. Welcome on board!
They are one of the first groups to generate torrents with MirrorBrain (new feature that came with version 2.13.0).
2.13.0 release with IETF Metalink and Torrent generator
I'm very happy to announce MirrorBrain version 2.13.0. An extraordinary amount of work went into this release, and I hope you will enjoy the new features and fixed bugs.
Highlights:
- IETF Metalink support (now that RFC5854 is done)
- Hash server for MD5, SHA1, SHA256, BitTorrent infohash, PGP. E.g., just append ".sha1" to any URL, and get the hash!
- Torrent generator embedded right in MirrorBrain. Generate torrents that actually work, with intelligent web seeding (only the closest mirrors)!
- Experimental support for zsync and for Magnet links
- Ubuntu 10.04 support
- numerous bug fixes, usability enhancements, and added docs (still a lot left to do in docs department...)
To see the new features in action, you can go to http://download.services.openoffice.org/files/stable/3.2.1/ and click on one of the Details links. Note the available metadata.
See the full 2.13.0 release notes for details (long!).
Packages are all built and ready for upgrading. Get them from your package repo, or from the download page.
OpenOffice.org switches to MirrorBrain
I am glad to announce that OpenOffice.org just completed switching their download system to MirrorBrain.
The project is releasing OpenOffice.org 3.2 today. While MirrorBrain has already delivered 3.1.1 recently, 3.2 is the first release that is fully handled through MirrorBrain.
Downloads of OOo were suffering from lack of mirror selection and stability issues since long. The new setup should greatly help users in obtaining OOo, and facilitate the spread of this important piece of free software.
A lot of work went into this migration, and I want to thank everybody involved! A really good team!
2.12.0 release (bugfixes + geographical coordinates)
MirrorBrain 2.12.0 has been released.
This release contains several important bug fixes, a new feature, and documentation fixes.
Below is a (very) condensed list of the changes. See the 2.12.0 release notes for details.
- store geographical coordinates in the mirror database records
- fix wrong hash filenames constructed by mod_mirrorbrain on Debian (issue 35)
- fix wrong URL type within Metalinks for FTP URLs (issue 23)
- fix wrong URLs printed by 'mb file ls -u' (issue 36)
- removed code for backwards compatibility for obsolete hash cache filename schemes (pre-2.10)
- try harder to catch rsync timeouts (also during connect time) (issue 12)
- fix FTP auth scanning
- fix mb db shell
- documentation enhancements
Thanks for using MirrorBrain!
2.11.3 release with bugfixes
MirrorBrain 2.11.3 has been released with
- bugfixes and small new features in the toolchain
- small documentation fixes.
See the 2.11.3 release notes for details.
2.11.2 FTP scanning and database maintenance improved
MirrorBrain 2.11.2 has been released.
It improves mirror scanning via FTP, by fixing an issue with whitespace in filenames, and another issue that could cause FTP mirrors to be ignored when scanning only a subdirectory.
In addition, there is some new, convenient functionality for database maintentance:
- mb db sizes --- shows sizes of each database table
- mb db shell --- conveniently open a shell for the database
- mb db vacuum --- cleans up unreferenced files
In the mirrorprobe, the default timeout has been lowered from 60s to 20s.
Details are in the 2.11.2 release notes and upgrade notes.
2.11.1 release (regression on Debian/Ubuntu)
MirrorBrain 2.11.1 has been released. It fixes a regression that has been introduced with 2.11.0 and that affected installations on Debian and Ubuntu.
Details are in the 2.11.1 release notes.
2.11.0 release: Fallback mirror support
Just 5 days after the last release, MirrorBrain 2.11.0 has been released. In addition to lots of bug fixes and minor corrections, there is a new feature.
It’s now possible to configure "fallback mirrors", via Apache config using the MirrorBrainFallback directive, for mirrors being used when no reachable mirror is found in the database. Thus, these mirrors get all those requests that MirrorBrain would normally deliver itself (which is the normal last-resort behaviour). This allows to run a MirrorBrain instance with a pseudo file tree (cf. recently added null-rsync script.) In planning is a "degraded mode" that keeps MirrorBrain running in a database outage, for which the new feature is one of the foundations. This new feature is still its infancy, but ready to be tested. It may be subject to refinement, based on future discussion.
Other enhancements and bug fixes:
mod_mirrorbrain:
- Compile fix for old APR (1.2)
- Obsolete MirrorBrainHandleDirectoryIndexLocally removed
- Default of MirrorBrainHandleHEADRequestLocally changed to off
mb:
- Parse errors in the configuration file are not caught and and reported nicely.
- Passwords now can contain special characters.
mb scan:
- A warning that appeared since the last release has been removed. It was caused by the removal of obsolete code, and purely cosmetic.
null-rsync:
- --exclude commandline option has been implemented, to be passed through to rsync.
- --quiet and --verbose options implemented.
Details are in the 2.11.0 release notes.
2.10.3 (null-rsync, usability bugfixes)
MirrorBrain 2.10.3 has been released. This is a minor bugfix and feature update, and nevertheless the changes are not insignificant.
First, there is a new program called null-rsync. It creates a pseudo mirror of a remote file tree, without occupying significant disk space. Use case: running MirrorBrain instances without hosting the file tree locally; and also experimentation and development.
Then, this release fixes usability issues in the mb tool that could occur when creating new mirrors and running into DNS intricacies. The change is that the admin is now given a link to in-depth background information. Which is hopefully helpful.
Finally, some small sorting issues in the generation of mirror lists have been fixed.
Details are in the 2.10.3 release notes.
2.10.2 release with Ubuntu support
MirrorBrain 2.10.2 has been released, and has now been packaged and tested on Ubuntu 9.04. Credits to David Farning and his team, who made this possible!
This release also fixes a bug in the mirror scanner, which could lead, under certain conditions, to accidental removal of files from the database when doing a subdirectory scan.
For details, please read the 2.10.2 release notes.
2.10.1 release with improved hash cache
(In fact, this release followed 2.10.0 by only a few days, and thus has been available since 9th of September. Due to lack of time it wasn't formally announced earlier. Apologies.)
2.10.1 revised the metalink hash cache again, after it was found that some filesystems do not guarantee stable inode numbers. To avoid expensive regeneration of hashes, previously existing hash files are automatically migrated. As a new feature, the metalink-hasher can now easily be run in parallel on large file trees, since it uses per-directory locking to make sure that two jobs won't work on the same files.
For details, please read the 2.10.1 release notes.
2.10.0 release (Metalink hash cache, mirrorprobe)
2.9.2 release (mirmon support, documentation, bugfixes)
MirrorBrain 2.9.2 has been released! Changes:
- Large documentation updates.
- Support for easy mirmon integration. Deploy mirmon without maintaining an extra mirror list for it.
- mod_autoindex_mb now works with Apache's directory index is configured with the HTMLTable option switched on.
- Again, large documentation updates.
See the 2.9.2 release notes for more verbose details.
2.9.1 bugfix release
Closely following 2.9.0, there is 2.9.1 out now. This fixes two (old) bugs that became apparent just now.
One concerns new installations: If the supplementary tool geoiplookup_continent wasn't installed yet, it was impossible to create a new mirror, because the mb new tool relied on its existance. Now, a meaningful error message should point into the right direction.
Regarding the other issue, is not likely that anyone (but me) ran into it. It turns out that database connection strings used in the Apache configuration need to be unique per vhost. This release adds debugging output that may be helpful to track this down.
See the Release Notes for more verbose details.
2.9.0 release with major updates
MirrorBrain version 2.9.0 has been released.
An important change is that a restriction in the mb tool which made it require mod_asn to be installed alongside MirrorBrain has been removed. Thus, MirrorBrain can now be installed without installing mod_asn.
The tools have been much revisited. The metalink-hasher received major work. File probing has been parallized, and enhanced with many features.
Perhaps the most significant advance is new docs subdirectory in the code tree. Any changes there are automatically reflected online at http://mirrorbrain.org/docs/. The current content there still needs to be looked at with one eye slightly squinted, but now everything's up and running to really document things.
See the Release Notes for the complete list of changes!
RFC: Concept for Collecting Download Statistics
I wrote up what I believe could be a good plan for collecting download statistics. I believe it would satisfy the needs of many projects. And not only MirrorBrain users — as projected, it could be used independently, or with other redirectors.
http://mirrorbrain.org/download-statistics
Looking forward to your comments!
MirrorBrain VirtualBox appliance ready for download
To make it easy to try out MirrorBrain and play with it, there's now a VirtualBox Appliance ready to download: it contains an openSUSE 11.1 system with installed MirrorBrain 2.8.1 setup. (regularly updated - see download page )
The image is about 500 MB in size and can be downloaded from http://mirrorbrain.org/eval/openSUSE_11.1/ or rsync'ed from rsync://mirrorbrain.org/mirrorbrain-eval/ . There's a README file which contains further notes useful to set up the image.
Subversion repository moved
The Subversion repository was moved to a new server today. Sorry about any inconvenience caused, but the new server has a number of significant advantages:
- faster :-)
- it is super-easy to give contributors access
- we have complete control over commit messages
- complete control over custom commit hooks, which we'll need for triggered publishing of generated docs
- clean structure
- no certificate warnings anymore
Authentication is done securely via Digest Auth, and since all content being stored in the repository is public anyway, there is no need for SSL.
I made extra-sure that the revision numbers are exactly the same as before. The move happened after r7693.
Old URL: https://forgesvn1.novell.com/svn/opensuse/trunk/tools/download-redirector-v2
New URL: http://svn.mirrorbrain.org/svn/mirrorbrain/trunk/ ViewVC: http://svn.mirrorbrain.org/viewvc/mirrorbrain/
svn switch --relocate doesn't work in this case, unfortunately, because both the server URL and the path inside the repository has changed. The following worked for me on Linux and OSX, but your mileage may vary. It recommendable that you just get a new working copy. If you want to try it, do so on a backup of your working copy. Don't update your working copy from the old location first:
cd your_working_copy_backup for i in $(find . -type f -name entries); do sed -i.bak \ 's,^https://forgesvn1.novell.com/svn/opensuse$,http://svn.mirrorbrain.org/svn/mirrorbrain,' $i; done for i in $(find . -type f -name entries); do sed -i.bak2 \ 's,^https://forgesvn1.novell.com/svn/opensuse/trunk/tools/download-redirector-v2,http://svn.mirrorbrain.org/svn/mirrorbrain/trunk,' $i; done
Happy hacking!
New MirrorBrain web site launched
The MirrorBrain web site was completely rewritten and launched today. On the surface, it looks very similar, but behind everything is new and shiny. I switched from the Zope application server (which I have been very happy with) to the Django web framework (which I'm even happier with).
The new framework will allow much easier integration of documentation subsite, and of a mod_asn web site.
2.8.1 Release: Very minor bugfixes
MirrorBrain 2.8.1 has been released, which adds a few very minor bugfixes on top of 2.8:
- Python 2.6 compatibility fixes in the "mb" tool. Patches kindly submitted by Lars Vogdt.
- when exporting metadata for import into a version control system for archival, handle additions and deletions of mirrors
- the INSTALL file was updated to point to the new RPM packages in the openSUSE build service
Please refer to the NEWS file for details on the changes.
The 2.8.1 tarball is up in the download section.
SourceForge.net uses parts of MirrorBrain
SourceForge.net has worked on their mirror redirector to improve mirror selection, and announced the launch of their new redirector yesterday. Their new mirror selection uses parts of MirrorBrain. This is great, and there could be room for more collaboration in the future!
USENIX Magazine research recognizes openSUSE infrastructure security
The security of the way how openSUSE delivers its content has been recognized by a paper in ;login, the USENIX association's magazine. According to the article, openSUSE is the only community Linux distro that's on par with enterprise Linux distributions in protecting against recently discovered package management vulnerabilities.
This is a combined result of the design of metadata, of client features and the setup chosen by openSUSE - and MirrorBrain. MirrorBrain plays a central role because it provides cryptographic signatures and allows fine-grained configuration to make sure that certain key files are always delivered directly.
The goal of this is that users can download software and deploy updates safely even though they're obtaining them through a decentralized system of community maintained mirrors.
2.8 Release: New Scanning Features, Change Notifications
MirrorBrain 2.8 has been released, which
- improves the scanner
- adds features for database inspection
- adds a database export function that can be used to send out notifications about changes
In addition, this release incorporates other changes described recently.
The 2.8 tarball is up in the "download section":/download.
Details:
The mirror scanner program underwent a cleanup and now offers a better way to include or exclude files on mirrors. Old, hardcoded excludes have been removed from the program, and has been made configurable where one would expect it: in /etc/mirrorbrain.conf. There are two ways to configure excludes:
scan_exclude = REGEXP [...] scan_exclude_rsync = RSYNC_PATTERN [...]
The former directive takes regular expressions and is effective for FTP and HTTP scans, while the latter takes rsync patterns, which are passed directly to the remote rsync daemon. Therefore, rsync patterns are used in that case. (This constitutes a duplication for the admin, and it would be nice if it would be possible to automatically convert rsync patterns into regexps and vice versa, to be able to specify the excludes only once.)
A mirrorbrain.conf directive with similar effect is scan_top_include. It lists directories at the top level of the tree that are scanned; all others are ignored:
scan_top_include = DIR [...]
With the new configurability, and much better excludes, the size of the openSUSE database could be decreased by 20%. For many mirrors, scan time is considerably shorter with good exclusions. (The reason is that some mirrors that have foreign stuff in-tree, or keep old files.)
A bug was fixed where the scanner could abort when encountering filenames in (valid or invalid) UTF-8 encoding.
There is mb dirs, a new subcommand for showing directories that the database contains, useful to tune scan exclude patterns. See output of "mb help dirs".
There is mb export --format=vcs, which implements a new output format named "vcs". It is suitable to commit changes to a subversion repository and get change notifications from it. The command generates a file tree which can be imported/committed into a version control system (VCS). This mechanism can be used to periodically dump the database into a working copy of such a repository and commit the changes, making use of the standard commit mail mechanism of the VCS to send change notifications.
See NEWS for a complete list of changes.
Post-2.7 Work On The Toolchain.
Since the new database format came about with MirrorBrain 2.7, there was a number of interesting improvements in the toolchain.
A new tarball has been spun and can be downloaded from the download section.
In the scanner, deletion of files for subdirectory scans from the mirror database is now implemented. This required a full scan before, because the database was too bloated to efficiently select the affected files. So this became possible with the new database schema. Very cool is that this opens the door for better scanning, which works much more directory-based now and can do cleanups whenever needed. This again allows for a tighter integration of mirror syncing with the database update. A (push) rsync can not only trigger a scan right after syncing a directory, but it could also enter the files directly into the database -- and delete the ones that are obsolete.
A bug in the scanner which prevented the correct usage of inclusion/exclusion of top-level directories in relation to subdirectory scans as been fixed.
The mirror choice can now be influenced with a query parameter, as=1234, appended to the URL. The number specifies the autonomous system number which the server will base its mirror selection on, instead of the AS of the client IP. Another possible parameter is country=XY, where XY is a two-letter country code. As an example, you could look at the following URLs:
- http://openoffice.mirrorbrain.org/stable/3.0.1/OOo_3.0.1_src_core.tar.bz2?mirrorlist
- http://openoffice.mirrorbrain.org/stable/3.0.1/OOo_3.0.1_src_core.tar.bz2?mirrorlist&as=680
- http://openoffice.mirrorbrain.org/stable/3.0.1/OOo_3.0.1_src_core.tar.bz2?mirrorlist&country=gb
The first URL gives a result depending on your location. The other two generate a list for AS 680, or for the United Kingdom, respectively. This shows some of the criteria for mirror selection that MirrorBrain uses. (In reality, it uses more criteria for mirror selection; whatever is available.)
Just as the mirrorlist is more or less for human admins to see what's going on, the as= and country= are not meant for machine clients to technically influence the mirror selection. For that, it would be more appropriate to override the IP address "detection" in the first place. The IP address, as looked up by mod_geoip and mod_asn, could be passed via a X-Forwarded-For header, for instance. This would allow frontend servers to influence the mirror selection appropriately. mod_geoip already supports this. For mod_asn I plan to add this in the future. mod_mirrorbrain just lets mod_asn and mod_geoip do that work and uses what it finds in Apache's subprocess environment.
The "mb list" tool has new options to customize what's being displayed when mirrors are listed, namely:
--country --region --prefix --as --prio
The "mb file ls" tool can now probe files that were looked up in the mirror database. So, contrary to "mb probefile", which probes for a given file on all mirrors, "mb file ls --probe" looks up which mirrors are known to have a certain file, or a certain list of files matching a pattern. The --probe switch causes it to probe the file on each mirror, and the --md5 switch to display the md5 hash of the returned content. This can be used to check functionality of the mirrors. Example:
% mb file ls '*i586/ConsoleKit-0.2.10-63.8.i586.rpm' --probe --md5 as vn 100 ok ok fpt.net 200 140b82137811ee451929f9977266ab73 eu de 50 ok ok widehat.opensuse.org 200 140b82137811ee451929f9977266ab73 eu hu 100 ok ok fsn.hu 200 140b82137811ee451929f9977266ab73
"mb new" fills in some data automatically now (AS number and prefix).
2.7 Release: Smaller and Faster Database
[Quite a long text, which aims to explain the recent under-the-hood changes.]
MirrorBrain 2.7 has been released, with the main change being a huge improvement in the database structure.
I have been using a "classic" relational database schema for years now, and wasn't not being very happy with the relational table alone being 2-3G in size with indexes, for the huge openSUSE file tree. For a small database that doesn't matter at all, but that file tree happens to be large enough (and growing) that 2.000.000 files and 200 mirrors result in sufficiently large number of rows in the relational table that the size is unavoidable. After optimizing out everything which wasn't needed, I still found 48 bytes used per row (two references to primary keys, and one timestamp column that was used to determine whether a file has been seen before or during the last scan).
I had an idea about a completely different organization of these data which doesn't waste 48 bytes per file per mirror where, in theory, one bit in a bit field would suffice. I found something that comes close in PostgreSQL in the form of the array datatype. The "list of mirrors per file" is now an array of two-byte integers which lives in a single column directly next to the path name. That way, only a single index remains.
All in all, the openSUSE database is now 5 times faster and 1/3 the size, which is exactly what I wanted.
The data is also more logically structured, looking up mirrors for a file doesn't require table joins anymore (which already were damn fast...), and the single index is a fast b-tree which is perfect for all needs. In particular, it is now easy to do efficient substring matches on the beginning of path names, which would have required a join over huge tables in the past. (The smaller size helps a lot as well, of course.)
This opens the door for fixing a previous shortcoming in the scanner: it was not possible to efficiently delete files from a subdirectory only, which have disappeared between two scans. That's now straightforward to implement. It also opens the door for a tight integration of mirror syncing with database updating, which would work it's way through a large tree on a directory basis.
The scanner doesn't need a timestamp anymore. It now creates a temporary table with the list of files at the beginning, scans, and in the end it just deletes all files that are still in the temp table.
With this change, MySQL is no longer supported; at least not by the framework in the whole. The core, mod_mirrorbrain, will still work, -- it doesn't care about the database, it just runs a database query and the query can be anything. The rest of the framework has now become quite adjusted to the PostgreSQL database schema now.
Of course, if there's interest, MySQL support in the toolchain could be maintained as well. For now, nobody uses it.
2.6 Release: Network Topological Mirror Selection
MirrorBrain 2.6 has been released, with a major new feature. Through the Apache module mod_asn, it uses BGP routing data to introduce two additional mirror selection criteria: network prefix and autonomous system number (AS). This network-topological knowledge supplements the country-based mirror selection (which relies on the GeoIP database). They work on a pretty much lower level and don't replace the latter. The country lookup is still needed for many requests, because there are many more ASs than mirrors — but for a subpopulation of users the change has a significant impact.
I owe a big "thank you" to Björn Metzdorf who approached me with this idea, nearly a year ago. Also, Christian Deckelmann, Simon Leinen and Marko Jung have provided very fruitful discussion, insight and support.
The change has a number of important implications:
It increases the likelihood to select the fastest mirror for a client. (See below.)
Traffic from clients of, for instance, a large university network can be sent to their local mirror automatically, with full-featured fallback to external mirrors if the internal one doesn't have what's requested yet. Such a local mirror is highly likely to be the fastest one. This has the potential to save large amounts of needless traffic between organizations.
Due to the further narrowing on subnet prefix, this works also for huge "hypertrophic" autonomous systems like the German AS680 which contains the majority of the universities.
This can be interesting for corporations / organizations which desire to run a mirror and have only their clients sent to it. The point is: the new criteria can effectively be used not only for mirror selection, but also to limit mirror selection to a certain client population, based on network topology. The option to set up a "private" mirror can spare the organization external traffic.
And this should be helpful for regions with thin or costly Internet bandwidth, enabling them to establish new mirrors. They can receive normal redirects from MirrorBrain, but have the requests restricted to those from clients in the vicinity of the mirror (same network). Thus, traffic to clients would primarily be local traffic, and the need for outgoing bandwidth would be small compared to what a "traditional" public mirror would have to expect.
This might hopefully lower the bar to find mirrors in many countries. Please spread the word!
The change is up and running on download.opensuse.org and also on the other MirrorBrain instances.
mod_asn: A new Apache module to look up routing data
For the purpose of implementing a finer grained mirror selection (than based on country and region as GeoIP database lookup), mod_asn was created.
mod_asn is an Apache module that uses BGP routing data to look up the autonomous system (AS), and the network prefix (subnet), which contains a given (clients) IP address.
It is written with scalability in mind. To do lookups in high-speed, it uses the PostgreSQL ip4r datatype that is indexable with a Patricia Trie algorithm to store network prefixes. This is the algorithm that can search through the ~250.000 existing prefixes in a breeze.
It comes with script to create such a database (and keep it up to date) with snapshots from global routing data - from a router's "view of the world", so to speak.
Apache-internally, the module sets the looked up data as env table variables, for perusal by other Apache modules. In addition, it can send it as response headers to the client.
MirrorBrain actually uses this already. Announcement to follow. :-)
The source code is available under the terms of the Apache License, Version 2.0.
It is available here (requires an openSUSE buildservice account) or here (in source RPM form). You can browse (or check out) the source code from the svn repository (viewvc link).
2.5 Release: PostgreSQL Support
Version 2.5 was released: it adds support for using the PostgreSQL database as backend, alternatively to MySQL.
MySQL is still fully supported. However, PostgreSQL is recommended now, particularly for large installations with dozens of mirrors and more. PostgreSQL support is an important step to a next-generation mirror selection regime that is in the works. Migration is pretty easy; the mb tool can export data in a format the PostgreSQL can understand.
The 2.5 release also sees major improvements in the mirror scanner. It now produces a much more readable output and error reporting. This makes it easier to see spot problems encountered on mirrors. Database operations done by the scanner are more efficient in this release.
The installation instructions have undergone a rework to be more complete, and reflect the recent changes.
New Mailing Lists
Three mailing lists have been created. One for announcements, a list for users and developers, and one to subscribe to notification for source code changes.
See the Communication page for details!
Simplified Installation
Version 2.4 was released. It is mainly a maintenance release, simplifying the installation and rounding up some things.
Memcache support is now completely optional (and no longer suggested), which makes it significantly easier to build and deploy MirrorBrain. One thing less to worry about!
The Apache module, mod_zrkadlo, has been renamed to mod_mirrorbrain, because the previous name was hard to memorize and spell.
Feature updates include:
- handle the pseudo-country called "A2", which is returned by GeoIP lookup for satellite links.
- automatic reenabling of dead mirrors, when they come back.
- the mirrorlist generator is complete now.
- geoip-lite-update tool updated for the URL it uses. It also downloads the city edition of the free GeoIP database now.
- rely on mod_geoip now, which means that the GeoIP city database can now be used.
- new geoiplookup_city tool added, which shows details about IP addresses from the city database.
- the clientip=x.x.x.x query parameter is no longer supported; instead it's possible to use country=xy. This change became necessary due to usage of mod_geoip.
Post-2.2 Work On The Toolchain.
First, mirrorlist generation is a current target. Programmatical generation of mirrorlists like http://mirrors.opensuse.org/list/all.html is now easily possible, and it is possible to filter for things and show only mirrors that mirror a certain part of the tree: http://mirrors.opensuse.org/list/bs.html
Second, the toolchain got the following new features and improvements:
- the mirrorprobe now does GET requests instead of HEAD requests. This is safer. A mirror with crashed filesystem might still be able to answer a HEAD correctly.
- mb, the mirrorbrain tool, has a powerful "probefile" command now that can check for existance of a file on all mirrors, probing every known URLs - HTTP, FTP and rsync ones. This is especially useful for checking whether the permission setup for staged content is correct on all mirrors.
Third, the database got new fields named public_notes, operator_name, operator_url, to store additional data about mirrors. Plus, it got two new tables:
- one with ISO 3166 country codes and country names
- one for the regions (continents).
A tarball has been spun and can be downloaded from the download section.
2.2 Release. Principal Change: Database space savings.
MirrorBrain 2.2 was released!
The principal change of this release are massive space savings in the mirror database. An unused database column was eliminated - which was intended to serve as a special index, when the database was designed. It didn't bring any benefits, but increased the database by as much as 30-40%.
2.1 Release. Main Feature: Slow Mirror Protection
MirrorBrain 2.1 was released!
There is a new feature: It's now possible to configure specific mirrors to get only requests for files smaller than a certain size.
Mirrors with limited bandwidth can easily become very slow, and result in a bad user experience, when large files are downloaded. So there is often a need to disable redirection to such mirrors at all. However, those mirrors could still be useful to handle smallish requests. So the idea is that you just don't send them requests for the very large files. At the same time, this takes load off them and should increase their performance for those smaller requests.
With the latest change in MirrorBrain, you can configure a maximum filesize for specific mirrors in the database.
Another change in this release is a significant simplification of the Apache configuration.
2.0 Released: Fallback Mirror Selection Improved
MirrorBrain 2.0 was released!
The fallback mirror selection has been considerably improved. Fallback mirrors are now defined in the SQL database instead of the Apache configuration. This approach is a lot more flexible, and allows to assign arbitrary mirrors to handle arbitrary countries. But most importantly, fallback mirrors are no longer considered unconditionally, but are chosen only when no local mirror could be found. (Note that the obsolete ZrkadloTreatCountryAs directive has been removed from the Apache config.)
Torrent link embedding
Since today, generated metalinks can automatically include links to Bittorrent resources. When a file name ending in .torrent is found, a hyperlink to it is included into the the metalink.
The webserver does all this fully automatically. However, the additional check doesn't need to be done by Apache for every request. With a new configuration directive, a file mask can be given which specifies the files for which this happens, e.g. *.iso. Thus, there is no tradeoff in scalability.
An example of the new metalinks is: http://download.opensuse.org/distribution/11.1/iso/openSUSE-11.1-DVD-i586.iso.metalink
So how can you use this new feature? The command line metalink client aria2 can automatically use P2P resources and HTTP resources from metalinks at the same time. There are other clients, check http://en.wikipedia.org/wiki/Metalink.
PGP signature embedding
Metalinks can now automatically include PGP signatures. When a file name ending in ".asc" is found, its content is embedded into the the metalink.
The command line metalink client aria2 automatically downloads the the PGP signature file, so it can be verified locally. Note that aria2 doesn't verify the signature itself.
This new feature is implemented carefully to have no impact on scalability and performance. Apache doesn't need to scan for further files or open them and read their content. The signature files content is saved together with the piece-wise hashes - which are created offline with the metalink-hasher script.
MirrorBrain is the first metalink generator that automates this. Hopefully, this makes way to more usage of this very interesting feature of metalinks.
This feature is already used by openSUSE, who (since 11.1 Beta3) sign their ISO images individually. Here is an example: http://download.opensuse.org/distribution/11.1/iso/openSUSE-11.1-DVD-i586.iso.metalink
Thus, openSUSE is the second project, after curl, to include PGP signatures into metalinks.
Linux Magazin Article
Linux-magazin.de published a very nice (German) article on MirrorBrain. titled Mirrorbrain, der Redirector und Metalink-Generator des Opensuse-Projekts - Optimale Lastverteilung.
Multi-instance support
Big news: Now multiple instances of "Mirror Brains" can be run on a single machine. Apache can run instances separated from each other, each in a virtual host.
The main implication (which makes this very important for me) is that now I can work to set up an instance for openoffice.org and samba.org, two possible candidate users that I would like to convince.
openSUSE talk about the MirrorBrain
A few days ago, I gave a presentation about the current state at the openSUSE offices. See the presentations page. It is updated for newest state of affairs, and gives details about the deployment at openSUSE and things that can be learnt from it. It is available as ogg video and PDF (slides). The video includes a live demo of two popular metalink clients.
Release of the mirrordoctor
Released the mirrordoctor, a commandline tool to maintain mirror entries in the database. Finally! You can watch a screencast to get an impression.
MirrorBrain 1.8 released
Released mod_zrkadlo 1.8:
- use mod_memcache for the configuration and initialization of the memcache client
- metalink-hasher script added, to prepare hashes for injection into metalink files
- rsyncusers analysis tool added. It takes an rsync log (which may be compressed) and prints a list of hosts that connected, sorted per rsync module.
- scanner bugfix regarding following redirects for large file checks
A testbed for download failover testing
- failover testbed for text mirrorlists implemented, for the libzypp failover proposal)
- a new tool named rsyncinfo, which can survey the sizes of rsync modules on remote rsync servers.
Metalink improvements
- metalinks: switch back to RFC 822 time format
- new ZrkadloMetalinkPublisher directive
- fix issue with <size> element in metalinks
More work on Metalink support.
- now there is better (more natural) way to request a metalink: by appending .metalink to the filename.
- change metalink negotiation to look for application/metalink+xml in the Accept header (keeping Accept-Features for now, but it is going to be removed probably)