Issue150

Title Possibility to auto-disable outdated mirrors
Priority feature Status chatting
Superseder Nosy List poeml, rhertzog
Assigned To poeml Keywords

Created on 2014-02-17.15:14:50 by rhertzog, last changed by poeml.

Messages
msg547 (view) Author: poeml Date: 2014-02-20.01:23:05
Very good idea. This would make MirrorBrain useful in more scenarios.
The current mirror checking is so minimal, that it's amazing that we got
so far with it. Historically, checking mirror freshness was neglected
since it's okay for file trees where files never change in-place, but
have their names changing (at least incrementing a counter). Thus, files
that change but keep their identical names were always a problem. At
openSUSE, requests on some of those files were never redirected to
mirrors therefore. It may be complicated or impossible for admins to get
rid of those files, of course.

Fedora solved the same issue by having their redirector replying with a
Metalink with a Metalink protocol extension, that lists several
variants of a file (which might be encountered on a mirror). The
redirector effectively tells the client, if the mirror has this file it's
okay, and if has a different file, it's also okay.

Scanning the mirrors more deeply, including mtime, file size and
calculating hashes isn't really realistic in many cases I think (it
might be in some of course). A compromise could be mtime and file size,
same as rsync does it (unless forced to look into files with -c).
But only rsync scanning would achieve this reliably. HTTP scanning is
more fragile, and FTP scanning isn't perfect either (character set
issues, time format not standardized).

This just as background. The idea to check the sync status of the
mirrors would be a big step forward.

I agree with making the check adaptable, and creating a useful default
check. There's a script to create a small timestamp file, which could be
used to detect the "sync age". Another check could be for a certain
arbitrary file. It would be easy to say "mb, use only mirrors that have
file foo" or "mb, use only mirrors where timestamp is not older than 12
hours" or "mb, use only mirrors where the content of file bar is
identical to our local copy".

A mirmon-like status report could be generated at the same time.


Several times, I wonder whether /etc/mirrorbrain.conf should contain a
setting for the DocRoot of Apache (which is the root of the file tree).
That would be very handy to implement checks, create timestamps and
further things from a 'mb' command with few effort for the user.
(The 'mb makehashes' call would also be less complicated, and less
error-prone.) This setting is needed I think.

Further notes:

- I committed a small function in r8481 that serves to find a random
  file in a local file tree, which could be used for some fully
  automatic test (the admin doesn't even need to specify a file then). A
  function that I recently wrote when I felt that mirror checking needed
  to be advanced finally...
- There's 'mb test', which doesn't do much yet, but could be the
  container for the new functionality. (I also need to check what kind
  of functionality is in mb/mb/testmirror.py, maybe there's something
  useful already.)
- Especially for a mirror that's newly added to the database, the first
  thing that one wants to know is if the mirror is working and if it was
  correctly configured (the mirror itself, but also its URLs in the mb
  database). It should be easy to run a test and see if everything is
  fine. Thinking of automatic plausibility tests...
msg543 (view) Author: rhertzog Date: 2014-02-17.15:14:50
mirrorbrain regularly checks that mirrors are online and working but it doesn't
detect mirrors that are stale and outdated. It would be really useful if we
could teach MirrorBrain how to detect outdated mirrors so that it could disable
them automatically.

The simple answer would be to have a parameter that we can point to a script
that will test the mirror and let mirrorbrain know if it's up-to-date (exit
code=0), outdated (exit code=1) or if there was an error (any other exit code).
The informations about the mirror to check would be provided either via
environment variables or via command line parameters. That way we can implement
any policy... but it requires scripting skills.

Another approach could be to define a path on the mirror that must be in sync
between the mirrors (same size and same SHA1 checksum) and the master copy to
consider the mirror up-to-date. But since synchronizations takes time, we must
be able to define some grace period before deciding to disable the mirror.

Or better, we could implement the first setting and provide a sample script that
implements the second solution while hooking into the mirrobrain.conf to get the
required parameters.
History
Date User Action Args
2014-02-20 01:23:05poemlsetmessages: + msg547
2014-02-17 21:29:18poemlsetstatus: unread -> chatting
assignedto: poeml
nosy: + poeml
2014-02-17 15:14:50rhertzogcreate