Re: [mirrorbrain] Mirrorbrain handing of modified files.

From: Dr. Peter Pöml <peter_at_poeml.de>
Date: Mon, 6 Aug 2012 23:07:35 +0200
Hi Peter,

Am 05.08.2012 um 15:42 schrieb peter green <plugwash_at_p10link.net>:
> I am setting up mirrorbrain to manage mirrors for the apt repository for raspbian (a rebuild of debian for armv6 hardfloat).
> 
> In my setup the server running mirrorbrain is also the public master server (it in-turn is updated from the private master server but that is not relavent here). This means that any changes will (or at least should) happen on the master server first.
> 
> Package updates come in the form of additions and removals of files which mirrorbrain seems to have no problem with. However the package lists and related files do get modified in-place.
> 
> What I would like to happen is when a file is changed locally I would like mirrorbrain to stop redirecting it until the mirrors have been rescanned and confirmed as having the new version of the file. In this way I can ensure that users always get the latest version of the package lists regardless of whether the mirrors are up to date.
> 
> Currently mirrorbrain seems to have some knowlege of file dates (it shows the updated date on the mirror list page for the file) but it seems to carry on redirecting clients regardless of the changed file date of the local copy.
> 
> Currently my workarround is to exclude the dists dir from mirrorbrain, however I'd rather not do this if I can help it.

Looking around in the tree, I see several files named "Packages" or "Release", thus with no versioning indication in their name, and on the other hand, package files with version info (including build or release numbering). The latter files are easy to handle for MirrorBrain, because a file is assumed to never change - if a package is rebuilt, the release counter is incremented. The "version-less" files however are more difficult to handle. MirrorBrain doesn't store modification times or file size for what it finds on mirrors. (It would have been possible to implement it that way, but it was decided against for performance reasons [which might not affect some users in fact...].) 

There are some ways to deal with it: 

If the files are small, simply don't redirect for them; just deliver them. You can use the MirrorBrainExcludeFileMask directive with a regexp, e.g. "\.(xml|asc)". This can save the client an additional roundtrip to the mirror. And for certain files, like crypto-hashes and signatures, it makes sense to deliver them directly as well, for security reasons.

For larger files, a workaround is required, so the load can be offloaded to mirrors by redirection. Say you have a file named "Packages" which is updated frequently. You could rename the file to make its name unique. It would be named e.g. "Packages-${sha1sum}", or "Packages-${timestamp}". A symlink named "Packages" would point to the current file. Whenever the file is updated, you update the symlink. This works because MirrorBrain resolves links before considering redirection. Therefore, MirrorBrain would send the client only to mirrors that have the current file. It wouldn't matter anymore if there are files names "Packages" on the mirrors; only the current uniquely named file would be downloaded. 

http://mirrors.xbmc.org/nightlies/osx/ppc/ is an example of this technique.

When mirrors sync and get the new uniquely named files, and you scan them, redirection to them will commence. To scan subdirectories, the "mb scan -d <dir>" command can be very useful (saving a full scan).

If it is important to spread files quickly, to save load, mirrors either need to sync very frequently (and need to be scanned frequently), or the content need to be pushed (which gives better control, and one knows exactly when to scan the mirrors). This little script
http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/tools/push2mirrors?revision=8255&view=markup
is used by some people to implement content pushing to mirrors in parallel. It requires write access on mirrors. Otherwise, triggered pull syncing is another possible way to implement this (as Debian does). 

Does this help?

Thanks,
Peter


_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-request_at_mirrorbrain.org
Received on Mon Aug 06 2012 - 21:07:48 GMT

This archive was generated by hypermail 2.3.0 : Tue Aug 07 2012 - 22:47:06 GMT