Re: [mirrorbrain] Mirrorbrain handing of modified files.

From: Dr. Peter Pöml <peter_at_poeml.de> Date: Wed, 8 Aug 2012 01:11:21 +0200 · This archive was generated by hypermail 2.3.0 : Wed Aug 08 2012 - 01:17:06 GMT

Am 08.08.2012 um 00:36 schrieb peter green <plugwash_at_p10link.net>:

> Dr. Peter Pöml wrote:
>> MirrorBrain stores the full path and uses that for redirection, anchored to the top-level directory of your file tree.
> From where does mirrorbrain determine "the top-level directory of my file tree"? the <directory> block in which the mirrorbrain engine
> is enabled? the documentroot of the vhost? somewhere else?

From the <Directory> block in Apache config which has the "MirrorBrainEngine On". So this is a bit semi-automatic, yes.

>> You don't have to worry about confusion with /anothersubdir/file1, because only the full (relative) path is used, not some part or trailing filename.
>>  
> Presumablly this also means that partial mirrors can't be used.

Oh yes, they can be used - however, only when they serve their partial content under the URL space that matches yours, at least below the top-level URL. 

Example: you have
  http://downloads.raspberrypi.org/raspbian/raspbian/dists/wheezy/rpi/source/

and let's assume that a mirror wants to serve only this source directory.

It must *not* serve the directory as 
  http://mirror.example.com/coolstuff/source/

but as 
  http://mirror.example.com/raspbian/raspbian/dists/wheezy/rpi/source

as if it was a full mirror, even though the only directory with files would be the subdirectory "source".

Of course,
  http://mirror.example.com/pub/mirrors/newstuff/raspbian/raspbian/dists/wheezy/rpi/source
would also be okay.

You see?

By the way, 'mb scan' doesn't care what it sees. It would happily enter files like '/source/file1' into your database. However, mod_mirrorbrain would never lookup mirrors for such a file, because that path doesn't exist in your file tree. It's not a valid URL so to speak, and Apache will not find it and return a 404 instead of even trying to lookup mirrors for that in the database.

When partial mirrors stick to this rule (matching URL hierarchy, they are totally free in what they mirror. Even if they mirror only an arbitrary single file, all is fine.

>> Thus, you can have as many "Packages" files as you like ;)
>>  
> Which is good news though I don't think I can really mirrorbrain them because of the lack of data checking.

Of course, it would be wonderful if data checking on the mirrors would simply happen. Intuitively, one just expects it. [*]  I seriously question the feasibility though, at least in a loose network of voluntarily operated mirrors. I have tried hard for some years, but I ended in turning to the client side to give it more attention, because it is the much less advanced side in this client-server game. MirrorBrain tries to provide for all that the client needs for robust downloading, but many clients lack the capabilities yet. 

There is some interesting progress. curl includes metalink support for the first time, these days. And there is GSoC student working on metalink support in wget!

Peter

[*]  It's not that easy! HTTP is not made for retrieving file lists, not at all. It only gets you so far. And there are at least 5 different web servers with different forms of directory listings, plus decoration. FTP is made for retrieving file lists, but one might notice that time values are not standardized, and the concept of timezones doesn't exist. Thus, modification times are not reliable. And, there are at least 37 different FTP server implementations. Only rsync would allow reliable scanning of modification times. But rsync is often not an option (many mirrors don't offer it). 

Oh, did I forget to mention character sets and encodings? ;-)
_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-request_at_mirrorbrain.org