Re: [mirrorbrain] How to make Squid work with mirrorbrain

From: Peter Pöml <peter_at_poeml.de>
Date: Tue, 5 Jun 2012 21:16:31 +0200
Hi Per,

Am 04.06.2012 um 14:23 schrieb Per Jessen:

> Anthony Bryan wrote:
> 
>> On Sat, Jun 2, 2012 at 4:12 AM, Jack Bates <grx28t_at_nottheoilrig.com>
>> wrote:
>>> 
>>>> When you say "we're using Metalink as the mirror list", what do you
>>>> mean?  One annoying item in my setup is the parsing of the HTML
>>>> mirror page - you wouldn't happen to know of a way of retrieving the
>>>> mirror list in XML format?
>>> 
>>> 
>>> You can retrieve a Metalink/XML resource that includes information
>>> about where a file is mirrored, in XML format. I think the correct
>>> way to *discover* this resource is through a 'Link: <...>;
>>> rel=describedby; type="application/metalink4+xml"' header. Can anyone
>>> (Anthony?) confirm that this is the correct way?
>> 
>> yes, Jack.
>> 
>> and that is what I meant, Per, that you could examine the metalink to
>> construct a mirror list.
> 
> Hi Anthony
> 
> I've looked at the metalink xml file from e.g. 
> 
> http://download.opensuse.org/distribution/12.1/repo/oss/boot/x86_64/common.meta4
> 
> There's no problem working with that XML, but it looks like the
> mirror-list will vary slightly depending on something, maybe which file
> is being retrieved:
> 
> http://download.opensuse.org/distribution/12.1/repo/oss/boot/x86_64/common.meta4
> 
> Found 129 mirrors: 0 in the same network prefix, 0 in the same
> autonomous system, 1 handling this country, 59 in the same region, 44
> elsewhere
> 
> http://download.opensuse.org/distribution/12.1/repo/oss/boot/x86_64/yast2-trans-zh_CN.rpm.meta4
> 
> Found 127 mirrors: 0 in the same network prefix, 0 in the same
> autonomous system, 1 handling this country, 59 in the same region, 42
> elsewhere

These little differences could be due to several reasons; one of them could be that the respective file is not on all mirrors. (Intentionally, accidentally, or some bug, or maybe an incomplete mirror scan or incomplete mirror sync)



> 
> http://download.opensuse.org/distribution.meta4
> 
> Found 35 mirrors: 0 in the same network prefix, 0 in the same autonomous
> system, 1 handling this country, 22 in the same region, 2 elsewhere

http://download.opensuse.org/distribution/ is a wrong URL, that's not a file, but a directory. If MirrorBrain shows mirrors for that file, it could be for several reasons: there could be a file of that name found on mirrors (but although that occurs, it's unlikely the case 35 times). Anyway, MirrorBrain doesn't (intent to) store directories, only files, so I would assume it could as well be a bug in the mirror scanner that directory names end up in the database. I'm surprised that this is the case so often. (Should be quite harmless, though.)

Bottom line: don't try to check for existence of directories. Only checking for some file is feasible. 

That's by design, because directories could be incomplete, meaning that mirrors mirror only some files within a given directory. (In the openSUSE case, for instance only the CDs, not the DVDs.)

> http://download.opensuse.org/distribution/12.1.meta4
> 
> Found 29 mirrors: 0 in the same network prefix, 0 in the same autonomous
> system, 1 handling this country, 18 in the same region, 1 elsewhere

Same here.

> I'm wondering if there is one single, easily identified file, that will
> give me the complete list?  I quite like the idea of using e.g.
> http://download.opensuse.org/distribution.meta4, but the list wasn't
> exactly complete.  (should have been complete I think?)

To generate the "grouped" mirror lists on http://mirrors.opensuse.org/, the code looks for defined "marker files". Some file that indicates that the mirror probably has also the rest of this part of the tree.

> I could no doubt write something to use the mirror-list from multiple
> files to cobble up a complete list, but that's even worse than my
> primitive parsing of the mirrorlists.py output. 

I just thought about this again, and I think it would be reasonable to produce a specific list for your kind of request. Just as mirrors resp. their URLs can be published on a web page like openSUSE does, the list of potential mirrors could be made public just as well. But I tend to prefer a "safe" default that does publish this data only on the sysadmin's will, and not by default. There could be some cases where requesting the URLs of all mirrors could not be wanted, or what do you guys think?

> I would like to just add another output format to mirrorlists.py (in the
> mirrorbrain package), unfortunately (for my own purposes) it would take
> a while before it would make it to e.g. the opensuse download site. 
> Still, my script could test on the availability and default to parsing
> the xhtml output. Hmm, that could work. 

I see. Yes, that would be one possibility.

Thanks,
Peter


_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-request_at_mirrorbrain.org
Received on Tue Jun 05 2012 - 19:16:48 GMT

This archive was generated by hypermail 2.3.0 : Wed Jun 06 2012 - 07:47:02 GMT