Re: [mirrorbrain] mirror list in plain text?

From: Per Jessen <per_at_computer.org> Date: Wed, 16 May 2012 18:55:17 +0200 · This archive was generated by hypermail 2.3.0 : Fri May 18 2012 - 21:47:02 GMT

Peter Pöml wrote:

> Hi Per,
> 
> Am 16.05.2012 um 13:47 schrieb Per Jessen:
>> is there a (standardized) way of retrieving the mirror list in plain
>> text?  I need this for setting up a URL rewriter for squid which will
>> help me cache segmented downloads.  For instance, I can retrieve the
>> HTML from http://mirrors.opensuse.org/list/all.html and parse that
>> HTML quite easily, but I would prefer just getting a plain text file
>> straight from mirrorbrain.
> 
> Do you mean all mirrors? If you have a certain file in mind, then
> appending .meta4 to the file's URL will give you parseable XML. Not
> plain text, though.

Hi Peter

Yep, I mean all mirrors. XML would be fine too, but here is an example - 

from http://mirrors.opensuse.org/list/all.html I currently
generate/extract a list like this:

http://repo.ugm.ac.id/opensuse/
http://dl2.foss-id.web.id/opensuse/
http://mirror.isoc.org.il/pub/opensuse/
http://ftp.jaist.ac.jp/pub/Linux/openSUSE/
http://ftp.kddilabs.jp/Linux/packages/opensuse/
http://ftp.novell.co.jp/pub/opensuse/
http://ftp.riken.jp/Linux/opensuse/
http://ftp.yz.yamagata-u.ac.jp/pub/linux/opensuse/
http://ftp.daum.net/opensuse/
http://ftp.kaist.ac.kr/pub/opensuse/
http://archive.mmu.edu.my/opensuse

Getting the meta4 file could perhaps have worked, but I need to know the
mirrors before the URL is retrieved - otherwise I can't tell squid to
rewrite the URL when it is stored.

> The list of *all* mirrors can't be requested directly. It would be
> easy to implement that, but there are some things to keep in mind:
> 
> Not all mirrors have all content, especially with openSUSE there is
> much variation between what the individual mirrors carry.

That is okay - I will use the list to rewrite all <mirror-urls>
to "download.opensuse.org".  If nothing is requested from a mirror,
there is no URL to rewrite/remap. 

> Some mirrors might want to remain private - which is the case for some
> mirrors located in countries with poor internationaly connectivity,
> where requests from outside the country need to be avoided. There is
> already a hack in the "mb mirrorlist" command (which generates also
> http://mirrors.opensuse.org/list/all.html) to exclude such mirrors
> from the listing. That might not be relevant in your case - I don't
> know if the URL rewriter could be deployed in a country with such a
> mirror.

I don't think it is important - what I do is "react" to requests that
have already been made, so if the current setup works wrt the above, I
don't see my rewriting (plus some other trickery) affecting anything. 

> The data you want to retrieve is the base URL of the mirrors, or
> anything else?

No, that's it, just the base. 

> With the latest MirrorBrain (newer than what is deployed on
> openSUSE.org), mirrors are also listed in HTTP headers on requesting a
> file (Link headers, RFC 6249). Maybe that would be convenient too. A
> head request would be sufficient to get a list of mirrors. (That list
> is limited to 5 entries a the moment.)

Interesting possibility, although I can't quite tell if it would be
useful.

> BTW, I noticed a GSOC project that might share a similar goal with
> yours, but with another proxy:
>http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/nottheoilrig/1

Also interesting, thanks.  I've got a working setup already, I'm just
dealing with the rough edges now :-)

-- 
Per Jessen, Zürich (7.6°C)

_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-request_at_mirrorbrain.org