Re: [mirrorbrain] How to make Squid work with mirrorbrain

From: Jack Bates <i2xxry_at_nottheoilrig.com>
Date: Thu, 14 Jun 2012 01:03:22 -0700
On 04/06/12 04:41 AM, Per Jessen wrote:
> Hi Jack
>
> not sure how valid these comments might be, I have zero knowledge about
> ATS.
>
> Your solution appears to address what I think of as the "first half of
> the issue" - that a file may have several URLs (one per mirror).  Pick
> the wrong one, and a previously cached copy will not be found.

Yes exactly, I have only tackled the first half of the issue you describe

> If looking up the currently cached content is fast/efficient, rewriting
> the header accordingly sounds okay, but I can't help thinking that it
> would be easier to do what I do with Squid - rewrite the URLs when they
> are stored?
> If<primary>  is the primary location, e.g.
> http://download.services.openoffice.org, and<mirror1-9>  are mirrors,
> then files retrieved from<mirror1-9>  are stored as if they were
> fetched from<primary>.  On subsequent retrievals, you would have a
> direct cache hit with no need to look at the header.

Hmm, is there any way to automatically discover the list of mirrors? I 
know you automatically retrieve the list of mirrors from 
http://mirrors.opensuse.org/list/all.html, and you are looking for 
something less messy than scraping this HTML. But I think the proxy 
administrator must manually configure where to find the list of mirrors, 
for each different content distribution network (openSUSE, OpenOffice, etc.)

A strong motivation for using Metalink is that no manual intervention is 
required by the proxy administrator. Any content distribution network 
that supports Metalink should be automatically discovered

>> We are also thinking of examining "Digest: ..." headers. If a response
>> has a "Location: ..." header that's not already cached and a "Digest:
>> ..." header, then the plugin would check the cache for a matching
>> digest. If found then it would rewrite the "Location: ..." header with
>> the cached URL
>
> I'm not really very familiar with metalink, what is your thinking behind
> wanting to use the digest to identify a cached object?

My thinking is that looking up cached content by digest might result in 
some additional cache hits where scanning the list of "Link: <...>; 
rel=duplicate" headers did not, e.g. if the content was downloaded from 
a server outside of the CDN, and therefore the URL is not among the 
"Link: <...>; rel=duplicate" headers

It might also be more efficient because the digest should be looked up 
only once, vs. scanning a possibly long list of "Link: <...>; 
rel=duplicate" URLs


_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-request_at_mirrorbrain.org
Received on Thu Jun 14 2012 - 17:39:32 GMT

This archive was generated by hypermail 2.3.0 : Mon Jun 18 2012 - 21:47:02 GMT