Re: [mirrorbrain] How to make Squid work with mirrorbrain

From: Per Jessen <per_at_computer.org>
Date: Mon, 04 Jun 2012 13:41:46 +0200
Jack Bates wrote:

> Hello Per, this writeup is really well done, thank you for it!
> 
[snip]
> If a response has a "Location: ..." header and a "Link: <...>;
> rel=duplicate" header then the Traffic Server plugin will check if the
> URLs in these headers are already cached. If the "Location: ..." URL
> is not already cached but a "Link: <...>; rel=duplicate" URL is
> cached, then the plugin will rewrite the "Location: ..." header with
> the cached URL
> 
> This should redirect clients that are not Metalink aware to a mirror
> that is already cached. I would love any feedback on this approach

Hi Jack

not sure how valid these comments might be, I have zero knowledge about
ATS.

Your solution appears to address what I think of as the "first half of
the issue" - that a file may have several URLs (one per mirror).  Pick
the wrong one, and a previously cached copy will not be found.

If looking up the currently cached content is fast/efficient, rewriting
the header accordingly sounds okay, but I can't help thinking that it
would be easier to do what I do with Squid - rewrite the URLs when they
are stored?  
If <primary> is the primary location, e.g.
http://download.services.openoffice.org, and <mirror1-9> are mirrors,
then files retrieved from <mirror1-9> are stored as if they were
fetched from <primary>.  On subsequent retrievals, you would have a
direct cache hit with no need to look at the header. 

> We are also thinking of examining "Digest: ..." headers. If a response
> has a "Location: ..." header that's not already cached and a "Digest:
> ..." header, then the plugin would check the cache for a matching
> digest. If found then it would rewrite the "Location: ..." header with
> the cached URL

I'm not really very familiar with metalink, what is your thinking behind
wanting to use the digest to identify a cached object? 

> This plugin is motivated by a similar problem to the one in your
> writeup. We run a caching proxy here at a rural village in Rwanda to
> improve our slow internet access. But many web sites don't predictably
> redirect users to the same download mirror, which defeats our cache

Yep. 

[snip]
> It would be neat if, after the cache is aware of requests for the same
> content from different mirrors, and after it is able to cache
> segmented downloads, it could be made aware of requests for the same
> segment from different mirrors. Then after one client assembled a
> complete download from segments from possibly many different mirrors,
> the cache would also contain this complete content, and could respond
> to requests from subsequent clients for any segment from any mirror

Exactly.  I've avoided teaching Squid how to assemble these segments by
instead having one client (my fetcher daemon) do a complete fetch from
one location. 



-- 
Per Jessen, Zürich (15.8°C)


_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-request_at_mirrorbrain.org
Received on Mon Jun 04 2012 - 11:42:12 GMT

This archive was generated by hypermail 2.3.0 : Thu Jun 14 2012 - 17:47:03 GMT