On 04/06/12 04:41 AM, Per Jessen wrote: > Hi Jack > > not sure how valid these comments might be, I have zero knowledge about > ATS. > > Your solution appears to address what I think of as the "first half of > the issue" - that a file may have several URLs (one per mirror). Pick > the wrong one, and a previously cached copy will not be found. Yes exactly, I have only tackled the first half of the issue you describe > If looking up the currently cached content is fast/efficient, rewriting > the header accordingly sounds okay, but I can't help thinking that it > would be easier to do what I do with Squid - rewrite the URLs when they > are stored? > If<primary> is the primary location, e.g. > http://download.services.openoffice.org, and<mirror1-9> are mirrors, > then files retrieved from<mirror1-9> are stored as if they were > fetched from<primary>. On subsequent retrievals, you would have a > direct cache hit with no need to look at the header. Hmm, is there any way to automatically discover the list of mirrors? I know you automatically retrieve the list of mirrors from http://mirrors.opensuse.org/list/all.html, and you are looking for something less messy than scraping this HTML. But I think the proxy administrator must manually configure where to find the list of mirrors, for each different content distribution network (openSUSE, OpenOffice, etc.) A strong motivation for using Metalink is that no manual intervention is required by the proxy administrator. Any content distribution network that supports Metalink should be automatically discovered >> We are also thinking of examining "Digest: ..." headers. If a response >> has a "Location: ..." header that's not already cached and a "Digest: >> ..." header, then the plugin would check the cache for a matching >> digest. If found then it would rewrite the "Location: ..." header with >> the cached URL > > I'm not really very familiar with metalink, what is your thinking behind > wanting to use the digest to identify a cached object? My thinking is that looking up cached content by digest might result in some additional cache hits where scanning the list of "Link: <...>; rel=duplicate" headers did not, e.g. if the content was downloaded from a server outside of the CDN, and therefore the URL is not among the "Link: <...>; rel=duplicate" headers It might also be more efficient because the digest should be looked up only once, vs. scanning a possibly long list of "Link: <...>; rel=duplicate" URLs _______________________________________________ mirrorbrain mailing list Archive: http://mirrorbrain.org/archive/mirrorbrain/ Note: To remove yourself from this mailing list, send a mail with the content unsubscribe to the address mirrorbrain-request_at_mirrorbrain.orgReceived on Thu Jun 14 2012 - 17:39:32 GMT
This archive was generated by hypermail 2.3.0 : Mon Jun 18 2012 - 21:47:02 GMT