I thought this list might be another good place to get feedback ---------- Forwarded message ---------- From: Jack Bates <jack.bates_at_gmail.com> Date: Fri, May 18, 2012 at 7:41 AM Subject: Re: Apache Traffic Server caching proxy To: Metalink Discussion <metalink-discussion_at_googlegroups.com> Hi, I started work on a plugin for Apache Traffic Server and I would love any feedback (and maybe implementation advice) from the Metalink community Traffic Server is a caching proxy and the goal of this plugin is to help it work better with files distributed from multiple mirrors or content distribution networks. Currently downloading a file that is already cached from a different mirror is a cache miss. A lot of download sites present users with a simple download button that doesn't always redirect them to the same mirror, which defeats the benefit of a caching proxy and frustrates users I would love to hear any of your thoughts on how caching proxies could work better with content distribution networks For this first attempt at this plugin, the approach taken is to use RFC 6249, Metalink/HTTP: Mirrors and Hashes. The plugin listens for responses that are an HTTP redirect and have "Link: <...>; rel=duplicate" headers, then scans the URLs for one that already exists in the cache. If found then it transforms the response, replacing the "Location: ..." header with the URL that already exists in the cache The code is up on GitHub [1] and works just enough that, given a response with a "Location: ..." header that's not cached and a "Link: <...>; rel=duplicate" header that is cached, it will rewrite the "Location: ..." header with the cached URL I would love any feedback on this approach We are also thinking of using RFC 3230, Instance Digests in HTTP. Given a response with a "Location: ..." header that's not cached and a "Digest: ..." header, the plugin would check if another URL with the same digest already exists in the cache and rewrite the "Location: ..." header with that URL if so Still more ideas include: * Remember URLs for the same file so future requests for any of these URLs use the same cache key. A problem is how to prevent a malicious domain from distributing false information about URLs it doesn't control. This could be addressed with a whitelist of domains * Making decisions about the best mirror to choose, e.g. one that is most cost efficient, faster, or more local * Use content digest to detect or repair download errors Finally, can anyone in the Metalink community recommend a reusable C/C+ + solution for checking if a "Link: ..." header has a "rel=duplicate" parameter? For now I am parsing these headers from scratch with memchr(), but I expect that I am neglecting some accumulated wisdom on getting all the RFC rules right, and maybe interoperating with nonconformant implementations. Please let me know if you know a better way Here is a similar message [2] on the Traffic Server developers list, with slightly more detail We run Traffic Server here at a rural village in Rwanda for faster, more reliable internet access. I am working on this as part of the Google Summer of Code [1] https://github.com/jablko/dedup [2] http://mail-archives.apache.org/mod_mbox/trafficserver-dev/201205.mbox/%3C4FAE78FB.1070404%40nottheoilrig.com%3E -- You received this message because you are subscribed to the Google Groups "Metalink Discussion" group. To post to this group, send email to metalink-discussion_at_googlegroups.com. To unsubscribe from this group, send email to metalink-discussion+unsubscribe_at_googlegroups.com. For more options, visit this group at http://groups.google.com/group/metalink-discussion?hl=en. -- (( Anthony Bryan ... Metalink [ http://www.metalinker.org ] )) Easier, More Reliable, Self Healing Downloads _______________________________________________ mirrorbrain mailing list Archive: http://mirrorbrain.org/archive/mirrorbrain/ Note: To remove yourself from this mailing list, send a mail with the content unsubscribe to the address mirrorbrain-request_at_mirrorbrain.orgReceived on Fri May 18 2012 - 21:37:22 GMT
This archive was generated by hypermail 2.3.0 : Tue May 22 2012 - 23:17:03 GMT