[mirrorbrain] Fwd: Apache Traffic Server caching proxy

From: Anthony Bryan <anthonybryan_at_gmail.com>
Date: Fri, 18 May 2012 17:37:02 -0400
I thought this list might be another good place to get feedback


---------- Forwarded message ----------
From: Jack Bates <jack.bates_at_gmail.com>
Date: Fri, May 18, 2012 at 7:41 AM
Subject: Re: Apache Traffic Server caching proxy
To: Metalink Discussion <metalink-discussion_at_googlegroups.com>


Hi, I started work on a plugin for Apache Traffic Server and I would
love any feedback (and maybe implementation advice) from the Metalink
community

Traffic Server is a caching proxy and the goal of this plugin is to
help it work better with files distributed from multiple mirrors or
content distribution networks. Currently downloading a file that is
already cached from a different mirror is a cache miss. A lot of
download sites present users with a simple download button that
doesn't always redirect them to the same mirror, which defeats the
benefit of a caching proxy and frustrates users

I would love to hear any of your thoughts on how caching proxies could
work better with content distribution networks

For this first attempt at this plugin, the approach taken is to use
RFC 6249, Metalink/HTTP: Mirrors and Hashes. The plugin listens for
responses that are an HTTP redirect and have "Link: <...>;
rel=duplicate" headers, then scans the URLs for one that already
exists in the cache. If found then it transforms the response,
replacing the "Location: ..." header with the URL that already exists
in the cache

The code is up on GitHub [1] and works just enough that, given a
response with a "Location: ..." header that's not cached and a "Link:
<...>; rel=duplicate" header that is cached, it will rewrite the
"Location: ..." header with the cached URL

I would love any feedback on this approach

We are also thinking of using RFC 3230, Instance Digests in HTTP.
Given a response with a "Location: ..." header that's not cached and a
"Digest: ..." header, the plugin would check if another URL with the
same digest already exists in the cache and rewrite the
"Location: ..." header with that URL if so

Still more ideas include:

  * Remember URLs for the same file so future requests for any of
these URLs use the same cache key. A problem is how to prevent a
malicious domain from distributing false information about URLs it
doesn't control. This could be addressed with a whitelist of domains

  * Making decisions about the best mirror to choose, e.g. one that
is most cost efficient, faster, or more local

  * Use content digest to detect or repair download errors

Finally, can anyone in the Metalink community recommend a reusable C/C+
+ solution for checking if a "Link: ..." header has a "rel=duplicate"
parameter? For now I am parsing these headers from scratch with
memchr(), but I expect that I am neglecting some accumulated wisdom on
getting all the RFC rules right, and maybe interoperating with
nonconformant implementations. Please let me know if you know a better
way

Here is a similar message [2] on the Traffic Server developers list,
with slightly more detail

We run Traffic Server here at a rural village in Rwanda for faster,
more reliable internet access. I am working on this as part of the
Google Summer of Code

[1] https://github.com/jablko/dedup
[2] http://mail-archives.apache.org/mod_mbox/trafficserver-dev/201205.mbox/%3C4FAE78FB.1070404%40nottheoilrig.com%3E

--
You received this message because you are subscribed to the Google
Groups "Metalink Discussion" group.
To post to this group, send email to metalink-discussion_at_googlegroups.com.
To unsubscribe from this group, send email to
metalink-discussion+unsubscribe_at_googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/metalink-discussion?hl=en.



-- 
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads

_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-request_at_mirrorbrain.org
Received on Fri May 18 2012 - 21:37:22 GMT

This archive was generated by hypermail 2.3.0 : Tue May 22 2012 - 23:17:03 GMT