Re: [mirrorbrain] Authenticating downloads?

From: Peter Pöml <peter_at_poeml.de> Date: Wed, 3 Nov 2010 14:27:49 +0100 · This archive was generated by hypermail 2.3.0 : Sun Nov 07 2010 - 18:47:06 GMT

Hi Oliver,

On Wed, Nov 03, 2010 at 11:21:31 +0000, Oliver Beattie wrote:
> Wow, thanks so much for your detailed reply… I certainly didn't expect
> anyone to put in as much effort as this — you're amazing! :)

You are welcome -- it is always nice to see MirrorBrain being considered
in a new scenario. Typically, I learn something new, and sometimes
MirrorBrain becomes more versatile. Thankfully, you provided enough info
to make some educated guess and reply in a meaningful way :-)

> Actually, the current setup is all the files are served from one machine,
> where it Basic auth's the users against an LDAP database (which is only
> accessible at this one site). It's currently saturating a 1gbps second link,
> and requests are timing out all the time. It's distributing rather large
> files (many gb) so we obviously need to move to something more scalable, and
> this seems like just the ticket. The users are distributed all across the
> world, but usually in big offices where there will be a large number of
> people downloading, so it seems logical to use something like mod_asn + MB
> that can take advantage of that if we setup local mirrors on their networks.

The ability of assigning requests to local mirrors (with mod_asn) would
be particularly interesting to you then, indeed.

> I know it's not really relevant, but probably interesting anyway… we were
> rather hoping that the mirrors wouldn't have to communicate with a central
> database, so I figured the easiest way to do it would just be to put an
> expiration timestamp in the URL of ~10 secs or so, and also a signature
> generated with a private key. The mirrors would just need to calculate the
> signature themselves with the key, check the signatures match and the
> expiration hasn't passed, and serve the file. Any failure would just result
> in a 403. 10 lines of Python/Perl/PHP/Shell should do it, I'd think… the
> simpler the better :)

Good idea. 

If you come up with a good solution for checking the timestamp for
deployment on the mirrors, it would be cool to add the timestamp
generation directly in MirrorBrain. If this is independant of user
authentication and/or a specific user database (I think it would be),
that could be an interesting feature also for other deployments. 

Then all things that you need would be doable with MirrorBrain
itself, without proxying, redirecting or scripting, right?

I'm thinking of a simple further Apache directive like
 MirrorBrainRedirectStampKey <key>
which causes MB to append ?stamp=<timestamp_crypted_with_key> to
the URLs that it returns to clients.

Sounds like just a few lines of code. If you have an idea how to best
use/verify the stamp on the mirrors, and we come up with a reusable
procedure for that, I'd be happy to implement the server part.

With that concept, the decision how long links are valid would be in the
hand of the mirrors, I think, which I hope is okay. Server time needs to
be synchronized decently.

> > 2)
> > mod_mirrorbrain (the Apache module that implements the mirror selection,
> > mirrorlist generator and redirection) could be extended by a mode that does
> > all the work of mirror selection, but doesn't return a redirect in the end.
> > The selected mirror (and country data) is already saved in the
> > Apache-internal environment before the redirect happens, so this data is
> > accessible to modules/scripts running later. If the redirect is not done
> > (which could easily be made configurable to switch it off), a module/script
> > running later could do something else with this data. Again, this would
> > require a change to mod_mirrorbrain, but trivial enough and also useful for
> > other scenarios I guess.
> >
> 
> This sounds like the "best solution" of the lot… I'd be very interested in
> this. It avoids the need for the extra redirect, which is desirable (though
> not crucial). I imagine this could be useful for a lot of scenarios… though
> none come to mind straight away haha!

Your scenario is one :-) I wonder, however, if most people would not
desire to use PHP, and I don't know if PHP can access that request
handling phase with mod_php. It likely cannot with one of the fastcgi
approaches. Nevertheless, an Apache module could always do anything, and
having such a "mode" in MB would definitely a good thing, and adding
flexibility for potential uses.

> > 4)
> > A fourth way would not require any changes at all, and it would seem easy
> > to set up and integrate with: You could run MirrorBrain as a backend service
> > that is accessed by a frontend server. The frontend could handle the Basic
> > Auth, and send a request to the backend running MirrorBrain, using the path
> > of the requested file, and pass the original client IP in a HTTP header. The
> > client IP is then seen by mod_geoip, and also mod_asn if you use that
> > optional module, and mod_mirrorbrain can do the mirror selection, without
> > needing to be aware of authentication or temporary link generation.
> > mod_mirrorbrain can return the selected mirror either within a HTTP header
> > (Location header and X-MirrorBrain-Mirror header). It can also return
> > comprehensive data in form of a Metalink (which is XML), which includes
> > hashes and a randomized list of mirrors sorted by priority for the client.
> > The frontend could easily use this information to add the temporary link
> > path elements, and reply to the client with the final redirect.
> >
> 
> I hadn't even thought of proxying it… d'oh! Actually, for now, I think
> you've probably come up with the most easily implemented solution. Our
> timescale for putting this into use is only a week or two, so I think this
> may well be the best option for us (and not really too hard, either). I
> think I'm going to have a quick bash at doing this now… I'll let you know
> how it goes :)
> 
> Going forward, #2 you mentioned above would be my vote for if you were to
> modify MB, though, especially if it were quite trivial to implement. It
> would avoid the need for two webservers to be running on the machine
> obviously, and it feels quite "Apache-esque".

So the whole thing would probably look like below, if I understand
correctly:

   http://download/path/to/file          http://mirrorbrain/path/to/file
    ^                          \               ^                 |
    |                           \              |                 |
 download                        \             |                 |       
 request                        basic          |                 |
    |                            auth -------->                  v
  ======                           =============            ===========
  client                             frontend               mirrorbrain
  ======                           =============            ===========
  ^  |  ^                          script adding                 |
  |  |  |                           timestamp                    |
  |  |  |                             |     ^                    |
  |  |  |                             v      \                   |
  |  | http://mirror1/path/to/file?stamp=abc  \                  |
  |  |                                         \                 v
  |  |                                        http://mirror1/path/to/file
  |  |                                           (or entire mirror list)
file |
  |  v
  |  check stamp
  =======
  mirror1
  =======

Note that the frontend and backend don't need to be different servers.
They can run in the same Apache, e.g. in different virtual hosts (or
even in the same). The "frontend" would just need to do a little HTTP
request. Of course, this would employ a second Apache thread for every
request handled, but they'll be short-lived and probably not be
noticeable too much.

You'd probably want to restrict access to the "backend" to localhost.
That could mean that MirrorBrain refuses to do its job, due to the fact
I mentioned before, regarding requests that require authentication
(where there is a simple but too stupid check). That I would need to fix
first, in order to allow MB to run behind authentication. (Which should
be quickly done.)

The above would become simpler if MB could generate the stamp on its
own, and be perfectly "Apache-esque". Which is good. :-) What do you
think?

There is another thing to consider. Depending on your file tree, mirror
situation & update and sync frequency it might happen that MirrorBrain
doesn't know a mirror for some file. The default behaviour then is to
deliver the file directly, which is not something that you want when you
query it from a frontend running between it and the client. The
behaviour can be modified by configuring one or more MirrorBrainFallback
mirrors, which will cause MB to always return a redirect, assuming that
the server(s) configured as such will always have all files. It would
seem natural to me if you use your existing download server for that
purpose.

Peter

_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-request_at_mirrorbrain.org