Re: [mirrorbrain] Authenticating downloads?

From: Oliver Beattie <oliver_at_obeattie.com>
Date: Wed, 3 Nov 2010 11:21:31 +0000
Hey Peter,

Wow, thanks so much for your detailed reply… I certainly didn't expect
anyone to put in as much effort as this — you're amazing! :)

I've added my replies inline below.

—Oliver


On 3 November 2010 09:47, Peter Pöml <peter_at_poeml.de> wrote:

> Hi Oliver,
>
> thank you for your interest, and welcome.
>
> Am 03.11.2010 um 08:09 schrieb Oliver Beattie:
> > We am considering implementing MirrorBrain for a download site of ours
> > that has outgrown its current solution, but I need to pick someone's
> > brain to find out if it's possible to do what we need. Basically, all
> > our downloads have to be authenticated. The way we would like to do
> > this is with Basic authentication (as is the current implementation)
> > on the MB server, and then to generate signed one-time URLs to each of
> > the mirrors (this would likely have a combination of an expiration
> > timestamp and a signature, signed with a private key).
>
> > I'm wondering whether this is possible with MirrorBrain? It's fine if
> > MB has to redirect to the same machine to a different path to generate
> > the new outgoing, signed URL, but it's not clear from the docs whether
> > this is feasible.
>
> Does this mean that you basically already have a file serving setup with
> Basic Auth and with temporarily valid links, and you would now like to know
> if MirrorBrain can be integrated into that, for load balancing?
>

Actually, the current setup is all the files are served from one machine,
where it Basic auth's the users against an LDAP database (which is only
accessible at this one site). It's currently saturating a 1gbps second link,
and requests are timing out all the time. It's distributing rather large
files (many gb) so we obviously need to move to something more scalable, and
this seems like just the ticket. The users are distributed all across the
world, but usually in big offices where there will be a large number of
people downloading, so it seems logical to use something like mod_asn + MB
that can take advantage of that if we setup local mirrors on their networks.


>
> How are the temp links generated - I would assume via a database call, e.g.
> by calling a further script by redirecting to a different path locally?
>
> The temporary links which are generated valid would be valid links on the
> mirrors, I guess?
>

I know it's not really relevant, but probably interesting anyway… we were
rather hoping that the mirrors wouldn't have to communicate with a central
database, so I figured the easiest way to do it would just be to put an
expiration timestamp in the URL of ~10 secs or so, and also a signature
generated with a private key. The mirrors would just need to calculate the
signature themselves with the key, check the signatures match and the
expiration hasn't passed, and serve the file. Any failure would just result
in a 403. 10 lines of Python/Perl/PHP/Shell should do it, I'd think… the
simpler the better :)


>
>
> As the first thing, Basic Auth shouldn't pose a problem, although I haven't
> tested MirrorBrain in conjunction with it. I know that mod_mirrorbrain has a
> simple check for authentication to not accidentally give access to protected
> files, and I suspect that this check is not smart enough to check whether a
> request is authenticated or not. Having said that, I'd be happy to fix this.
> That should not be difficult.
>
> Regarding the temporary links, I see the following possibilities:
>
> 1)
> It would be easily possible to have MB redirect to a local path instead of
> directly to a mirror. This would require either a small code change in
> mod_mirrorbrain, which I would be happy to implement for you, or to help you
> with. (A simple RewriteRule is not sufficient, because mod_rewrite runs
> before the content handler of mod_mirrorbrain.)


That's what I thought, too.


> A custom Apache module that hooks in later (or mod_python/mod_perl script)
> could do this as well. (Maybe also a PHP script via mod_php, but I'm not
> sure if mod_php allows scripts to run at an arbitrary phase of the request
> processing.)
>
> 2)
> mod_mirrorbrain (the Apache module that implements the mirror selection,
> mirrorlist generator and redirection) could be extended by a mode that does
> all the work of mirror selection, but doesn't return a redirect in the end.
> The selected mirror (and country data) is already saved in the
> Apache-internal environment before the redirect happens, so this data is
> accessible to modules/scripts running later. If the redirect is not done
> (which could easily be made configurable to switch it off), a module/script
> running later could do something else with this data. Again, this would
> require a change to mod_mirrorbrain, but trivial enough and also useful for
> other scenarios I guess.
>

This sounds like the "best solution" of the lot… I'd be very interested in
this. It avoids the need for the extra redirect, which is desirable (though
not crucial). I imagine this could be useful for a lot of scenarios… though
none come to mind straight away haha!


>
> 3)
> A third way (or maybe hack) that I could think of would be to change the
> redirect (as you suggest) to prepend a different hostname and path to the
> URL, in order to redirect to a different path on the same machine, which
> takes care of the rest. Again, a simple change that would be easy to
> implement. If the redirect is not an external one (going back and forth
> between the client and server) but an internal redirect, maybe certain data
> (username or some cookies) could be passed as well, without them becoming
> visible to the outside. Hard to say if this would be useful for you without
> knowing details.
>
> 4)
> A fourth way would not require any changes at all, and it would seem easy
> to set up and integrate with: You could run MirrorBrain as a backend service
> that is accessed by a frontend server. The frontend could handle the Basic
> Auth, and send a request to the backend running MirrorBrain, using the path
> of the requested file, and pass the original client IP in a HTTP header. The
> client IP is then seen by mod_geoip, and also mod_asn if you use that
> optional module, and mod_mirrorbrain can do the mirror selection, without
> needing to be aware of authentication or temporary link generation.
> mod_mirrorbrain can return the selected mirror either within a HTTP header
> (Location header and X-MirrorBrain-Mirror header). It can also return
> comprehensive data in form of a Metalink (which is XML), which includes
> hashes and a randomized list of mirrors sorted by priority for the client.
> The frontend could easily use this information to add the temporary link
> path elements, and reply to the client with the final redirect.
>

I hadn't even thought of proxying it… d'oh! Actually, for now, I think
you've probably come up with the most easily implemented solution. Our
timescale for putting this into use is only a week or two, so I think this
may well be the best option for us (and not really too hard, either). I
think I'm going to have a quick bash at doing this now… I'll let you know
how it goes :)

Going forward, #2 you mentioned above would be my vote for if you were to
modify MB, though, especially if it were quite trivial to implement. It
would avoid the need for two webservers to be running on the machine
obviously, and it feels quite "Apache-esque".


>
> Please let me know if this info helps you.
> Peter
> _______________________________________________
> mirrorbrain mailing list
> Archive: http://mirrorbrain.org/archive/mirrorbrain/
>
> Note: To remove yourself from this mailing list, send a mail with the
> content
>        unsubscribe
> to the address mirrorbrain-request_at_mirrorbrain.org
>


_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-request_at_mirrorbrain.org
Received on Wed Nov 03 2010 - 11:21:53 GMT

This archive was generated by hypermail 2.3.0 : Wed Nov 03 2010 - 13:32:08 GMT