[mirror discuss] Re: [distro-dev] progress with MirrorBrain

From: Andrea Pescetti <pescetti_at_openoffice.org>
Date: Thu, 16 Jul 2009 19:32:15 +0200
(Keeping the CC but please follow-up on dev_at_distribution)

On 16/07/2009 Peter Poeml wrote:
> On Wed, Jul 15, 2009 at 11:49:13PM +0200, Andrea Pescetti wrote:
> >   There is the odd client which goes wild and issues the same request
> >   over and over again, which can skew numbers very much.
> > This is indeed a significant problem in the case of OOo and it would be
> > nice to be able to set a "threshold" (say, 10 download per day) valid
> > for each IP address and ignore, in statistics, all downloads exceeding
> > it. ...
> Good point. One way to tackle it (live on the download server collecting
> the numbers) could be to keep state of accessed files per IP address.
> Much in the same spirit as the Apache module mod_ip_count does

Brilliant solution indeed! I find it very nice.

> That doesn't cover corporate networks, but if we don't store the URL but
> instead a hash of IP, URL, User agent and referer, it should work pretty
> well.

This would probably be overkill but it would be a nice addition too, so
we know we are not missing any legitimate downloads (of course, I agree
that we should serve all requests in any case).

> Do you see this phenomenon a lot? I actually saw only may be one such
> client during (the most busy time of) each major openSUSE release.

I can only say it frequently happens. The marketing men know better. The
only public information I could find about it is posted in their blogs:
http://www.plio.it/node/75 (Italian, sorry!)

  Andrea Pescetti - Italian N-L Project Lead.

discuss mailing list
Archive: http://mirrorbrain.org/archive/discuss/

Note: To remove yourself from this mailing list, send a mail with the content
to the address discuss-request_at_mirrorbrain.org
Received on Thu Jul 16 2009 - 18:21:59 GMT

This archive was generated by hypermail 2.2.0 : Fri Dec 11 2009 - 22:12:59 GMT