Hi Andrea! Thanks a lot for your feedback, it's really appreciated. On Wed, Jul 15, 2009 at 11:49:13PM +0200, Andrea Pescetti wrote: > Peter Poeml wrote: > > if the question is: will we be able to implement good, live, > > detailed logging in _conjunction_ with MirrorBrain? Then the answer is a > > clear yes. > > Good, thanks! > > > I have written up a concept for collecting downlaod statistics here: > > http://mirrorbrain.org/download-statistics/ > > Nice document. I agree with most of it and find it a very reasonable > solution. I just don't find how you plan to tackle point 6 in your list, > i.e., > There is the odd client which goes wild and issues the same request > over and over again, which can skew numbers very much. > This is indeed a significant problem in the case of OOo and it would be > nice to be able to set a "threshold" (say, 10 download per day) valid > for each IP address and ignore, in statistics, all downloads exceeding > it. > > Currently, as far as I know, this is managed semi-manually through > interpolation of data from the previous days (but we don't have IP > addresses in the available data, so there is some guessing involved > too). Anyway, if the IP address is not lost in stored data and is made > available for processing, we could compute this correction at a later > stage. Good point. One way to tackle it (live on the download server collecting the numbers) could be to keep state of accessed files per IP address. Much in the same spirit as the Apache module mod_ip_count does (for protection of server resources and some DoS protection). I use a patched version that uses mod_memcache with good success on a mirror. http://en.opensuse.org/Mirror_Setup_Howto#mod_ip_count That's very lightweight and it wouldn't be a problem to do this per file with a reasonably low TTL. Blocking requires careful adjustment due to corporate networks and web caches (multiple requests originating from the same IP). But given that we wouldn't actually use the state to block any accesses, but rather to restrict counters from going up the roof, we could do this a little more aggressive. At the same time, it might make sense to look for X-forwarded-for headers and give those requests some headroom. That doesn't cover corporate networks, but if we don't store the URL but instead a hash of IP, URL, User agent and referer, it should work pretty well. Does this make sense? Do you see this phenomenon a lot? I actually saw only may be one such client during (the most busy time of) each major openSUSE release. Peter _______________________________________________ discuss mailing list Archive: http://mirrorbrain.org/archive/discuss/ Note: To remove yourself from this mailing list, send a mail with the content unsubscribe to the address discuss-request_at_mirrorbrain.orgReceived on Wed Jul 15 2009 - 23:10:14 GMT
This archive was generated by hypermail 2.2.0 : Fri Dec 11 2009 - 22:12:59 GMT