[mirrorbrain] Re: download stats (was: XBMC using MB?)

From: Peter Pml <poeml_at_cmdline.net>
Date: Wed, 3 Feb 2010 22:41:09 +0100
Am 02.01.2010 um 09:24 schrieb Cory Fields:

> Yup, Mirrorbrain is up and working beautifully. We'll be deploying as
> soon as we can gather stats correctly and automatically. Hope to be
> using metalinks in the near-future as well on the client side for
> larger (~30mb+) downloads as well. They're working great via the file
> index, just need to work on integrating them into our client.

About gathering download stats, I intended to come up with enough  
helpful details for you to implement the same that I have done for  
OpenOffice.org... I implemented most of http://mirrorbrain.org/download-statistics/ 
  (code at http://svn.mirrorbrain.org/svn/mod_stats/trunk/ ).  
Unfortunately, I so far failed to find the time to document it  
properly, and to iron out little things from the code that are  
specific to this first deployment. Anyway, I'd like to resurrect it  

mod_stats, the planned Apache module that counts in realtime, hasn't  
been written yet; as a quick solution, I came up with a script that  
crawls the logs once a day, and it is reasonably fast anyway:

tools/dlcount.py tools/ooo.conf --db --db-home downloadstats /var/log/ 
apache2/download.example.com/$(date -d yesterday "+%Y/%m")/ 
download.example.com-$(date -d yesterday "+%Y%m%d")-access_log.bz2

processed 149016 lines in 97.5082769394 seconds
found 3173 countables
saved data in 79.724547863 seconds

The database offers (through Django) some view on the data, like http://download.services.openoffice.org/stats/csv/20100201.csv 
, and more views could be easily added.

(The `date -d yesterday` thing obviously requires a cronolog setup.)

To deploy the current codebase, one would primarily need to define the  
ruleset that parses the logs. This requires taking tools/ooo.conf or  
tools/go-oo.conf and adjusting the rules to fit the content of your  
logfile. I'd be happy to assist; this would obviously made  
considerably easier if there was a walk-through documentation that  
explains it all.

I used the script tools/dlcount.py for that, adding various print  
statements to make visible what happens. Later I added the code that  
actually puts the counts into the database, so the script would  
benefit from better separation of those steps. It should have some  
kind of debug mode, which summarizes what has been counted, and what  
fell through the cracks.


mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
to the address mirrorbrain-request_at_mirrorbrain.org
Received on Wed Feb 03 2010 - 21:41:16 GMT

This archive was generated by hypermail 2.3.0 : Thu Mar 25 2010 - 19:30:56 GMT