Author: poeml Date: Thu Nov 26 15:48:22 2009 New Revision: 45 URL: http://svn.mirrorbrain.org/viewvc/mod_stats?rev=45&view=rev Log: work on the comment header. Modified: trunk/tools/dlcount.py Modified: trunk/tools/dlcount.py URL: http://svn.mirrorbrain.org/viewvc/mod_stats/trunk/tools/dlcount.py?rev=45&r1=44&r2=45&view=diff ============================================================================== --- trunk/tools/dlcount.py (original) +++ trunk/tools/dlcount.py Thu Nov 26 15:48:22 2009 _at_@ -21,41 +21,37 @@ # # # This script parses a MirrorBrain-enhanced access_log and does the following: -# - a little ring buffer filters requests recurring within a sliding time window (keyed by ip+url+referer+user-agent) -# - strip trailing http://... cruft -# - remove duplicated slashes -# - remove accidental query strings -# - remove a possible .metalink suffix -# - remove the /files/ prefix +# - select lines on that the log analysis is supposed to run +# (StatsLogMask directive, which defaults to a regexp suitable for a MirrorBrain logfile) +# The expression also selects data from the log line, for example the +# country where a client request originated from. +# - a little ring buffer filters requests recurring within a sliding time +# window (keyed by ip+url+referer+user-agent +# length of the sliding window: StatsDupWindow +# - arbitrary log lines can be ignored by regexp (StatsIgnoreMask) +# - IP addresses can be ignored by string prefix match (StatsIgnoreIP) +# - apply prefiltering to the request (regular expressions with substitution) +# with one or more StatsPrefilter directives +# - parse the remaining request url into the values to be logged +# (StatsCount directive) +# - apply optional post-filtering to the parsed data (StatsPostfilter) +# # -# It applies filtering by -# - status code being 200 or 302 -# - requests must be GET -# - bouncer's IP which keeps coming back to download all files (from OOo) +# The script should serve as model implementation for the Apache module which +# does the same in realtime. +# +# +# Usage: +# ./dlcount.py /var/log/apache2/download.services.openoffice.org/2009/11/download.services.openoffice.org-20091123-access_log.bz2 | sort -u +# +# Uncompressed, gzip or bzip2 compressed files are transparently opened. # -# It also captures the country where the client requests originate from. -# +# # This script uses Python generators, which means that it doesn't allocate # memory according to the log size. It rather works like a Unix pipe. # (The implementation of the generator pipeline is based on David Beazley's # PyCon UK 08 great talk about generator tricks for systems programmers.) # -# -# I baked a first regexp which is able to parse most (OpenOffice.org) requests -# from /stable and /extended. There are some exceptions (language code with 3 -# letters) and I didn't take care of /localized yet. -# -# The script should serve as model implementation for the Apache module which -# does the same in realtime. -# -# -# Usage: -# ./dlcount.py /var/log/apache2/download.services.openoffice.org/2009/11/download.services.openoffice.org-20091123-access_log.bz2 | sort -u -# -# Uncompressed, gzip or bzip2 compressed files are transparently opened. -# -# -# __version__='0.9' _______________________________________________ mirrorbrain-commits mailing list Archive: http://mirrorbrain.org/archive/mirrorbrain-commits/ Note: To remove yourself from this list, send a mail with the content unsubscribe to the address mirrorbrain-commits-request_at_mirrorbrain.orgReceived on Thu Nov 26 2009 - 14:48:23 GMT
This archive was generated by hypermail 2.3.0 : Mon Feb 20 2012 - 23:47:04 GMT