Author: poeml Date: Thu Sep 2 03:11:46 2010 New Revision: 8098 URL: http://svn.mirrorbrain.org/viewvc/mirrorbrain?rev=8098&view=rev Log: docs/changes: preliminary release notes for 2.13.0 (still working on them) docs/configuration: notes on magnet links Modified: trunk/docs/changes.rst trunk/docs/configuration.rst trunk/docs/tuning.rst Modified: trunk/docs/changes.rst URL: http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/docs/changes.rst?rev=8098&r1=8097&r2=8098&view=diff ============================================================================== --- trunk/docs/changes.rst (original) +++ trunk/docs/changes.rst Thu Sep 2 03:11:46 2010 _at_@ -4,36 +4,280 @@ ============================ -*not released:* 2.13.0 (rXXXX, ...) +Release 2.13.0 (r8098, Sep 1, 2010) ------------------------------------ -* Issue #40 largely solved. - - The semantics for detection of up-to-date-ness of caches hashes are the same as - before. File size and mtime are the criteria. - - Need to describe the "failure case" when the hash table doesn't exist yet, but - mod_mirrorbrain can't prepare its statements then. Or: fix it. - - The problem is that when mod_mirrorbrain is already upgraded and reloaded, - but no script has created the missing hash table yet - - -* The requirement on the metalink package has been removed. - - -* Issue #41: Mirror lists now include hashes, link to PGP signature, file size and mtime. - -* Issue #14: Meta4 support. - -* Issue #38: Magnet link support. - - Hashes are hex-encoded, because Base32 encoding would be awkward to add and - there seems to be a transition to hex encoding. - -* MirrorBrainTrackerURL directive added. - -* Issue #42: Implemented hash server. +New features: + +* This release **fully supports IETF Metalinks**, as finalized in :rfc:`5854` early in 2010. + The extension ``.meta4`` triggers the IETF Metalink response. An HTTP Accept + header containing ``metalink4+xml`` also elicits this kind of response. This + closes `issue 14`_. The old (v3) Metalinks are still supported, and TCN + (transparent content negotiation) is supported for both variants. + +* As the "hash cache" needed to be restructured for this feature, a number of + additional features became possible. Inclusion of **various metadata in the + mirror lists** is possible now: (`issue 41`_): + + - file size and modification time + - SHA256 hash + - SHA1 hash + - MD5 hashes + - BitTorrent infohash + - link to Metalink + - link to Torrent + - zsync link + - Magnet link (needs testing) + - link to PGP signature (if applicable) + + The information respective links is displayed depending on the availability: + hashes need to be generated (or regenerated); PGP signatures need to be + present. + +* MirrorBrain is now a **hash/metadata server**. A so-called "top hash" + (cryptographic hash of the complete file) can now be requested. Depending on + the extension added to the URL, like ``.md5``, ``.sha1``, or ``.sha256``, the + respective representation is returned. This closes `issue 42`_. + + Like before, MirrorBrain can also store piece-wise hashes for chunks of the files. + + All hashes are now stored in the database. Design notes: + + Inside the database, the hashes are stored as compart binary blobs. For + transfer, they are converted to hexadecimal. This is due to the following + design decisiion: Storage is binary in so-called ``bytea`` columns. + PostgreSQL automatically escapes binary (bytea) data on output in its own + way. But this encoding is not very efficient in space. Hex encoding is more + efficient (it results in shorter strings, and thus less data to transfer over + the wire, and it's also faster). The escape format is kind phased out, and it + doesn't make sense to use it in a new application (which we are). + On the other hand, storage in bytea is as compact as it can be, which is good. + So we store the data in binary, and provide a database view which converts to + hex on the fly. The hex encoding function in PostgreSQL seems to be fast. + +* Despite of all this, hashing is **twice as fast** as before, not using the + external metalink binary any longer. All functionality of the + :program:`metalink-hasher` tool has been integrated into :program:`mb + makehashes`, which makes sure to never read data from disk more than once, + regardless of how many hashes are calculated. + +* MirrorBrain now has a **torrent generator embedded**. Torrents are generated in + realtime (from hashes cached in the database). See + :ref:`configuring_torrent_generation` for details. + +* MirrorBrain now has basic **zsync support**. The `zsync distribution method + <http://zsync.moria.org.uk/>`_ is rsync over HTTP, so to speak, and + MirrorBrain can generate zsync files on-the-fly. MirrorBrain supports the + simpler variant which doesn't look into compressed content. It is compatible + to the current zsync release (0.6.1). + + See :ref:`configuring_zsync_generation` for details. + + This feature is off by default, because Apache allocates large amounts of + memory for large rows from database; this may be worked around in the future. + + +* Support for `Magnet <http://magnet-uri.sourceforge.net/>`_ links (`issue 38`_). + See :ref:`magnet_links`. + + +* :program:`mb list`: + + - A new option ``-N|--number-of-files`` has been added, which displays the + number of files that a mirror is known to have. + + To achieve this, a new stored procedure :func:`mirr_get_nfiles` has been + implemented, which retrieves this number, given either a mirror id or its + name. It is added automatically when migrating from previous versions, and + made available in through the :mod:`mb.core.mirror_get_nfiles` method. + - ``mb list <mirror identifier>`` did not work due to a missing module import + in the Python script. This has been amended. + +* :program:`mb update`: + + - This command can now also update country & region info in mirror records (from GeoIP). + - A ``--dry-run`` option has been added, to allow seeing the changes before + applying them. + - An ``--all`` option has been added, which updates all metadata, same as when + giving ``-c -a -p --country --region`` all at once. + - The command now properly takes notice of hostnames that don't resolve in the + DNS (so further action cannot be taken). + +* :program:`mb db sizes`: + + - The command now reports also the size of the hashes table. + +* :program:`mb db vacuum`: + + - The database cleanup now takes into account that files in the filearr table + might not exist on any mirror, but only locally - so they could be + referenced in the hash table. + + + +.. FIXME + +* take note in the subprocess environment what the client requested and which + representation was actually sent. Those variables can be logged with + CustomLog want:%{WANT}e give:%{GIVE}e for instance. + + +.. _`issue 14`: http://mirrorbrain.org/issues/issue14 +.. _`issue 38`: http://mirrorbrain.org/issues/issue38 +.. _`Issue 40`: http://mirrorbrain.org/issues/issue40 +.. _`Issue 41`: http://mirrorbrain.org/issues/issue41 +.. _`issue 42`: http://mirrorbrain.org/issues/issue42 + + +Bug fixes: + +* :program:`mod_mirrorbrain`: + + - When a client IP's network prefix did not match a mirror's network prefix + exactly, the assignment of the client to this mirror would fail, even + though the client IP was (also) contained in the mirror's network prefix. + This has been rectified by properly checking for containment of the IP, + fixing `issue 52`_. + - Requests with PATH_INFO were not ignored, as they should be. The default + behaviour of Apache is to ignore such requests, and CGI or script handler + deviate from that. :program:`mod_mirrorbrain` now also correctly returns + ``404 Not Found`` for such requests. This fixes `issue 18`_, as well as + `openSUSE bug #546396 + <https://bugzilla.novell.com/show_bug.cgi?id=546396>`_ (which is not + publicly readable). + - When the only available mirror(s) had a limitation flag set (such as + ``region_only``), and a metalink was transparently negotiated, an empty + metalink would result. This is now prevented, and the file delivered + directly instead. Other representations (mirror lists, non-negotiated + metalinks, torrents, hashes) are generated also if there is no mirror. This + was tracked in `openSUSE bug #602434 + <https://bugzilla.novell.com/show_bug.cgi?id=602434>`_. The mirrorlist is + improved when there's no mirror, and can still list all hashes, and give + the direct download URL. + - Errors from the database adapter (lower DBD layer) are now resolved to + strings, where available. + - Some variable types have been corrected from int to ``apr_off_t``, using + :func:`apr_atoi64` instead of :func:`atoi`. This applies to: ``min_size``, + ``file_maxsize``, and the database identifier of a hash row. This at least + fixes the info message given when a file is excluded from redirection due + to its size. The checks seemed to work nevertheless, because the + ``min_size`` numbers were small and ``file_maxsize`` numbers large, which + helped to get the correct result when comparing. + + +* :program:`mb scan`: + + - Usage of FTP authentication was fixed (with credentials encoded into the + URL). The change done in January + http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/tools/scanner.pl?r1=7911&r2=7945 + was incomplete in so far that the FTP client used a wrong path now when + cd'ing into a directory (complete URL instead of only the path component). + This may have worked with some FTP servers, but it definitely didn't work + with vsftpd. Thanks to Deepak Gupta for raising this issue and providing + means to analyse it. + +* :program:`mb edit`: + + - Problems that occurred when copying and pasting data on the editing window + have been fixed (reported in `issue 30`_). + + +* :program:`mirrorprobe`: + + - A hard-to-catch exception is now handled. If Python's socket module ran + into a timeout while reading a chunked response, the exception would not be + passed correctly to the upper layer, so it could not be caught by its name. + We now wrap the entire thread into another exception, which would otherwise + be bad practice, but is probably okay here, since we already catch all + other exceptions. This should fix `issue 46`_. + - In case of exceptions we run into, allow logging the affected mirror's name. + - If an unhandled exception occurs, a note is printed. + + +* :program:`null-rsync`: + + - Broken links that are replaced by a directory, and point outside the tree, + are now correctly removed in the destination tree. (That's a really special + case.) + - Some error messages were improved. + + + +.. _`issue 18`: http://mirrorbrain.org/issues/issue18 +.. _`issue 30`: http://mirrorbrain.org/issues/issue30 +.. _`issue 46`: http://mirrorbrain.org/issues/issue46 +.. _`issue 52`: http://mirrorbrain.org/issues/issue52 + +Internal changes: + +* :program:`mod_mirrorbrain`: + + - Code was generally cleaned up and logging improved. + - A hex decoder for efficient handling of binary data from PostgreSQL was added. + - Old obsolete code has been removed, which was needed from before 2008/2009 + when mod_geoip didn't support continent codes yet. Since then, compiling + with GeoIP support built-in was still optionally possible, but this old + code is now removed. + - The code path has been cleaned up a lot for easier handling of different + representation, like hashes that are requested. + - The message which is logged when no hashes where found in the database has + been enhanced. + +* :program:`mb makehashes`: + + - Hashes are also stored for files which exists only locally, and not on any + mirror (and which weren't present in the ``filearr`` table yet, therefore). + + + +Documentations improvements: + +* A few hints about :ref:`tuning_postgresql` were added to the :ref:`tuning`. + +* The installation docs have been restructured: Now there's a new section + explaining the :ref:`initial_configuration`, and this part is linked from all + platform-specific sections as "next step" at their end. This should avoid + some confusion. Hand in hand with this change, a cleanup of things scattered + in all places is in progress. + +* A :ref:`initial_configuration_logging_setup` is described. + +* Notes about the necessity of :ref:`initial_configuration_file_tree` have been + added, and alternatives explained. + +* Installing from Debian packages: There is now a note about expired keys, and + how to renew them. + +* The obsolete MySQL database schema has been removed, which could + theoretically be useful for people aiming to run only mod_mirrorbrain, but + not the rest of the framework - but is confusing and may cause people assume + that MySQL is supported as backend. + + +Other improvements: + +* :program:`rsyncinfo`: + + - `This script <http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/tools/rsyncinfo.py?view=markup>`_ + is easier to use now. Instead of the arkward syntax it now also + takes simple rsync URLs. Before:: + + rsyncinfo size gd.tuwien.ac.at -m openoffice + + Now:: + + rsyncinfo size gd.tuwien.ac.at::openoffice + rsyncinfo size rsync://gd.tuwien.ac.at/openoffice + +* :program:`bdecode`: + + - A new tool `bdecode <http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/tools/bdecode.py?view=markup>`_ + to parse a Torrent file (or other BEncoded input), and + pretty-print it. Useful to work on the Torrent generator in + mod_mirrorbrain. It also reads from standard input:: + + curl -s <url> | bdecode.py + Modified: trunk/docs/configuration.rst URL: http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/docs/configuration.rst?rev=8098&r1=8097&r2=8098&view=diff ============================================================================== --- trunk/docs/configuration.rst (original) +++ trunk/docs/configuration.rst Thu Sep 2 03:11:46 2010 _at_@ -14,9 +14,9 @@ Generating Torrents ------------------- -When hashes are generated with :program:`mb makehashes`, and stored in the -database, MirrorBrain can generate not only Metalinks but also Torrents. The -required chunked hashes are the same. +From the hashes generated with :program:`mb makehashes`, MirrorBrain can +generate not only Metalinks, but also Torrents. The required chunked hashes are +the same. The generation is triggered by appending ``.torrent`` to an URL. _at_@ -180,6 +180,21 @@ The checksums occupy space in the database. To find out how much it is, the :program:`mb db sizes` command can be helpful. Note the size of the ``hash`` table. + + + +.. _magnet_links: + +Magnet links +------------ + +Hashes are hex-encoded, because Base32 encoding would be awkward to add and +there seems to be a transition to hex encoding. + +The ``urn:sha1`` scheme is currently also not supported, because it is required +to be Base32-encoded. Base32 encoding could be added in the future, of course. +Contributions welcome! + Modified: trunk/docs/tuning.rst URL: http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/docs/tuning.rst?rev=8098&r1=8097&r2=8098&view=diff ============================================================================== --- trunk/docs/tuning.rst (original) +++ trunk/docs/tuning.rst Thu Sep 2 03:11:46 2010 _at_@ -8,6 +8,8 @@ Depending on the size of your install, this can be mandatory. + +.. _tuning_apache: Tuning Apache ------------- _at_@ -114,6 +116,8 @@ +.. _tuning_postgresql: + Tuning PostgreSQL ----------------- _______________________________________________ mirrorbrain-commits mailing list Archive: http://mirrorbrain.org/archive/mirrorbrain-commits/ Note: To remove yourself from this list, send a mail with the content unsubscribe to the address mirrorbrain-commits-request_at_mirrorbrain.orgReceived on Thu Sep 02 2010 - 01:11:49 GMT
This archive was generated by hypermail 2.3.0 : Mon Feb 20 2012 - 23:47:04 GMT