Title add support for mirrors and checksums in Link headers (RFC 6249)
Priority wish Status resolved
Superseder Nosy List ant, poeml
Assigned To poeml Keywords

Created on 2009-10-09.00:49:17 by poeml, last changed by poeml.

msg37 (view) Author: poeml Date: 2009-10-09.00:49:17
There is a proposal for transmitting information about mirrors and checksums to clients using Link 
headers which looks like:

  Link: <>; rel="duplicate"
  Link: <>; rel="duplicate"
  Link: <>; rel="describedby";
  Link: <>; rel="describedby";
  Link: <>; rel="describedby";
  Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=

See IETF draft
msg108 (view) Author: poeml Date: 2009-12-11.21:41:21
Most of this can be easily added, there is one thing that I need to change first though:

At the moment, Metalink hashes are cached to disk in a form that is suitable for direct inclusion 
into Metalinks:

 # cat /srv/metalink-hashes/samba/srv/mirrors/samba/pub/samba/MIRRORS.txt.size_68
      	<hash type="md5">e8ad5924dcef6c25a3455230c46a4caa</hash>
      	<hash type="sha1">8094b506b9115abc2eb174a35e8bc84b8f72f0a9</hash>
      	<hash type="sha256">2104ed8aa2f4af920c1669585eeaabb0c94ace6cb92e67cbd3ab04b2bb7356b5</hash>
      	<pieces length="262144" type="sha1">
            <hash piece="0">8094b506b9115abc2eb174a35e8bc84b8f72f0a9</hash>

Or, an example with PGP signature:

        <signature type="pgp" file="openSUSE-11.2-NET-i586.iso.asc">
Version: GnuPG v1.0.7 (GNU/Linux)


        <hash type="md5">bfb98c4b2e079f9d147b53d3fc9495c5</hash>
        <hash type="sha1">8e5854c6e00b7a0f124c3060da4184e6d5f8d6b2</hash>
        <hash type="sha256">9de4a0b44f7c474929ece46481a783500078fb3b2f05b885069a74aff198fc7f</hash>
        <pieces length="524288" type="sha1">
            <hash piece="0">8fc60d0c4918bf53ad7858196633b05a4ca4b060</hash>
            <hash piece="1">a4860b0e708063900253be974e7ebcd9a50c660b</hash>
            <hash piece="214">11ddc7c352d9f017f6c00822c3d4be540dc9ad38</hash>
            <hash piece="215">e5ec3ff4693175b7da90f3bc1fdf1ee5d7f3f20a</hash>
            <hash piece="216">897256b6709e1a4da9daba92b6bde39ccfccd8c1</hash>

That was fine so far because Apache just has to open the file and can directly write it to the 
network, while sending the metalink. 

Now, we'll need access to the individual data in that snippet. Thus, the format of storing the data 
needs to be changed (or the XML parsed by Apache, but that sounds an ugly option). I'm thinking of a 
text-based format. It should be optimized for parsing with low overhead.

Maybe a simple series of null-terminated strings, and no newlines (because then we can store the 
multi-line PGP signature string without modifications):

hash <type> <hash string>\0
hash <type> <hash string>\0
hashpieces <type> <length>\0
hashpiece 0 <hash string>\0
hashpiece 1 <hash string>\0
hashpiece 2 <hash string>\0
pgp <signature string with embedded newlines>\0

Maybe, if looking around a bit in Apache, something else springs to eye which is suitable for 
reading in the data quickly.

Once the data can be read in once & quickly during the request processing phase, it's available to 
do the most wonderful things. In particular we can also easily implement a "checksum server" that 
returns a checksum for any file when .md5 or .sha1 or .sha256 is appended to an URL. And of course 
we can send instance digests, as requested here.
msg109 (view) Author: poeml Date: 2009-12-11.21:43:08
Additional thought about the hash store file format: An identifier and version 
number should added to the beginning.
msg136 (view) Author: poeml Date: 2010-03-08.20:46:21
Cf. issue #40, where the hash cache redesign is tracked now.
msg159 (view) Author: poeml Date: 2010-03-12.02:51:00
Issue 40 which was blocking this issue is mostly done.
msg220 (view) Author: poeml Date: 2010-09-05.23:52:30
A lot of groundwork for this has been done:

- issue 40 is resolved. The hash cache was moved from file-based into the 
- all required tools are there.

Now it is just a matter of using the data and writing it to HTTP headers.

This should be made configurable maybe - because it causes Apache nee a little bit 
more resources, which may or may not be desired. (Of course, it would be cool if 
it just "happens", and the default should probably be that the headers are 
msg366 (view) Author: ant Date: 2012-03-28.16:05:56
just to update this bug,

RFC 6249 ( ) describes this, Metalink/HTTP.

poeml, what should a student know about this bug? where to start within the source?

what configurable options do you need?
On (default) /Off
Amount of Mirrors to emit over HTTP?
which full file hashes to include? all known full file hashes, or just some?
msg367 (view) Author: poeml Date: 2012-03-30.12:12:28
status of this feature:

low-hanging fruit; all the hard work should be done already.

The following needs to be done:

- check the database if hashes exist. If yes, add them as HTTP headers
- add Link header to .asc file if available
- add Link header to .torrent file if available
- add Link header to .meta4/.metalink file
- add Link headers with links to mirrors

regarding the number of mirrors, it is important to limit it, or the HTTP
response could easily become huge. The top ten mirrors should be more than
enough. Luckily, there is a convenient get_n_best_mirrors() function in
mod_mirrorbrain to get the desired list of best mirrors.
msg376 (view) Author: poeml Date: 2012-04-11.19:41:52
So, HTTP replies can get long now -- here's what it looks in my testing now:

HTTP/1.1 302 Found
Date: Wed, 11 Apr 2012 19:40:39 GMT
Server: Apache/2.2.17 (Linux/SUSE)
X-AS: 8422
X-MirrorBrain-Realm: country
Link: <>; rel=describedby; type="application/metalink4+xml"
Link: <>; rel=describedby; type="application/pgp-signature"
Link: <>; rel=describedby; type="application/x-bittorrent"
Link: <>; rel=duplicate; 
pri=1; geo=de
Link: <>; rel=duplicate; 
pri=2; geo=de
Link: <>; rel=duplicate; pri=3; geo=de
Link: <>; rel=duplicate; pri=4; geo=de
Link: <>; rel=duplicate; pri=5; 
Digest: MD5=mertNzkLoFcfjShYKf9j/A==
Digest: SHA=SXw8fhX2ZMHasmbFbSWjpeUn/bQ=
Digest: SHA-256=WVwzYHQVWTdFBKJacO4Bz2Fz60XHjtpLf0IG9KRuOjM=
Content-Type: text/html; charset=iso-8859-1
msg379 (view) Author: poeml Date: 2012-04-14.21:46:18
I think this bug can be closed. Code will be included in the next release.
Date User Action Args
2012-04-14 21:46:18poemlsetstatus: testing -> resolved
messages: + msg379
2012-04-11 19:41:53poemlsetstatus: chatting -> testing
messages: + msg376
2012-03-30 12:12:28poemlsetmessages: + msg367
2012-03-28 16:05:56antsetmessages: + msg366
title: add support for mirrors and checksums in Link headers -> add support for mirrors and checksums in Link headers (RFC 6249)
2010-09-05 23:53:14poemlsetassignedto: poeml
2010-09-05 23:52:30poemlsetmessages: + msg220
2010-03-12 02:51:00poemlsetmessages: + msg159
2010-03-08 20:46:21poemlsetmessages: + msg136
2009-12-11 21:43:08poemlsetmessages: + msg109
2009-12-11 21:41:21poemlsetmessages: + msg108
2009-11-04 16:32:24antsetnosy: + ant
2009-10-09 00:49:17poemlcreate