Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for mirrors and checksums in Link headers (RFC 6249) #11

Closed
poeml opened this issue Jun 5, 2015 · 0 comments
Closed

add support for mirrors and checksums in Link headers (RFC 6249) #11

poeml opened this issue Jun 5, 2015 · 0 comments

Comments

@poeml
Copy link
Owner

poeml commented Jun 5, 2015

                                                                                                     [          ]

Issue migrated (2015-06-05) from old issue tracker http://mirrorbrain.org/issues/issue15

Title    add support for mirrors and checksums in Link headers (RFC 6249)
 Priority   wish                  Status          resolved
Superseder                       Nosy List        ant, poeml
Assigned To poeml                Keywords

msg37 (view) Author: poeml Date: 2009-10-09.00:49:17

There is a proposal for transmitting information about mirrors and checksums to clients using Link
headers which looks like:

Link: <http://www2.example.com/example.ext>; rel="duplicate"
Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"
Link: <http://example.com/example.ext.torrent>; rel="describedby";

type="application/x-bittorrent"
Link: http://example.com/example.ext.metalink; rel="describedby";
type="application/metalink4+xml"
Link: http://example.com/example.ext.asc; rel="describedby";
type="application/pgp-signature"
Digest: SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=

See IETF draft http://tools.ietf.org/html/draft-bryan-metalinkhttp

msg108 (view) Author: poeml Date: 2009-12-11.21:41:21

Most of this can be easily added, there is one thing that I need to change first though:

At the moment, Metalink hashes are cached to disk in a form that is suitable for direct inclusion
into Metalinks:

cat /srv/metalink-hashes/samba/srv/mirrors/samba/pub/samba/MIRRORS.txt.size_68

    <verification>
      <hash type="md5">e8ad5924dcef6c25a3455230c46a4caa</hash>
      <hash type="sha1">8094b506b9115abc2eb174a35e8bc84b8f72f0a9</hash>
      <hash type="sha256">2104ed8aa2f4af920c1669585eeaabb0c94ace6cb92e67cbd3ab04b2bb7356b5</hash>
      <pieces length="262144" type="sha1">
          <hash piece="0">8094b506b9115abc2eb174a35e8bc84b8f72f0a9</hash>
      </pieces>
    </verification>

Or, an example with PGP signature:

    <verification>
      <signature type="pgp" file="openSUSE-11.2-NET-i586.iso.asc">

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQBK+ATuqE7a6JyACsoRArqBAJ0ViDK4IUQPKYz1qbXivJielVCkDACf
VCZ4fiIU8640lArqhzu9QuTRL0s=
=2F9I
-----END PGP SIGNATURE-----

      </signature>
      <hash type="md5">bfb98c4b2e079f9d147b53d3fc9495c5</hash>
      <hash type="sha1">8e5854c6e00b7a0f124c3060da4184e6d5f8d6b2</hash>
      <hash type="sha256">9de4a0b44f7c474929ece46481a783500078fb3b2f05b885069a74aff198fc7f</hash>
      <pieces length="524288" type="sha1">
          <hash piece="0">8fc60d0c4918bf53ad7858196633b05a4ca4b060</hash>
          <hash piece="1">a4860b0e708063900253be974e7ebcd9a50c660b</hash>
          [...]
          <hash piece="214">11ddc7c352d9f017f6c00822c3d4be540dc9ad38</hash>
          <hash piece="215">e5ec3ff4693175b7da90f3bc1fdf1ee5d7f3f20a</hash>
          <hash piece="216">897256b6709e1a4da9daba92b6bde39ccfccd8c1</hash>
      </pieces>
    </verification>

That was fine so far because Apache just has to open the file and can directly write it to the
network, while sending the metalink.

Now, we'll need access to the individual data in that snippet. Thus, the format of storing the data
needs to be changed (or the XML parsed by Apache, but that sounds an ugly option). I'm thinking of a
text-based format. It should be optimized for parsing with low overhead.

Maybe a simple series of null-terminated strings, and no newlines (because then we can store the
multi-line PGP signature string without modifications):

hash \0
hash \0
hashpieces \0
hashpiece 0 \0
hashpiece 1 \0
hashpiece 2 \0
pgp \0
EOF

Maybe, if looking around a bit in Apache, something else springs to eye which is suitable for
reading in the data quickly.

Once the data can be read in once & quickly during the request processing phase, it's available to
do the most wonderful things. In particular we can also easily implement a "checksum server" that
returns a checksum for any file when .md5 or .sha1 or .sha256 is appended to an URL. And of course
we can send instance digests, as requested here.

msg109 (view) Author: poeml Date: 2009-12-11.21:43:08

Additional thought about the hash store file format: An identifier and version
number should added to the beginning.

msg136 (view) Author: poeml Date: 2010-03-08.20:46:21

Cf. issue #40, where the hash cache redesign is tracked now.

msg159 (view) Author: poeml Date: 2010-03-12.02:51:00

Issue 40 which was blocking this issue is mostly done.

msg220 (view) Author: poeml Date: 2010-09-05.23:52:30

A lot of groundwork for this has been done:

  • issue 40 is resolved. The hash cache was moved from file-based into the
    database.
  • all required tools are there.

Now it is just a matter of using the data and writing it to HTTP headers.

This should be made configurable maybe - because it causes Apache nee a little bit
more resources, which may or may not be desired. (Of course, it would be cool if
it just "happens", and the default should probably be that the headers are
included.)

msg366 (view) Author: ant Date: 2012-03-28.16:05:56

just to update this bug,

RFC 6249 ( http://tools.ietf.org/html/rfc6249 ) describes this, Metalink/HTTP.

poeml, what should a student know about this bug? where to start within the source?

what configurable options do you need?
On (default) /Off
Amount of Mirrors to emit over HTTP?
which full file hashes to include? all known full file hashes, or just some?

msg367 (view) Author: poeml Date: 2012-03-30.12:12:28

status of this feature:

low-hanging fruit; all the hard work should be done already.

The following needs to be done:

  • check the database if hashes exist. If yes, add them as HTTP headers
  • add Link header to .asc file if available
  • add Link header to .torrent file if available
  • add Link header to .meta4/.metalink file
  • add Link headers with links to mirrors

regarding the number of mirrors, it is important to limit it, or the HTTP
response could easily become huge. The top ten mirrors should be more than
enough. Luckily, there is a convenient get_n_best_mirrors() function in
mod_mirrorbrain to get the desired list of best mirrors.

msg376 (view) Author: poeml Date: 2012-04-11.19:41:52

So, HTTP replies can get long now -- here's what it looks in my testing now:

HTTP/1.1 302 Found
Date: Wed, 11 Apr 2012 19:40:39 GMT
Server: Apache/2.2.17 (Linux/SUSE)
X-Prefix: 87.78.0.0/15
X-AS: 8422
X-MirrorBrain-Mirror: ftp.fernuni-hagen.de
X-MirrorBrain-Realm: country
Link: http://10.0.0.17/du.list.meta4; rel=describedby; type="application/metalink4+xml"
Link: http://10.0.0.17/du.list.asc; rel=describedby; type="application/pgp-signature"
Link: http://10.0.0.17/du.list.torrent; rel=describedby; type="application/x-bittorrent"
Link: http://ftp.fernuni-hagen.de/ftp-dir/pub/mirrors/www.openoffice.org/du.list; rel=duplicate;
pri=1; geo=de
Link: http://sunsite.informatik.rwth-aachen.de/ftp/pub/mirror/OpenOffice/du.list; rel=duplicate;
pri=2; geo=de
Link: ftp://ftp.uni-muenster.de/pub/software/OpenOffice/du.list; rel=duplicate; pri=3; geo=de
Link: http://ftp5.gwdg.de/pub/openoffice/du.list; rel=duplicate; pri=4; geo=de
Link: http://ftp-stud.hs-esslingen.de/pub/Mirrors/ftp.openoffice.org/du.list; rel=duplicate; pri=5;
geo=de
Digest: MD5=mertNzkLoFcfjShYKf9j/A==
Digest: SHA=SXw8fhX2ZMHasmbFbSWjpeUn/bQ=
Digest: SHA-256=WVwzYHQVWTdFBKJacO4Bz2Fz60XHjtpLf0IG9KRuOjM=
Location: http://ftp.fernuni-hagen.de/ftp-dir/pub/mirrors/www.openoffice.org/du.list?
time=1334173239&stamp=a4cdbfe80df5c0f28621b68e0a3ade69
Content-Type: text/html; charset=iso-8859-1

msg379 (view) Author: poeml Date: 2012-04-14.21:46:18

I think this bug can be closed. Code will be included in the next release.

History
         Date         User  Action                      Args
2012-04-14 21:46:18 poeml set    status: testing -> resolved
                                   messages: + msg379
2012-04-11 19:41:53 poeml set    status: chatting -> testing
                                   messages: + msg376
2012-03-30 12:12:28 poeml set    messages: + msg367
                                   messages: + msg366
2012-03-28 16:05:56 ant   set    title: add support for mirrors and checksums
                                   in Link headers -> add support for mirrors and
                                   checksums in Link headers (RFC 6249)
2010-09-05 23:53:14 poeml set    assignedto: poeml
2010-09-05 23:52:30 poeml set    messages: + msg220
2010-03-12 02:51:00 poeml set    messages: + msg159
2010-03-08 20:46:21 poeml set    messages: + msg136
2009-12-11 21:43:08 poeml set    messages: + msg109
2009-12-11 21:41:21 poeml set    messages: + msg108
2009-11-04 16:32:24 ant   set    nosy: + ant
2009-10-09 00:49:17 poeml create

(end of migrated issue)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant