[mirrorbrain] Documenting Metalink/XML client, publishers, & cache behavior

From: Anthony Bryan <anthonybryan_at_gmail.com>
Date: Tue, 22 May 2012 18:55:24 -0400
while thinking about curl, wget, Chrome & Firefox adding Metalink
support, I realized there is no documentation for Metalink/XML
clients, just the XML format in RFC 5854. what about publishers & caches?

here are some initial thoughts...obviously the publisher section is
most important regarding MirrorBrain, but it's interaction w/ caches
could be too.

what do you think? is there more to add?

Metalink/XML publishers

MUST use correct MIME type for metalink files

SHOULD advertise Metalink/XML file with Link HTTP header field from
regular download for "transparent metalink" usage ( Link:
<http://example.com/example.ext.meta4>; rel=describedby;
type="application/metalink4+xml" )

SHOULD publish with chunk hashes if error recovery ability is desired
(and files meet certain criteria like "large enough" - no point for
10k size file).

MAY do Accept header transparent content negotiation (deprecated?)

SHOULD use Metalink/XML origin element and dynamic="true" if updated
metalinks will be offered.

Metalink proxy cache consumers

whitelist for trusted sources by domain name (ie kde.org, ubuntu.com,
detect & log metalink usage so able to add to whitelist
SHOULD use preferred mirrors (those that are most cost efficient/better/local)
should they repair errors or use hashes? I guess so, but the client
will be verifying hashes too.

Metalink/XML clients

for some of the download behavior, RFC 6249, Section 7 could be edited
& re-used.

MUST! sanitize directory traversal information as specified in RFC
5854 Section

MUST process metalinks available by URI. MAY (or SHOULD?) process
local metalinks (like aria2's -M option).

MUST recognize by MIME type.

(what about misconfigured/unupdated server that does not have correct
MIME type?) SHOULD(?) client recognize metalink by file extension as

if HTTP client, MUST(?) support "transparent metalink" usage from
regular download to Metalink/XML advertised with Link header ( Link:
<http://example.com/example.ext.meta4>; rel=describedby;
type="application/metalink4+xml" )

if HTTP client, MAY do Accept header transparent content negotiation

if file with same name already exists, SHOULD verify full file hash
and if hash is correct, do not re-download the file?
if file exists and full file hash is incorrect, MAY repair file if
chunk hashes exist. otherwise, MAY write to other file name (file_2 or
file(2) like some apps already do).

SHOULD (or MUST?) verify full file hash after download completes. if
error, MUST describe as corrupted and MAY re-download or keep
SHOULD verify chunk hash if available and re-get error parts. SHOULD
(or MAY?) be done during initial download process, MAY be done after
download completed or to repair file downloaded another way?

SHOULD(?) use BitTorrent chunk hashes with HTTP/FTP downloads to
repair file if client supports torrents? (what if chunk hashes are
present in torrent and metalink, should one be preferred?)

if client supports Metalink/XML (3/4) AND Metalink/HTTP, which info
should be preferred (in case they differ)?

SHOULD make use of Metalink/XML origin element if dynamic="true" to
check for updated metalink?

(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads

mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
to the address mirrorbrain-request_at_mirrorbrain.org
Received on Tue May 22 2012 - 22:55:45 GMT

This archive was generated by hypermail 2.3.0 : Thu May 31 2012 - 10:17:03 GMT