Re: [mirrorbrain] Re: Repository delta download und Mirrorbrain 2.13

From: Michael Schroeder <mls_at_suse.de>
Date: Thu, 21 Oct 2010 10:57:51 +0200
On Tue, Oct 19, 2010 at 05:54:17PM +0200, Peter Pöml wrote:
> Oh, there is no reason to not touch the current zsync support - it is
> experimental, documented as such, and not configured by default. IMO it
> is better to improve it and make it more flexible, than to add a second
> implementation which is even less flexible, and duplicates code. It also
> duplicates data in the database, and CPU time in calculating the hashes.
> Especially the code maintenance aspect would give me a headache. It's
> not as if this project had too much manpower ;-)

The difference is that mirrorbrain's zsync support is written with
the goal to deliver a "zsync" file. I just need the extra info in
the metalink.

> > Btw, you really should put the hash type of the pieces in the database
> > and not assume that sha-1 is always used.
> 
> I disagree here. The database columns for piecewise hashes are clearly
> named "sha1pieces" and "sha1piecesize" which makes sure that there is no
> misunderstanding. One doesn't need to fill them or use them, of course.

That's ok with me. I just wanted to point out that you'll need to
modify the database layout for every new hash type that you'll
want to support.

> > Only if you use mirrorbrain's zsync support, which I don't.
> 
> If there are other reasons not to use it, except fear of breaking
> it, could you explain them? Maybe there is something that I miss.

I don't need it, as I don't run zsync. I do metalink downloads
with the additional feature to not download blocks that are
already there. For this I need a single part of the zsync file, 
the "rsums".

> > Oh sure, as it's supported by the standard.
> > 
> > A cleaner implementation would be to
> > - put the checksum type for the pieces in the database
> > - use "sha-1+zsync" as type with a hash size of 20+4 bytes instead
> >   of appending the zsync part.
> 
> Hm, sha-1+zsync (i.e., combining two hashes into one) doesn't sound like
> a very clean thing to me, and unlikely to be supported by anyone else.
> Do you think this could become a standard?

No, I think the type should be more like "array of strings". But
it's really up to you how the database should look like.

> If you suggest that hashes (with same block size) are combined, because
> may seem cleaner to you, I think that may make sense only for clients
> that support all of the combined hashes, and the way you combine them
> (which is not in any standard yet).

Ah no, this has nothing to do with the client. It's just a way of
storing the data. You could do the same with the "complete file"
hash.

> We also don't put a hash of type "sha-1+sha-256+md5+pgp" into the
> Metalink, consisting of 20+32+16+x bytes, or varying par jour, but
> instead we list them separately -- because in most people's book it
> seems to be easier to handle and cleaner.

Yes, that's ok with me. You need to add more columns for each new
hash type, though.

> Fair enough, but then I propose to make this configurable. Not everybody
> would want to have the additional data sent out with every metalink, I
> suppose. (But that's the smallest & easiest part :)

Oh, of course. I didn't say that you should integrate that patch
(as it is a hack). I just sent it to you so that you know what
we currently use.

> So from what I see, it should be possible that you rework your patch to
> use the existing data column, and calculates hashes once instead of
> twice. Or are there further obstacles with that approach?

Frankly I don't have time to rework it. I don't think it makes
sense to misuse the zsync columns, if you want to integrate it
you should probably add a new "zsyncpieces" column.

Cheers,
  Michael.

-- 
Michael Schroeder                                   mls_at_suse.de
SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}


_______________________________________________
mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-request_at_mirrorbrain.org
Received on Thu Oct 21 2010 - 11:33:07 GMT

This archive was generated by hypermail 2.3.0 : Thu Oct 21 2010 - 12:17:09 GMT