[mirrorbrain] Re: Repository delta download und Mirrorbrain 2.13

From: Michael Schroeder <mls_at_suse.de>
Date: Tue, 19 Oct 2010 12:09:35 +0200
On Tue, Oct 19, 2010 at 04:26:58AM +0200, Peter Pml wrote:
> This change (and the hunk above) breaks the configuration of the chunk
> size via /etc/mirrorbrain.conf, where the value is read from (or
> substituted by 262144), see mb/mb/conf.py.
> If the same chunk size is not good for zsync (which I presume), it is
> still good for torrents and important for local adjustment.
> I suggest that either the same chunk size is used (if that makes sense),
> or a separate variable (zsync_chunk_size) is introduced to the
> configuration. You only need to add it to the DEFAULTS dictionary in
> mb/mb/conf.py, I think. See 

Oh yes, setting the chunk size to 65536 is just a quick hack I did.

> I have difficulty understanding the above two hunks. You append the
> zsync hashes to the piece-wise hashes that are used in torrents and
> metalinks? Really?

Absolutely. ;-) I needed the patch to be compatible to older
mirrorbrain versions so I just appended the zsync hashes.
As I said in my mail, this is just a hack to get it working again.
I didn't want to extend the database with yet another field.

Btw, you really should put the hash type of the pieces in the
database and not assume that sha-1 is always used.

> There already is a database field for zsync hashes, zsums, which is
> filled with self.hb.zsums. What did prevent you from using that?

Cause I didn't want to break mirrorbrain's own zsync support.

> > +    def get_zsync_digest(self, buf, blocksize):
> > +        if len(buf) < blocksize:
> > +            buf = buf + ( '\x00' * ( blocksize - len(buf) ) )
> > +        r = zsync.rsum06(buf)
> > +        return "%02x%02x%02x%02x" % (ord(r[3]), ord(r[2]), ord(r[1]), ord(r[0]))
> > +
> This looks to me as if zsync checksumming is now done twice. That's a
> waste.

Only if you use mirrorbrain's zsync support, which I don't.

> >      def calc_btih(self):
> >          """ calculate a bittorrent information hash (btih) """
> >  
> Before I read on, I'd like to raise a general question regarding your
> patch. Do you think it is a good idea to extend the Metalink with a
> further <pieces> section?

Oh sure, as it's supported by the standard.

A cleaner implementation would be to
- put the checksum type for the pieces in the database
- use "sha-1+zsync" as type with a hash size of 20+4 bytes instead
  of appending the zsync part.

> Which clients support this? I fear that there is only one, which is the
> openSUSE download client. I don't have anything against it, but before
> enlarging the database, adding code that has to be maintained for a long
> time, which costs work, I would like to see how this could be done best.

It makes sense to support multiple hash types in the pieces for the same
reasons as with the hashes for the complete file.

> For the moment, I wonder why the existing zsync support is not good
> enough. What's wrong with it? Should it not be fixed/improved, instead
> of adding a second database field with further zsync hashes?

No, I don't want to do another request for a zsync file (which also
contains the block checksums again, thus is wasteful). For our
use case (minimizing transfer and latency) it's best to have them
in the metalink file.

> The existing zsync hash support works with the standard zsync client,
> while your proposed patch does not, since the hashes are contained in
> a Metalink (which only Metalink clients can read, which is not the case
> for the zsync client). 

I don't mind, as I don't use a standard zsync client.


Michael Schroeder                                   mls_at_suse.de
SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg

mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
to the address mirrorbrain-request_at_mirrorbrain.org
Received on Tue Oct 19 2010 - 11:02:37 GMT

This archive was generated by hypermail 2.3.0 : Tue Oct 19 2010 - 16:17:08 GMT