Hi Micha, On Tue, Oct 19, 2010 at 12:09:35PM +0200, Michael Schroeder wrote: > On Tue, Oct 19, 2010 at 04:26:58AM +0200, Peter Pöml wrote: > Oh yes, setting the chunk size to 65536 is just a quick hack I did. Yup, I understood that meanwhile. > > I have difficulty understanding the above two hunks. You append the > > zsync hashes to the piece-wise hashes that are used in torrents and > > metalinks? Really? > > Absolutely. ;-) I needed the patch to be compatible to older > mirrorbrain versions so I just appended the zsync hashes. > As I said in my mail, this is just a hack to get it working again. > I didn't want to extend the database with yet another field. Oh, there is no reason to not touch the current zsync support - it is experimental, documented as such, and not configured by default. IMO it is better to improve it and make it more flexible, than to add a second implementation which is even less flexible, and duplicates code. It also duplicates data in the database, and CPU time in calculating the hashes. Especially the code maintenance aspect would give me a headache. It's not as if this project had too much manpower ;-) > Btw, you really should put the hash type of the pieces in the database > and not assume that sha-1 is always used. I disagree here. The database columns for piecewise hashes are clearly named "sha1pieces" and "sha1piecesize" which makes sure that there is no misunderstanding. One doesn't need to fill them or use them, of course. Since the piecewise hashes are not used for final verification (which a top hash is used for), there is no big incentive to offer a lot of types of hashes here - and it could cost a lot of space anyway. If another type of piecewise hash is added, it should obviously not go into the wrong column but get its own. Overloading the one column is not a good idea, just as there currently several separate columns for different types of data in order to access them in an efficient way. If you looked at the database scheme, you'll have noticed that there's more than a single column named "blob". ;-) There is a defined layer of abstraction for storing this kind of data, and I don't think there is a need to leave that and implement a "microformat" on a different level. > > There already is a database field for zsync hashes, zsums, which is > > filled with self.hb.zsums. What did prevent you from using that? > > Cause I didn't want to break mirrorbrain's own zsync support. > Compatibility... Don't worry -- that part is very experimental still. And what would break anyway? So scared? ;-) > > This looks to me as if zsync checksumming is now done twice. That's a > > waste. > > Only if you use mirrorbrain's zsync support, which I don't. If there are other reasons not to use it, except fear of breaking it, could you explain them? Maybe there is something that I miss. > Oh sure, as it's supported by the standard. > > A cleaner implementation would be to > - put the checksum type for the pieces in the database > - use "sha-1+zsync" as type with a hash size of 20+4 bytes instead > of appending the zsync part. Hm, sha-1+zsync (i.e., combining two hashes into one) doesn't sound like a very clean thing to me, and unlikely to be supported by anyone else. Do you think this could become a standard? If you suggest that hashes (with same block size) are combined, because may seem cleaner to you, I think that may make sense only for clients that support all of the combined hashes, and the way you combine them (which is not in any standard yet). By gut feels, I would assume however that there will be clients that support not all of the hashes, which would make it difficult for them cope with the metadata. Therefore, I think it is cleaner to keep the hashes separate (and also less error-prone, and less work for programmers). Just like you did, adding them alongside other hashes. We also don't put a hash of type "sha-1+sha-256+md5+pgp" into the Metalink, consisting of 20+32+16+x bytes, or varying par jour, but instead we list them separately -- because in most people's book it seems to be easier to handle and cleaner. Or to rephrase this, there is a defined level to abstract the metadata (the IETF Metalink), and while introducing a microformat side may be creative, it breaks out of the scheme. It would require different ways of parsing, and it would break backwards compatibility, which the current way to transmit the metadata doesn't. > > For the moment, I wonder why the existing zsync support is not good > > enough. What's wrong with it? Should it not be fixed/improved, instead > > of adding a second database field with further zsync hashes? > > No, I don't want to do another request for a zsync file (which also > contains the block checksums again, thus is wasteful). For our > use case (minimizing transfer and latency) it's best to have them > in the metalink file. Fair enough, but then I propose to make this configurable. Not everybody would want to have the additional data sent out with every metalink, I suppose. (But that's the smallest & easiest part :) > > The existing zsync hash support works with the standard zsync client, > > while your proposed patch does not, since the hashes are contained in > > a Metalink (which only Metalink clients can read, which is not the case > > for the zsync client). > > I don't mind, as I don't use a standard zsync client. You might not mind, but I do. I *do* care :-) So from what I see, it should be possible that you rework your patch to use the existing data column, and calculates hashes once instead of twice. Or are there further obstacles with that approach? Peter _______________________________________________ mirrorbrain mailing list Archive: http://mirrorbrain.org/archive/mirrorbrain/ Note: To remove yourself from this mailing list, send a mail with the content unsubscribe to the address mirrorbrain-request_at_mirrorbrain.orgReceived on Tue Oct 19 2010 - 15:54:25 GMT
This archive was generated by hypermail 2.3.0 : Thu Oct 21 2010 - 11:47:06 GMT