Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash cache needs to be more flexible #40

Closed
poeml opened this issue Jun 5, 2015 · 0 comments
Closed

Hash cache needs to be more flexible #40

poeml opened this issue Jun 5, 2015 · 0 comments

Comments

@poeml
Copy link
Owner

poeml commented Jun 5, 2015

                                                                                           [          ]

Issue migrated (2015-06-05) from old issue tracker http://mirrorbrain.org/issues/issue40

Title    Hash cache needs to be more flexible
 Priority   feature     Status    resolved
Superseder             Nosy List  ant, poeml
Assigned To poeml      Keywords

msg135 (view) Author: poeml Date: 2010-03-08.20:44:36

The hash cache is too inflexible, in its current on-disk format. It was fine in the past,
where Apache included the ocntents into v3 Metalinks. The snippets on disk were prepared
just for that. However, it's difficult to add further features like

  • hashes in HTTP headers
  • inclusion of hashes into RFC Metalinks (different format)
  • inclusion of hashes into the mirror lists
  • building a "hash server" (append .md5 to any URL and get the md5 sum)

So this is blocking several good things that could be done.

Issue 15 contains some ramblings about this, but let's track this change here.

I currently think that moving the hash into the database might be best. It would definitely
a flexible option without the need to invent an on-disk format and write parsers for it.
Also, it would make the data available to a web frontend.

Before the on-disk format is dropped, we can try how well it works with the database.

As a first step, I have now transferred all functionality from the external metalink-hasher
script into the "mb" tool. Thus, now the database functionality is available for no cost.

msg148 (view) Author: poeml Date: 2010-03-10.00:17:41

In svn trunk, there is now working code that saves the hashes also to the
database. Seems like a good step forward. The code needs more testing to become
robust enough to be used by mod_mirrorbrain.

msg150 (view) Author: poeml Date: 2010-03-11.23:53:05

This is largely done.

Code in metalink-hasher seems to work well, and creates hashes in the
database in addition to the on-disk storage which we keep available for
transition.

The new hashes in the database are not cleaned up yet, if they become
obsolete. Maybe "mb db vacuum" should become involved in the cleanup,
but it would need to look into the file tree for that. It's probably
needed to let mb makehashes clean up per directory. Otherwise files
could very quickly accumulate.

mod_mirrorbrain uses the new hashes from the database and falls back to
on-disk hashes for transition. The new hashes are already used in old
Metalinks, new Metalinks, and also in the mirror lists!

msg160 (view) Author: poeml Date: 2010-03-12.02:57:44

Note to self: need to check whether empty files (0 byte size) are still
handled correctly, or if they need a special case.

msg182 (view) Author: poeml Date: 2010-04-23.03:03:42

What's also missing is a way to switch off (or on) (per /etc/mirrorbrain.conf)
generation of the "expensive" hashes, like torrents and zsync. Maybe with a file
mask or list of directories.

msg204 (view) Author: poeml Date: 2010-09-01.16:13:33

Generation of hashes for zsync and torrents can now be (separately) switched off
in /etc/mirrorbrain.conf.

For the zsync hashes, the default is "off", because Apache currently allocates
large amounts of memory for these large data.

On another note, empty files seem to be handled as they should.

Hence, I regard this bug resolved.

History
         Date         User  Action              Args
2010-09-01 16:13:33 poeml set    status: testing -> resolved
                                   messages: + msg204
2010-04-23 03:05:37 poeml set    status: in-progress -> testing
2010-04-23 03:03:42 poeml set    messages: + msg182
2010-03-29 06:44:31 ant   set    nosy: + ant
2010-03-12 02:57:45 poeml set    messages: + msg160
2010-03-11 23:53:06 poeml set    messages: + msg150
2010-03-10 00:17:41 poeml set    messages: + msg148
2010-03-08 20:44:37 poeml create

(end of migrated issue)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant