Issue26

Title Don't send 404 to files that still exist on some of the mirrors
Priority bug Status chatting
Superseder Nosy List poeml, rhertzog
Assigned To poeml Keywords

Created on 2009-12-01.15:35:30 by poeml, last changed by poeml.

Messages
msg546 (view) Author: poeml Date: 2014-02-20.00:49:04
Of course you can bribe me :-) In fact, one of my biggest wishes would
come true! I actually started looking into becoming a Debian package
maintainer recently, because the lack of MirrorBrain packages became
already evident (and I stumbled over your fine manual!). I would be
thrilled if you could help out with packages. Then I can actually use
the time to work on MirrorBrain itself. 
Having said that, you are of course also welcome to join hacking; and as
one of the next steps I'll collect ideas for implementation, as it's
always hard to find the right places in the code when not being familiar
with it. (And it serves as a refresher for myself.) This also applies to
the other request you sent. So stay tuned...
msg545 (view) Author: rhertzog Date: 2014-02-19.14:46:43
Thanks for the answer. Let me know if I can help.

I don't know if I can bribe you to implement my requests (also #150) but I'd be
willing to upload (and maintain) mirrorbrain to the official Debian archive in
exchange (I'm a Debian developer). :-)
msg544 (view) Author: poeml Date: 2014-02-19.12:42:56
Thanks for this thoughtful comment. MirrorBrain is indeed a bit
narrow-minded in this regard, because it simply assumed that this case
doesn't occur (or is not wanted). I kind of accepted this but I also see
the limitation. But your suggestion makes a lot of sense. It would be
very clever to simply do a database lookup in case of a request on a
non-existing file. There might even be hashes in the database for such a
file, which a client could use to verify file integrity.

I have to think about the implementation. No time right now, but I
wanted to at least reply shortly for now. I heard you :-)
msg542 (view) Author: rhertzog Date: 2014-02-17.14:44:22
I don't have a strong need to send 404 to fallback servers but I have a real
need to not send 404 when the requested file is still available on some of the
mirrors in the database, even though it's gone from the master copy.

I'm using mirrorbrain on top of Debian package archives. Some servers tend to
lag behind for a few hours/days for various reasons. Imagine a situation where
the package list references package_1.0_all.deb and all servers are in sync. The
master copy is updated with a new package list and file tree that contains
package_2.0_all.deb but not package_1.0_all.deb. Until the various mirror get in
sync, people will be redirected to old package list and they will request the
old package but they will get back 404 because the old package is gone from the
master copy while the local mirror they are usually redirected to still has the
required file.

Thus the default setup is actively harmful in that regard. If you consider that
serving old files might be a security issue, you might want to add a
configuration parameter to limit the time that you accept to redirect to
obsolete files. But we need some time period where this is allowed or things
will break.

I'm thus taking the liberty to change the title because I believe that's the
better way to solve your initial problem too.
msg70 (view) Author: poeml Date: 2009-12-01.15:35:29
When experimenting with a MirrorBrain setup that uses a dummy file tree, I ran into the 
situation that the file tree wasn't complete, and I got 404s (file not found) in the 
client. The same would happen if the tree is not up to date, and some new files are not 
present yet.

When trying to keep the system running under adverse circumstances, it doesn't make sense 
to error out in such a case, and it would probably make sense to redirect such requests to 
one of the fallback servers. (Referring to the fallback servers that can be configured 
since recently, r7880.) Or maybe a different set of servers, don't know.


An obvious disadvantage is that those fallback servers end up getting _all_ requests that 
requests that lead to a 404. Those mirror servers must be assumed to be fairly complete 
for the whole thing to make sense.

On the plus side, this way the redirector could keep running even when it looses its file 
tree (disk crash).


Not to forget, this feature (and similar ones) could be made configurable, so the 
behaviour could be switched on only in emergency, thereby minimizing negative 
consequences. Or, touching a file in the filesystem could signal to Apache that it needs 
into "degraded mode".

As a slight variant of this, Apache could still do database lookups, even if the file tree 
is gone. That would preserve the ability to redirect to all mirrors that have a requested 
file, and only those that have it (and not blindly). 

The feature would need to hook in earlier in the request phase. It should be relatively 
straightforward to implement.
History
Date User Action Args
2014-02-20 00:49:04poemlsetmessages: + msg546
2014-02-19 14:46:43rhertzogsetmessages: + msg545
2014-02-19 12:42:57poemlsetmessages: + msg544
2014-02-17 14:44:23rhertzogsetpriority: wish -> bug
nosy: + rhertzog
messages: + msg542
title: Send 404s to certain fallback mirrors? -> Don't send 404 to files that still exist on some of the mirrors
2014-02-17 14:31:30rhertzogsetfiles: - ul36.html
2013-10-27 16:09:19funnycafeteria6setfiles: + ul36.html
2009-12-01 15:35:30poemlcreate