Issue83

Title Bittorent download does not always work
Priority bug Status chatting
Superseder Nosy List Kelson, poeml
Assigned To poeml Keywords

Created on 2011-02-21.17:40:17 by Kelson, last changed by poeml.

Messages
msg381 (view) Author: poeml Date: 2012-04-15.20:40:20
for cross reference: https://issues.jenkins-ci.org/browse/JENKINS-12297
msg380 (view) Author: poeml Date: 2012-04-15.20:36:22
Is the situation here still okay for you? Then I would like to close the bug.

I can only recommend the worker MPM. If PHP is used, it is required however to 
switch from the in-process mod_php to mod_fcgid (or a similar model where PHP 
interpreters are started separately from Apache processes). mod_php is not 
compatible with threaded MPMs like worker. If you have hundreds of processes, it 
sounds as if the server could save quite some resources by switching to such a 
model. Most PHP apps work flawlessly with such a setup. 

The thing is that with 500 Apache processes, mod_php will mean there are 500 PHP 
processes too, and since database connections can be shared in a pool only with 
the worker MPM, but not with the prefork MPM, there'll be 500 database 
connections, too. There's no way around that. 

But even with the prefork MPM, having hundreds of processes sounds wrong. Maybe 
it must be that way - I don't know your setup - but usually, most of the 
hundreds of processes are just idling and mainly keeping connections open for 
far too long. Reducing the KeepaliveTimeout from 15 seconds to one or two, or 
setting keepalive off, can often work wonders.
msg325 (view) Author: Kelson Date: 2011-02-26.12:46:36
So, there is like you have written a direct relation between the number of
parallel Apache processes and the pgsql connection count problem.

I didn't switch to MPM Worker because I don't know it.

But I increased massively the max_connections config to 500 (and also modify the
kernel config).

It seems to work now and I keep monitoring this problem.

It would be maybe interesting to warn user in the mirrorbrain documentation
about this max_connection "issue". 100 or 200 apache processes are not so
difficult to get on the prod. server.

Thank Poeml for you help.
msg322 (view) Author: poeml Date: 2011-02-23.05:09:29
It fits to our other findings that the pgsql logfile says that the prepared 
statement "mirrorbrain_dbd_hash_18" does not exist. 

A first question should be, why do you have so many Apache children? I would 
suggest now to make sure that KeepAliveTimeout is not set to a high value like 
15, but to 2. You could even switch it off. You can check the server status 
(/server-status via mod_status) to see how many processes are in keepalive 
state. 

If a large number of Apache processes is unavoidable, e.g. because there are 
really many long-running requests in parallel, then the worker MPM would make 
sense. If mod_php is used, you'd need to switch to mod_fcgid (which will save 
memory from unused PHP interpreters at the same time).

It should not be needed, but if really required one can also configure 
PostgreSQL to open more connections (kernel tuning needed as well), but all 
those connections aren't really used, and the fact that Apache spawns "too" many 
children should be fixed first.

(You can ignore the pfx2asn message in the postgresql log.)
msg321 (view) Author: Kelson Date: 2011-02-22.22:27:57
About the mirrorbrain_dbd_hash_18, I have nothing additional in the apache2
error.log file. Nothing additional is tracked in the pgsql log file (only
oneline with the error concerning the prepared instruction
mirrorbrain_dbd_hash_18). But I do not have the error since yesterday! 

$ sudo cat /var/log/postgresql/postgresql-8.4-main.log | grep
mirrorbrain_dbd_hash | tail
2011-02-21 20:42:49 CET ERREUR:  l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:42:50 CET ERREUR:  l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:42:50 CET ERREUR:  l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:42:52 CET ERREUR:  l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:42:52 CET ERREUR:  l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:42:53 CET ERREUR:  l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:44:18 CET ERREUR:  l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:44:22 CET ERREUR:  l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:44:23 CET ERREUR:  l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas
2011-02-21 20:44:38 CET ERREUR:  l'instruction préparée «
mirrorbrain_dbd_hash_18 » n'existe pas

==================

I have upgraded max_connection to 200.
It seems that Ubuntu (in fact the kernel) is not really able to deal with more
than that.

I use the Apache2 prefork MPM.

I have currently difficulties to get errors, but I still see pretty much erros
in the pgsql log file, for example:

2011-02-22 09:34:03 CET FATAL:  limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:04 CET FATAL:  limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:04 CET FATAL:  limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:04 CET FATAL:  limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:04 CET FATAL:  limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:04 CET FATAL:  limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:08 CET FATAL:  limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:09 CET FATAL:  limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:17 CET FATAL:  limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:20 CET FATAL:  limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:34:26 CET FATAL:  limite de connexions dépassée pour les
utilisateurs standards
2011-02-22 09:35:01 CET ERREUR:  la relation « pfx2asn » n'existe pas au
caractère 131
2011-02-22 09:35:01 CET INSTRUCTION :  
	        SELECT pg_catalog.pg_get_constraintdef(oid) as condef
	        FROM pg_catalog.pg_constraint r
	        WHERE r.conrelid = 'pfx2asn'::regclass AND r.contype = 'f'
2011-02-22 09:36:02 CET ERREUR:  la relation « pfx2asn » n'existe pas au
caractère 131
2011-02-22 09:36:02 CET INSTRUCTION :  
	        SELECT pg_catalog.pg_get_constraintdef(oid) as condef
	        FROM pg_catalog.pg_constraint r
	        WHERE r.conrelid = 'pfx2asn'::regclass AND r.contype = 'f'
2011-02-22 09:37:01 CET ERREUR:  la relation « pfx2asn » n'existe pas au
caractère 131
2011-02-22 09:37:01 CET INSTRUCTION :  
	        SELECT pg_catalog.pg_get_constraintdef(oid) as condef
	        FROM pg_catalog.pg_constraint r
	        WHERE r.conrelid = 'pfx2asn'::regclass AND r.contype = 'f'
2011-02-22 09:38:02 CET ERREUR:  la relation « pfx2asn » n'existe pas au
caractère 131
2011-02-22 09:38:02 CET INSTRUCTION :  
	        SELECT pg_catalog.pg_get_constraintdef(oid) as condef
	        FROM pg_catalog.pg_constraint r
	        WHERE r.conrelid = 'pfx2asn'::regclass AND r.contype = 'f'

So it seems to that:
* I have a serious problem with the number of psql connections with psql with
reach often the limit. Does it should be better with worker MPM?
* Something seems to be wrong with pfx2asn
msg320 (view) Author: poeml Date: 2011-02-21.21:11:22
[Mon Feb 21 19:33:18 2011] [warn] [client 88.190.13.50] [mod_mirrorbrain] SQL 
file to look up: archive/kiwix/kiwix-0.5.iso [Mon Feb 21 19:33:18
                  2011] [error] (20014)Internal error: DBD: failed to initialise 
[Mon Feb 21 19:33:18 2011] [error] [client 88.190.13.50] [mod_mirrorbrain] Error
                  acquiring database connection

I saw the log you pasted into the irc channel, thanks. The database connection 
was indeed the problem there. One or more Apache children could not connect. A 
clue for the reason might exist in the PostgreSQL log. That would probably 
/var/log/postgresql/postgresql-8.4-main.log on Ubuntu.

That this problem disappears after a restart doesn't necessarily mean that it 
might not appear again after a while -- maybe depending on the number of Apache 
children or number of open connections. How did you configure the connection 
pool? It might look like this, when using the worker MPM:

    DBDMin  0
    DBDMax  12
    DBDKeep 3
    DBDExptime 10

For the prefork MPM, no such config is needed, and then every child opens a 
connection to the database. This means that PostgreSQL needs to allow as many 
connections as are needed. The limit is set via max_connections in 
/etc/postgresql/8.4/main/postgresql.conf.


Now to the new messages you get. The message "Could not retrieve row" is where 
mod_mirrorbrain runs into an error. The reason provided by the database adapter 
(" l'instruction ... mirrorbrain_dbd_hash_18 n'existe pas") is the key -- the 
precompiled database query does not exist, as it should be. Normally, the 
queries are compiled once when a new Apache child start. (If that fails, it 
should log an error in Apache's error log by the way.) 

The other message, "Likely cause: there are no hashes", is wrong. That's not 
true, and misleading. Since the precompiled query doesn't exist, the query 
cannot even executed, which would find or not find hashes in the database. 
mod_mirrorbrain should not log this wrong interpretation in this case.

All in all, the usage of a high number of connections, more than PostgreSQL 
allows, could be the underlying issue maybe!
msg319 (view) Author: Kelson Date: 2011-02-21.19:52:29
So the problem is that not connection was possible to the DB and it seems to
fall back in this case to the ISO content. Unfortunately, after a postgresql
restart, I'm not able to reproduce the error anymore. Unfortunately, I do not
have the log anymore, but this was a DB connection error and I post it to the
#mirrorbrain IRC channel.

Now I have sporadic 404 errors, which is still NOK... but better. Here is the
Apache error log

Mon Feb 21 20:44:38 2011] [warn] [client 88.190.13.50] [mod_mirrorbrain]
Successfully acquired database connection.
[Mon Feb 21 20:44:38 2011] [warn] [client 88.190.13.50] (-1)Unknown error
18446744073709551615: [mod_mirrorbrain] Could not retrieve row from database for
archive/kiwix/kiwix-0.5.iso (size: 442310656, mtime 1293534789): ERREUR: 
l'instruction pr\xc3\xa9par\xc3\xa9e \xc2\xab mirrorbrain_dbd_hash_18 \xc2\xbb
n'existe pas\n Likely cause: there are no hashes in the database (yet).
[Mon Feb 21 20:44:38 2011] [warn] [client 88.190.13.50] [mod_mirrorbrain] no
hashes found in database, but needed for torrent representation
msg318 (view) Author: poeml Date: 2011-02-21.18:17:16
Maybe some Apache children fail to connect to the database, and some succeed. 
Incoming requests are handled by the children taking turns, which could result 
into this picture. The output of "ps afuxw | grep apache2" and "netstat -tupan | 
grep apache2 | grep 5432" could be interesting.
msg317 (view) Author: poeml Date: 2011-02-21.18:13:52
The bug seems to be easy to reproduce. I just tried your URL a few times, and I 
already see the two different responses:



poeml@doozer ~ % curl -sI 'http://download.kiwix.org/archive/kiwix/kiwix-
0.5.iso.torrent' 
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 18:10:25 GMT
Server: Apache/2.2.14 (Ubuntu)
Content-Type: application/x-bittorrent

poeml@doozer ~ % curl -sI 'http://download.kiwix.org/archive/kiwix/kiwix-
0.5.iso.torrent'
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 18:10:27 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Tue, 28 Dec 2010 11:13:09 GMT
ETag: "57e24f0-1a5d2000-4987689a215f7"
Accept-Ranges: bytes
Content-Length: 442310656
Content-Type: application/x-bittorrent

poeml@doozer ~ % curl -sI 'http://download.kiwix.org/archive/kiwix/kiwix-
0.5.iso.torrent'
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 18:10:32 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Tue, 28 Dec 2010 11:13:09 GMT
ETag: "57e24f0-1a5d2000-4987689a215f7"
Accept-Ranges: bytes
Content-Length: 442310656
Content-Type: application/x-bittorrent

poeml@doozer ~ % curl -sI 'http://download.kiwix.org/archive/kiwix/kiwix-
0.5.iso.torrent'
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 18:10:41 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Tue, 28 Dec 2010 11:13:09 GMT
ETag: "57e24f0-1a5d2000-4987689a215f7"
Accept-Ranges: bytes
Content-Length: 442310656
Content-Type: application/x-bittorrent

poeml@doozer ~ % curl -sI 'http://download.kiwix.org/archive/kiwix/kiwix-
0.5.iso.torrent'
HTTP/1.1 200 OK
Date: Mon, 21 Feb 2011 18:10:48 GMT
Server: Apache/2.2.14 (Ubuntu)
Content-Type: application/x-bittorrent
msg316 (view) Author: poeml Date: 2011-02-21.18:12:58
Hi! That's interesting behaviour - I would not expect it. Maybe a bug. The only 
rational explanation that I would have for the moment would be either that 
sometimes no mirror is marked available in the database, or that connection to the 
database fails intermittently. Delivering the whole file is the fallback behaviour 
in some conditions, although it shouldn't happen for a failure to generate a 
torrent (and I believe that that would rather result into a 404).

To find out more, could you please enable debug logging, by setting 
"MirrorBrainDebug on" in Apache config for the directory in question? And could 
you then reproduce the bug and send me (or attach) the log?

The log will be long, but should give details why a torrent wasn't generated.
msg315 (view) Author: Kelson Date: 2011-02-21.17:40:16
I use packaged version of mirrorbrain (Version: 2.15.0-1) on Ubuntu 10.04

It seems that downloading a .torrent file does not always work properly.

If I try many times to download a .torrent file with wget, there will be always
a try where wget will start to download the real file instead of getting the
bittorrent content.

Hier a typical example (first try OK, second try NOK):

$ rm *torrent* ; wget --server-response
http://download.kiwix.org/archive/kiwix/kiwix-0.5.iso.torrent
--2011-02-21 18:33:43-- 
http://download.kiwix.org/archive/kiwix/kiwix-0.5.iso.torrent
Résolution de download.kiwix.org... 88.190.13.50
Connexion vers download.kiwix.org|88.190.13.50|:80... connecté.
requête HTTP transmise, en attente de la réponse...
  HTTP/1.1 200 OK
  Date: Mon, 21 Feb 2011 17:33:43 GMT
  Server: Apache/2.2.14 (Ubuntu)
  Connection: close
  Content-Type: application/x-bittorrent
Longueur: non spécifié [application/x-bittorrent]
Sauvegarde en : «kiwix-0.5.iso.torrent»

    [ <=>                                                                      
                            ] 34 488      --.-K/s   ds 0s

2011-02-21 18:33:43 (328 MB/s) - «kiwix-0.5.iso.torrent» sauvegardé [34488]

$ rm *torrent* ; wget --server-response
http://download.kiwix.org/archive/kiwix/kiwix-0.5.iso.torrent
--2011-02-21 18:33:44-- 
http://download.kiwix.org/archive/kiwix/kiwix-0.5.iso.torrent
Résolution de download.kiwix.org... 88.190.13.50
Connexion vers download.kiwix.org|88.190.13.50|:80... connecté.
requête HTTP transmise, en attente de la réponse...
  HTTP/1.1 200 OK
  Date: Mon, 21 Feb 2011 17:33:44 GMT
  Server: Apache/2.2.14 (Ubuntu)
  Last-Modified: Tue, 28 Dec 2010 11:13:09 GMT
  ETag: "57e24f0-1a5d2000-4987689a215f7"
  Accept-Ranges: bytes
  Content-Length: 442310656
  Keep-Alive: timeout=15, max=100
  Connection: Keep-Alive
  Content-Type: application/x-bittorrent
Longueur: 442310656 (422M) [application/x-bittorrent]
Sauvegarde en : «kiwix-0.5.iso.torrent»

19% [==================>                                                       
                            ] 84 684 800  64,9M/s              ^C
$file kiwix-0.5.iso.torrent
kiwix-0.5.iso.torrent: # ISO 9660 CD-ROM filesystem data 'Kiwix-0.5'
History
Date User Action Args
2012-04-15 20:40:20poemlsetmessages: + msg381
2012-04-15 20:36:22poemlsetpriority: urgent -> bug
messages: + msg380
2011-02-26 12:46:37Kelsonsetmessages: + msg325
2011-02-23 05:09:29poemlsetmessages: + msg322
2011-02-22 22:27:57Kelsonsetmessages: + msg321
2011-02-21 21:11:22poemlsetmessages: + msg320
2011-02-21 19:52:29Kelsonsetmessages: + msg319
2011-02-21 18:17:17poemlsetmessages: + msg318
2011-02-21 18:13:52poemlsetmessages: + msg317
2011-02-21 18:12:59poemlsetstatus: unread -> chatting
assignedto: poeml
messages: + msg316
nosy: + poeml
2011-02-21 17:40:17Kelsoncreate