[mirrorbrain-commits] r8225 - /trunk/docs/tuning.rst

From: <poeml_at_mirrorbrain.org>
Date: Sun, 14 Nov 2010 13:33:53 -0000
Author: poeml
Date: Sun Nov 14 14:33:50 2010
New Revision: 8225

URL: http://svn.mirrorbrain.org/viewvc/mirrorbrain?rev=8225&view=rev
Log:
docs/tuning:
- finally, the PostgreSQL tuning section has been completed (extensively).

Modified:
    trunk/docs/tuning.rst

Modified: trunk/docs/tuning.rst
URL: http://svn.mirrorbrain.org/viewvc/mirrorbrain/trunk/docs/tuning.rst?rev=8225&r1=8224&r2=8225&view=diff
==============================================================================
--- trunk/docs/tuning.rst (original)
+++ trunk/docs/tuning.rst Sun Nov 14 14:33:50 2010
_at_@ -75,6 +75,12 @@
 for testing purposes, or for upgrade purposes (killing Apache with SIGWINCH for
 graceful stop, and starting a new one while the old one still continues to
 serve requests to their end).
+
+Having said that, if you find that you need a hundred connections or more in an
+everyday situation, there is something wrong -- then you need to check if you
+chose the Apache threading model, check for the size of the database connection
+pool, and verify that there are no big bottlenecks in the database (which
+causes Apache to stall and stack up working threads and connections).
 
 
 Thread stack size
_at_@ -115,6 +121,12 @@
     KeepAliveTimeout 2
 
 
+.. note::
+   When it is getting *really* serious, like when you are slashdotted, don't
+   hesitate to simply switch KeepAlive off. You will see a drastic improvement,
+   and probably save you. (If you did your homework and your website is lean
+   and fast otherwise ;-) If it is fat and bloated, there is not much to do.)
+
 
 .. _tuning_postgresql:
 
_at_@ -133,20 +145,104 @@
 Memory sizing
 ^^^^^^^^^^^^^
 
-.. describe:: shared_buffers
+With a small database, using only a few megabytes, there will not be much need
+for tuning. With larger databases, that go into the hundreds of megabytes,
+tuning becomes important.
+
+.. note::
 
    Make sure to reserve enough memory for the database, especially if it will
    be large. As a rough first estimate, it is usually sufficient and optimal if
    the reserved RAM is about the same as the database size on disk.
    
-   There is a special command :program:`mb db sizes` that helps you to assess
-   the size of your databases. See :ref:`mb_db_sizes`.
-
-.. FIXME: describe in detail
-
-
-Other parameters
+Allocating memory to the database is done in the following way. PostgreSQL
+largely relies on buffer caching done by the OS. Thus, the first measure in
+"reserving" memory is to *not* run too much other stuff on the machine, which
+would compete for memory, or (in other words) having enough memory. In general,
+PostgreSQL's performance reaches its maximum when the whole dataset, including
+indexes, fits in to the amount of RAM available for caching. (That statement is
+true if the whole dataset is actually used -- if only parts are used, top
+performance will be reached already with less memory. MirrorBrain tends to use
+the whole dataset, at least during mirror scanning.)
+
+There is a special command :program:`mb db sizes` that helps you to assess
+the size of your databases. See :ref:`mb_db_sizes`. (Just note that changes may
+not be immediately reflected in the numbers, because the statistics are updated
+periodically by PostgreSQL.)
+
+
+.. describe:: shared_buffers
+
+   This parameter should usually be set to about 10-25% of the available RAM. 
+   Maximum value may be limited by the SHMMAX tunable of the OS. 
+
+   (Of course, if your database is only 10 MB in size, there is no benefit in
+   increasing this value that far. It obviously depends on the database size.)
+
+   Mirror scanning can incur heavy write activity, if there is a lot of
+   fluctuation in the file tree, and when done in a massive parallel way.
+   Scanning performance can benefit from higher values (25-50%) here. For read
+   performance, (as affecting Apache and mod_mirrorbrain) higher values are not
+   needed.
+
+.. describe:: effective_cache_size
+
+   This is the effective amount of caching between the actual PostgreSQL
+   buffers, and the OS buffers.
+
+   This does not create RAM allocations nor does it change how PostgreSQL uses
+   RAM -- it just gives PostgreSQL an assumption about the availability of
+   memory to the OS cache. This influences decisions in the query planner,
+   regarding usage of indexes.
+
+   In principle, this value could be set to the sum of ``cached + free`` in the
+   :program:`free -m` output. However, this value needs to be divided by the
+   number of processes using this memory simultaneously. To estimate the
+   latter, you could use :program:`top` to see how many PostgreSQL processes
+   are busy at the same time.
+
+   Anyway, it is better to set this parameter too low rather than too high,
+   because that could result in too many index scans.
+
+
+Other memory parameters that you might want to increase are:
+
+- ``maintenance_work_mem`` (generously)
+- ``work_mem`` (a bit)
+
+
+Connection setup
 ^^^^^^^^^^^^^^^^
+
+.. describe:: listen_addresses
+
+   You'll need to change the parameter listen_addresses if you 
+
+   a) run the web server on a different host than the database server, or if you 
+    
+   b) want to use the :program:`mb` admin tool from a different host than the the
+      database host.
+
+   The default is localhost only. Add '*' or comma-separated addresses.
+
+
+.. describe:: max_connections
+
+   The default of 100 should fit many cases. Apache's re-usal of connections is
+   so efficient (and MirrorBrain quickly done with answering queries) that a
+   handful connections is enough. However, if you use Apache's prefork MPM,
+   every child will use a connection. Thus, if you allow to have 200 Apache
+   processes running you will need to adjust max_connections accordingly. With
+   a threaded Apache, the connection pool is shared, so no problem. This is
+   further discussed above, in the notes regarding Apache tuning.
+
+
+
+Transaction log
+^^^^^^^^^^^^^^^
+
+The transaction log (called Write-Ahead-Log or WAL) is a central thing in
+PostgreSQL, and the configuration of its handling important.
 
 .. describe:: synchronous_commit
 
_at_@ -155,15 +251,89 @@
    you have other databases than MirrorBrain, which require a higher level of
    data integrity than MirrorBrain does.
 
-   
-.. describe:: listen_addresses
-
-   You'll need to change the parameter listen_addresses if you 
-
-   a) run the web server on a different host than the database server, or if you 
-    
-   b) want to use the :program:`mb` admin tool from a different host than the the
-      database host.
-
-   The default is localhost only. Add '*' or comma-separated addresses.
-
+
+.. describe:: wal_buffers
+
+   The default (64kB) may be increased to e.g. 256kB.
+
+
+.. describe:: checkpoint_segments
+
+   For big databases (hundreds of MB in size), increase this from 3 to 32.
+
+
+.. describe:: checkpoint_timeout
+
+   Increase to 15min.
+
+
+To log a checkpoint whenever one occurs, set ``log_checkpoints = on`` and
+``checkpoint_warning to 1h``.  
+
+
+Deadlocks
+^^^^^^^^^
+
+.. describe:: deadlock_timeout
+
+   As described below, set this parameter to 30s.
+
+Concurrent write access by different processes to the same rows causes a
+queue-up of those write-requests. A row can be written only by one process at a
+time. If a process waits too long, it gives up after a while. Its lock times
+out, so to speak, which is called a deadlock in this context. It's not a real
+deadlock in the common sense, it's just giving up after a while.
+
+Read activity (as done by Apache + mod_mirrorbrain, serving users) is not
+affected by write activity locks. Write activity is mainly caused by mirror
+scanning. Scanning then again is often done in parallel, to save time, so it is
+typical to have to wait for locks (when two scanners happen to want to write to
+similar regions in the database).
+
+The default time waiting for a lock is 1s in PostgreSQL, which is often too
+short for MirrorBrain. That could be too long for other applications in fact,
+but for a mirror scanner it doesn't matter if it has to wait many seconds now
+and then. In fact, it is best to increase the lock waiting time to something
+like 30 seconds. The deadlocks don't occur frequently when scanning, but when
+they occur, you don't want a scanner to give up on that part and have some
+missing files on the mirror later.
+
+Such deadlocks are more likely to occur when scanning a new mirror, which
+means that every database row has to be touched (for each file found on the
+mirror). Even more likely (actually, unavoidable) are they when you fill your
+database for the first time, after installing, and the rows are created at the
+first place. In that case, you will see deadlocks occur frequently. The best
+advice is to ignore them and simply scan once again, after the first run has
+completed.
+
+Later scans mainly see what they know already, so there is no reason to write
+to the database, which means that deadlocks don't occur.
+
+
+Enhancing logging
+^^^^^^^^^^^^^^^^^
+
+Logging can be enhanced with some details that might be relevant or helpful to
+tuning for MirrorBrain:
+
+
+.. describe:: log_line_prefix
+
+   To get more detailed log lines, set it to::
+        
+        '%m [%p:%l] %u_at_%d '
+
+   Note the trailing space!
+
+.. describe:: log_lock_waits
+
+   To log lock waits >= deadlock_timeout, set to ``on``.
+
+.. describe:: log_min_duration_statement
+
+   You could log all long-running queries by configuring this to e.g. ``2000``
+   (value is in milliseconds).
+
+
+
+




_______________________________________________
mirrorbrain-commits mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain-commits/

Note: To remove yourself from this list, send a mail with the content
 	unsubscribe
to the address mirrorbrain-commits-request_at_mirrorbrain.org
Received on Sun Nov 14 2010 - 13:33:58 GMT

This archive was generated by hypermail 2.3.0 : Sun Nov 14 2010 - 13:47:24 GMT