Author: poeml Date: 2009-03-03 15:14:07 -0700 (Tue, 03 Mar 2009) New Revision: 6676 Modified: trunk/tools/download-redirector-v2/ABOUT Log: ABOUT: turn this into a forwarding pointer Modified: trunk/tools/download-redirector-v2/ABOUT =================================================================== --- trunk/tools/download-redirector-v2/ABOUT 2009-03-03 20:19:38 UTC (rev 6675) +++ trunk/tools/download-redirector-v2/ABOUT 2009-03-03 22:14:07 UTC (rev 6676) @@ -1,161 +1,2 @@ - - -XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -Note, this document is no longer maintained. -See http://mirrorbrain.org/ -XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX - - - - The openSUSE download redirector - - - - The openSUSE download redirector (a.k.a. the MirrorBrain) automatically -redirects clients (per HTTP redirection) to a mirror server near them. It works -similar to the systems employed by sourceforge.net, mozilla.com or similar -large organizations, which face a number of download requests which is too high -to be practically handled by a single site. To find a mirror close to the -client, the redirector employs geolocation of the client's IP address. If -several mirrors are suitable, the redirector load-balances requests to the -mirrors based on their capabilities. - - - -Implementation: - - The core of the redirector is mod_mirrorbrain, a module for the Apache HTTP -server, written in C, and designed for high performance and scalability, with -security in mind. - Previous name of mod_mirrorbrain was mod_zrkadlo, pronounced "mod zurrcat -low". Zrkadlo is Slovakian for mirror. - - Due to the fast-evolving nature of the file tree offered by openSUSE project, -the redirector doesn't simply choose one mirror for a client once, but acts as -granular as on file-level, because mirrors are known to be incomplete, -especially if content changes often. To achieve this, the redirector is -supported by an SQL database which knows the exact contents of each mirror. The -database is periodically updated by scanning all mirrors with a scanner -program. In addition, there is a probing program which intermittently checks -each mirror for responsiveness, and which can disable or pause redirection to a -certain mirror, should it fail. - - - -Features: - - - works transparently to the client, through HTTP redirection - - can optionally return metalinks (http://metalinker.org), or human readable - mirror lists - - supports transparent negotiation of metalinks (see - http://groups.google.com/group/metalink-discussion/web/transparent-metalinks) - - operates with file level granularity - - involves only a single database query per HTTP request, using a database - connection pool through the Apache DBD framework - - mirror choice per country / continent, using GeoIP database - - uses a randomized, weighted algorithm for mirror selection (each mirror - having a score) - - optionally memorizes client<->mirror association through memcache daemon - - can make sure that mirrors get only requests from the same country or region - (important for countries with poor internet connectivity) - - mirrors can be special catch-all type, to integrate content delivery networks - - is configurable in Apache style configuration, with automatic per-directory - configuration merging - - optionally redirects dependent on file name pattern, file size, mime type, - user agent, request origin, ... - - flexible logging options - - has a debug mode which can be enabled directory-wise, and thus is - "compatible" with running production - - the client IP address can be overridden for diagnostic purposes - - canonicalizes file pathnames before database lookup, so the database needs to - hold only real files, and is not blown up by symlinks. - - - -So how does the redirector Apache module work? - - This page http://en.opensuse.org/Build_Service/Redirector shows pseudocode -which gives an outline how it works. - - - -Software requirements: - -Frontend (the redirector): - - Apache HTTP server 2.2.6 or newer - - libGeoIP, apr_memcache (or apr-util > 1.3.2), mod_memcache, and mod_form - - memcache daemon -Backend (rest of the MirrorBrain framework): - - MySQL server. The tables should be InnoDB tables, because only that engine - offers row-based write locks. Due to optimizations of InnoDB engine for high - performance it makes sense to have a separate MySQL instance for this - database. - Postgresql should also work, but it hasn't been tested. - - Python, python-mysql, python-sqlobject for the mirrorprobe and database - maintenance - - Perl for the scanner process - - There is a small mirror administration web frontend, built upon the TurboGears -framework, but its development has just started. - - - -Hardware requirements: - - File storage is attached to the webserver. (Running the redirector without -attached file storage is a feature which is not implemented, but considered.) -The openSUSE project currently hosts > 700.000 files using 850 GB. -The webserver needs few computational resources. If it has other tasks, besides -redirecting, those other tasks mainly determine the needed resources. However, -to handle high amounts of redirects, like hundreds per seconds, it is -recommended to run Apache in a hybrid prefork/worker configuration, with e.g. -32 threads per process, which results in a good pooling of database -connections. - - The most computational resources are needed by the database server. For large -file trees, they can be considerable, like the openSUSE project, which -redirects for a total of > 500.000 files. For performance reasons, the database -server must be able to hold the database and indices completely in memory. The -openSUSE redirector database is currently served by a 4-way Xeon 3.4Ghz with 4 -gigs of Ram, which is sufficient for the mysql server itself, as well as 12 -parallel scanner processes, and can handle 1000-2000 requests per second. - - - -HA (High Availability) setup: - - For HA, the webserver, the database server and the connecting infrastructure -needs to be redundant. A geographically distributed array of redirectors -would be one way. Locally, it can be achieved by creating a hot standby for -failover, or by running identical nodes with load sharing / balancing. This -could be implemented by - - deploying a hardware load balancer, which distributes requests to webserver - nodes, or using clusterip on the webserver nodes themselves to make them do - load sharing - - two or more webserver nodes with identical setup for ease of maintenance - - running a one or more mysql servers in slave configuration. Database queries - could be split so that write requests go only to the master, while read - requests go to master and slaves. - - mysql-proxy does load balancing and r/w-splitting -Obviously, there are different ways achieving HA, which will vary to local -requirements. - - - -Links: -http://en.opensuse.org/Build_Service/Redirector -http://www.poeml.de/~poeml/talks/apachecon08-mirrors.pdf -http://www.poeml.de/~poeml/talks/redirector/ -https://forgesvn1.novell.com/svn/opensuse/trunk/tools/download-redirector-v2 -http://www.metalinker.org/ -http://groups.google.com/group/metalink-discussion/web/transparent-metalinks -http://www.maxmind.com/app/ip-location -http://www.linux-ha.org/ClusterIP -http://forge.mysql.com/wiki/MySQL_Proxy -http://www.stdlib.net/~colmmacc/Apachecon-EU2005/scaling-apache-handout.pdf - - - -Acknowledgement -This product includes GeoLite data created by MaxMind, available from -http://maxmind.com/ +This document has moved to the main project page (online at +http://mirrorbrain.org/) _______________________________________________ Opensuse-svn mailing list Opensuse-svn_at_forge.novell.com http://forge.novell.com/mailman/listinfo/opensuse-svn _______________________________________________ mirrorbrain-commits mailing list Note: To remove yourself from this list, send a mail with the content unsubscribe to the address mirrorbrain-commits-request_at_mirrorbrain.orgReceived on 2009-03-03Z22:14:31
This archive was generated by hypermail 2.2.0 : 2009-07-10Z19:18:12 GMT