[mirrorbrain] RFC: geographic distance ordering for finer mirror selection

From: Peter Pml <poeml_at_cmdline.net>
Date: Fri, 11 Dec 2009 21:04:12 +0100

I entered a feature request into the issue tracker today:

It needs some discussion. See below. What do you think, how this  
should best be integrated with the other criteria for mirror selection?

The proposal is this:

> In some countries (US, Germany) there can be a wealth of mirrors. A  
> specifically
> suited mirror can be picked when there's on in the same AS or in the  
> same
> network as the client. But otherwise, any mirror from the country  
> will be
> picked, which could be from the different end of the country (East  
> coast, West
> coast in the us; north/south in Germany).
> Another scenario is that no mirror is found in the client's country,  
> but there
> are several in its continent. However, which one to choose?  
> Currently, it is a
> random choice, which means that, for instance, any European country  
> could be
> sent to any European mirror; in most cases, it would probably be  
> better to use a
> neighbouring country or the closest one that can be found.
> Here's how a finer mirror selection could be achieved in both these  
> cases.
> The GeoLite city(!) database provides geographical coordinates. The  
> coordinates
> of the mirrors could be filled into the database's mirror records  
> when mirrors
> are created, or later, similar as their network and AS data.
> When clients' IP addresses are looked up via GeoIP during request  
> processing,
> their geographical coordinates would (should) be available as well.
> Using a planar approximation of the Haversine formula, the distance  
> to available
> mirrors can be calculated, and mirrors sorted by their closeness to  
> the client.
> A simple approximation, that is not only easy to implement but also  
> should incur
> very low possible overhead, would be described by the following  
> formular:
> distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)
> Thus, the available mirrors can be ordered according to their  
> distance to the
> client.
> References:
> http://www.movable-type.co.uk/scripts/latlong.html (Haversine formula)
> http://www.movable-type.co.uk/scripts/gis-faq-5.1.html (simple  
> approximation)
> (The simple approximation should be suitable insofar the 180 border  
> is pretty
> much between Alaska and Russia, and other than islands in the  
> Pacific Ocean all
> countries should be usefully covered by it. About the Pacific Ocean  
> I don't know
> much, but it is very likely that those clients resolve to satellite  
> links
> anyway, and need to be treated specially.)

> There is one problem, though: We use a weighted randomization to  
> assign mirrors
> that would be equally suitable. That is useful (and used) for load  
> balancing.
> With introducing ordering by geographic distance, the question  
> arises how to use
> that together with mirror priorities. Alternatives / thoughts:
> - Should we let geographic distance override priorities? Can the  
> latter become
> obsolete? I don't really think so.
> - Should we use geographic ordering only in certain scenarios: when  
> no country
> mirror is available, and when dealing with certain countries (US)?
> - Should we use just stratify according to geographic distances,  
> instead of
> sorting by distance? That is, select the 3 or 5 mirrors closest by  
> distance, and
> then do the weighted randomization on them to pick one?
> - Should the configured priority value and the measured geographic  
> distance be
> mixed together into a single number, that's then used for mirror
> picking/ordering?
> - Should we do geographic distance ordering only within a set of  
> mirrors that
> shares the same priority?
> - At any rate, the static assignment of fallback mirrors (through the
> other_countries field) should have precedence. For instance, Russian  
> clients
> should still be sent to German mirrors according to configured  
> country-level
> fallback, instead of sending them to China, because it is closer,  
> while China
> and Russia entertain few connectivity.
> - Since geographic distance itself is only a rough approximation of  
> suitability
> of a mirror for a client, certain analogies that one could think of,  
> like
> "radius to serve", are relatively moot, and not worth implementing.
> This needs to be discussed.

Your comments would be appreciated!


mirrorbrain mailing list
Archive: http://mirrorbrain.org/archive/mirrorbrain/

Note: To remove yourself from this mailing list, send a mail with the content
to the address mirrorbrain-request_at_mirrorbrain.org
Received on Fri Dec 11 2009 - 20:04:19 GMT

This archive was generated by hypermail 2.3.0 : Fri Nov 05 2010 - 22:47:06 GMT