#jsDisabledContent { display:none; } My Account |  Register |  Help

Geometric median

Article Id: WHEBN0010324687
Reproduction Date:

 Title: Geometric median Author: World Heritage Encyclopedia Language: English Subject: Collection: Publisher: World Heritage Encyclopedia Publication Date:

Geometric median

The geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher dimensions. It is also known as the 1-median,[1] spatial median,[2] Euclidean minisum point,[2] or Torricelli point.[3]

The geometric median is an important estimator of location in statistics,[4] where it is also known as the L1 estimator.[5] It is also a standard problem in facility location, where it models the problem of locating a facility to minimize the cost of transportation.[6]

The special case of the problem for three points in the plane (that is, m = 3 and n = 2 in the definition below) is sometimes also known as Fermat's problem; it arises in the construction of minimal Steiner trees, and was originally posed as a problem by Pierre de Fermat and solved by Evangelista Torricelli.[7] Its solution is now known as the Fermat point of the triangle formed by the three sample points.[8] The geometric median may in turn be generalized to the problem of minimizing the sum of weighted distances, known as the Weber problem after Alfred Weber's discussion of the problem in his 1909 book on facility location.[2] Some sources instead call Weber's problem the Fermat–Weber problem,[9] but others use this name for the unweighted geometric median problem.[10]

Wesolowsky (1993) provides a survey of the geometric median problem. See Fekete, Mitchell & Beurer (2005) for generalizations of the problem to non-discrete point sets.

Contents

• Definition 1
• Properties 2
• Special cases 3
• Computation 4
• Characterization of the geometric median 5
• Generalizations 6
• Notes 7
• References 8

Definition

Formally, for a given set of m points x_1, x_2, \dots, x_m\, with each x_i \in \mathbb{R}^n, the geometric median is defined as

Geometric Median =\underset{y \in \mathbb{R}^n}{\operatorname{arg\,min}} \sum_{i=1}^m \left \| x_i-y \right \|_2

Note that argmin means the value of the argument y which minimizes the sum. In this case, it is the point y from where the sum of all Euclidean distances to the x_i's is minimum.

Properties

• For the 1-dimensional case, the geometric median coincides with the median. This is because the univariate median also minimizes the sum of distances from the points.[11]
• The geometric median is unique whenever the points are not collinear.[12]
• The geometric median is equivariant for Euclidean similarity transformations, including translation and rotation.[11][5] This means that one would get the same result either by transforming the geometric median, or by applying the same transformation to the sample data and finding the geometric median of the transformed data. This property follows from the fact that the geometric median is defined only from pairwise distances, and doesn't depend on the system of orthogonal Cartesian coordinates by which the sample data is represented. In contrast, the component-wise median for a multivariate data set is not in general rotation invariant, nor is it independent of the choice of coordinates.[5]
• The geometric median has a breakdown point of 0.5.[5] That is, up to half of the sample data may be arbitrarily corrupted, and the median of the samples will still provide a robust estimator for the location of the uncorrupted data.

Special cases

• For 3 (non-collinear) points, if any angle of the triangle formed by those points is 120° or more, then the geometric median is the point making that angle. If all the angles are less than 120°, the geometric median is the point inside the triangle which subtends an angle of 120° to each three pairs of triangle vertices.[11] This is also known as the Fermat point of the triangle formed by the three vertices. (If the three points are collinear then the geometric median is the point between the two other points, as is the case with a one-dimensional median.)
• For 4 coplanar points, if one of the four points is inside the triangle formed by the other three points, then the geometric median is that point. Otherwise, the four points form a convex quadrilateral and the geometric median is the crossing point of the diagonals of the quadrilateral. The geometric median of four coplanar points is the same as the unique Radon point of the four points.[13]

Computation

Despite the geometric median's being an easy-to-understand concept, computing it poses a challenge. The centroid or center of mass, defined similarly to the geometric median as minimizing the sum of the squares of the distances to each point, can be found by a simple formula — its coordinates are the averages of the coordinates of the points — but no such formula is known for the geometric median, and it has been shown that no explicit formula, nor an exact algorithm involving only arithmetic operations and kth roots can exist in general. Therefore only numerical or symbolic approximations to the solution of this problem are possible under this model of computation.[14]

However, it is straightforward to calculate an approximation to the geometric median using an iterative procedure in which each step produces a more accurate approximation. Procedures of this type can be derived from the fact that the sum of distances to the sample points is a convex function, since the distance to each sample point is convex and the sum of convex functions remains convex. Therefore, procedures that decrease the sum of distances at each step cannot get trapped in a local optimum.

One common approach of this type, called Weiszfeld's algorithm after the work of Endre Weiszfeld,[15] is a form of iteratively re-weighted least squares. This algorithm defines a set of weights that are inversely proportional to the distances from the current estimate to the samples, and creates a new estimate that is the weighted average of the samples according to these weights. That is,

\left. y_{i+1}=\left( \sum_{j=1}^m \frac{x_j}{\| x_j - y_i \|} \right) \right/ \left( \sum_{j=1}^m \frac{1}{\| x_j - y_i \|} \right).

This method converges for almost all initial positions, but may fail to converge when one of its estimates falls on one of the given points. It can be modified to handle these cases so that it converges for all initial points.[12]

Bose, Maheshwari & Morin (2003) describe more sophisticated geometric optimization procedures for finding approximately optimal solutions to this problem. As Nie, Parrilo & Sturmfels (2008) show, the problem can also be represented as a semidefinite program.

Characterization of the geometric median

If y is distinct from all the given points, xj, then y is the geometric median if and only if it satisfies:

0 = \sum_{j=1}^m \frac {x_j - y} {\left \| x_j - y \right \|}.

This is equivalent to:

\left. y = \left( \sum_{j=1}^m \frac{x_j}{\| x_j - y \|} \right) \right/ \left( \sum_{j=1}^m \frac{1}{\| x_j - y \|} \right),

which is closely related to Weiszfeld's algorithm.

In general, y is the geometric median if and only if there are vectors uj such that:

0 = \sum_{j=1}^m u_j

where for xjy,

u_j = \frac {x_j - y} {\left \| x_j - y \right \|}

and for xj = y,

\| u_j \| \leq 1 .

An equivalent formulation of this condition is

\sum _{1\le j\le m, x_j\ne y} \frac {x_j - y} {\left \| x_j - y \right \|} \le \left|\{ \,j\mid 1\le j\le m, x_j= y\,\}\right|.

Generalizations

The geometric median can be generalized from Euclidean spaces to general Riemannian manifolds (and even metric spaces) using the same idea which is used to define the Fréchet mean on a Riemannian manifold.[16] Let M be a Riemannian manifold with corresponding distance function d(\cdot, \cdot), let w_1, \ldots, w_n be n weights summing to 1, and let x_1, \ldots, x_n be n observations from M. Then we define the weighted geometric median m (or weighted Fréchet median) of the data points as

m = \underset{x \in M}{\operatorname{arg\,min}} \sum_{i=1}^n w_i d(x,x_i) .

If all the weights are equal, we say simply that m is the geometric median.

Notes

1. ^ The more general k-median problem asks for the location of k cluster centers minimizing the sum of distances from each sample point to its nearest center.
2. ^ a b c Drezner et al. (2002)
3. ^ Cieslik (2006).
4. ^ Lawera & Thompson (1993).
5. ^ a b c d Lopuhaä & Rousseeuw (1991)
6. ^ Eiselt & Marianov (2011).
7. ^ Krarup & Vajda (1997).
8. ^ Spain (1996).
9. ^ Brimberg (1995).
10. ^ Bose, Maheshwari & Morin (2003).
11. ^ a b c Haldane (1948)
12. ^ a b Vardi & Zhang (2000)
13. ^ Cieslik (2006), p. 6; Plastria (2006). The convex case was originally proven by Giovanni Fagnano.
14. ^ Bajaj (1986); Bajaj (1988). Earlier, Cockayne & Melzak (1969) proved that the Steiner point for 5 points in the plane cannot be constructed with ruler and compass
15. ^ Weiszfeld (1937); Kuhn (1973); Chandrasekaran & Tamir (1989).
16. ^ Fletcher, Venkatasubramanian & Joshi (2009).

References

• Bose, Prosenjit; Maheshwari, Anil; Morin, Pat (2003). "Fast approximations for sums of distances, clustering and the Fermat–Weber problem". Computational Geometry: Theory and Applications 24 (3): 135–146.
• Brimberg, J. (1995). "The Fermat–Weber location problem revisited".
• Chandrasekaran, R.; Tamir, A. (1989). "Open questions concerning Weiszfeld's algorithm for the Fermat-Weber location problem".
• Cieslik, Dietmar (2006). Shortest Connectivity: An Introduction with Applications in Phylogeny. Combinatorial Optimization 17. Springer. p. 3.
• Cockayne, E. J.; Melzak, Z. A. (1969). "Euclidean constructability in graph minimization problems".
• Drezner, Zvi; Klamroth, Kathrin; Schöbel, Anita; Wesolowsky, George O. (2002). "Facility Location: Applications and Theory". Springer, Berlin. pp. 1–36.
• Eiselt, H. A.; Marianov, Vladimir (2011). Foundations of Location Analysis. 155series=International Series in Operations Research & Management Science. Springer. p. 6.
• Fekete, Sándor P.;
• Fletcher, P. Thomas; Venkatasubramanian, Suresh; Joshi, Sarang (2009). "The geometric median on Riemannian manifolds with application to robust atlas estimation". Neuroimage 45 (1 Suppl): s143–s152.
• Krarup, Jakob; Vajda, Steven (1997). "On Torricelli's geometrical solution to a problem of Fermat". IMA Journal of Mathematics Applied in Business and Industry 8 (3): 215–224.
• Lawera, Martin; Thompson, James R. (1993). "Proceedings of the 38th Conference on the Design of Experiments". U.S. Army Research Office Report 93–2. pp. 99–126.
• Lopuhaä, Hendrick P.;
• Nie, Jiawang; Parrilo, Pablo A.;
• Ostresh, L. (1978). "Convergence of a Class of Iterative Methods for Solving Weber Location Problem".
• .
• Spain, P. G. (1996). "The Fermat Point of a Triangle". Mathematics Magazine 69 (2): 131–133.
• Vardi, Yehuda; Zhang, Cun-Hui (2000). "The multivariate L1-median and associated data depth". Proceedings of the National Academy of Sciences of the United States of America 97 (4): 1423–1426 (electronic).
• Wesolowsky, G. (1993). "The Weber problem: History and perspective". Location Science 1: 5–23.
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.

Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.