Sampling Theory
- Motivation and Basic Concepts
The state space for motion planning, , is uncountably infinite, yet a sampling- based planning algorithm can consider at most a countable number of samples. If the algorithm runs forever, this may be countably infinite, but in practice we expect it to terminate early after only considering a finite number of samples. This mismatch between the cardinality of and the set that can be probed by an algorithm motivates careful consideration of sampling techniques. Once the sampling component has been defined, discrete planning methods from Chapter 2 may be adapted to the current setting. Their performance, however, hinges on the way the C-space is sampled.
C
C
Since sampling-based planning algorithms are often terminated early, the par- ticular order in which samples are chosen becomes critical. Therefore, a distinction is made between a sample set and a sample sequence. A unique sample set can always be constructed from a sample sequence, but many alternative sequences can be constructed from one sample set.
Denseness Consider constructing an infinite sample sequence over . What would be some desirable properties for this sequence? It would be nice if the sequence eventually reached every point in , but this is impossible because is
C
C C
uncountably infinite. Strangely, it is still possible for a sequence to get arbitrarily close to every element of (assuming Rm). In topology, this is the notion of denseness. Let U and V be any subsets of a topological space. The set U is said to be dense in V if cl(U) = V (recall the closure of a set from Section 4.1.1). This means adding the boundary points to U produces V . A simple example is that (0, 1) R is dense in [0, 1] R. A more interesting example is that the set Q of
⊂ ⊂
C C ⊆
rational numbers is both countable and dense in R. Think about why. For any
real number, such as π R, there exists a sequence of fractions that converges to it. This sequence of fractions must be a subset of Q. A sequence (as opposed to a set) is called dense if its underlying set is dense. The bare minimum for sampling
∈
methods is that they produce a dense sequence. Stronger requirements, such as uniformity and regularity, will be explained shortly.
A random sequence is probably dense Suppose that = [0, 1]. One of the simplest ways conceptually to obtain a dense sequence is to pick points at random. Suppose I [0, 1] is an interval of length e. If k samples are chosen independently at random,3 the probability that none of them falls into I is (1 e)k. As k approaches infinity, this probability converges to zero. This means that the probability that any nonzero-length interval in [0, 1] contains no points converges to zero. One small technicality exists. The infinite sequence of independently, randomly chosen points is only dense with probability one, which is not the same as being guaranteed. This is one of the strange outcomes of dealing with uncountably infinite sets in probability theory. For example, if a number between [0, 1] is chosen at random, the probably that π/4 is chosen is zero; however, it is still possible. (The probability is just the Lebesgue measure, which is zero for a set of measure zero.) For motion planning purposes, this technicality has no practical implications; however, if k is not very large, then it might be frustrating to obtain only probabilistic assurances, as opposed to absolute guarantees of coverage. The next sequence is guaranteed to be dense because it is deterministic.
C
−
⊂
The van der Corput sequence A beautiful yet underutilized sequence was published in 1935 by van der Corput, a Dutch mathematician [952]. It exhibits many ideal qualities for applications. At the same time, it is based on a simple idea. Unfortunately, it is only defined for the unit interval. The quest to extend many of its qualities to higher dimensional spaces motivates the formal quality measures and sampling techniques in the remainder of this section.
To explain the van der Corput sequence, let = [0, 1]/ , in which 0 1, which can be interpreted as SO(2). Suppose that we want to place 16 samples in
C ∼ ∼
. An ideal choice is the set S = i/16 0 i < 16 , which evenly spaces the points at intervals of length 1/16. This means that no point in is further than 1/32 from the nearest sample. What if we want to make S into a sequence? What is the best ordering? What if we are not even sure that 16 points are sufficient? Maybe 16 is too few or even too many.
C
C { | ≤ }
The first two columns of Figure 5.2 show a naive attempt at making S into a sequence by sorting the points by increasing value. The problem is that after i = 8, half of has been neglected. It would be preferable to have a nice covering of for any i. Van der Corput’s clever idea was to reverse the order of the bits, when the sequence is represented with binary decimals. In the original sequence, the most significant bit toggles only once, whereas the least significant bit toggles in every step. By reversing the bits, the most significant bit toggles in every step, which means that the sequence alternates between the lower and upper halves of . The third and fourth columns of Figure 5.2 show the original and reversed-order binary representations. The resulting sequence dances around [0, 1]/ in a nice way, as shown in the last two columns of Figure 5.2. Let ν(i) denote the ith point of the van der Corput sequence.
C
C
C
∼
3See Section 9.1.2 for a review of probability theory.
Naive Reverse Van der
i Sequence Binary Binary Corput Points in [0, 1]/ ∼
1 0 .0000 .0000 0
2 1/16 .0001 .1000 1/2
3 1/8 .0010 .0100 1/4
4 3/16 .0011 .1100 3/4
5 1/4 .0100 .0010 1/8
6 5/16 .0101 .1010 5/8
7 3/8 .0110 .0110 3/8
8 7/16 .0111 .1110 7/8
9 1/2 .1000 .0001 1/16
10 9/16 .1001 .1001 9/16
11 5/8 .1010 .0101 5/16
12 11/16 .1011 .1101 13/16
13 3/4 .1100 .0011 3/16
14 13/16 .1101 .1011 11/16
15 7/8 .1110 .0111 7/16
16 15/16 .1111 .1111 15/16
Figure 5.2: The van der Corput sequence is obtained by reversing the bits in the binary decimal representation of the naive sequence.
In contrast to the naive sequence, each ν(i) lies far away from ν(i + 1). Fur- thermore, the first i points of the sequence, for any i, provide reasonably uniform coverage of . When i is a power of 2, the points are perfectly spaced. For other i, the coverage is still good in the sense that the number of points that appear in any interval of length l is roughly il. For example, when i = 10, every interval of length 1/2 contains roughly 5 points.
C
The length, 16, of the naive sequence is actually not important. If instead 8 is used, the same ν(1), . . ., ν(8) are obtained. Observe in the reverse binary column of Figure 5.2 that this amounts to removing the last zero from each binary decimal representation, which does not alter their values. If 32 is used for the naive sequence, then the same ν(1), . . ., ν(16) are obtained, and the sequence continues nicely from ν(17) to ν(32). To obtain the van der Corput sequence from ν(33) to ν(64), six-bit sequences are reversed (corresponding to the case in which the naive sequence has 64 points). The process repeats to produce an infinite sequence that does not require a fixed number of points to be specified a priori. In addition to the nice uniformity properties for every i, the infinite van der Corput sequence is also dense in [0, 1]/ . This implies that every open subset must contain at least one sample.
∼
You have now seen ways to generate nice samples in a unit interval both ran- domly and deterministically. Sections 5.2.2–5.2.4 explain how to generate dense samples with nice properties in the complicated spaces that arise in motion plan-
ning.
- Random Sampling
Now imagine moving beyond [0, 1] and generating a dense sample sequence for any bounded C-space, Rm. In this section the goal is to generate uniform random samples. This means that the probability density function p(q) over is uniform.
C
C ⊆
Wherever relevant, it also will mean that the probability density is also consistent with the Haar measure. We will not allow any artificial bias to be introduced by selecting a poor parameterization. For example, picking uniform random Euler angles does not lead to uniform random samples over SO(3). However, picking
uniform random unit quaternions works perfectly because quaternions use the same parameterization as the Haar measure; both choose points on S3.
Random sampling is the easiest of all sampling methods to apply to C-spaces.
One of the main reasons is that C-spaces are formed from Cartesian products, and independent random samples extend easily across these products. If X = X1 ×X2,
and uniform random samples x1 and x2 are taken from X1 and X2, respectively, then (x1, x2) is a uniform random sample for X. This is very convenient in im- plementations. For example, suppose the motion planning problem involves 15 robots that each translate for any (xt, yt) [0, 1] ; this yields = [0, 1] . In this case, 30 points can be chosen uniformly at random from [0, 1] and combined into a 30-dimensional vector. Samples generated this way are uniformly randomly distributed over . Combining samples over Cartesian products is much more difficult for nonrandom (deterministic) methods, which are presented in Sections
C
2 30
∈ C
5.2.3 and 5.2.4.
Generating a random element of SO(3) One has to be very careful about sampling uniformly over the space of rotations. The probability density must correspond to the Haar measure, which means that a random rotation should be obtained by picking a point at random on S3 and forming the unit quaternion. An extremely clever way to sample SO(3) uniformly at random is given in [883] and is reproduced here. Choose three points u1, u2, u3 [0, 1] uniformly at random. A uniform, random quaternion is given by the simple expression
∈
h = (√1 − u1 sin 2πu2, √1 − u1 cos 2πu2, √u1 sin 2πu3, √u1 cos 2πu3). (5.15)
A full explanation of the method is given in [883], and a brief intuition is given here. First drop down a dimension and pick u1, u2 [0, 1] to generate points on S2. Let u1 represent the value for the third coordinate, (0, 0, u1) R3. The slice of points on S2 for which u1 is fixed for 0 < u1 < 1 yields a circle on S2 that corresponds to some line of latitude on S2. The second parameter selects the longitude, 2πu2. Fortunately, the points are uniformly distributed over S2. Why? Imagine S2 as the crust on a spherical loaf of bread that is run through a bread slicer. The slices
∈
∈
are cut in a direction parallel to the equator and are of equal thickness. The crusts of each slice have equal area; therefore, the points are uniformly distributed. The
method proceeds by using that fact that S3 can be partitioned into a spherical arrangement of circles (known as the Hopf fibration); there is an S1 copy for each point in S2. The method above is used to provide a random point on S2 using u2 and u3, and u1 produces a random point on S1; they are carefully combined in (5.15) to yield a random rotation. To respect the antipodal identification for
rotations, any quaternion h found in the lower hemisphere (i.e., a < 0) can be negated to yield h. This does not distort the uniform random distribution of the samples.
−
Generating random directions Some sampling-based algorithms require choos- ing motion directions at random.4 From a configuration q, the possible directions of motion can be imagined as being distributed around a sphere. In an (n + 1)-
dimensional C-space, this corresponds to sampling on Sn. For example, choosing a direction in R2 amounts to picking an element of S1; this can be parameter- ized as θ [0, 2π]/ . If n = 4, then the previously mentioned trick for SO(3)
∈ ∼
should be used. If n = 3 or n > 4, then samples can be generated using a slightly more expensive method that exploits spherical symmetries of the multidimensional Gaussian density function [341]. The method is explained for Rn+1; boundaries and identifications must be taken into account for other spaces. For each of the n + 1 coordinates, generate a sample ui from a zero-mean Gaussian distribution with the same variance for each coordinate. Following from the Central Limit Theorem, ui can be approximately obtained by generating k samples at random over [ 1, 1] and adding them (k 12 is usually sufficient in practice). The vector (u1, u2, . . . , un+1) gives a random direction in Rn+1 because each ui was obtained independently, and the level sets of the resulting probability density function are spheres. We did not use uniform random samples for each ui because this would bias the directions toward the corners of a cube; instead, the Gaussian yields spherical symmetry. The final step is to normalize the vector by taking ui/ u for each coordinate.
− ≥
I I
Pseudorandom number generation Although there are advantages to uni- form random sampling, there are also several disadvantages. This motivates the consideration of deterministic alternatives. Since there are trade-offs, it is impor- tant to understand how to use both kinds of sampling in motion planning. One of the first issues is that computer-generated numbers are not random.5 A pseu- dorandom number generator is usually employed, which is a deterministic method that simulates the behavior of randomness. Since the samples are not truly ran- dom, the advantage of extending the samples over Cartesian products does not necessarily hold. Sometimes problems are caused by unforeseen deterministic de- pendencies. One of the best pseudorandom number generators for avoiding such
4The directions will be formalized in Section 8.3.2 when smooth manifolds are introduced. In that case, the directions correspond to the set of possible velocities that have unit magnitude. Presently, the notion of a direction is only given informally.
5There are exceptions, which use physical phenomena as a random source [808].
troubles is the Mersenne twister [684], for which implementations can be found on the Internet.
To help see the general difficulties, the classical linear congruential pseudo- random number generator is briefly explained [619, 738]. The method uses three integer parameters, M, a, and c, which are chosen by the user. The first two, M and a, must be relatively prime, meaning that gcd(M, a) = 1. The third parame- ter, c, must be chosen to satisfy 0 c < M. Using modular arithmetic, a sequence can be generated as
≤
yi+1 = ayi + c mod M, (5.16)
by starting with some arbitrary seed 1 y0 M. Pseudorandom numbers in [0, 1] are generated by the sequence
≤ ≤
xi = yi/M. (5.17)
The sequence is periodic; therefore, M is typically very large (e.g., M = 231 1). Due to periodicity, there are potential problems of regularity appearing in the samples, especially when applied across a Cartesian product to generate points in Rn. Particular values must be chosen for the parameters, and statistical tests are used to evaluate the samples either experimentally or theoretically [738].
−
Testing for randomness Thus, it is important to realize that even the “ran- dom” samples are deterministic. They are designed to optimize performance on statistical tests. Many sophisticated statistical tests of uniform randomness are used. One of the simplest, the chi-square test, is described here. This test measures how far computed statistics are from their expected value. As a simple example, suppose = [0, 1] and is partitioned into a 10 by 10 array of 100 square boxes. If a set P of k samples is chosen at random, then intuitively each box should receive roughly k/100 of the samples. An error function can be defined to measure how far from true this intuition is:
2
C
100
e(P) = (bi − k/100)2, (5.18)
-
i=1
in which bi is the number of samples that fall into box i. It is shown [521] that e(P) follows a chi-squared distribution. A surprising fact is that the goal is not to minimize e(P). If the error is too small, we would declare that the samples are too uniform to be random! Imagine k = 1, 000, 000 and exactly 10, 000 samples appear in each of the 100 boxes. This yields e(P) = 0, but how likely is this to ever occur? The error must generally be larger (it appears in many statistical tables) to account for the irregularity due to randomness.
This irregularity can be observed in terms of Voronoi diagrams, as shown in Figure 5.3. The Voronoi diagram partitions R2 into regions based on the samples. Each sample x has an associated Voronoi region Vor(x). For any point y Vor(x), x is the closest sample to y using Euclidean distance. The different sizes and shapes
∈
(a) 196 pseudorandom samples (b) 196 pseudorandom samples
Figure 5.3: Irregularity in a collection of (pseudo)random samples can be nicely observed with Voronoi diagrams.
of these regions give some indication of the required irregularity of random sam- pling. This irregularity may be undesirable for sampling-based motion planning and is somewhat repaired by the deterministic sampling methods of Sections 5.2.3 and 5.2.4 (however, these methods also have drawbacks).
- Low-Dispersion Sampling
This section describes an alternative to random sampling. Here, the goal is to optimize a criterion called dispersion [738]. Intuitively, the idea is to place samples in a way that makes the largest uncovered area be as small as possible. This generalizes of the idea of grid resolution. For a grid, the resolution may be selected by defining the step size for each axis. As the step size is decreased, the resolution increases. If a grid-based motion planning algorithm can increase the resolution arbitrarily, it becomes resolution complete. Using the concepts in this section, it may instead reduce its dispersion arbitrarily to obtain a resolution complete algorithm. Thus, dispersion can be considered as a powerful generalization of the notion of “resolution.”
Dispersion definition The dispersion6 of a finite set P of samples in a metric space (X, ρ) is7
δ(P) = sup { min {ρ(x, p)11. (5.19)
p∈P
x∈X
x∈X
6The definition is unfortunately backward from intuition. Lower dispersion means that the points are nicely dispersed. Thus, more dispersion is bad, which is counterintuitive.
7The sup represents the supremum, which is the least upper bound. If X is closed, then the
sup becomes a max. See Section 9.1.1 for more details.
- L2 dispersion (b) L∞ dispersion
Figure 5.4: Reducing the dispersion means reducing the radius of the largest empty ball.
(a) 196-point Sukharev grid (b) 196 lattice points Figure 5.5: The Sukharev grid and a nongrid lattice.
Figure 5.4 gives an interpretation of the definition for two different metrics. An alternative way to consider dispersion is as the radius of the largest empty ball (for the L∞ metric, the balls are actually cubes). Note that at the boundary of X (if it exists), the empty ball becomes truncated because it cannot exceed
the boundary. There is also a nice interpretation in terms of Voronoi diagrams. Figure 5.3 can be used to help explain L2 dispersion in R2. The Voronoi vertices are the points at which three or more Voronoi regions meet. These are points in
for which the nearest sample is far. An open, empty disc can be placed at any Voronoi vertex, with a radius equal to the distance to the three (or more) closest samples. The radius of the largest disc among those placed at all Voronoi vertices is the dispersion. This interpretation also extends nicely to higher dimensions.
C
Making good grids Optimizing dispersion forces the points to be distributed more uniformly over . This causes them to fail statistical tests, but the point distribution is often better for motion planning purposes. Consider the best way to reduce dispersion if ρ is the L∞ metric and X = [0, 1]n. Suppose that the number of samples, k, is given. Optimal dispersion is obtained by partitioning [0, 1] into a grid of cubes and placing a point at the center of each cube, as shown for n = 2 and k = 196 in Figure 5.5a. The number of cubes per axis must be
C
1 1
⌊ ⌋ ⌊·⌋
k n , in which denotes the floor. If k n is not an integer, then there are leftover
points that may be placed anywhere without affecting the dispersion. Notice that
1
k n just gives the number of points per axis for a grid of k points in n dimensions.
The resulting grid will be referred to as a Sukharev grid [922].
The dispersion obtained by the Sukharev grid is the best possible. Therefore, a useful lower bound can be given for any set P of k samples [922]:
1
δ(P) ≥ 2 k 1 . (5.20)
d
This implies that keeping the dispersion fixed requires exponentially many points in the dimension, d.
At this point you might wonder why L∞ was used instead of L2, which seems more natural. This is because the L2 case is extremely difficult to optimize (except in R2, where a tiling of equilateral triangles can be made, with a point in the center of each one). Even the simple problem of determining the best way to distribute
a fixed number of points in [0, 1]3 is unsolved for most values of k. See [241] for extensive treatment of this problem.
Suppose now that other topologies are considered instead of [0, 1]n. Let X = [0, 1]/ , in which the identification produces a torus. The situation is quite different because X no longer has a boundary. The Sukharev grid still produces optimal dispersion, but it can also be shifted without increasing the dispersion. In this case, a standard grid may also be used, which has the same number of points as the Sukharev grid but is translated to the origin. Thus, the first grid point is (0, 0), which is actually the same as 2n 1 other points by identification. If X represents a cylinder and the number of points, k, is given, then it is best to just use the Sukharev grid. It is possible, however, to shift each coordinate that
∼
−
behaves like S1. If X is rectangular but not a square, a good grid can still be made by tiling the space with cubes. In some cases this will produce optimal dispersion.
For complicated spaces such as SO(3), no grid exists in the sense defined so far. It is possible, however, to generate grids on the faces of an inscribed Platonic solid
- and lift the samples to Sn with relatively little distortion [987]. For example, to sample S2, Sukharev grids can be placed on each face of a cube. These are lifted to obtain the warped grid shown in Figure 5.6a.
Example 5.15 (Sukharev Grid) Suppose that n = 2 and k = 9. If X = [0, 1]2, then the Sukharev grid yields points for the nine cases in which either coordinate may be 1/6, 1/2, or 5/6. The L∞ dispersion is 1/6. The spacing between the points
_g_2
- (b)
Figure 5.6: (a) A distorted grid can even be placed over spheres and SO(3) by putting grids on the faces of an inscribed cube and lifting them to the surface [987]. (b) A lattice can be considered as a grid in which the generators are not necessarily orthogonal.
along each axis is 1/3, which is twice the dispersion. If instead X = [0, 1]2/ , which represents a torus, then the nine points may be shifted to yield the stan- dard grid. In this case each coordinate may be 0, 1/3, or 2/3. The dispersion and
∼
spacing between the points remain unchanged. .
One nice property of grids is that they have a lattice structure. This means that neighboring points can be obtained very easily be adding or subtracting vectors. Let gj be an n-dimensional vector called a generator. A point on a lattice can be expressed as
n
-
x = kj gj (5.21)
j=1
for n independent generators, as depicted in Figure 5.6b. In a 2D grid, the gen- erators represent “up” and “right.” If X = [0, 100]2 and a standard grid with integer spacing is used, then the neighbors of the point (50, 50) are obtained by adding (0, 1), (0, 1), ( 1, 0), or (1, 0). In a general lattice, the generators need not be orthogonal. An example is shown in Figure 5.5b. In Section 5.4.2, lattice structure will become important and convenient for defining the search graph.
− −
Infinite grid sequences Now suppose that the number, k, of samples is not given. The task is to define an infinite sequence that has the nice properties of the van der Corput sequence but works for any dimension. This will become the notion of a multi-resolution grid. The resolution can be iteratively doubled. For a
multi-resolution standard grid in Rn, the sequence will first place one point at the origin. After 2n points have been placed, there will be a grid with two points per
axis. After 4n points, there will be four points per axis. Thus, after 2ni points for any positive integer i, a grid with 2i points per axis will be represented. If only complete grids are allowed, then it becomes clear why they appear inappropriate
for high-dimensional problems. For example, if n = 10, then full grids appear after 1, 210, 220, 230, and so on, samples. Each doubling in resolution multiplies the number of points by 2n. Thus, to use grids in high dimensions, one must be willing to accept partial grids and define an infinite sequence that places points in a nice way.
The van der Corput sequence can be extended in a straightforward way as follows. Suppose X = T2 = [0, 1]2/ . The original van der Corput sequence started by counting in binary. The least significant bit was used to select which half of [0, 1] was sampled. In the current setting, the two least significant bits can be used to select the quadrant of [0, 1]2. The next two bits can be used to select the quadrant within the quadrant. This procedure continues recursively to obtain a complete grid after k = 22i points, for any positive integer i. For any k, however, there is only a partial grid. The points are distributed with optimal L∞ dispersion.
∼
This same idea can be applied in dimension n by using n bits at a time from
the binary sequence to select the orthant (n-dimensional quadrant). Many other orderings produce L∞-optimal dispersion. Selecting orderings that additionally optimize other criteria, such as discrepancy or L2 dispersion, are covered in [639, 644]. Unfortunately, it is more difficult to make a multi-resolution Sukharev grid. The base becomes 3 instead of 2; after every 3ni points a complete grid is obtained. For example, in one dimension, the first point appears at 1/2. The next two points appear at 1/6 and 5/6. The next complete one-dimensional grid appears after there are 9 points.
Dispersion bounds Since the sample sequence is infinite, it is interesting to consider asymptotic bounds on dispersion. It is known that for X = [0, 1]n and any Lp metric, the best possible asymptotic dispersion is O(k−1/n) for k points and n dimensions [738]. In this expression, k is the variable in the limit and n is treated as a constant. Therefore, any function of n may appear as a constant (i.e., O(f(n)k−1/n) = O(k−1/n) for any positive f(n)). An important practical consideration is the size of f(n) in the asymptotic analysis. For example, for the van der Corput sequence from Section 5.2.1, the dispersion is bounded by 1/k, which means that f(n) = 1. This does not seem good because for values of k that are powers of two, the dispersion is 1/2k. Using a multi-resolution Sukharev grid, the constant becomes 3/2 because it takes a longer time before a full grid is obtained. Nongrid, low-dispersion infinite sequences exist that have f(n) = 1/ ln 4 [738]; these are not even uniformly distributed, which is rather surprising.
- Low-Discrepancy Sampling
In some applications, selecting points that align with the coordinate axis may be undesirable. Therefore, extensive sampling theory has been developed to deter- mine methods that avoid alignments while distributing the points uniformly. In sampling-based motion planning, grids sometimes yield unexpected behavior be-
cause a row of points may align nicely with a corridor in Cfree. In some cases, a
Figure 5.7: Discrepancy measures whether the right number of points fall into boxes. It is related to the chi-square test but optimizes over all possible boxes.
solution is obtained with surprisingly few samples, and in others, too many sam- ples are necessary. These alignment problems, when they exist, generally drive the variance higher in computation times because it is difficult to predict when they will help or hurt. This provides motivation for developing sampling techniques that try to reduce this sensitivity.
Discrepancy theory and its corresponding sampling methods were developed to avoid these problems for numerical integration [738]. Let X be a measure space, such as [0, 1]n. Let be a collection of subsets of X that is called a range space. In most cases, is chosen as the set of all axis-aligned rectangular subsets; hence, this will be assumed from this point onward. With respect to a particular point set, P, and range space, , the discrepancy [965] for k samples is defined as (see Figure 5.7)
R
R
R
D(P, R) = sup (III|P ∩ R| − µ(R) III1 , (5.22)
k
µ(X)
R∈R I
I
in which P R denotes the number of points in P R. Each term in the supremum considers how well P can be used to estimate the volume of R. For example, if µ(R) is 1/5, then we would hope that about 1/5 of the points in P fall into R. The discrepancy measures the largest volume estimation error that can be
| ∩ | ∩
obtained over all sets in R.
Asymptotic bounds There are many different asymptotic bounds for discrep- ancy, depending on the particular range space and measure space [682]. The most widely referenced bounds are based on the standard range space of axis-aligned rectangular boxes in [0, 1]n. There are two different bounds, depending on whether the number of points, k, is given. The best possible asymptotic discrepancy for a single sequence is O(k−1 logn k). This implies that k is not specified. If, however, for every k a new set of points can be chosen, then the best possible discrepancy is O(k−1 logn−1 k). This bound is lower because it considers the best that can be achieved by a sequence of points sets, in which every point set may be completely
different. In a single sequence, the next set must be extended from the current set by adding a single sample.
Relating dispersion and discrepancy Since balls have positive volume, there is a close relationship between discrepancy, which is measure-based, and dispersion,
which is metric-based. For example, for any P ⊂ [0, 1] ,
n
δ(P, L∞) ≤ D(P, R) , (5.23)
1/d
which means low-discrepancy implies low-dispersion. Note that the converse is not true. An axis-aligned grid yields high discrepancy because of alignments with the boundaries of sets in , but the dispersion is very low. Even though low- discrepancy implies low-dispersion, lower dispersion can usually be obtained by ignoring discrepancy (this is one less constraint to worry about). Thus, a trade-off must be carefully considered in applications.
R
Low-discrepancy sampling methods Due to the fundamental importance of numerical integration and the intricate link between discrepancy and integration error, most sampling literature has led to low-discrepancy sequences and point sets [738, 893, 937]. Although motion planning is quite different from integration, it is worth evaluating these carefully constructed and well-analyzed samples. Their potential use in motion planning is no less reasonable than using pseudorandom sequences, which were also designed with a different intention in mind (satisfying statistical tests of randomness).
Low-discrepancy sampling methods can be divided into three categories: 1) Halton/Hammersley sampling; 2) (t,s)-sequences and (t,m,s)-nets; and 3) lattices. The first category represents one of the earliest methods, and is based on extending the van der Corput sequence. The Halton sequence is an n-dimensional generaliza- tion of the van der Corput sequence, but instead of using binary representations, a different basis is used for each coordinate [430]. The result is a reasonable de- terministic replacement for random samples in many applications. The resulting
discrepancy (and dispersion) is lower than that for random samples (with high probability). Figure 5.8a shows the first 196 Halton points in R2.
Choose n relatively prime integers p1, p2, . . . , pn (usually the first n primes,
p1 = 2, p2 = 3, . . ., are chosen). To construct the ith sample, consider the base-p representation for i, which takes the form i = a0 + pa1 + p2a2 + p3a3 + . . .. The following point in [0, 1] is obtained by reversing the order of the bits and moving the decimal point (as was done in Figure 5.2):
r(i, p) = a0 + a1 + a2 + a3 + · · · . (5.24)
p
p2
p3
p4
For p = 2, this yields the ith point in the van der Corput sequence. Starting from
i = 0, the ith sample in the Halton sequence is
(r(i, p1), r(i, p2), . . . , r(i, pn)). (5.25)
Suppose instead that k, the required number of points, is known. In this case, a better distribution of samples can be obtained. The Hammersley point set [431] is an adaptation of the Halton sequence. Using only d 1 distinct primes and starting at i = 0, the ith sample in a Hammersley point set with k elements is
−
i/k, r(i, p1), . . . , r(i, pn−1) . (5.26) Figure 5.8b shows the Hammersley set for n = 2 and k = 196.
( )
The construction of Halton/Hammersley samples is simple and efficient, which has led to widespread application. They both achieve asymptotically optimal discrepancy; however, the constant in their asymptotic analysis increases more than exponentially with dimension [738]. The constant for the dispersion also increases exponentially, which is much worse than for the methods of Section 5.2.3.
- 196 Halton points (b) 196 Hammersley points
Figure 5.8: The Halton and Hammersley points are easy to construct and provide a nice alternative to random sampling that achieves more regularity. Compare the Voronoi regions to those in Figure 5.3. Beware that although these sequences pro- duce asymptotically optimal discrepancy, their performance degrades substantially in higher dimensions (e.g., beyond 10).
Improved constants are obtained for sequences and finite points by using (t,s)- sequences, and (t,m,s)-nets, respectively [738]. The key idea is to enforce zero discrepancy over particular subsets of known as canonical rectangles, and all remaining ranges in will contribute small amounts to discrepancy. The most famous and widely used (t,s)-sequences are Sobol’ and Faure (see [738]). The Niederreiter-Xing (t,s)-sequence has the best-known asymptotic constant, (a/n)n, in which a is a small positive constant [739].
R
R
The third category is lattices, which can be considered as a generalization of grids that allows nonorthogonal axes [682, 893, 959]. As an example, consider
Figure 5.5b, which shows 196 lattice points generated by the following technique.
Let α be a positive irrational number. For a fixed k, generate the ith point ac- cording to (i/k, {iα}), in which {·} denotes the√fractional part of the real value
(modulo-one arithmetic). In Figure 5.5b, α = ( 5 + 1)/2, the golden ratio. This
procedure can be generalized to n dimensions by picking n 1 distinct irrational numbers. A technique for choosing the α parameters by using the roots of irre- ducible polynomials is discussed in [682]. The ith sample in the lattice is
−
( i , {iα }, . . . , {iα }\ . (5.27)
k
1
n−1
Recent analysis shows that some lattice sets achieve asymptotic discrepancy that is very close to that of the best-known nonlattice sample sets [445, 938]. Thus, restricting the points to lie on a lattice seems to entail little or no loss in performance, but has the added benefit of a regular neighborhood structure that is useful for path planning. Historically, lattices have required the specification of k in advance; however, there has been increasing interest in extensible lattices, which are infinite sequences [446, 938].