In this chapter we consider the discrete wavelet transform (DWT). We will see that when certain criteria are met it is possible to completely reconstruct the original signal using infinite summations of discrete wavelet coefficients rather than continuous integrals (as required for the CWT). This leads to a fast wavelet transform for the rapid computation of the discrete wavelet transform and its inverse. We will then see how to perform a discrete wavelet transform on discrete input signals of finite length: the kind of signal we might be presented with in practice. We will also consider briefly biorthogonal wavelets, which come in pairs, and some space is devoted to twodimensional discrete wavelet transforms. The chapter ends with wavelet packets: a generalization of the discrete wavelet transform which allows for adaptive partitioning of the timefrequency plane.
In this chapter we consider the discrete wavelet transform (DWT). We will see that when certain criteria are met it is possible to completely reconstruct the original signal using infinite summations of discrete wavelet coefficients rather than continuous integrals (as required for the CWT). This leads to a fast wavelet transform for the rapid computation of the discrete wavelet transform and its inverse. We will then see how to perform a discrete wavelet transform on discrete input signals of finite length: the kind of signal we might be presented with in practice. We will also consider briefly biorthogonal wavelets, which come in pairs, and some space is devoted to twodimensional discrete wavelet transforms. The chapter ends with wavelet packets: a generalization of the discrete wavelet transform which allows for adaptive partitioning of the timefrequency plane.
In chapter 2, the wavelet function was defined at scale a and location b as
A natural way to sample the parameters a and b is to use a logarithmic discretization of the a scale and link this, in turn, to the size of steps taken between b locations. To link b to a we move in discrete steps to each location b which are proportional to the a scale. This kind of discretization of the wavelet has the form
The wavelet transform of a continuous signal, x(t), using discrete wavelets of the form of equation (3.2) is then
Figure 3.1 The nearly tight Mexican hat wavelet frame with a 0 = 21/2 and b 0 = 0.5. Three consecutive locations of the Mexican hat wavelet for scale indices m (top) and m + 1 (lower) and location indices n, n + 1, n + 2. That is, a = 2 m and a = 2 m+1 respectively, and three consecutive b locations separated by a/2.
Common choices for discrete wavelet parameters a _{0} and b _{0} are 2 and 1 respectively. This poweroftwo logarithmic scaling of both the dilation and translation steps is known as the dyadic grid arrangement. The dyadic grid is perhaps the simplest and most efficient discretization for practical purposes and lends itself to the construction of an orthonormal wavelet basis. Substituting a _{0} = 2 and b _{0} = 1 into equation (3.2), we see that the dyadic grid wavelet can be written as
Discrete dyadic grid wavelets are commonly chosen to be orthonormal. These wavelets are both orthogonal to each other and normalized to have unit energy. This is expressed as
Using the dyadic grid wavelet of equation (3.7a), the discrete wavelet transform (DWT) can be written as:
Orthonormal dyadic discrete wavelets are associated with scaling functions and their dilation equations. The scaling function is associated with the smoothing of the signal and has the same form as the wavelet, given by
We can represent a signal x(t) using a combined series expansion using both the approximation coefficients and the wavelet (detail) coefficients as follows:
Figure 3.2 Smooth approximation of a sine wave using a block pulse scaling function, (a) Simple block scaling function shown at scale 1 (scale index m = 0) and location n = 0, i.e. ϕ 0,0(t) (shown bold), together with its dilations at that location, (b) Sine wave of one period, (c) Selected smooth approximations, x m (t), of the sine wave at increasing scales. The width of one of the scaling functions φ m,n (t) at scale index m is depicted in each figure by the arrows. Note that the example has been set up so that one period of the sinusoid corresponds to scale 27. Note also that at small scales x m (t) approaches x(t).
The scaling equation (or dilation equation) describes the scaling function ϕ(t) in terms of contracted and shifted versions of itself as follows:
This construction ensures that the wavelets and their corresponding scaling functions are orthogonal. This wavelet equation is commonly seen in practice. In this chapter, however, we will consider only wavelets of compact support which have a finite number of scaling coefficients, N _{ k }. For this case we can define the wavelet function as
Often the reconfigured coefficients used for the wavelet function are written more compactly as
The Haar wavelet is the simplest example of an orthonormal wavelet. Its scaling equation contains only two nonzero scaling coefficients and is given by
Figure 3.3 Discrete orthonormal wavelets, (a) The Haar scaling function in terms of shifted and dilated versions of itself, (b) The Haar wavelet in terms of shifted and dilated versions of its scaling function, (c) Three consecutive scales shown for the Haar wavelet family specified on a dyadic grid, e.g. from top to bottom: ψ m,n (t), ψ m+1,n (t) and ψ m+2,n (t). (d) Three Haar wavelets at three consecutive scales on a dyadic grid, (e) Three Haar wavelets at different scales. This time the Haar wavelets are not defined on a dyadic grid and are hence not orthogonal to each other. (f) A Meyer wavelet and associated scaling function (right).
From equation (3.14) we can see that the approximation coefficients at scale index m + 1 are given by
Similarly the wavelet coefficients can be found from the approximation coefficients at the previous scale using the reordered scaling coefficients b _{ k } as follows:
We can go in the opposite direction and reconstruct S _{ m,n } from S _{ m+1,n } and T _{ m+1,n }. We know already from equation (3.17) that x _{ m−1}(t) = x _{ m }(t) + d _{ m }(t); we can expand this as
So far we have considered the discrete orthonormal wavelet transform of a continuous function x(t), where it was shown how the continuous function could be represented as a series expansion of wavelet functions at all scales and locations (equation (3.10a)) or a combined series expansion involving the scaling and wavelet functions (equation (3.16)). In this section, and from here on, we will consider discrete input signals specified at integer spacings. To fit into a wavelet multiresolution framework, the discrete signal input into the multiresolution algorithm should be the signal approximation coefficients at scale index m = 0, defined by
In practice our discrete input signal S _{0,n } is of finite length N, which is an integer power of 2: N = 2^{ M }. Thus the range of scales we can investigate is 0 < m < M. Substituting both m = 0 and m = M into equation (3.16), and noting that we have a finite range of n which halves at each scale, we can see that the signal approximation scale m = 0 (the input signal) can be written as the smooth signal at scale M plus a combination of detailed signals as follows:
Figure 3.4 Covering the time axis with dyadic grid wavelets, (a) Eight Daubechies D4 wavelets covering the time axis at scale m. (b) Four Daubechies D4 wavelets covering the time axis at scale m + 1. These wavelets are twice the width of those in (a).
We can rewrite equation (3.44) as
The term on the far right of equation (3.45) represents the series expansion of the fluctuating components of the finite length signal at various scales in terms of its detail coefficients. The detail signal approximation corresponding to scale index m is defined for a finite length signal as
We saw in equation (3.19) that the signal approximation at a specific scale was a combination of the approximation and detail at the next lower scale. If we rewrite this equation as
Figure 3.5 Multiresolution decomposition of a chirp signal containing a short burst of noise. (a) Signal details d m (t). (The signals have been displaced from each other on the vertical axis to aid clarity.) (b) Signal approximations x m (t).
We have glossed over much of the mathematical detail of multiresolution analysis here. Most mathematical accounts of the subject begin with a discussion of orthogonal nested subspaces and the signal approximations and details, x _{ m }(t) and d _{ m }(t), as projections on to these spaces. This tack has not been followed here; see for example Mallat (1998), Blatter (1998), Sarkar et al (1998) or Williams and Armatunga (1994). In this chapter we concentrate on the mechanics, rather than the mathematics, of multiresolution analysis.
Once we have our discrete input signal S _{0,n }, we can compute S _{ m,n } and T _{ m,n } using the decomposition algorithm given by equations (3.36) and (3.37). This can be done for scale indices m > 0, up to a maximum scale determined by the length of the input signal. To do this, we use an iterative procedure as follows. First we compute S _{1,n } and T _{1,n } from the input coefficients S _{0,n }, i.e.
The decomposition of approximation coefficients into approximation and detail coefficients at subsequent levels can be illustrated schematically thus Figure 3.6 shows an alternative schematic of the same process, illustrating the decomposition and insertion of the approximation and detail coefficients at each Iteration within the wavelet transform vector for an arbitrary input signal vector containing 32 components. The wavelet transform vector after the full decomposition has the form W ^{(M)} = (S _{ M }, T _{ M }, T _{ M−1},…, T _{ m }, …, T _{2}, T _{1}), where T _{ m } represents the subvector containing the coefficients T _{ m,n } at scale index m (where n is in the range 0 to 2 ^{ M−m } − 1). We can halt the transformation process before the full decomposition. If we do this, say at an arbitrary level m _{0}, the transform vector has the form ${W}^{\left({m}_{0}\right)}=\left({S}_{{m}_{0}},{T}_{{m}_{0}},{T}_{{m}_{0}1},\dots ,{T}_{2},{T}_{1}\right)$
, where m _{0} can take the range 1 ⩽ m _{0} ⩽ M − 1. In this case, the transform vector does not contain a single approximation component but rather the sequence of approximation components ${S}_{{m}_{0,n}}$ . However, the transform vector always contains N = 2^{ M } components. For example, we can see from figure 3.6 that stopping the algorithm at m = 2 results in W ^{(2)} = (S _{2},T _{2} ,T _{1}). Remember, the range of n is a function of scale index m, hence this vector contains eight S _{2,n } components, eight T _{2,n } components and 16 T _{1,n } components, matching the 32 components of the original input signal vector. The range of n is indicated below the full decomposition vector in figure 3.6. Notice also that we can express the original input signal as the transform vector at scale index zero, i.e. W ^{(0)}.
Figure 3.6 Schematic diagram of the Haar filtering algorithm.→ S m,n used to derive S m+1,n ; → S m,n used to derive T m+1,n ; T m,n taken to next level as T m,n , i.e. no further manipulation.
An example of a wavelet decomposition using a discrete wavelet is shown in figure 3.7. The input signal is composed of a section of a sine wave, some noise and a flatline. The signal is decomposed using a Daubechies D6 wavelet. A member of this family is shown in figure 3.7(b). (We will look more closely at the Daubechies wavelet family later in this chapter.) The discrete transform plot is shown in figure 3.7(c), where the dyadic grid arrangement may be seen clearly. This plot is simply a discretized dyadic map of the detail coefficients, T _{ m,n }, where the coefficients at larger scales have correspondingly longer boxes (as the wavelets cover larger segments of the input signal). In addition to the detail coefficients, T _{ m,n }, the remaining approximation coefficient S _{ M,0} is added to the bottom of the plot. As we would expect it covers the whole time axis. We can see from the transform plot that the dominant oscillation is picked up at scale index m = 6 and the high frequency noise is picked up within the middle segment of the transform plot at the smaller scales. We can use the reconstruction algorithm (equation (3.42)) to get back the original input signal S _{0,n } from the array of detail coefficients shown in figure 3.7(c). Alternatively, as with the continuous transform, we can reconstruct a modified version of the input signal by using only selected coefficients in the reconstruction. This is shown in figures 3.7(d) and (e), where only the coefficients corresponding to scales m = 5 to 8 are kept (the others are set to zero) and the signal is reconstructed. This has removed a significant amount of the noise from the signal although the sinusoidal waveform is less smooth than the original in figure 3.7(a). We will come across more sophisticated ways to remove noise and other signal artefacts later, in section 3.4.2 of this chapter.
After a full decomposition, the energy contained within the coefficients at each scale is given by
Figure 3.7 Discrete wavelet transform of a composite signal, (a) Original composite signal, (b) A member of the Daubechies D6 wavelet family, (c) Discrete transform plot. (Note dyadic structure—large positive coefficient values are white and large negative values black.) (d) Coefficient removal, (e) Reconstructed signal using only retained coefficients in (d). The original composite signal (a) is composed of three segments: a sinusoid, uniformly distributed noise and a flatline. The signal is decomposed using Daubechies D6 wavelets (b) to give the dyadic array of transform coefficients plotted in (c). The coefficients corresponding to scales 5 to 9 are kept (d) and used to reconstruct the signal in (e). Note that a grey scale is used to depict the coefficient values, where the maximum value is white and the minimum value is black.
The total energy of the discrete input signal
There are three main methods used in practice to index the coefficients resulting from the discrete wavelet transform. It is worth discussing these now. All three methods are popular in the scientific literature and appear often in many of the examples of the practical application of wavelet analysis in the subsequent chapters of this book. We will use the full decomposition of a 32 component input signal as an illustration.
Scale indexing: The scale indexing system (m, n) corresponding to an input signal of length N = 2^{ M } is shown schematically on a dyadic grid in figure 3.8(b) for the discrete signal shown in figure 3.8(a). The lowest scale on the grid m = 1 corresponds to a spacing of 2^{1} = 2 on the data set. The discrete input signal is at scale index m = 0. We have already come across this grid structure in the plot of the transform coefficients in figure 3.7(c). Such plots give a good visual indication of the covering of the timescale plane by the wavelets and their relative importance in making up the signal.
Sequential indexing: Again, we have already come across this form of indexing (figure 3.6). Figure 3.8(c) contains a schematic of the transform coefficients given in sequential format. We know that a signal N samples long produces N wavelet vector components. It makes sense, therefore, to find an ordering of these coefficients to fit into a vector of length N. The discrete time series, S _{0,n }, n = 0, 1, 2, N•− 1, can be decomposed into N – 1 detail coefficients plus one approximation coefficient where N is an integer power of 2: N = 2^{ M }. After a full decomposition these transform coefficients can be resequenced from the twodimensional dyadic array T _{ m,n } to the transform vector, where the components ${W}_{i}^{\left(M\right)}$
have the same range as the original signal (i = 0, 1, 2, .N − 1). The vector component index i is found from the dyadic grid indices m and n through the relationship i = 2^{ M−m } + n. In addition, the superscript M in parentheses denotes a full decomposition of the signal over M scales. The transform vector components, ${W}_{i}^{\left(M\right)}$ are plotted in figure 3.8(c). The last half of the series in the figure represents the coefficients corresponding to the smallest wavelet scale (index m = 1). The next quarter, working backwards, represents the coefficients corresponding to scale index m = 2, the next eighth to the m = 3 coefficients, and so on, back to ${W}_{i}^{\left(M\right)}$ , which is the single coefficient for scale index m = M. The remaining component ${W}_{1}^{\left(M\right)}$ is the single approximation coefficient (S _{ M,0}). As mentioned in section 3.3.2, if we halt the decomposition algorithm at scale m _{0} (before full decomposition) this results in an intermediate transform vector ${W}_{0}^{\left(M\right)}$ which contains a number of approximation coefficients, ${W}^{\left({m}_{0}\right)}$ at its beginning.Figure 3.8 Schematic diagram of alternative indexing methods for dyadic grid wavelet transform coefficients, (a) Original signal containing 32 components (scale index m = 0). (b) Scale indexing, T m,n . (c) Sequential indexing, S m 0 , n where i = 2 M−N + n (corresponding scale indexing is shown at the bottom of the plot), (d) Level indexing, T l,n .
Level indexing: Level indices, l, are often used instead of scale indices, m. The level index l is simply equal to M − m. In this case, the number of wavelets used to cover the signal at each level at a specific level is 2^{ l }, e.g. level l = 0 corresponds to a single wavelet and the scale of the whole time series, level l = 1, corresponds to two wavelets, and the scale of one half of the time series, and so on. The number of wavelets used at a level is simply 2^{ l }. In addition, it is standard in this nomenclature to denote the remaining approximation coefficient at level l = − 1. The wavelet array becomes T _{ l,n }. Figure 3.8(d) shows the l, n indexing system. Level indexing is used when the signal is specified over the unit interval and the analysis is set up in terms of resolution rather than scale (i.e. the lowest resolution l = 0 corresponds to the length of the whole signal, whereas the smallest scale m = 0 corresponds to the distance between each signal point).
Now we will illustrate the methods described above using a Haar wavelet in the decomposition of a discrete input signal, S _{0,n }: n = 0. 1, 2, …, N … 1. To do so, we employ the decomposition algorithm given by equations (3.36) and (3.37). The Haar wavelet has two scaling coefficients, c _{0} = 1 and c _{1} = 1. Substituting these into equation (3.36) we can obtain the approximation coefficients at the next scale through the relationship
The first iteration of the decomposition algorithm gives
Figure 3.9 Iteration of the wavelet transform vector in the decomposition of a simple signal. → S m,n used to derive S m+ı,n → S m,n used to derive T m+ı,n ; T m,n taken to next level as T m¡n , i.e. no further manipulation.
The transform coefficient vector, after this first iteration, is then
To return from the transform coefficients to the original discrete signal we work in reverse. The inverse Haar transform can be written simply as: at even locations 2n:
Figure 3.10 Haar decomposition and reconstruction of a simple ramp signal. The original discrete input signal is decomposed through an initial iteration to (b), the first decomposition, then further to (c), the second and final decomposition. From the coefficients at the full decomposition (c), the following can be constructed: the scale index 2 approximation (the signal mean) (d), the scale index 2 detail (e) and the scale index 1 detail (f). (g) The transform coefficient plot associated with the transform values given in (c). (h) Schematic diagram of the addition of the rescaled Haar wavelets to reconstruct the original signal. (Adding the reconstructions in (d), (e) and (f) returns the approximation of the signal at scale index m = 0.)
We will perform the reconstruction on the transform vector we have just computed:
Iterating again gives
Figure 3.10 attempts to show visually the decomposition of the signal into the Haar wavelet components. Figure 3.10(a) contains the original signal, figure 3.10(b) plots the coefficients after the first iteration of the decomposition algorithm and figure 3.10(c) plots the coefficients after the second iteration of the algorithm. The coefficients contained in figure 3.10(c) correspond in turn to a single scaling function at scale index m = 2, a wavelet at scale index m = 2 and two wavelets at scale index m = 1. These are shown respectively in figures 10(d)–(f). The transform coefficient plot for this signal is equally simplistic, consisting of four coefficient values partitioned as shown in figure 3.10(g).
We can find the corresponding approximation and details of the signal by taking the inverse transform of the coefficients at each scale. First the approximation coefficient is used to determine the signal approximation at the largest scale:
We will now consider the case where we have a discrete experimental signal collected, say, using some piece of experimental apparatus, and we want to perform a wavelet decomposition of it using the multiresolution algorithm. In addition, we would also like to represent the approximation and detail signal components discretely at the resolution of the input signal. This discrete signal, which we will denote x _{ i }, is of finite length N, i.e. i = 0, 1, …, N − 1. It has been acquired at discrete time intervals ∆t (the sampling interval) to give the discrete time signal x(t _{ i }): i = 0, 1, 2, …, N − 1. The sampling of the signal provides a finite resolution to the acquired signal. This discretization of the continuous signal is then mapped on to a discrete signal x _{ i }, where the sampling interval has been normalized to 1. In doing so we must remember ∆t and add it when required, for example in real applications when computing signal frequencies.
Common practice is to input the discretely sampled experimental signal, x _{ i }, directly as the approximation coefficients at scale m = 0, and begin the multiresolution analysis from there. However, it is not correct to use the sampled time series x _{ i } directly in this way. We should really use the approximation coefficients S _{0,n } obtained from the original continuous signal at scale m = 0, as defined by equation (3.43). In practice we usually do not know exactly what x(t) is. As S _{0,n } is a weighted average of x(t) in the vicinity of n, then it is usually reasonable to input x _{ i } as S _{0,n } if our signal is slowly varying between samples at this scale. That is we simply set
In practice, continuous approximations, x _{ m }(t), and details, d _{ m }(t), of the signal are not constructed from the multiresolution analysis, especially when inputting the signal x _{ i } as the detail coefficients at scale index m = 0. Instead, either the approximation and detail coefficients, S _{ m,n } and T _{ m,n }, are displayed at their respective scales or, alternatively, they are used to construct representations of the signal at the scale of the input signal (m = 0). The latter is sometimes preferable as the scalings of the displays are visually comparable. As an example, let us consider the Haar wavelet decomposition of a simple discrete signal containing eight components: x _{ i } = (4, 5, −4, −1, 0, 1, 2, 0). The signal is shown in figure 3.11(a). We can see that the first half of the signal has a marked step from 5 to −4, whereas the second half of the signal appears much smoother. The transform vector after a full decomposition is
Let us now look at what happens to the reconstructed signal when we remove the smallestscale (i.e. highestfrequency) detail coefficients in turn (figures 3.11(c)–(e)). Removing the components, T _{1,n }, the modified transform vector becomes (2.475, 0.354, 7.000, −0.500, 0, 0, 0, 0), where the coefficients set to zero are shown bold. Performing the inverse transform on this vector with the coefficients at the smallest wavelet scale set to zero removes the highfrequency details at this scale and returns a smoother version of the input signal. We will denote this smooth version of the input signal as x _{1,i }. This operation is shown schematically as
Figure 3.11 Multiresolution decomposition as scale thresholding. (a) x 0, i (=x 1,i + d 1,i ), (b) Wavelet coefficients (S 3,0, T 3,0 , T 2,0 T 2,1 , T 1,0 T 1,1, T 1,2 , T 1,3). (c) x 1,i (=x 2,i + d 2,i ), (=x 0,i −d 1,i ). (d) d 1,i . (e) x 2,i (=x 3,i + d 3,i ), (=x 1,1 − d 2,i ). (f) d 2,i . (g) x 3,i (signal mean), (=x 2,i − d 3,i ). (h) d 3,i .
Now we remove the detail coefficients associated with the next smallest scales from the transform vector to get (2.475, 0.354, 0, 0, 0, 0, 0, 0). Reconstructing the signal using this modified transform vector, we get x _{2,i } = (1, 1, 1, 1, 0.75, 0.75, 0.75, 0.75) shown in figure 3.11(e). Again we can show this schematically as
Removing the last detail coefficient, the modified transform vector becomes (2.475, 0, 0, 0, 0, 0, 0, 0) which, when used to reconstruct the signal, produces x _{3,i } = (0.875, 0.875, 0.875, 0.875, 0.875, 0.875, 0.875, 0.875). This can be shown as
We can by see comparing figures 3.11(c), (e) and (g) that the highfrequency information has been successively stripped from the discrete input signal (figure 3.11(a)) by successively removing the discrete details at scale indices m _{0} = 1, 2 and 3 in turn. These discrete signal details are shown in figures 3.11(d), (f) and (h). If we want to generate the details of the discrete signal at any scale using the multiresolution algorithm, we perform the inverse transform using only the detail coefficients at that scale (we could also subtract two successive discrete approximations). For example, to compute the detail at the lowest wavelet scale (index m = 1), which we will denote d _{1,i }, the following multiresolution operation is required:
In the above example we were effectively taking the information stored in the approximation and detail coefficients S _{ m,n } and T _{ m,n } and expressing it at scale index m = 0. The discrete approximation and detail contributions x _{ m,i } and d _{ m,i } are related to their continuous counterparts x _{ m }(t) and d _{ m }(t) through the scaling equation at scale m = 0 as follows:
Figure 3.11(c) shows x _{1,i }, i.e. where only the detail signal d _{1,i } corresponding to the smallest scale wavelets (m = 1) has been removed (figure 3.11(d)). Figure 3.11(e) shows x _{2,i } where both detail signals d _{1,i } and d _{2,i}, corresponding to the smallest and next smallest scale wavelets, have been removed from the original discrete input signal. d _{2, i } is shown in figure 3.11(f). Figure 3.11(g) shows x _{3,i }, where the detail signals d _{ m,i } corresponding to all the wavelet scales (m = 1,2 and 3) have been stripped from the original signal leaving only the signal mean component. This can be written as
Let us look again at the discrete input signal used in the previous section (figure 3.12(a)). Once we have performed the full decomposition, we are free to alter any of the coefficients in the transform vector before performing the inverse. We can set groups of components to zero, as we did in the last section (figure 3.11), or set selected individual components to zero. We can reduce the magnitudes of some components rather than set them to zero. In fact, we can manipulate the components in a variety of ways depending on what we want to achieve. Notice, however, that the transform vector contains a range of values: some large and some small. Let us see what happens if we throw away the smallest valued coefficients in turn and perform the inverse transforms. We start with the smallest valued coefficient 0.354. Setting it to zero, the modified transform vector becomes (2.475, 0, 7.000, −0.500, −0.707, −2.121, −0.707, 1.414). Performing the inverse on this vector gives the reconstructed discrete signal (3.875, 4.875, −4.125, −1.125, 0.125, 1.125, 2.125, 0.125), as shown in figure 3.12(b). There is no discernible difference between the original discrete signal and the reconstruction. We now remove the next smallest valued coefficient (smallest in an absolute sense), the −0.5 coefficient. The transform vector becomes (2.475, 0, 7.000, 0, −0.707, −2.121, −0.707, 1.414) and the reconstructed signal is (3.875, 4.875, −4.125, −1.125, 0.375, 1.375, 1.875, −0.125). Again we have to look carefully at figure 3.12(c) to discern any difference between the original discrete signal and the reconstruction. Next we remove the two −0.707 components to get (2.475, 0, 7.000,0, 0, −2.121, 0, 1.414). Reconstructing we get (4.375, 4.375, −4.125, −1.125, 0.875, 0.875, 1.875, −0.125). This reconstruction is shown in figure 3.12(d) where we can now notice obvious smoothing between both the first and second signal point and the fifth and sixth. Next setting the last transform vector component to zero, we reconstruct to get (4.375, 4.375, −4.125, −1.125, 0.875, 0.875, 0.875, 0.875). This is shown in figure 3.12(e). Removing the −2.121 component leaves the coefficient vector as (2.475, 0, 7.000, 0, 0, 0, 0, 0) and the reconstruction becomes (4.375, 4.375, 2.625, −2.625, 0.875, 0.875, 0.875, 0.875) as shown in figure 3.12(f). Removing the 2.475 component (the signal mean component) leaves the coefficient vector with only a single component: (0, 0, 7.000, 0, 0, 0, 0, 0). Reconstructing using this component leaves a single (wavelet shaped) fluctuation contained within the first half of the signal (figure 3.12(g)).
Figure 3.12 Signal reconstruction using thresholded wavelet coefficients. The coefficient vectors used in the reconstructions are given below each reconstructed signal. Note that (a) is the original signal as it uses all of the original eight wavelet coefficients.
Can you see what has happened? The least significant components have been smoothed out first, leaving the more significant fluctuating parts of the signal intact. What we have actually done is to threshold the wavelet coefficients at increasing magnitudes. First, all the components whose magnitude was equal to or below 0.354 were removed; i.e. 0.354 was the threshold. Then 0.5 was set as the threshold, then 0.707 and so on. By doing this we removed the least significant influences on the signal first. Hence, the shape of the reconstructed signal resembles the original even with a large number of coefficients set to zero. Compare this magnitude thresholding of the coefficients to the scaledependent smoothing of figure 3.11, which removed the components at each scale in turn, beginning with the smallest scales. Note that in practice we would normally deal with the signal mean coefficient separately, either retaining it regardless of magnitude or removing it at the beginning of the thresholding process.
We can define the scaledependent smoothing of the wavelet coefficients as
Magnitude thresholding is normally carried out to remove noise from a signal, to partition signals into two or more (and not necessarily noisy) components, or simply to smooth the data. It involves the reduction or complete removal of selected wavelet coefficients in order to separate out the behaviour of interest from within the signal. There are many methods for selecting and modifying the coefficients. The two most popular are hard and soft thresholding. Unlike scaledependent smoothing, which removes all small scale coefficients below the scale index m* regardless of amplitude, hard and soft thresholding remove, or reduce, the smallest amplitude coefficients regardless of scale. This is shown schematically in figure 3.13(b). To hard threshold the coefficients, a threshold, λ, is set which is related to some mean value of the wavelet coefficients at each scale, e.g. standard deviation, mean absolute deviation, etc. Those coefficients above the threshold are deemed to correspond to the coherent part of the signal, and those below the threshold are deemed to correspond to the random or noisy part of the signal. Hard thresholding is of the form:
Figure 3.13 Scaledependent smoothing and coefficient thresholding, (a) Scaledependent smoothing, (b) Amplitude thresholding, (c) The relationship between the original coefficients and hard (left) and soft (right) thresholded coefficients.
Figure 3.14 shows examples of both hard and soft thresholding of a test signal composed of two sinusoids plus Gaussian white noise (figures 3.14(a) and (b)). The wavelet coefficients obtained from a Daubechies D10 decomposition are shown in figure 3.14(c). These have been hard thresholded for thresholds set at λ = 2, 4, 6 and 8 respectively. The reconstructions corresponding to each of the thresholds are shown in figures 3.14(d)–(g). We can see that for low thresholds some of the high frequency noise is retained, whereas for high thresholds the signal is excessively smoothed. From visual inspection, an optimum threshold would seem to lie somewhere in the region of λ = 4. One commonly used measure of the optimum reconstruction is the mean square error between the reconstructed signal and the original signal and, in fact, it is found to be minimum near to this value of the threshold. The corresponding soft thresholded reconstructions are shown in figures 3.14(h) and (i) for λ = 2 and 4 respectively.
Often we do not know the exact form of either the underlying signal or the corrupting noise. The choice of threshold is therefore nontrivial and we can apply a number of signal and/or noise based criteria in producing a value pertinent to the problem under consideration. The threshold can, for example, be a constant value applied to the coefficients across all scales, some of the scales, or its value can vary according to scale. One of the most popular and simplest thresholds in use is the universal threshold defined as
Figure 3.14 Hard and soft thresholding, (a) Original time series composed of two sinusoids, both with unit amplitude, one twice the periodicity of the other, (b) Time series in (a) with added Gaussian noise (zero mean and unit standard deviation), (c) Wavelet coefficients in sequential format derived using the Daubechies D10 shown, (d) Hard thresholded coefficients, λ = 2 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.) (e) Hard thresholded coefficients, λ = 4 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.) (f) Hard thresholded coefficients, λ = 6 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.) (g) Hard thresholded coefficients, λ = 8 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.) (h) Soft thresholded coefficients, λ = 2 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.) (i) Soft thresholded coefficients, A = 4 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.)
Universal soft thresholding usually produces a signal reconstruction with less energy than that for hard thresholding using the same threshold value, as the retained coefficients are all shrunk towards zero. This can be seen by comparing figure 3.14(e) with 3.14(1) for λ = 4. Hence, it is often the case in practice that the universal threshold used for hard thresholding is divided by about 2 when employed as a soft threshold. Another problem encountered when implementing the universal threshold in practice is that we do not know the value of σ for our signal. In this case, a robust estimate a is used, typically set to the median of absolute deviation (MAD) of the wavelet coefficients at the smallest scale divided by 0.6745 to calibrate with the standard deviation of a Gaussian distribution. Thus the universal threshold becomes
As we saw with the Haar wavelet transform earlier in section 3.3.5, the coefficients are ordered in two distinct sequences: one acts as a smoothing filter for the data, the other extracts the signal detail at each scale. The Haar wavelet is extremely simple in that it has only two scaling coefficients and both are equal to unity. In this section we will look at a family of discrete wavelets of which the Haar wavelet is the simplest member—the Daubechies wavelets. The scaling functions associated with these wavelets satisfy the conditions given in section 3.2.4, as all orthogonal wavelets do. In addition to satisfying these criteria, Daubechies required that her wavelets had compact support (i.e. a finite number, N _{ k } of scaling coefficients) and were smooth to some degree. The smoothness of the wavelet is associated with a moment condition which can be expressed in terms of the scaling coefficients as
In this chapter we have already looked in detail at the Daubechies wavelet with only two scaling coefficients (the D2 or Haar wavelet). Let us now look at the Daubechies wavelet which has four scaling coefficients, the D4. The ‘D’ represents this particular family of Daubechies wavelet and the ‘4’ represents the number of nonzero scaling coefficients, N _{ k }. (Note that Daubechies wavelets are also often defined by the number of zero moments they have, which equals Nk/2, in which case the sequence runs Dl, D2, D3, D4, …, etc.)
Figure 3.15 A selection of Daubechies wavelets and scaling functions with their energy spectra. All wavelets and scaling functions are shown only over their respective support—outside their support they have zero value. Note also that only the positive part of the spectrum is given; an identical mirror image is present at negative frequencies.
From equation (3.20), we know that the scaling equation for a fourcoefficient wavelet is
We can compute the scaling function from the D4 coefficients using equation (3.75). To do this we must rewrite it as
D2 
−0.34265671 
0.01774979 
0.04345268 
1 
−0.04560113 
6.07514995e − 4 
−0.09564726 
1 
0.10970265 
−2.54790472e − 3 
3.54892813e − 4 
−0.00882680 
5.00226853e − 4 
0.03162417 

D4 
−0.01779187 
−6.67962023e − 3 

0.6830127 
4.71742793e − 3 
D16 
−6.05496058e − 3 
1.1830127 
0.07695562 
2.61296728e − 3 

0.3169873 
D12 
0.44246725 
3.25814671e − 4 
−0.1830127 
0.15774243 
0.95548615 
−3.56329759e − 4 
0.69950381 
0.82781653 
−5.5645514e − 5 

Do 
1.06226376 
−0.02238574  
0.47046721 
0.44583132 
−0.40165863 
D20 
1.14111692 
−0.31998660 
6.68194092e − 4 
0.03771716 
0.650365 
−0.18351806 
0.18207636 
0.26612218 
−0.19093442 
0.13788809 
−0.02456390 
0.74557507 
−0.12083221 
0.03892321 
−0.06235021 
0.97362811 
0.0498175 
−0.04466375 
0.01977216 
0.39763774 
D8 
7.83251152e − 4 
0.01236884 
−0.35333620 
0.32580343 
6.75606236e − 3 
−6.88771926e − 3 
−0.27710988 
1.01094572 
−1.52353381c − 3 
−5.54004549e − 4 
0.18012745 
0.89220014 
9.55229711e − 4 
0.13160299 

−0.03957503 
D14 
−1.66137261e − 4 
−0.10096657 
−0.26450717 
0.11009943 
−0.04165925 

0.0436163 
0.56079128 
D18 
0.04696981 
0.0465036 
1.03114849 
0.05385035 
5.10043697e − 3 
−0.01498699 
0.66437248 
0.34483430 
−0.01517900 
−0.20351382 
0.85534906 
1.97332536e − 3 

D10 
−0.31683501 
0.92954571 
2.81768659e − 3 
0.22641898 
0.10084647 
0.18836955 
−9.69947840e − 4 
0.85394354 
0.11400345 
−0.41475176 
−1.64709006e − 4 
1.02432694 
−0.05378245 
−0.13695355 
1.32354367e − 4 
0.19576696 
−0.02343994 
0.21006834 
−1.87584156e − 5 
Figure 3.16 Filtering of the signal: decomposition. The original signal at scale index m = 0 (i.e. S 0, n ) is filtered to produce the approximation coefficients S 1,n . This is done by sliding the lowpass filter along the signal one step at a time. Subsampling removes every second value. The diagram shows only the retained values, i.e. effectively the filter coefficient vector jumps two steps at each iteration. Next, the sequence S 1,n is filtered in the same way to get S 2,n and so on. The detail coefficients are found by using the same method but employing highpass wavelet filter coefficients.
We know that the Daubechies D4 wavelet has two vanishing moments, thus it can suppress both constant signals and linear signals. This means that for an infinitely long constant or linear signal all the coefficients are zero. However, end effects occur if the signal is of finite length. We will use the linear ramp signal (0, 1, 2,…, 31), shown in figure 3.17(a), as an example. The first iteration produces the transform vector
Figure 3.17 The multiresolution decomposition of a ramp signal using Daubechies wavelets, (a) Original signal, (b) Decomposition 1 (D4 wavelet), (c) Decomposition 2 (D4 wavelet), (d) Decomposition 3 (D4 wavelet), (e) Decomposition 4 (D4 wavelet), (f) Decomposition 5, full decomposition (D4 wavelet), (g) Full decomposition using D6 wavelets, (h) Full decomposition using D12 wavelets. (Note change in scale of the vertical axes.)
The full D2 (Haar) decomposition of the ramp signal is
On the other hand, all Daubechies wavelets with more than four scaling coefficients have more than two vanishing moments and all can therefore suppress linear signals. For example, the D6 wavelet can suppress mean, linear and quadratic signals. The D6 decomposition of the ramp signal, which employs the scaling coefficients (c _{0}, c _{1}, c _{2}, c _{3}, c _{4}, c _{5}) = (0.470, 1.141, 0.650, −0.191, −0.121, 0.0498), is
Let us revisit the filtering process described in the last section and shown in figure 3.16. In signal processing, the approximation coefficients at resolution m, S _{ m,n }, are convolved with the lowpass filter. This done by moving the filter along the signal one step at a time. The approximation coefficients are then subsampled (or downsampled) where every second value is chosen to give the approximation coefficient vector at scale m + 1. The approximation coefficients at resolution m, S _{ m,n }, are also convolved with the highpass filter and subsampled in the same way to give the detail signal coefficients at scale m + 1. The detail components T _{ m+1,n } are kept and the approximation components S _{ m+1,n } are again passed through the lowpass and highpass filters to give components T _{ m+2,n } and S _{ m+2,n }. Fhis process is repeated over all scales to give the full decomposition of the signal. Each step in the decomposition filtering process is shown schematically in figure 3.18(a). The sequences $\left(1/\sqrt{2}\right)$
c _{ k } and $\left(1/\sqrt{2}\right)$ b _{ k } are contained respectively within the lowpass and highpass filter vectors used within the wavelet decomposition algorithm. The filter coefficients are sometimes referred to as ‘taps’.For signal reconstruction (equation (3.42)), the filtering process is simply reversed, whereby the components at the larger scales are fed back through the filters. This is shown in figure 3.18(b). The approximation and detail components at scale index m + 1 are first upsampled (zeros are inserted between their values) and then passed through the lowpass and highpass filters respectively. This time, however, the filter coefficients are reversed in order and are shifted back along the signal. An example of the reconstruction filtering is shown in figure 3.19 where the component S _{ m,5} is found from the sequences S _{ m+1,n } and T _{ m+1,n }. Note that the leading (righthand) filter coefficient defines the location of the S _{ m,n } coefficient. Hence the computation of the S _{ m,0}, S _{ m,1} and S _{ m,2} coefficients involve components of the filter vector lying off the lefthand end of the upsampled signal at scale m + 1. These are simply wrapped back around to the righthand end of the signal.
Figure 3.18 Schematic of the signal filtering, (a) Schematic diagram of the filtering of the approximation coefficients to produce the approximation and detail coefficients at successive scales. The subsample symbol means take every second value of the filtered signal. HP and LP are, respectively, the highpass and lowpass filters. In practice the filter is moved along one location at a time on the signal, hence filtering plus subsampling corresponds to skipping every second value as shown in figure 3.16. (b) Schematic diagram of the filtering of the approximation and detail coefficients to produce the approximation coefficients at successively smaller scales. The upsample symbol means add a zero between every second value of the input vector. The coefficients in the HP′ and LP′ filters are in reverse order to their counterparts used in the decomposition shown in (a). Figures (a) and (b) represent a subband coding scheme.
Figure 3.19 Filtering of the signal: reconstruction. The smooth and detail coefficients at scale index m are passed back through the filters to reconstruct the smooth coefficients at the higher resolution S m−1,n . This is done by sliding the lowpass and highpass filters along their respective coefficient sequence with zeros inserted between the coefficients. (Refer back to equation (3.42) where, remember, ‘n’ is the coefficient location index at the lower scale and ‘k’ is the index at the higher scale.)
We can produce a discrete approximation of the scaling function at successive scales if we set all the values of the transform vector to zero except the first one and pass this vector repeatedly back up through the lowpass filter. This is illustrated in the sequence of plots contained in figure 3.20. Figure 3.20(a) shows the initial transform vector, 64 components long, with only the first component set to unity, the rest set to zero, i.e. (1,0,0,0,…, 0). After the first iteration, four components can be seen. These are equal to $\left({c}_{0}/\sqrt{2},\text{\hspace{0.17em}}\text{\hspace{0.17em}}{c}_{1}/\sqrt{2},\text{\hspace{0.17em}}\text{\hspace{0.17em}}{c}_{2}/\sqrt{2},\text{\hspace{0.17em}}\text{\hspace{0.17em}}{c}_{3}/\sqrt{2},\text{\hspace{0.17em}}\text{\hspace{0.17em}}0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}0,\text{\hspace{0.17em}}\text{\hspace{0.17em}}0,\dots 0\right)$
(notice that this is the decomposition filter). The next iteration (figure 3.20(c)) produces 10 nonzero values in the transform vector, the next, 22 nonzero values, and so on. The transform vector components take on the appearance of the scaling function quite quickly. What we are actually doing is reconstructing S _{0,n } based on the initial vector (1,0,0,0,…, 0) equal to the following transform vectors: (S _{1,0} , S _{1,1} , S _{1,2},…, T _{1,0} , T _{1,1}, T _{1,2},…) for figure 3.20(b); (S _{2,0} , S _{2,1} , S _{2,2},…, T _{1,0} , T _{1,1}, T _{1,2},…) for figure 3.20(c); (S _{3,0} , S _{3,1} , S _{3,2},…, T _{1,0} , T _{1,1}, T _{1,2},…) for figure 3.18(d), and so on. Notice that, as we employ wraparound to deal with the edge effects, the final signal shown in figure 3.20(g) is a constant value equal to the signal mean. Figure 3.20(h) shows a highresolution representation of a Daubechies scaling function generated by iterating nine times with an initial transform vector containing 4096 components.Figure 3.20 The reconstruction of a Daubechies D4 scaling function, (a) The transform vector (1, 0, 0, 0, 0, 0, …). (b) One iteration, i.e. S 1,0 = 1. (c) Two iterations, i.e. S 2,0 = 1. (d) Three iterations, i.e. S 3,0 = 1. (e) Four iterations, i.e. S 4,0 = 1. (f) Five iterations, i.e. S 5,0 = 1. (g) Six iterations of the transform vector with 64 components, i.e. S 6,0 = 1. This results in a constant signal equal to the mean due to wraparound, (h) After nine iterations of a transform vector (1, 0, 0, 0, 0, …) with 4096 components.
You can verify that the constant signal of figure 3.20(g) stems from the wraparound employed when reconstructing from the full decomposition wavelet vector with S _{ M,0} set to unity. In this case, for the first iteration, adding a zero to get (S _{ M,0},0) and running the reconstruction filter over both components leads to two approximation coefficients (S _{ M,0} , S _{ M−1,1}) equal to $\left(\left({c}_{0}+{c}_{2}\right)/\sqrt{2},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left({c}_{1}+{c}_{3}\right)/\sqrt{2}\right)=\left(0.707,\text{\hspace{0.17em}}0.707\right)$
. Adding zeros to this vector and iterating again produces four approximation coefficients (0.5,0.5,0.5,0.5) and so on until we obtain the full reconstruction after the sixth iteration which contains constant values of S _{0,n } equal to ${\left(1/\sqrt{2}\right)}^{6}=0.125$ . We can see this is true for the general case as, from equation (3.74) with m = 0 together with equation (3.21), it can be shown thatIf we repeat the reconstruction filtering, this time setting the first detail coefficient to unity, we can generate an approximation to a wavelet. Notice that, in doing so, only the first iteration of the reconstruction algorithm requires the reordered scaling coefficient sequence for the wavelet, b _{ k }, and subsequent reconstruction filtering then uses the scaling coefficient sequence c _{ k }. Figure 3.21 shows the wavelet reconstruction process. From these examples, where a single component was set to unity, we can see that the transform vector components, when passed back through the filters during the reconstruction process, increasingly spread their influence (information) over the transform vector in terms of either discrete scaling functions (if the original component was an approximation coefficient) or discrete wavelet function (if the original component was a detail coefficient). Figures 3.20 and 3.21 illustrate the discrete approximations of the wavelet and scaling functions at various scales implicit within the multiresolution algorithm. The discrete scaling and wavelet functions at scale 1 are, in fact, the filter coefficient vectors $\left(1/\sqrt{2}\right)$
b _{ k } and $\left(1/\sqrt{2}\right)$ c _{ k } respectively.Looking back at figure 3.15 we can see that Daubechies wavelets are quite asymmetric. To improve symmetry while retaining simplicity, Daubechies proposed Symmlets as a modification to her original wavelets (also spelled symlets). Symmetric wavelet filters are desirable in some applications, e.g. image coding, as it is argued that our visual systemis more tolerant of symmetric errors. In addition, it makes it easier to deal with image boundaries. Daubechies came up with Symmlets by ‘juggling with their phase’ during their construction. Figure 3.22 contains two examples of Symmlets together with their scaling functions. They have N _{ k }/2 − 1 vanishing moments, support length N _{ k } − 1 and filter length N _{ k }. However, true symmetry (or antisymmetry) cannot be achieved for orthonormal wavelet bases with compact support with one exception: the Haar wavelet which is antisymmetric. Coifiets are another wavelet family found by Daubechies. They are also nearly symmetrical and have vanishing moments for both the scaling function and wavelet: the wavelet has N _{ k }/3 moments equal to zero and the scaling function has N _{ k }/3 − 1 moments equal to zero. They have support length N _{ k } − 1 and filter length N _{ k }. Three examples of Coifiet wavelets are shown in figure 3.23. The number of coefficients, N _{ k }, used to define Coiflets increase in multiples of six.
Figure 3.21 The reconstruction of a Daubechies D4 wavelet, (a) One iteration with only T 1,0 initially set equal to 1. (b) Two iterations with only T 2,0 initially set equal to 1. (c) Three iterations with only T 3,0 initially set equal to 1. (d) Four iterations with only T 4,0 initially set equal to 1.
Figure 3.22 Symmlets and their associated scaling functions. (a) S6. (b) S10. (The scaling functions are shown dotted.)
Figure 3.23 Coiflets and their associated scaling functions. (a) C6. (b) C12. (c) C18. (Scaling functions are shown dotted in figures.)
Translation invariance (or shift invariance) is an important property associated with some wavelet transforms but not others. It simply means that if you shift along the signal all your transform coefficients simply move along by the same amount. However, for the dyadic grid structure of the discrete wavelet transform this is clearly not the case: only if you shift along by the grid spacing at that scale do the coefficients become translation invariant at that scale and below. Even for the discretization of the continuous wavelet transform, the transform values are translation invariant only if shifted by any integer multiple of the discrete time steps. This is illustrated in figure 3.24 where a simple sinusoidal signal is decomposed using the discrete orthonormal Haar wavelet and a continuous Mexican hat wavelet discretized at each time step. The original signal shown on the lefthand side of figure 3.24(a) is composed of 10 cycles of a sinusoid made up of 1024 discrete data points. The middle plot in figure 3.24(a) contains the Haar transform coefficients of the signal plotted in their dyadic grid formation. The righthand plot contains the continuous wavelet transform using the Mexican hat wavelet. The coarse structure of the dyadic grid is evident when compared with the smooth, high resolution Mexican hat transform plot.
We will assume the signal continues indefinitely and shift the signal window along to see what happens to the transform plots. The variation in the coefficients of the Haar transform with displacement of the time window along the signal can be seen by looking down the central plots in the figure. Shifting the window forward (or back) along the signal by a scale of 2^{ m } produces the same coefficients in the transform plot at and below that scale shifted in time by 2^{ m }. This can be observed in figures 3.24(b)–(d) where the size of the shift is indicated in the transform plot of figure 3.24(a). The coefficient level at and below which the coefficients remain the same as the original is indicated by the arrows at the lefthand side of each transform plot. Shifting the signal by an arbitray scale (not a power of two) leads to a completely different set of coefficients, as can be seen in figure 3.24(e), where the signal is shifted by an eighth of a sinusoidal cycle (12.8 data points). The translation invariance of the continuous transform plots is obvious by looking down the righthand figures where, for any arbitrary shift in the signal, the wavelet transform values are simply translated with the shift.
Figure 3.24 Translation invariance. (a) Original signal, 1024 data points, (b) Shift by 64 data points, (c) Shift by 128 data points, (d) Shift by 256 data points, (e) Shift by 1/8 cycle.
We can make a translation invariant version of the discrete wavelet transform. Known by a variety of names including the redundant, stationary, translation invariant, maximal overlap or nondecimated wavelet transform, we simply skip the subsampling part of the filtering process described in section 3.5.1. This results in the same number of wavelet coefficients generated at each step, equal to the number of signal components N. The decomposition is the same as that shown in figure 3.16 except that every value is retained as the filter moves one step at a time along the signal. In addition, the filters have to be stretched through the addition of zeros between coefficients, hence this algorithm is called the ‘à trous’ algorithm from the French ‘with holes’. An average basis inverse can be performed which gives the average of all possible discrete wavelet transform reconstructions over all possible choices of time origin in the signal. This is sometimes useful for statistical applications including denoising. In addition to being translation invariant, the redundant discrete wavelet transform is extendable to signals of arbitrary length. We do not consider the redundant discrete wavelet transform herein, rather the reader is referred elsewhere at the end of this chapter.
For certain applications real, symmetric wavelets are required. One way to obtain symmetric wavelets is to construct two sets of biorthogonal wavelets: ψ _{ m,n } and its dual, ${\tilde{\psi}}_{m,n}$
. One set is used to decompose the signal and the other to reconstruct it. For example, we can decompose the signal using ψ _{ m,n } as follows:Figure 3.25 Biorthogonal spline wavelets. (a) Biorthogonal (1,5) spline wavelets: ϕ(t), ψ(t) (left) and ϕ ˜ ( t ) , ψ ˜ ( t ) (right), (b) Biorthogonal (2,4) spline wavelets: ϕ(t), ψ(t) (left) and ϕ ˜ ( t ) , ψ ˜ ( t ) (right), (c) Biorthogonal (3,7) spline wavelets: ϕ(t), ψ(t) (left) and ϕ ˜ ( t ) , ψ ˜ ( t ) (right). (Scaling functions shown dotted.)
In many applications the data set is in the form of a twodimensional array, for example the heights of a machined surface or natural topography, or perhaps the intensities of an array of pixels making up an image. We may want to perform a wavelet decomposition of such arrays either to compress the data or to carry out a waveletbased parametric characterization of the data. To perform a discrete wavelet decomposition of a twodimensional array we must use twodimensional discrete wavelet transforms. We can simply generate these from tensor products of their onedimensional orthonormal counterparts. The most common (and simplest) arrangement is to use the same scaling in the horizontal and vertical directions. These give square transforms. (Other forms are possible, e.g. rectangular transforms where the horizontal and vertical scaling vary independently and also twodimensional transforms which are not simply tensor products. These are outside the scope of this text.) The twodimensional Haar scaling and wavelet functions are: twodimensional scaling function
The multiresolution decomposition of the twodimensional coefficient matrices can be expressed as
Figure 3.26 The twodimensional discrete Haar wavelet at scale index 1. (a) Scaling function, (b) Horizontal wavelet, (c) Vertical wavelet, (d) Diagonal wavelet.
Figure 3.27 Schematic diagram of the matrix manipulation required to perform wavelet decomposition on a twodimensional grid.
Figure 3.28 A twodimensional signal containing a single spike.
Let us look at a very simple example, a matrix with a single nonzero component
Figure 3.29 Schematic diagram of the matrix manipulation required to perform the Haar wavelet decomposition of the spike signal. Components of the discrete scaling and wavelet matrices are shown in bold, components of the original discrete signal are shown shaded.
Figure 3.30 An 8 × 8 ‘step’ array together with its associated Haar decompositions. (a) Original step, (b) Iteration 1. (c) Iteration 2. (d) Iteration 3. Note that black dots have been added to the nonzero coefficients in plots (c) and (d) to aid clarity.
Let us look at another simple example, the step shown in figure 3.30(a) and given in matrix form as
first decomposition W ^{(1)}
We can now use the detail coefficients to reconstruct the array at different scales. This is shown in figure 3.31 and is performed using the detail coefficients at each scale of interest. In the same way as we did for the onedimensional signals earlier in section 3.4, discrete detail arrays ${D}_{m}^{h},\text{\hspace{0.17em}}{D}_{m}^{d}$
and ${D}_{m}^{v}$ can be reconstructed at the scale of the input array using each of the detail coefficients, ${T}_{m}^{h},\text{\hspace{0.17em}}{T}_{m}^{d}$ and ${T}_{m}^{v}$ . A combined detail can be found at scale index m, given byFigure 3.31 Schematic diagram of the matrix manipulation required to derive the detail matrices at scale index m = 1. (a) The three detail coefficient submatrices at scale index 1 for the step, (b) Computing the horizontal discrete detail from the detail coefficients at scale index 1.
Figure 3.32 Schematic diagram of the matrix manipulation required to compute the horizontal detail at scale index m = 2.
Figure 3.33 The details of the step array. (a) Original step array X 0. (b) D 1. (c) D 2. (d) D 3. (e) X 3. (f) Sum of (b), (c), (d) and (e) giving original signal (a).
Figure 3.34 An image and its first two decomposition matrices, (a) Original array S 0. (b) Transform array V (1) containing submatrices S 1, T 1 v , T 1 h and T 1 d . (c) Transform array V (2) containing submatrices S 2, T 1 v , T 1 h , T 1 d , T 2 v , T 2 h and T 2 d .
Figure 3.35 Haar decomposition of a surface data set. (a) Original data set 128 × 128 array of surface heights taken from a river bed. (b) Scale index m = 1 discrete detail D 1. (c) Scale index m = 2 discrete detail D 2. (d) Scale index m = 3 discrete detail D 3. (e) Scale index m = 4 discrete detail D 4. (f) Scale index m = 5 discrete detail D 5. (g) Scale index m = 6 discrete detail D 6. (h) Scale index m = 7 discrete detail D 7. (i) Sum of first three discrete details D 1 + D 2 + D 3 (= (b) + (c) + (d)). (j) Scale index m = 3 discrete approximation X 3 = X 0 − (D 1 + D 2 + D3 ) (= (a) − (b) − (c) − (d)). (Greyscale used in all images: maximum = white, minimum = black.)
Figures 3.34 and 3.35 show examples of much larger data sets, both using the Haar wavelet transform. Figure 3.34 shows an image (‘Lena’) together with its first two decomposition matrices, where the approximation and detail coefficient submatrices can be clearly seen. Figure 3.35 shows an example of the Haar decomposition of a more irregular array. The 128 × 128 array shown in figure 3.35(a) contains the heights of a measured rough surface. The details of this array, at scale indices m = 1–7, are shown in figures 3.35(b)–(h). Figure 3.35(i) contains the summation of the first three details and figure 3.35(j) shows the resulting approximation at scale index m = 3 when these details are subtracted from the original array. The blocky nature of the Haar decomposition is noticeable from the plots. The twodimensional Haar wavelet is very simple in its form and, as with their onedimensional counterparts, there are more complex forms of Daubechies wavelets and, of course, many other wavelet families to choose from. Some examples of these wavelets are given in figure 3.36. Using these wavelets will result in overlap of the wavelet at the array edge and therefore would require the use of wraparound or other methods to deal with the data edges.
Figure 3.36 Examples of twodimensional orthonormal wavelets: Daubechies, Symmlets and Coiflets. (a) D2 Haar wavelet, (b) D4. (c) D8. (d) D12. (e) D12, another perspective, (f) D20. (g) Symmlet S8. (h) Coiflet C6. (Wavelets shown viewed at various angles and elevations.)
As we saw in chapter 2 (figure 2.29) the resolution of the wavelet transform is not uniform in the time–frequency plane. The Heisenberg boxes expand in frequency and contract in time as fluctuations at smaller and smaller scales are explored. The short term Fourier transform (STFT), on the other hand, covers the time–frequency plane with tiles of constant aspect ratio (figure 2.30). We also looked briefly at matching pursuits which offer another way of extracting timefrequency information. In this section we will consider another method which can adapt to the signal and hence allows for more flexibility in the partitioning of the timefrequency plane: the wavelet packet transform.
Wavelet packet (WP) transforms are a generalization of the discrete wavelet transform. Wavelet packets involve particular linear combinations of wavelets and the wavelet packet decomposition of a signal is performed in a manner similar to the multiresolution algorithm given earlier for the discrete wavelet transform. The difference is that, in the WP signal decomposition, both the approximation and detailed coefficients are further decomposed at each level. This leads to the decomposition tree structure depicted at the top of figure 3.37. Compare this with the schematic of the wavelet decomposition given in figure 3.6. At each stage in the wavelet algorithm, the detailed coefficients are simply transferred down, unchanged, to the next level. However, in the wavelet packet algorithm, all the coefficients at each stage are further decomposed. In this way, we end up with an array of wavelet packet coefficients with M levels each with N coefficients. A total of N coefficients from this M × N array can then be selected to represent the signal. The standard wavelet transform decomposition coefficients are contained within the WP array, shown by the bold boxes in figure 3.37. A new nomenclature is employed in the figure to indicate the operations that have been performed on each set of coefficients. S produces the approximation components of the previous set of coefficients by lowpass filtering, and T the detail components through highpass filtering. We simply add the letter S or T to the lefthand end of the coefficient name to indicate the most recent filtering procedure. SSTS _{ n }, for example, corresponds to the original signal lowpass filtered, then highpass filtered then passed twice through the lowpass filter. Notice also that the subscript contains only the location index n. The scaling index m is omitted as it is obviously equal to the number of letters S and T in the coefficient name. As with the original wavelet transform, the number of coefficients at each scale depends upon the scale, with one coefficient in each coefficient group at the largest scale M and N/2 coefficients at the smallest scale m = 1. Hence, the coefficient index spans n = 0,1, …,2^{ M− m }− 1.
Figure 3.37 Schematic diagram of wavelet packet decomposition.
At each stage in the decomposition, the wavelet packet algorithm partitions the timefrequency plane into rectangles of constant aspect ratio. These become wider (in time) and narrower (in frequency) as the decomposition proceeds. This is shown schematically at the bottom of figure 3.37 for each scale. A variety of tilings of the timefrequency plane is possible using the wavelet packet coefficients. For example, we could keep all the coefficients at a level and discard all the others. This would tile the plane in boxes of constant shape, just like one of those shown at the bottom of figure 3.37. Other tilings are possible, some examples of these being shown in figure 3.38. The standard wavelet transform is just one of all the possible tiling patterns. Figure 3.39 contains the wavelet packet coefficient selections corresponding to the tiling patterns of figure 3.38.
Figure 3.38 Schematic diagrams of allowable wavelet packet tiling of the timefrequency plane. The righthand tiling is that used in the wavelet transform algorithm and corresponds to the components contained in the bold boxes shown in the previous figure.
Figure 3.39 The choice of wavelet packet components leading to the tilings shown in figure 3.38.
The optimal or ‘best’ coefficient selection (hence tiling arrangement) is chosen to represent the signal based on some predefined criterion. This criterion is normally based on an information cost function which aims to retain as much information in a few coefficients as possible. The most common measure of information used is the Shannon entropy measure. This is defined for a discrete distribution p _{ i }, i = 0,1, … ,N − 1, as
The set of TV wavelet packet coefficients which contain the least entropy are selected to represent the signal. That is, we want the signal information to be concentrated within as few coefficients as possible. To find these coefficients the WP array, such as the one we saw in figure 3.37, is inspected from the bottom upwards. At each scale, each pair of partitioned coefficient sets (the ‘children’) are compared with those from which they were derived (their ‘parent’). If the combined children’s coefficients have a smaller entropy than those of their parent then they are retained. If not, the parent’s coefficients are retained. When the children are selected their entropy value is assigned to their parent in order that subsequent entropy comparisons can be made further up the tree. This is shown schematically in figure 3.40. Once the whole WP array has been inspected in this way, we get an optimal tiling of the timefrequency plane (with respect to the localization of coefficient energies). This tiling provides the best basis for the signal decomposition.
Figure 3.40 Wavelet packet coefficient selection.
Figure 3.41 Wavelet packet decomposition of a simple signal, (a) Signal (top) with wavelet packet decomposition (below). The coefficient values are plotted as heights. The scale indices, m = 1 to 6, are given down the lefthand side of the plot. Trace WP contains the best selection of wavelet packets and trace WT contains the wavelet transform decomposition for comparison, (b) The timefrequency tiling associated with each wavelet packet decomposition in (a). Larger coefficient values are plotted darker. The timefrequency tiling associated with the best wavelet packet decomposition (left) and wavelet decomposition (right), (d) The 16 largest coefficients from (c): wavelet packet decomposition (left) and wavelet decomposition (right), (e) The reconstruction of the signal using only the 16 largest coefficients given in (d): wavelet packet (left) and wavelet (right).
Figure 3.41 illustrates the wavelet packet method on a simple discrete signal composed of a sampled sinusoid plus a spike. The signal is 64 data points in length. A Haar wavelet is used to decompose the signal. Figure 3.41(a) shows the wavelet packet coefficients below the original signal for each stage in the WP algorithm. The coefficients are displayed as histograms. The bottom two traces contain the coefficients corresponding to the best wavelet packet basis and the ‘traditional’ discrete wavelet basis respectively. The WP tiling of the coefficient energies in the timefrequency plane for each scale is given in figure 3.41(b). The larger coefficient energies are shaded darker in the plot. In figure 3.41(c) the optimal WP tiling is compared with the wavelet transform tiling of the time–frequency plane. The plots in figure 3.41(d) outline the 16 largest coefficients in both time–frequency planes of figure 3.41(c). These are used to reconstruct the signals shown in figure 3.41(e). The 16 largest wavelet packet coefficients contain 98.6% of the signal energy, whereas the 16 wavelet transform coefficients contain 96.5% of the signal energy. The wavelet packet reconstruction using the selected coefficients is visibly smoother than the reconstruction using the traditional wavelet transform coefficients. Figure 3.42 contains the same signal as figure 3.41. However, this time the WP decomposition is performed using a Daubechies D20 wavelet (refer back to figure 3.15). Using this wavelet results in a different tiling of the timefrequency plane for the WP method (compare the lefthand plots of figures 3.42(b) and 3.41(c)). Again, the 16 largest coefficients are used in the signal reconstruction. The oscillatory parts of both reconstructions shown in figure 3.42(d) are visibly smoother than their Haar counterparts in the previous figure (figure 3.41(e)). We expect this as the D20 is more smoothly oscillatory than the Haar. Note, however, comparing figures 3.42(c) and 3.41(d), we see that the signal spike leads to a single high frequency tile for both Haar decompositions but respectively five and four high frequency tiles for the D20 wavelet. The more compact support of the Haar wavelet has allowed for a better localization of the signal spike, but it does makes it less able than the D20 to represent the smooth oscillations in the signal. The energies of the reconstructed signals for the D20 decompositions using only the largest 16 coefficients are 99.8% (WP) and 99.7% (WT), indicating the data compression possibilities of the techniques.
Figure 3.42 Wavelet packet decomposition of a simple signal using a Daubechies D20 wavelet, (a) The wavelet packet decomposition of the signal shown at the top of figure 3.31(a). (b) The timefrequency tiling associated with the best wavelet packet decomposition (left) and wavelet decomposition (right). The 16 largest coefficients from figure 3.31(c): wavelet packet decomposition (left) and wavelet decomposition (right), (d) The reconstruction of the signal using only the 16 largest coefficients given in figure 3.31(d): wavelet packet (left) and wavelet (right).
(You may find it helpful to jot down your understanding of each of them.)
discrete wavelet transform 
wraparound 
wavelet/detail coefficients 
scale index 
wavelet frames 
sequential indexing 
tight frame 
level indexing 
orthonormal basis 
hard thresholding 
dyadic grid 
soft thresholding 
scaling function 
scale thresholding 
approximation coefficients 
Daubechies wavelets 
inverse discrete wavelet transform 
subsampled 
discrete approximation 
upsampled 
continuous approximation 
Symmlets 
signal detail 
Coiflets 
multiresolution 
translation invariance 
scaling equation 
redundant/stationary/translation 
scaling coefficients 
invariant/nondecimated/maximal 
compact support 
overlap/discrete wavelet transform 
fast wavelet transform 
biorthogonal wavelets 
decomposition algorithm 
wavelet packet transforms 
reconstruction algorithm 
Shannon entropy measure 
Papers describing the discrete wavelet transform at an introductory level include those by Kim and Aggarwal (2000), Depczynski et al (1997), Asamoah (1999) and Graps (1995). The paper by Williams and Armatunga (1994) contains a good explanation of the derivation of the Daubechies D4 scaling coefficients and multiresolution analysis. They present a clear account of the wavelet filtering of a signal using matrices. Other useful introductory papers are those by Sarkar et al (1998), Meneveau (1991a), Jawerth and Sweldens (1994), Newland (1994ac), Strang (1989, 1993) and the original paper on multiresolution analysis is by Mallat (1989). In his book, Newland (1993a) gives a more detailed account of the conditions that must be satisfied for discrete orthonormal wavelets. Diou et al (1999) and Toubin et al (1999) provide some useful information on projecting the details and approximations determined at one scale at another scale. Chui (1997) provides a little more mathematical detail in a readable account of discrete wavelet transforms and their role in signal analysis. Strang and Nguyens’ (1996) text concentrates on the connection between wavelets and filter banks used in digital signal processing. Daubechies’ (1992) book provides a good grounding in all aspects of wavelet transforms, containing among other things useful text on wavelet frames and wavelet symmetry. It also contains more information on the construction of Symmlets, Coiflets and biorthogonal wavelets. This chapter has concentrated on two compact, discrete orthonormal wavelets: the Haar wavelet and the Daubechies D4 wavelet. There are many other wavelets we have not considered, most notably the Shannon, Meyer, and BattleLemarié wavelets. More information on these can be found in, for example, Chui (1997), Mallat (1998), Daubechies (1992) and Blatter (1998). More examples of the multiresolution analysis of simple discrete signals can be found in the book by Walker (1999). Books providing a more mathematical account of discrete wavelet transforms are those by Hernandez and Weiss (1996), Benedetto and Frazier (1994), Chui (1992a,b), Walter (1994) and Percival and Waiden (2000). There is also a lot of useful information on the web. The appendix contains a list of useful websites from which to begin a search.
The book by Starck et al. (1998) contains a lot of good introductory material on twodimensional discrete wavelet transforms, covering many of their practical applications including remote sensing, image compression, multiscale vision models, object detection and multiscale clustering. A nice illustrative paper on the application of twodimensional wavelet transforms is that by Jiang et al (1999), who investigate the threedimensional surface of orthopaedic joint prostheses. We considered only square twodimensional transforms in this chapter, where the horizontal and vertical scalings were kept the same. If they are allowed to vary independently, we get rectangular transforms. Also possible are twodimensional transforms which are not simply tensor products but wavelets constructed ‘intrinsically’ for higher dimensions (Jawerth and Sweldens, 1994). There are many applications of wavelet packets cited in the rest of this book. Quinquis (1998) provides a nice introduction to wavelet packets, while the paper by HessNielsen and Wickerhauser (1996) and Wickerhauser’s (1994) book provide a more indepth account.
We mentioned briefly the redundant wavelet transform, a variant of the discrete wavelet transform which produces N coefficients at each level and is translation invariant. This has been found useful in statistical applications: see for example Coifman and Donoho (1995), Lang et al (1996) and Nason and Silverman (1995). A concise summary of the various wavelet thresholding methods developed over recent years is to be found in the paper by Abramovich et al (2000) together with a comprehensive list of references on the subject. In addition, the book by Ogden (1997) provides a more detailed overview together with numerous examples. The universal threshold method is detailed in the paper by Donoho and Johnstone (1994) together with another global method, the minimax thresholding method. Donoho and Johnstone (1995) also developed a scheme which uses the wavelet coefficients at each scale to choose a scaledependent threshold. This method is known as the SURE or SureShrink method after ‘Stein’s Unbiased Risk Estimate’ on which it is based. They suggest a HYBRID method to be used in practice for decompositions where, at some scales, the wavelet representation is sparse, i.e. a large number of the coefficients are zero (or very near zero). The HYBRID method uses the SURE method, but defaults to the universal threshold for scales where the data representation is sparse due to the noise overwhelming the little signal contributed from the nonzero coefficients. A number of other methods have been proposed: including those based on cross validation, e.g. Nason (1996), Jansen and Bultheel (1999); Bayesian approaches, e.g. Chipman et al (1997), Vidakovich (1998), Abramovich et al (1998); and the Lorentz curve (Katul and Vidakovich, 1996 , 1998). Translation invariant denoising is considered by Coifman and Donoho (1995) who compute the estimated signal using all possible discrete shifts of the signal. See also the papers by Donoho and Johnstone (1995, 1998), Donoho (1995), Johnstone and Silverman (1997), Hall and Patil (1996), Efromovich (1999), Downie and Silverman (1998), Moulin (1994), Krim and Schick (1999) and Krim et al (1999). We will come across the use of thresholding extensively in the subsequent application chapters of this book. In particular, thresholding is revisited in chapter 4, section 4.2.3, where another thresholding method, the Lorentz threshold, is explained.
Often we want to compress the signal into as few detail coefficients as possible without losing too much information for data compression applications such as speech and audio transmission. We can do this by finding a wavelet which decomposes the signal into a few large amplitude detail coefficients, which we retain, and many coefficients close to zero, which we discard. How many coefficients we set to zero affects the subsequent quality of the reconstruction. In practice for larger data sets and more suitable wavelets we can get good compression ratios without the loss of significant detail, where the term ‘significant’ is determined by the user. Here we have not considered compression schemes, including quantization and encoding, but rather the reader is referred in the first instance to the simplified explanation of Walker (1999) and the more detailed treatments in Chui (1997) and Mallat (1998).
Note that all examples in this chapter have used wraparound to deal with the signal edges, i.e. the part of the wavelet spilling over the end of the signal is placed back at the start. This is the simplest and one of the most common treatments of edge effects for a finite length signal, and it results in exactly the same number of decomposition coefficients as original signal components. However, it is not always the best for the application. Other methods were shown in chapter 2, figure 2.35. In addition, there are families of boundary wavelets which are intrinsically defined on finite length signals (see for example the discussions in Jawerth and Sweldens (1994) and Chui (1997)). Take particular care when using offtheshelf wavelet software packages as they may employ other boundary methods as the default setting, for example zero padding, which results in slightly more than N coefficients resulting from the full decomposition. In addition, we have employed one version of the scaling coefficient reordering b _{ k } (equation (3.25)). Again this is very much software dependent and you will find alternative ordering employed by various software packages.