# The discrete wavelet transform

Authored by: Paul S. Addison

# The Illustrated Wavelet Transform Handbook

Print publication date:  July  2002
Online publication date:  July  2002

Print ISBN: 9780750306928
eBook ISBN: 9781420033397

10.1201/9781420033397.ch3

#### Abstract

In this chapter we consider the discrete wavelet transform (DWT). We will see that when certain criteria are met it is possible to completely reconstruct the original signal using infinite summations of discrete wavelet coefficients rather than continuous integrals (as required for the CWT). This leads to a fast wavelet transform for the rapid computation of the discrete wavelet transform and its inverse. We will then see how to perform a discrete wavelet transform on discrete input signals of finite length: the kind of signal we might be presented with in practice. We will also consider briefly biorthogonal wavelets, which come in pairs, and some space is devoted to two-dimensional discrete wavelet transforms. The chapter ends with wavelet packets: a generalization of the discrete wavelet transform which allows for adaptive partitioning of the time-frequency plane.

#### 3.1  Introduction

In this chapter we consider the discrete wavelet transform (DWT). We will see that when certain criteria are met it is possible to completely reconstruct the original signal using infinite summations of discrete wavelet coefficients rather than continuous integrals (as required for the CWT). This leads to a fast wavelet transform for the rapid computation of the discrete wavelet transform and its inverse. We will then see how to perform a discrete wavelet transform on discrete input signals of finite length: the kind of signal we might be presented with in practice. We will also consider briefly biorthogonal wavelets, which come in pairs, and some space is devoted to two-dimensional discrete wavelet transforms. The chapter ends with wavelet packets: a generalization of the discrete wavelet transform which allows for adaptive partitioning of the time-frequency plane.

#### 3.2.1  Frames

In chapter 2, the wavelet function was defined at scale a and location b as

3.1 $ψ a , b ( t ) = 1 a ψ ( t − b a )$
In this section the wavelet transform of a continuous time signal, x(t), is considered where discrete values of the dilation and translation parameters, a and b, are used.

A natural way to sample the parameters a and b is to use a logarithmic discretization of the a scale and link this, in turn, to the size of steps taken between b locations. To link b to a we move in discrete steps to each location b which are proportional to the a scale. This kind of discretization of the wavelet has the form

3.2 $ψ m , n ( t ) = 1 a 0 m ψ ( t − n b 0 a 0 m a 0 m )$
where the integers m and n control the wavelet dilation and translation respectively; a 0 is a specified fixed dilation step parameter set at a value greater than 1, and b 0 is the location parameter which must be greater than zero. The control parameters m and n are contained in the set of all integers, both positive and negative. It can be seen from the above equation that the size of the translation steps, $Δ b = b 0 a 0 m$ , is directly proportional to the wavelet scale, $a 0 m$ .

The wavelet transform of a continuous signal, x(t), using discrete wavelets of the form of equation (3.2) is then

3.3a $T m , n = ∫ − ∞ ∞ x ( t ) 1 a 0 m / 2 ψ ( a 0 − m t − n b 0 ) d t$
which can also be expressed as the inner product
3.3b $T m , n = ⟨ x , ψ m , n ⟩$
where T m,n are the discrete wavelet transform values given on a scale-location grid of index m, n. For the discrete wavelet transform, the values T m,n are known as wavelet coefficients or detail coefficients. These two terms are used interchangeably in this chapter as they are in the general wavelet literature. To determine how ‘good’ the representation of the signal is in wavelet space using this decomposition, we can resort to the theory of wavelet frames which provides a general framework for studying the properties of discrete wavelets. Wavelet frames are constructed by discretely sampling the time and scale parameters of a continuous wavelet transform as we have done above. The family of wavelet functions that constitute a frame are such that the energy of the resulting wavelet coefficients lies within a certain bounded range of the energy of the original signal, i.e.
3.4 $A E ≤ ∑ m = − ∞ ∞ ∑ n = − ∞ ∞ | T m , n | 2 ≤ B E$
where T m,n are the discrete wavelet coefficients, A and B are the frame bounds, and E is the energy of the original signal given by equation (2.19) in chapter 2: $E = ∫ − ∞ ∞ | x ( t ) | 2 d t = ‖ x ( t ) ‖ 2$ , where our signal, x(t), is defined to have finite energy. The values of the frame bounds A and B depend upon both the parameters a 0 and b 0 chosen for the analysis and the wavelet function used. (For details of how to determine A and B see Daubechies (1992).) If A = B the frame is known as ‘tight’. Such tight frames have a simple reconstruction formula given by the infinite series
3.5 $x ( t ) = 1 A ∑ m = − ∞ ∞ ∑ n = − ∞ ∞ T m , n ψ m , n ( t )$
A tight frame with A (=B) > 1 is redundant, with A being a measure of the redundancy. However, when A = B = 1 the wavelet family defined by the frame forms an orthonormal basis. If A is not equal to B a reconstruction formula can still be written:
3.6 $x ′ ( t ) = 2 A + B ∑ m = − ∞ ∞ ∑ n = − ∞ ∞ T m , n ψ m , n ( t )$

Figure 3.1   The nearly tight Mexican hat wavelet frame with a 0 = 21/2 and b 0 = 0.5. Three consecutive locations of the Mexican hat wavelet for scale indices m (top) and m + 1 (lower) and location indices n, n + 1, n + 2. That is, a = 2 m and a = 2 m+1 respectively, and three consecutive b locations separated by a/2.

where x′(t) is the reconstruction which differs from the original signal x(t) by an error which depends on the values of the frame bounds. The error becomes acceptably small for practical purposes when the ratio B/A is near unity. It has been shown, for the case of the Mexican hat wavelet, that if we use a 0 = 21/v , where v ≥ 2 and b 0 ≤ 0.5, the frame is nearly tight or ‘snug’ and for practical purposes it may be considered tight. (This fractional discretization, v, of the power-of-two scale is known as a voice.) For example, setting a 0 = 21/2 and b 0 = 0.5 for the Mexican hat leads to A = 13.639 and B = 13.673 and the ratio B/A equals 1.002. The closer this ratio is to unity, the tighter the frame. Thus discretizing a Mexican hat wavelet transform using these scale and location parameters results in a highly redundant representation of the signal but with very little difference between x(t) and x′(t). The nearly tight Mexican hat wavelet frame with these parameters (a 0 = 21/2 and b 0 = 0.5) is shown in figure 3.1 for two consecutive scales m and m + 1 and at three consecutive locations, n = 0, 1 and 2.

#### 3.2.2  Dyadic grid scaling and orthonormal wavelet transforms

Common choices for discrete wavelet parameters a 0 and b 0 are 2 and 1 respectively. This power-of-two logarithmic scaling of both the dilation and translation steps is known as the dyadic grid arrangement. The dyadic grid is perhaps the simplest and most efficient discretization for practical purposes and lends itself to the construction of an orthonormal wavelet basis. Substituting a 0 = 2 and b 0 = 1 into equation (3.2), we see that the dyadic grid wavelet can be written as

3.7a $ψ m , n ( t ) = 1 2 m ψ ( t − n 2 m 2 m )$
or, more compactly, as
3.7b $ψ m , n ( t ) = 2 − m / 2 ψ ( 2 − m t − n )$
Note that this has the same notation as the general discrete wavelet given by equation (3.2). From here on in this chapter we will use ψ m,n (t) to denote only dyadic grid scaling with a 0 = 2 and b 0 = 1.

Discrete dyadic grid wavelets are commonly chosen to be orthonormal. These wavelets are both orthogonal to each other and normalized to have unit energy. This is expressed as

3.8 $∫ − ∞ ∞ ψ m , n ( t ) ψ m ′ , n ′ ( t ) d t = { 1 if m = m ′ and n = n ′ 0 otherwise$
That is to say, the product of each wavelet with all others in the same dyadic system (i.e. those which are translated and/or dilated versions of each other) are zero. This means that the information stored in a wavelet coefficient T m,n is not repeated elsewhere and allows for the complete regeneration of the original signal without redundancy. In addition to being orthogonal, orthonormal wavelets are normalized to have unit energy. This can be seen from equation (3.8) as, when m = m′ and n = n′, the integral gives the energy of the wavelet function equal to unity. Ortho-normal wavelets have frame bounds A = B = 1 and the corresponding wavelet family is an orthonormal basis. (A basis is a set of vectors, a combination of which can completely define the signal, x(t). An orthonormal basis has component vectors which, in addition to being able to completely define the signal, are perpendicular to each other.) The discrete dyadic grid wavelet lends itself to a fast computer algorithm, as we shall see later.

Using the dyadic grid wavelet of equation (3.7a), the discrete wavelet transform (DWT) can be written as:

3.9 $T m , n = ∫ − ∞ ∞ x ( t ) ψ m , n ( t ) d t$
By choosing an orthonormal wavelet basis, ψ m,n (t), we can reconstruct the original signal in terms of the wavelet coefficients, T m,n , using the inverse discrete wavelet transform as follows:
3.10a $x ( t ) = ∑ m = − ∞ ∞ ∑ n = − ∞ ∞ T m , n ψ m , n ( t )$
requiring the summation over all integers m and n. This is actually equation (3.5), with A = 1 due to the orthonormality of the chosen wavelet. Equation (3.10a) is often seen written in terms of the inner product
3.10b $x ( t ) = ∑ m = − ∞ ∞ ∑ n = − ∞ ∞ ⟨ x , ψ m , n ⟩ ψ m , n ( t )$
where the combined decomposition and reconstruction processes are clearly seen: going from x(t) to T m,n via the inner product ⟨x, ψ m,n ⟩ then back to x(t) via the infinite summations. In addition, as A = B and A = 1, we can see from equation (3.4) that the energy of the signal may be expressed as
3.11 $∫ − ∞ ∞ | x ( t ) | 2 d t = ∑ m = − ∞ ∞ ∑ n = − ∞ ∞ | T m , n | 2$
Before continuing it is important to make clear the distinct difference between the DWT and the discretized approximations of the CWT covered in chapter 2. The discretization of the continuous wavelet transform, required for its practical implementation, involves a discrete approximation of the transform integral (i.e. a summation) computed on a discrete grid of a scales and b locations. The inverse continuous wavelet transform is also computed as a discrete approximation. How close an approximation to the original signal is recovered depends mainly on the resolution of the discretization used and, with care, usually a very good approximation can be recovered. On the other hand, for the DWT, as defined in equation (3.9), the transform integral remains continuous but is determined only on a discretized grid of a scales and b locations. We can then sum the DWT coefficients (equation (3.10a)) to get the original signal back exactly. We will see later in this chapter how, given an initial discrete input signal, which we treat as an initial approximation to the underlying continuous signal, we can compute the wavelet transform and inverse transform discretely, quickly and without loss of signal information.

#### 3.2.3  The scaling function and the multiresolution representation

Orthonormal dyadic discrete wavelets are associated with scaling functions and their dilation equations. The scaling function is associated with the smoothing of the signal and has the same form as the wavelet, given by

3.12 $ϕ m , n ( t ) = 2 − m / 2 ϕ ( 2 − m t − n )$
They have the property
3.13 $∫ − ∞ ∞ ϕ 0 , 0 ( t ) d t = 1$
where ϕ 0,0(t) = ϕ(t) is sometimes referred to as the father scaling function or father wavelet (cf. mother wavelet). (Remember from chapter 2 that the integral of a wavelet function is zero.) The scaling function is orthogonal to translations of itself, but not to dilations of itself. The scaling function can be convolved with the signal to produce approximation coefficients as follows:
3.14 $S m , n = ∫ − ∞ ∞ x ( t ) ϕ m , n ( t ) d t$
From the last three equations, we can see that the approximation coefficients are simply weighted averages of the continuous signal factored by 2 m/2. The approximation coefficients at a specific scale m are collectively known as the discrete approximation of the signal at that scale. A continuous approximation of the signal at scale m can be generated by summing a sequence of scaling functions at this scale factored by the approximation coefficients as follows:
3.15 $x m ( t ) = ∑ n = − ∞ ∞ S m , n ϕ m , n ( t )$
where x m (t) is a smooth, scaling-function-dependent, version of the signal x(t) at scale index m. This continuous approximation approaches x(t) at small scales, i.e. as m → − ∞. Figure 3.2(a) shows a simple scaling function, a block pulse, at scale index 0 and location index 0: ϕ 0,0(t) = ϕ(t)—the father function—together with two of its corresponding dilations at that location. It is easy to see that the convolution of the block pulse with a signal (equation (3.14)) results in a local weighted averaging of the signal over the nonzero portion of the pulse. Figure 3.2(b) shows one period of a sine wave, x(t) contained within a window. Figure 3.2(c) shows various approximations of the sine wave generated using equations (3.14) and (3.15) with the scaling function set to a range of widths, 20 to 27. These widths are indicated by the vertical lines and arrows in each plot. Equation (3.14) computes the approximation coefficients S m,n which are, as mentioned above for this simple block scaling function, the weighted average of the signal over the pulse width. The approximation coefficients are then used in equation (3.15) to produce an approximation of the signal which is simply a sequence of scaling functions placed side by side, each factored by their corresponding approximation coefficient. This is obvious from the blocky nature of the signal approximations. The approximation at the scale of the window (=27) is simply the average over the whole sine wave which is zero. As the scale decreases, the approximation is seen to approach the original waveform. This simple block pulse scaling function used in this example here is associated with the Haar wavelet, which we will come to shortly.

We can represent a signal x(t) using a combined series expansion using both the approximation coefficients and the wavelet (detail) coefficients as follows:

3.16 $x ( t ) = ∑ n = − ∞ ∞ S m 0 n ϕ m 0 n ( t ) + ∑ m = − ∞ m 0 ∑ n = − ∞ ∞ T m , n ψ m , n ( t )$
We can see from this equation that the original continuous signal is expressed as a combination of an approximation of itself, at arbitrary scale index m 0, added to a succession of signal details from scales m 0 down to negative infinity. The signal detail at scale m is defined as
3.17 $d m ( t ) = ∑ n = − ∞ ∞ T m , n ψ m , n ( t )$
hence we can write equation (3.16) as
3.18 $x ( t ) = x m 0 ( t ) + ∑ m = − ∞ m 0 d m ( t )$

Figure 3.2   Smooth approximation of a sine wave using a block pulse scaling function, (a) Simple block scaling function shown at scale 1 (scale index m = 0) and location n = 0, i.e. ϕ 0,0(t) (shown bold), together with its dilations at that location, (b) Sine wave of one period, (c) Selected smooth approximations, x m (t), of the sine wave at increasing scales. The width of one of the scaling functions φ m,n (t) at scale index m is depicted in each figure by the arrows. Note that the example has been set up so that one period of the sinusoid corresponds to scale 27. Note also that at small scales x m (t) approaches x(t).

From this equation it is easy to show that
3.19 $x m − 1 ( t ) = x m ( t ) + d m ( t )$
which tells us that if we add the signal detail at an arbitrary scale (index m) to the approximation at that scale we get the signal approximation at an increased resolution (i.e. at a smaller scale, index m − 1). This is called a multiresolution representation.

#### 3.2.4  The scaling equation, scaling coefficients and associated wavelet equation

The scaling equation (or dilation equation) describes the scaling function ϕ(t) in terms of contracted and shifted versions of itself as follows:

3.20 $ϕ ( t ) = ∑ k c k ϕ ( 2 t − k )$
where ϕ(2tk) is a contracted version of ϕ(t) shifted along the time axis by an integer step k and factored by an associated scaling coefficient, c k . (Take note of the similar but different terminology—scaling equation and scaling function.) Equation (3.20) basically tells us that we can build a scaling function at one scale from a number of scaling equations at the previous scale. The solution to this two-scale difference equation gives the scaling function ϕ(t). For the sake of simplicity in the rest of the chapter we concern ourselves only with wavelets of compact support. These have sequences of nonzero scaling coefficients which are of finite length. Integrating both sides of the above equation, we can show that the scaling coefficients must satisfy the following constraint:
3.21 $∑ k c k = 2$
In addition, in order to create an orthogonal system we require that
3.22 $∑ k c k c k + 2 k ′ = { 2 if k ′ = 0 0 otherwise$
This also tells us that the sum of the squares of the scaling coefficients is equal to 2. The same coefficients are used in reverse with alternate signs to produce the differencing of the associated wavelet equation, i.e.
3.23 $ψ ( t ) = ∑ k ( − 1 ) k c 1 − k ϕ ( 2 t − k )$

This construction ensures that the wavelets and their corresponding scaling functions are orthogonal. This wavelet equation is commonly seen in practice. In this chapter, however, we will consider only wavelets of compact support which have a finite number of scaling coefficients, N k . For this case we can define the wavelet function as

3.24 $ψ ( t ) = ∑ k ( − 1 ) k c N k − 1 − k ϕ ( 2 t − k )$
This ordering of scaling coefficients used in the wavelet equation allows for our wavelets and their corresponding scaling equations to have support over the same interval [0, N k − 1]. (The ordering of equation (3.23) leads to wavelet and scaling functions displaced from each other, except for the Haar wavelet where N k = 2.) Note that, if the number of scaling coefficients is not finite, we cannot use this reordering and must revert back to an ordering of the type given by equation (3.23). We will stick to the ordering specified by equation (3.24) in this text.

Often the reconfigured coefficients used for the wavelet function are written more compactly as

3.25 $b k = ( − 1 ) k C N k − 1 − k$
where the sum of all the coefficients b k is zero. Using this reordering of the coefficients, equation (3.24) can be written as
3.26 $ψ ( t ) = ∑ k = 0 N k − 1 b k ϕ ( 2 t − k )$
From equations (3.12) and (3.20) and examining the wavelet at scale index m + 1, we can see that for arbitrary integer values of m the following is true:
3.27a $2 − ( m + 1 ) / 2 ϕ ( t 2 m + 1 − n ) = 2 − m / 2 2 − 1 / 2 ∑ k c k ϕ ( 2 t 2 × 2 m − 2 n − k )$
which may be written more compactly as
3.27b $ϕ m + 1 , n ( t ) = 1 2 ∑ k c k ϕ m , 2 n + k ( t )$
That is, the scaling function at an arbitrary scale is composed of a sequence of shifted scaling functions at the next smaller scale each factored by their respective scaling coefficients. Similarly, for the wavelet function we obtain
3.28 $ψ m + 1 , n ( t ) = 1 2 ∑ k b k ϕ m , 2 n + k ( t )$

#### 3.2.5  The Haar wavelet

The Haar wavelet is the simplest example of an orthonormal wavelet. Its scaling equation contains only two nonzero scaling coefficients and is given by

3.29 $ϕ ( t ) = ϕ ( 2 t ) + ϕ ( 2 t − 1 )$
that is, its scaling coefficients are c 0 = c 1 = 1. We get these coefficient values by solving equations (3.21) and (3.22) simultaneously. (From equation (3.21) we see that c 0 + c 1 = 2 and from equation (3.22) c 0 c 0 + c 1 c 1 = 2.) The solution of the Haar scaling equation is the single block pulse shown in figure 3.3(a) and defined as
3.30 $ψ ( t ) = { 1 0 ≤ t < 1 0 elsewhere$
This is, in fact, the scaling function used earlier in figure 3.2 to generate the signal approximations of the sine wave. Reordering the coefficient sequence according to equation (3.25) we can see that the corresponding Haar wavelet equation is
3.31 $ψ ( t ) = ϕ ( 2 t ) − ϕ ( 2 t − 1 )$
The Haar wavelet is shown in figure 3.3(b) and defined as
3.32 $ψ ( t ) = { 1 0 ≤ t < 1 2 − 1 1 2 ≤ t < 1 0 elsewhere$

Figure 3.3   Discrete orthonormal wavelets, (a) The Haar scaling function in terms of shifted and dilated versions of itself, (b) The Haar wavelet in terms of shifted and dilated versions of its scaling function, (c) Three consecutive scales shown for the Haar wavelet family specified on a dyadic grid, e.g. from top to bottom: ψ m,n (t), ψ m+1,n (t) and ψ m+2,n (t). (d) Three Haar wavelets at three consecutive scales on a dyadic grid, (e) Three Haar wavelets at different scales. This time the Haar wavelets are not defined on a dyadic grid and are hence not orthogonal to each other. (f) A Meyer wavelet and associated scaling function (right).

The mother wavelet for the Haar wavelet system, ψ(t) = ψ 0,0(t), is formed from two dilated unit block pulses sitting next to each other on the time axis, with one of them inverted. From the mother wavelet we can construct the Haar system of wavelets on a dyadic grid, ψ m,n (t). This is illustrated in figure 3.3(c) for three consecutive scales. The orthogonal nature of the family of Haar wavelets in a dyadic grid system is obvious from figure 3.3(d), where it can be seen that the positive and negative parts of the Haar wavelet at any scale coincide with a constant (positive or negative) part of the Haar wavelet at the next larger scale (and all subsequent larger scales). In addition, Haar wavelets at the same scale index m on a dyadic grid do not overlap. Hence, it is obvious that the convolution of the Haar wavelet with any others in the same dyadic grid gives zero. Figure 3.3(e) shows three Haar wavelets which are not specified on a dyadic grid. The non-orthogonal nature of the Haar wavelets across scales, when specified in this way, is obvious from the plot. In addition, if these wavelets overlap each other along each scale this also destroys orthogonality. (Although this is not the case for other orthonormal wavelets—see later.) Finally, note that the Haar wavelet is of finite width on the time axis; that is, it has compact support. As was stated in the last section, wavelets which have compact support have a finite number of scaling coefficients and these are the type of wavelet we concentrate on in this chapter. Not all orthonormal wavelets have compact support. Figure 3.3(f), for example, shows a Meyer wavelet which is orthonormal with infinite support, although, as with all wavelets, it is localized, decaying relatively quickly from its central peak.

#### 3.2.6  Coefficients from coefficients: the fast wavelet transform

From equation (3.14) we can see that the approximation coefficients at scale index m + 1 are given by

3.33 $S m + 1 , n = ∫ − ∞ ∞ x ( t ) ϕ m + 1 , n ( t ) d t$
Using equation (3.27b) this can be written as
3.34 $S m + 1 , n = ∫ − ∞ ∞ x ( t ) [ 1 2 ∑ k c k ϕ m , 2 n + k ( t ) ] d t$
We can rewrite this as
3.35 $S m + 1 , n = 1 2 ∑ k c k [ ∫ − ∞ ∞ x ( t ) ϕ m , 2 n + k ( t ) d t ]$
The integral in brackets gives the approximation coefficients S m,2n+k for each k. We can therefore write this equation as
3.36 $S m + 1 , n = 1 2 ∑ k c k S m , 2 n + k = 1 2 ∑ k c k − 2 n S m , k$
Hence, using this equation, we can generate the approximation coefficients at scale index m + 1 using the scaling coefficients at the previous scale.

Similarly the wavelet coefficients can be found from the approximation coefficients at the previous scale using the reordered scaling coefficients b k as follows:

3.37 $T m + 1 , n = 1 2 ∑ k b k S m , 2 n + k = 1 2 ∑ k b k − 2 n S m , k$
We can see now that, if we know the approximation coefficients $S m 0 , n$ at a specific scale m 0 then, through the repeated application of equations (3.36) and (3.37), we can generate the approximation and detail wavelet coefficients at all scales larger than m 0. Notice that, to do this, we do not even need to know exactly what the underlying continuous signal x(t) is, only $S m 0 , n$ . Equations (3.36) and (3.37) represent the multire-solution decomposition algorithm. The decomposition algorithm is the first half of the fast wavelet transform which allows us to compute the wavelet coefficients in this way, rather than computing them laboriously from the convolution of equation (3.9). Iterating equations (3.36) and (3.37) performs respectively a highpass and lowpass filtering of the input (i.e. the coefficients S m,2n+k ) to get the outputs (S m+1,n and T m+1,n ). The vectors containing the sequences $( 1 / 2 )$ c k and $( 1 / 2 )$ b k represent the filters: $( 1 / 2 )$ c k is the lowpass filter, letting through low signal frequencies and hence a smoothed version of the signal, and $( 1 / 2 )$ b k is the highpass filter, letting through the high frequencies corresponding to the signal details. We will come back to the filtering process in more detail in later sections of this chapter.

We can go in the opposite direction and reconstruct S m,n from S m+1,n and T m+1,n . We know already from equation (3.17) that x m−1(t) = x m (t) + d m (t); we can expand this as

3.38 $x m − 1 ( t ) = ∑ n S m , n ϕ m , n ( t ) + ∑ n T m , n ψ m , n ( t )$
Using equations (3.27b) and (3.28) we can expand this equation in terms of the scaling function at the previous scale as follows:
3.39 $x m − 1 ( t ) ∑ n S m , n 1 2 ∑ k c k ϕ m − 1 , 2 n + k ( t ) + ∑ n T m , n 1 2 ∑ k b k ϕ m − 1 , 2 n + k ( t )$
Rearranging the summation indices, we get
3.40 $x m − 1 ( t ) = ∑ n S m , n 1 2 ∑ k c k − 2 n ϕ m − 1 , k ( t ) + ∑ n T m , n 1 2 ∑ k b k − 2 n ϕ m − 1 , k ( t )$
We also know that we can expand x m−1(t) in terms of the approximation coefficients at scale m − 1, i.e.
3.41 $x m − 1 ( t ) = ∑ n S m − 1 , n ϕ m − 1 , n ( t )$
Equating the coefficients in equation (3.41) with equation (3.40) we note that the index k at scale index m relates to the location index n at scale index m − 1. In addition, location index n in equation (3.40) is not equivalent to location index n in equation (3.41), as the former corresponds to scale index m, with associated discrete location spacings 2 m , and the latter to scale index m − 1, with discrete location spacings 2 m−1. Hence the n indices are twice as dense in the latter expression. The simplest way to proceed before equating the two expressions is to swap the indices k and n in equation (3.40) which, after some algebra, produces the reconstruction algorithm:
3.42 $S m − 1 , n = 1 2 ∑ k c n − 2 k S m , k + 1 2 ∑ k b n − 2 k T m , k$
where we have reused k as the location index of the transform coefficients at scale index m to differentiate it from n, the location index at scale m − 1. Hence, at the smaller scale, m − 1, the approximation coefficients can be found in terms of a combination of approximation and detail coefficients at the next scale, m. Note that if there are only a finite number of nonzero scaling coefficients (=N K ), then c n−2k has nonzero values only when in the range 0 to N k −1. The reconstruction algorithm is the second half of the fast wavelet transform (FWT). Note that in the literature the fast wavelet transform, discrete wavelet transform, decomposition/reconstruction algorithms, fast orthogonal wavelet transform, multiresolution algorithm, pyramid algorithm, tree algorithm and so on, are all used to mean the same thing. It becomes even more confusing when other discretizations of the continuous wavelet transform are referred to as the discrete wavelet transform. Take care!

#### 3.3.1  Approximations and details

So far we have considered the discrete orthonormal wavelet transform of a continuous function x(t), where it was shown how the continuous function could be represented as a series expansion of wavelet functions at all scales and locations (equation (3.10a)) or a combined series expansion involving the scaling and wavelet functions (equation (3.16)). In this section, and from here on, we will consider discrete input signals specified at integer spacings. To fit into a wavelet multiresolution framework, the discrete signal input into the multiresolution algorithm should be the signal approximation coefficients at scale index m = 0, defined by

3.43 $S 0 , n = ∫ − ∞ ∞ x ( t ) ϕ ( t − n ) d t$
which, as we now know from equations (3.36) and (3.37), will allow us to generate all subsequent approximation and detail coefficients, S m,n and T m,n . at scale indices greater than m = 0. In this section we will assume that we have been given S 0,n . Section 3.4 considers further the question of discrete input data which may not be S 0,n .

In practice our discrete input signal S 0,n is of finite length N, which is an integer power of 2: N = 2 M . Thus the range of scales we can investigate is 0 < m < M. Substituting both m = 0 and m = M into equation (3.16), and noting that we have a finite range of n which halves at each scale, we can see that the signal approximation scale m = 0 (the input signal) can be written as the smooth signal at scale M plus a combination of detailed signals as follows:

3.44 $∑ n = 0 2 M − m − 1 S 0 , n ϕ 0 , n ( t ) = S M , n ϕ M , n ( t ) + ∑ m = 1 M ∑ n = 0 2 M − m − 1 T m , n ψ m , n ( t )$
This is the form we use to describe our finite length discrete signal in terms of its discrete wavelet expansion. The covering of a finite length time segment with wavelets is illustrated in figure 3.4 for Daubechies D4 wavelets at two successive scales. The lower scale covers the time window using eight wavelets, and the larger scale uses four wavelets. One of the wavelets in each plot is shown bold for clarity. The wavelets shown which spill over the end of the window have been wrapped around back to the beginning. Known as wraparound, it is the simplest and one of the most common treatments of the boundary for a finite length signal and we will employ it throughout the rest of this chapter. However, note that, by employing this method, we assume that the signal segment under investigation represents one period of a periodic signal and we are in effect pasting the end of the signal back on to the beginning. Obviously, if the signal is not periodic, and in practice it usually is not, then we create artificial singularities at the boundary which results in large detail coefficients generated near to the boundary.

Figure 3.4   Covering the time axis with dyadic grid wavelets, (a) Eight Daubechies D4 wavelets covering the time axis at scale m. (b) Four Daubechies D4 wavelets covering the time axis at scale m + 1. These wavelets are twice the width of those in (a).

We can rewrite equation (3.44) as

3.45 $x 0 ( t ) = x M ( t ) + ∑ m = 1 M d m ( t )$
where the mean signal approximation at scale M is
3.46 $x M ( t ) = S M , n ϕ M , n ( t )$
As the approximation coefficients are simply factored, weighted averages of the signal then, when wraparound is employed to deal with the boundaries, the single approximation component S M,n is related to the mean of the input signal through the relationship $S 0 , n ¯ = S M , n / 2 M$ , where the overbar denotes the mean of the sequence S 0,n . In addition, when wraparound has been used to deal with the boundaries, the mean signal approximation at the largest scale, x M (t), is a constant valued function equal to the input signal mean. (We will see why this is so later in section 3.5.1.)

The term on the far right of equation (3.45) represents the series expansion of the fluctuating components of the finite length signal at various scales in terms of its detail coefficients. The detail signal approximation corresponding to scale index m is defined for a finite length signal as

3.47 $d m ( t ) = ∑ n = 0 2 M − m − 1 T m , n ψ m , n ( t )$
As we saw above (equation (3.45)), adding the approximation of the signal at scale index M to the sum of all detail signal components across scales 0 < m < M gives the approximation of the original signal at scale index 0. Figure 3.5(a) shows the details of a chirp signal with a short burst of noise added to the middle of it. A Daubechies D20 wavelet was used in the decomposition (see an example of this wavelet later in figure 3.15(e)). The original signal is shown at the top of the plot. Below the signal the details for ten wavelet scales, d 1(t) to d 10(t), are shown. The bottom trace is the remaining signal approximation x 10(t). Adding together all these details plus the remaining approximation (which is the signal mean) returns the original signal. Two things are noticeable from the plot. First, there is a shift to the left of the large amplitude details with increasing scale, as we would expect as the chirp oscillation increases in frequency from left to right. The second thing to notice is that the high frequency burst of noise is captured at the smallest scales, again as we would expect.

We saw in equation (3.19) that the signal approximation at a specific scale was a combination of the approximation and detail at the next lower scale. If we rewrite this equation as

Figure 3.5   Multiresolution decomposition of a chirp signal containing a short burst of noise. (a) Signal details d m (t). (The signals have been displaced from each other on the vertical axis to aid clarity.) (b) Signal approximations x m (t).

3.48 $x m ( t ) = x m − 1 ( t ) − d m ( t )$
and begin at scale m − 1 = 0, that of the input signal, we can see that at scale index m = 1, the signal approximation is given by
3.49a $x 1 ( t ) = x 0 ( t ) − d 1 ( t )$
at the next scale (m = 2) the signal approximation is given by
3.49b $x 2 ( t ) = x 0 ( t ) − d 1 ( t ) − d 2 ( t )$
and at the next scale by
3.49c $x 3 ( t ) = x 0 ( t ) − d 1 ( t ) − d 2 ( t ) − d 3 ( t )$
and so on, corresponding to the successive stripping of high frequency information (contained within the d m (t)) from the original signal. Figure 3.5(b) contains successive approximations x m (t) of the chirp signal. The top trace is the original signal x 0(t). Subsequent smoothing of the signal takes place throughout the traces from the top to the bottom of the figure. As we saw in equation (3.48), the difference between each of the approximation traces x m−1(t) and x m (t) is the detail component d m (t). We can view these differences in the detail component traces of figure 3.5(a).

We have glossed over much of the mathematical detail of multiresolution analysis here. Most mathematical accounts of the subject begin with a discussion of orthogonal nested subspaces and the signal approximations and details, x m (t) and d m (t), as projections on to these spaces. This tack has not been followed here; see for example Mallat (1998), Blatter (1998), Sarkar et al (1998) or Williams and Armatunga (1994). In this chapter we concentrate on the mechanics, rather than the mathematics, of multiresolution analysis.

#### 3.3.2  The multiresolution algorithm—an example

Once we have our discrete input signal S 0,n , we can compute S m,n and T m,n using the decomposition algorithm given by equations (3.36) and (3.37). This can be done for scale indices m > 0, up to a maximum scale determined by the length of the input signal. To do this, we use an iterative procedure as follows. First we compute S 1,n and T 1,n from the input coefficients S 0,n , i.e.

3.50a $S 1 , n = 1 2 ∑ k c k S 0 , 2 n + k$
and
3.50b $T 1 , n = 1 2 ∑ k b k S 0 , 2 n + k$
In the same way, we can then find S 2,n and T 2,n from the approximation coefficients S 1,n , i.e.
3.51a $S 2 , n = 1 2 ∑ k c k S 1 , 2 n + k$
and
3.51b $T 2 , n = 1 2 ∑ k b k S 1 , 2 n + k$
Next, we can find .S 3,n and T 3,n from the approximation coefficients S 2,n and so on, up to those coefficients at scale index M, where only one approximation and one detail coefficient is computed: S M,0 and T M,0. At scale index M we have performed a full decomposition of the finite-length, discrete input signal. We are left with an array of coefficients: a single approximation coefficient value, S M,0, plus the detail coefficients, T m,n , corresponding to discrete wavelets of scale a = 2 m and location b = 2 m n. The finite time series is of length N = 2 M . This gives the ranges of m and n for the detail coefficients as respectively 1 < m < M and Q<n<2 Mm − 1. Notice that the range of n successively halves at each iteration as it is a function of scale index m for a finite length signal. At the smallest wavelet scale, index m = 1, 2 M /21 = N/2 coefficients are computed, at the next scale 2 M /22 = N/4 are computed and so on, at larger and larger scales, until the largest scale (m = M) where only one (=2 M /2 M ) coefficient is computed. The total number of detail coefficients for a discrete time series of length N = 2 M is then, 1 + 2 + 4 + 8 + ⋯ +2 Mx , or $∑ m = 1 M − 1 2 m = 2 M − 1 = N − 1$ . In addition to the detail coefficients, the single approximation coefficient .S M,0 remains. This is related to the signal mean as we saw above, and is required in addition to the detail coefficients to fully represent the discrete signal. Thus a discrete input signal of length N can be broken down into exactly N components without any loss of information using discrete orthonormal wavelets. In addition, no signal information is repeated in the coefficient representation. This is known as zero redundancy.

The decomposition of approximation coefficients into approximation and detail coefficients at subsequent levels can be illustrated schematically thus Figure 3.6 shows an alternative schematic of the same process, illustrating the decomposition and insertion of the approximation and detail coefficients at each Iteration within the wavelet transform vector for an arbitrary input signal vector containing 32 components. The wavelet transform vector after the full decomposition has the form W (M) = (S M , T M , T M−1,…, T m , …, T 2, T 1), where T m represents the sub-vector containing the coefficients T m,n at scale index m (where n is in the range 0 to 2 Mm − 1). We can halt the transformation process before the full decomposition. If we do this, say at an arbitrary level m 0, the transform vector has the form $W ( m 0 ) = ( S m 0 , T m 0 , T m 0 − 1 , … , T 2 , T 1 )$

, where m 0 can take the range 1 ⩽ m 0M − 1. In this case, the transform vector does not contain a single approximation component but rather the sequence of approximation components $S m 0 , n$ . However, the transform vector always contains N = 2 M components. For example, we can see from figure 3.6 that stopping the algorithm at m = 2 results in W (2) = (S 2,T 2 ,T 1). Remember, the range of n is a function of scale index m, hence this vector contains eight S 2,n components, eight T 2,n components and 16 T 1,n components, matching the 32 components of the original input signal vector. The range of n is indicated below the full decomposition vector in figure 3.6. Notice also that we can express the original input signal as the transform vector at scale index zero, i.e. W (0).

Figure 3.6   Schematic diagram of the Haar filtering algorithm.---→ S m,n used to derive S m+1,n ; → S m,n used to derive T m+1,n ; T m,n taken to next level as T m,n , i.e. no further manipulation.

An example of a wavelet decomposition using a discrete wavelet is shown in figure 3.7. The input signal is composed of a section of a sine wave, some noise and a flatline. The signal is decomposed using a Daubechies D6 wavelet. A member of this family is shown in figure 3.7(b). (We will look more closely at the Daubechies wavelet family later in this chapter.) The discrete transform plot is shown in figure 3.7(c), where the dyadic grid arrangement may be seen clearly. This plot is simply a discretized dyadic map of the detail coefficients, T m,n , where the coefficients at larger scales have correspondingly longer boxes (as the wavelets cover larger segments of the input signal). In addition to the detail coefficients, T m,n , the remaining approximation coefficient S M,0 is added to the bottom of the plot. As we would expect it covers the whole time axis. We can see from the transform plot that the dominant oscillation is picked up at scale index m = 6 and the high frequency noise is picked up within the middle segment of the transform plot at the smaller scales. We can use the reconstruction algorithm (equation (3.42)) to get back the original input signal S 0,n from the array of detail coefficients shown in figure 3.7(c). Alternatively, as with the continuous transform, we can reconstruct a modified version of the input signal by using only selected coefficients in the reconstruction. This is shown in figures 3.7(d) and (e), where only the coefficients corresponding to scales m = 5 to 8 are kept (the others are set to zero) and the signal is reconstructed. This has removed a significant amount of the noise from the signal although the sinusoidal waveform is less smooth than the original in figure 3.7(a). We will come across more sophisticated ways to remove noise and other signal artefacts later, in section 3.4.2 of this chapter.

#### 3.3.3  Wavelet energy

After a full decomposition, the energy contained within the coefficients at each scale is given by

3.52 $E m = ∑ n = 0 2 M − m − 1 ( T m , n ) 2$

Figure 3.7   Discrete wavelet transform of a composite signal, (a) Original composite signal, (b) A member of the Daubechies D6 wavelet family, (c) Discrete transform plot. (Note dyadic structure—large positive coefficient values are white and large negative values black.) (d) Coefficient removal, (e) Reconstructed signal using only retained coefficients in (d). The original composite signal (a) is composed of three segments: a sinusoid, uniformly distributed noise and a flatline. The signal is decomposed using Daubechies D6 wavelets (b) to give the dyadic array of transform coefficients plotted in (c). The coefficients corresponding to scales 5 to 9 are kept (d) and used to reconstruct the signal in (e). Note that a grey scale is used to depict the coefficient values, where the maximum value is white and the minimum value is black.

A wavelet-based power spectrum of the signal may be produced using these scale-dependent energies. To do so, we require a frequency measure which is a reciprocal of the wavelet dilation, e.g. the passband centre of the power spectrum of the wavelet. A wavelet power spectrum can then be produced for the signal which is directly comparable with both its Fourier and continuous wavelet counterparts (see chapter 2, section 2.9). This topic is dealt with in detail in chapter 4, section 4.2.1, in connection with the statistical measures used to analyse turbulent fluid signals.

The total energy of the discrete input signal

3.53 $E = ∑ n = 0 N − 1 ( S 0 , n ) 2$
is equal to the sum of the squared detail coefficients over all scales plus the square of the remaining approximation coefficient, S M,0 as follows:
3.54 $E = ( S M , 0 ) 2 + ∑ m = 1 M ∑ n = 0 2 M − m − 1 ( T m , n ) 2$
In fact, the energy contained within the transform vector at all stages of the multiresolution decomposition remains constant. We can, therefore, write the conservation of energy more generally as
3.55 $E = ∑ i = 0 N − 1 ( W i ( m ) ) 2$
where $W i ( m )$ are the individual components of the transform vector W (m) ordered as described in the previous section. When m = 0, this equation corresponds to the summation of the component energies of the input signal (equation (3.53)) and when m = M it corresponds to the summation of the energies within the coefficients at full decomposition (equation (3.54)).

#### 3.3.4  Alternative indexing of dyadic grid coefficients

There are three main methods used in practice to index the coefficients resulting from the discrete wavelet transform. It is worth discussing these now. All three methods are popular in the scientific literature and appear often in many of the examples of the practical application of wavelet analysis in the subsequent chapters of this book. We will use the full decomposition of a 32 component input signal as an illustration.

Scale indexing: The scale indexing system (m, n) corresponding to an input signal of length N = 2 M is shown schematically on a dyadic grid in figure 3.8(b) for the discrete signal shown in figure 3.8(a). The lowest scale on the grid m = 1 corresponds to a spacing of 21 = 2 on the data set. The discrete input signal is at scale index m = 0. We have already come across this grid structure in the plot of the transform coefficients in figure 3.7(c). Such plots give a good visual indication of the covering of the timescale plane by the wavelets and their relative importance in making up the signal.

Sequential indexing: Again, we have already come across this form of indexing (figure 3.6). Figure 3.8(c) contains a schematic of the transform coefficients given in sequential format. We know that a signal N samples long produces N wavelet vector components. It makes sense, therefore, to find an ordering of these coefficients to fit into a vector of length N. The discrete time series, S 0,n , n = 0, 1, 2, N•− 1, can be decomposed into N – 1 detail coefficients plus one approximation coefficient where N is an integer power of 2: N = 2 M . After a full decomposition these transform coefficients can be resequenced from the two-dimensional dyadic array T m,n to the transform vector, where the components $W i ( M )$

have the same range as the original signal (i = 0, 1, 2, .N − 1). The vector component index i is found from the dyadic grid indices m and n through the relationship i = 2 Mm + n. In addition, the superscript M in parentheses denotes a full decomposition of the signal over M scales. The transform vector components, $W i ( M )$ are plotted in figure 3.8(c). The last half of the series in the figure represents the coefficients corresponding to the smallest wavelet scale (index m = 1). The next quarter, working backwards, represents the coefficients corresponding to scale index m = 2, the next eighth to the m = 3 coefficients, and so on, back to $W i ( M )$ , which is the single coefficient for scale index m = M. The remaining component $W 1 ( M )$ is the single approximation coefficient (S M,0). As mentioned in section 3.3.2, if we halt the decomposition algorithm at scale m 0 (before full decomposition) this results in an intermediate transform vector $W 0 ( M )$ which contains a number of approximation coefficients, $W ( m 0 )$ at its beginning.

Figure 3.8   Schematic diagram of alternative indexing methods for dyadic grid wavelet transform coefficients, (a) Original signal containing 32 components (scale index m = 0). (b) Scale indexing, T m,n . (c) Sequential indexing, S m 0 , n where i = 2 M−N + n (corresponding scale indexing is shown at the bottom of the plot), (d) Level indexing, T l,n .

Level indexing: Level indices, l, are often used instead of scale indices, m. The level index l is simply equal to Mm. In this case, the number of wavelets used to cover the signal at each level at a specific level is 2 l , e.g. level l = 0 corresponds to a single wavelet and the scale of the whole time series, level l = 1, corresponds to two wavelets, and the scale of one half of the time series, and so on. The number of wavelets used at a level is simply 2 l . In addition, it is standard in this nomenclature to denote the remaining approximation coefficient at level l = − 1. The wavelet array becomes T l,n . Figure 3.8(d) shows the l, n indexing system. Level indexing is used when the signal is specified over the unit interval and the analysis is set up in terms of resolution rather than scale (i.e. the lowest resolution l = 0 corresponds to the length of the whole signal, whereas the smallest scale m = 0 corresponds to the distance between each signal point).

#### 3.3.5  A simple worked example: the Haar wavelet transform

Now we will illustrate the methods described above using a Haar wavelet in the decomposition of a discrete input signal, S 0,n : n = 0. 1, 2, …, N … 1. To do so, we employ the decomposition algorithm given by equations (3.36) and (3.37). The Haar wavelet has two scaling coefficients, c 0 = 1 and c 1 = 1. Substituting these into equation (3.36) we can obtain the approximation coefficients at the next scale through the relationship

3.56 $S m + 1 , n = 1 2 [ S m + S m + 1 ]$
Similarly, through equation (3.37) we can obtain the detail coefficients at subsequent scales using
3.57 $T m + 1 , n = 1 2 [ S m , 2 n + S m , 2 n + 1 ]$
Using these equations, we will perform the Haar wavelet decomposition of the (very) simple discrete signal, (1, 2, 3, 4). As the signal contains only four data points, we can only perform two iterations of the Haar decomposition algorithm given by equations (3.56) and (3.57). After two iterations we expect to obtain four transform coefficients: three wavelet coefficients T m,n —two at scale index m = 1, (T 1,0, T 1,1), and one at scale index m = 2, (T 2,0), plus a signal mean coefficient at scale index m = 2, (S 2,0). This is illustrated through the iteration of the transform vector in figure 3.9 and also through a schematic of the coefficients in figure 3.10.

The first iteration of the decomposition algorithm gives

$T 1 , 0 = 1 2 [ 1 − 2 ] T 1 , 1 = 1 2 [ 3 − 4 ] = − 1 2 S 1 , 0 = 1 2 [ 0 + 3 ] = 3 2 S 1 , 1 = 1 2 [ 3 + 4 ] = 7 2$

Figure 3.9   Iteration of the wavelet transform vector in the decomposition of a simple signal. ---→ S m,n used to derive S m+ı,n → S m,n used to derive T m+ı,n ;---- T m,n taken to next level as T m¡n , i.e. no further manipulation.

The transform coefficient vector, after this first iteration, is then

$W ( 1 ) = ( S 1 , 0 , S 1 , 1 , T 1 , 0 , T 1 , 1 ) = ( 3 2 , 7 2 , − 1 2 , − 1 2 )$
The second iteration (only involving the remaining approximation coefficients S 1,0 and S 1,1) yields
$T 2 , 0 = 1 2 [ 3 2 − 7 2 ] = − 2 and S 2 , 0 = 1 2 [ 3 2 + 7 2 ] = 5$
The transform coefficient vector after this second iteration is now
$W i ( 2 ) = ( S 2 , 0 , T 2 , 0 , T 1 , 0 , T 1 , 1 ) = ( 5 , − 2 , − 1 2 , − 1 2 )$
The signal mean is found from the remaining approximation coefficient, i.e. $S 2 , 0 / ( 2 ) 2 = 2.5$ . We can also see that the energy of the original discrete signal (12 + 22 + 32 + 42 = 30) is equal to the energy of the transform coefficient vector after the first iteration
$( 3 2 ) 2 + ( 7 2 ) 2 + ( − 1 2 ) 2 + ( − 1 2 ) 2 = 30$
and the second iteration, giving the full decomposition
$5 2 + ( − 2 ) 2 + ( − 1 2 ) 2 + ( − 1 2 ) 2 = 30$
which is a result we expect from the conservation of energy condition expressed in equations (3.52) to (3.55).

To return from the transform coefficients to the original discrete signal we work in reverse. The inverse Haar transform can be written simply as: at even locations 2n:

3.58 $S m , 2 n = 1 2 [ S m + 1 , n + T m + 1 , n ]$

Figure 3.10   Haar decomposition and reconstruction of a simple ramp signal. The original discrete input signal is decomposed through an initial iteration to (b), the first decomposition, then further to (c), the second and final decomposition. From the coefficients at the full decomposition (c), the following can be constructed: the scale index 2 approximation (the signal mean) (d), the scale index 2 detail (e) and the scale index 1 detail (f). (g) The transform coefficient plot associated with the transform values given in (c). (h) Schematic diagram of the addition of the rescaled Haar wavelets to reconstruct the original signal. (Adding the reconstructions in (d), (e) and (f) returns the approximation of the signal at scale index m = 0.)

and at odd locations 2n + 1:
3.59 $S m , 2 n + 1 = 1 2 [ S m + 1 , n − T m + 1 , n ]$
For the Haar wavelet transform, we can derive these reconstruction equations directly from equations (3.56) and (3.57). Alternatively, we could derive them from the reconstruction algorithm of equation (3.42).

We will perform the reconstruction on the transform vector we have just computed:

$W ( 2 ) = ( S 2 , 0 , T 2 , 0 , T 1 , 0 , T 1 , 1 ) = ( 5 , − 2 , − 1 2 , − 1 2 )$
The first iteration using the reconstruction pair (equations (3.58) and (3.59)) yields
$S 1 , 0 = 1 2 ( S 2 , 0 + T 2 , 0 ) = 3 2 S 1 , 1 = 1 2 ( S 2 , 0 − T 2 , 0 ) = 7 2$
which results in
$( S 1 , 0 , S 1 , 1 , T 1 , 0 , T 1 , 1 ) = ( 3 2 , 7 2 , − 1 2 , − 1 2 )$

Iterating again gives

$S 0 , 0 = 1 2 ( S 1 , 0 + T 1 , 0 ) = 1 S 0 , 1 = 1 2 ( S 1 , 0 − T 1 , 0 ) = 2 S 0 , 2 = 1 2 ( S 1 , 1 + T 1 , 1 ) = 3 S 0 , 3 = 1 2 ( S 1 , 1 − T 1 , 1 ) = 4$
hence we get back the original signal (S 0,0, S 0,1, S 0,2, S 0,3) = (1, 2, 3, 4).

Figure 3.10 attempts to show visually the decomposition of the signal into the Haar wavelet components. Figure 3.10(a) contains the original signal, figure 3.10(b) plots the coefficients after the first iteration of the decomposition algorithm and figure 3.10(c) plots the coefficients after the second iteration of the algorithm. The coefficients contained in figure 3.10(c) correspond in turn to a single scaling function at scale index m = 2, a wavelet at scale index m = 2 and two wavelets at scale index m = 1. These are shown respectively in figures 10(d)–(f). The transform coefficient plot for this signal is equally simplistic, consisting of four coefficient values partitioned as shown in figure 3.10(g).

We can find the corresponding approximation and details of the signal by taking the inverse transform of the coefficients at each scale. First the approximation coefficient is used to determine the signal approximation at the largest scale:

3.60 $x 2 ( t ) = S 2 , 0 ϕ 2 , 0 ( t )$
where the scaling function ϕ 2,0(t), for the Haar wavelet, is simply a block pulse of length 4 and amplitude $1 / ( 2 ) 2 = 1 2$ , beginning at t = 0. Thus x 2(t) is simply a block pulse of length 4 and magnitude $S 2 , 0 × 1 2 = 2.5$ , which is, in fact, the signal mean. A plot of x 2(t) is shown in figure 3.10(d). Similarly, we can obtain the signal detail component at scale index 2 as follows:
3.61 $d 2 ( t ) = T 2 , 0 Ψ 2 , 0 ( t )$
This detail component (shown in figure 3.10(e)) is simply a single Haar wavelet spanning the data set with coefficient value T 2,0 = −2, hence magnitude $− 2 / ( 2 ) 2 = − 1$ . Next the detail signal component at scale index 1 is found from
3.62 $d 1 ( t ) = ∑ n = 0 1 T 1 , n Ψ 1 , n ( t ) = T 1 , 0 Ψ 1 , 0 ( t ) + T 1 , 1 Ψ 1 , 1 ( t )$
which is simply two Haar wavelets at scale index 1, side by side, with amplitudes given by $T 1 , 0 / ( 2 ) = ( − 1 / 2 ) / ( 2 ) = − 0.5$ and $T 1 , 1 / ( 2 ) = ( − 1 / 2 ) / ( 2 ) = − 0.5$ . The detail component d 1(t) is plotted in figure 3.10(f). We already know from equation (3.45) that the approximation of the signal at scale index 0 can be found by adding together all the detail components plus the signal approximation at scale index M, i.e.
3.63a $x 0 ( t ) = x M ( t ) + ∑ m = 1 M d m ( t )$
We have already seen an example of this partitioning of the signal into approximation and detail components in figure 3.5 for a chirp signal. For the Haar case considered here M = 2, hence
3.63b $x 0 ( t ) = x 2 ( t ) + d 2 ( t ) + d 1 ( t )$
This is simply the addition of the approximation and detail components shown in figures 3.10(d)–(f). This is shown schematically in figure 3.10(h).

#### 3.4.1  Discrete experimental input signals

We will now consider the case where we have a discrete experimental signal collected, say, using some piece of experimental apparatus, and we want to perform a wavelet decomposition of it using the multiresolution algorithm. In addition, we would also like to represent the approximation and detail signal components discretely at the resolution of the input signal. This discrete signal, which we will denote x i , is of finite length N, i.e. i = 0, 1, …, N − 1. It has been acquired at discrete time intervals ∆t (the sampling interval) to give the discrete time signal x(t i ): i = 0, 1, 2, …, N − 1. The sampling of the signal provides a finite resolution to the acquired signal. This discretization of the continuous signal is then mapped on to a discrete signal x i , where the sampling interval has been normalized to 1. In doing so we must remember ∆t and add it when required, for example in real applications when computing signal frequencies.

Common practice is to input the discretely sampled experimental signal, x i , directly as the approximation coefficients at scale m = 0, and begin the multi-resolution analysis from there. However, it is not correct to use the sampled time series x i directly in this way. We should really use the approximation coefficients S 0,n obtained from the original continuous signal at scale m = 0, as defined by equation (3.43). In practice we usually do not know exactly what x(t) is. As S 0,n is a weighted average of x(t) in the vicinity of n, then it is usually reasonable to input x i as S 0,n if our signal is slowly varying between samples at this scale. That is we simply set

3.64 $S 0 , n = x n$
where, at scale index m = 0, both the coefficient location index n and signal discretization index i have the same range (0 to N − 1) and are equal to each other. In the rest of this chapter we will assume that the sampled experimental signal has been input directly as S 0,n . The literature is full of studies of real data where this has been done and, if anything, it is the rule rather than the exception. Obviously, it does not make it any more correct! In fact, Strang and Nguyen (1996) call the use of the sampled signal directly within the transform a ‘wavelet crime’ and suggest various ways to preprocess the sampled signal prior to performing the analysis.

In practice, continuous approximations, x m (t), and details, d m (t), of the signal are not constructed from the multiresolution analysis, especially when inputting the signal x i as the detail coefficients at scale index m = 0. Instead, either the approximation and detail coefficients, S m,n and T m,n , are displayed at their respective scales or, alternatively, they are used to construct representations of the signal at the scale of the input signal (m = 0). The latter is sometimes preferable as the scalings of the displays are visually comparable. As an example, let us consider the Haar wavelet decomposition of a simple discrete signal containing eight components: x i = (4, 5, −4, −1, 0, 1, 2, 0). The signal is shown in figure 3.11(a). We can see that the first half of the signal has a marked step from 5 to −4, whereas the second half of the signal appears much smoother. The transform vector after a full decomposition is

$W ( 3 ) = ( S 3 , 0 , T 3 , 0 , T 2 , 0 , T 2 , 1 , T 1 , 0 , T 1 , 1 , T 1 , 2 , T 1 , 3 ) = ( 2.475 , 0.354 , 7.000 , − 0.500 , − 0.707 , − 2.121 , − 0.707 , 1.414 ) = ( 7 ( 2 ) 3 , 1 ( 2 ) 3 , 14 2 , − 1 2 , − 1 2 , − 3 2 , − 1 2 , 2 2 )$
(You can find these values for yourself using the procedure described in section 3.3.5.) This vector is shown schematically in figure 3.11(b). Performing the inverse transform on this vector leads us back to the original discrete input signal. Referring back to the schematic of the wavelet decomposition given in section 3.3.2, we can similarly represent the reconstruction from the transform vector, W (3), as

Remember that the original signal x i was input into the multiresolution algorithm as S 0,n .

Let us now look at what happens to the reconstructed signal when we remove the smallest-scale (i.e. highest-frequency) detail coefficients in turn (figures 3.11(c)–(e)). Removing the components, T 1,n , the modified transform vector becomes (2.475, 0.354, 7.000, −0.500, 0, 0, 0, 0), where the coefficients set to zero are shown bold. Performing the inverse transform on this vector with the coefficients at the smallest wavelet scale set to zero removes the high-frequency details at this scale and returns a smoother version of the input signal. We will denote this smooth version of the input signal as x 1,i . This operation is shown schematically as

Figure 3.11   Multiresolution decomposition as scale thresholding. (a) x 0, i (=x 1,i + d 1,i ), (b) Wavelet coefficients (S 3,0, T 3,0 , T 2,0 T 2,1 , T 1,0 T 1,1, T 1,2 , T 1,3). (c) x 1,i (=x 2,i + d 2,i ), (=x 0,i −d 1,i ). (d) d 1,i . (e) x 2,i (=x 3,i + d 3,i ), (=x 1,1 − d 2,i ). (f) d 2,i . (g) x 3,i (signal mean), (=x 2,i − d 3,i ). (h) d 3,i .

The reconstructed discrete signal becomes x 1,i = (4.5, 4.5, −2.5, −2.5, 0.5, 0.5, 1, 1). That is, the signal is smoothed by the averaging of each pair of signal values as shown in figure 3.11(c). Note that we use the nomenclature x 1,i for the approximation signal x 1(t) expressed in terms of discrete coefficients at the scale of the original input signal (i.e. m = 0). Remember from section 3.2.3 that the approximation coefficients, S m,n , provide a discrete approximation of the signal at scale index m. By passing the coefficients through the reconstruction filter we can express the contributions of these discrete approximations at the scale of the original signal (scale index 0). In the rest of this section, we will see how to express all the discrete approximations and details at the scale of the input signal with index m = 0.

Now we remove the detail coefficients associated with the next smallest scales from the transform vector to get (2.475, 0.354, 0, 0, 0, 0, 0, 0). Reconstructing the signal using this modified transform vector, we get x 2,i = (1, 1, 1, 1, 0.75, 0.75, 0.75, 0.75) shown in figure 3.11(e). Again we can show this schematically as

where we have put the prime on S1,n to indicate that it has different valued components from S 1,n above in the full reconstruction as it is computed with the T 2,n components set to zero. We can see that the nomenclature is beginning to get a bit awkward here as x 2,i is in effect S0,n : a modified version of the input signal. In x 2,i , the subscript 2 relates to the largest scale of the original approximation coefficients used at the beginning of the reconstruction, and the index i reminds us that we are back at the input resolution of the original signal (m = 0) with N components.

Removing the last detail coefficient, the modified transform vector becomes (2.475, 0, 0, 0, 0, 0, 0, 0) which, when used to reconstruct the signal, produces x 3,i = (0.875, 0.875, 0.875, 0.875, 0.875, 0.875, 0.875, 0.875). This can be shown as

Actually, we have now removed all signal detail and all the components of vector x 3,i are of constant value. The signal has been progressively smoothed until all that is left is a row of constant values equal to the signal mean (figure 3.11(g)).

We can by see comparing figures 3.11(c), (e) and (g) that the high-frequency information has been successively stripped from the discrete input signal (figure 3.11(a)) by successively removing the discrete details at scale indices m 0 = 1, 2 and 3 in turn. These discrete signal details are shown in figures 3.11(d), (f) and (h). If we want to generate the details of the discrete signal at any scale using the multiresolution algorithm, we perform the inverse transform using only the detail coefficients at that scale (we could also subtract two successive discrete approximations). For example, to compute the detail at the lowest wavelet scale (index m = 1), which we will denote d 1,i , the following multiresolution operation is required:

d 1, i is in fact the contribution of the wavelets at scale index m = 1 (i.e. ψ 1,0(t), ψ 1,1(t), ψ 1,2(t) and ψ 1,3(t) to the original signal x i expressed at discrete points at the scale of the input function m = 0 at locations i = 0, 1, …, N – 1. Similarly we can calculate the contribution d 2, i using the following operation:

where, obviously, the S2,n vector has all of its components equal to zero as it comes from S 3,n and T 3,n with elements set to zero. Then d 2,i is the contribution of the wavelets at scale index m = 2 (i.e. which for this example come from the two wavelets ψ 2,0(t) and ψ 2,1 (t)) to the original signal x i given at discrete points at the scale of the input function m = 0, i.e. at locations i = 0 to N – 1. Similarly, beginning with T 3,n we can calculate the contribution d 3,i .

In the above example we were effectively taking the information stored in the approximation and detail coefficients S m,n and T m,n and expressing it at scale index m = 0. The discrete approximation and detail contributions x m,i and d m,i are related to their continuous counterparts x m (t) and d m (t) through the scaling equation at scale m = 0 as follows:

3.65 $x m ( t ) = ∑ i = 0 N − 1 x m , i ϕ 0 , i ( t ) [ = ∑ n = 0 2 M − m − 1 S m , n ϕ m , n ( t ) ]$
and
3.66 $d m ( t ) = ∑ i = 0 N − 1 d m , i ϕ 0 , i ( t ) [ = ∑ n = 0 2 M − m − 1 S m , n ϕ m , n ( t ) ]$
Notice that the scaling function Φ 0,i (t) only used to compute the continuous approximation from x m,i but also the detail components from d m,i . This is because the scaling coefficient sequence for the wavelet, b k , has already been used in the initial stages of the reconstruction sequence to take T m,n to S m−1,n Thereafter the contributions can be expressed in terms of scaling functions, i.e. from S m−1,n to S m−2,n and so on. You can verify equations (3.65) and (3.66) using the reconstruction algorithm of equation (3.42) (noting that one of its terms will be zero) and the scaling function relationship given by equation (3.27b).

Figure 3.11(c) shows x 1,i , i.e. where only the detail signal d 1,i corresponding to the smallest scale wavelets (m = 1) has been removed (figure 3.11(d)). Figure 3.11(e) shows x 2,i where both detail signals d 1,i and d 2,i, corresponding to the smallest and next smallest scale wavelets, have been removed from the original discrete input signal. d 2, i is shown in figure 3.11(f). Figure 3.11(g) shows x 3,i , where the detail signals d m,i corresponding to all the wavelet scales (m = 1,2 and 3) have been stripped from the original signal leaving only the signal mean component. This can be written as

3.67a $x 3 , i = x 0 , i − ∑ m = 1 3 d m , i$
or, in general, over M scales as
3.67b $x M , i = x 0 , i − ∑ m = 1 M d m , i$
where x 0,i is simply the input signal x i . This equation stems directly from its continuous counterpart of equation (3.45) using equations (3.65) and (3.66). We have now seen how we can express everything discretely. We can use a suitable discrete input signal within the multiresolution algorithm, and we can express the approximation and details of this signal at discrete points at the input resolution; and, as we will see later in section 3.5.1, even the wavelet and scaling functions are expressed discretely within the multiresolution algorithm, built up by the repeated iteration of the scaling coefficient vectors.

#### 3.4.2  Smoothing, thresholding and denoising

Let us look again at the discrete input signal used in the previous section (figure 3.12(a)). Once we have performed the full decomposition, we are free to alter any of the coefficients in the transform vector before performing the inverse. We can set groups of components to zero, as we did in the last section (figure 3.11), or set selected individual components to zero. We can reduce the magnitudes of some components rather than set them to zero. In fact, we can manipulate the components in a variety of ways depending on what we want to achieve. Notice, however, that the transform vector contains a range of values: some large and some small. Let us see what happens if we throw away the smallest valued coefficients in turn and perform the inverse transforms. We start with the smallest valued coefficient 0.354. Setting it to zero, the modified transform vector becomes (2.475, 0, 7.000, −0.500, −0.707, −2.121, −0.707, 1.414). Performing the inverse on this vector gives the reconstructed discrete signal (3.875, 4.875, −4.125, −1.125, 0.125, 1.125, 2.125, 0.125), as shown in figure 3.12(b). There is no discernible difference between the original discrete signal and the reconstruction. We now remove the next smallest valued coefficient (smallest in an absolute sense), the −0.5 coefficient. The transform vector becomes (2.475, 0, 7.000, 0, −0.707, −2.121, −0.707, 1.414) and the reconstructed signal is (3.875, 4.875, −4.125, −1.125, 0.375, 1.375, 1.875, −0.125). Again we have to look carefully at figure 3.12(c) to discern any difference between the original discrete signal and the reconstruction. Next we remove the two −0.707 components to get (2.475, 0, 7.000,0, 0, −2.121, 0, 1.414). Reconstructing we get (4.375, 4.375, −4.125, −1.125, 0.875, 0.875, 1.875, −0.125). This reconstruction is shown in figure 3.12(d) where we can now notice obvious smoothing between both the first and second signal point and the fifth and sixth. Next setting the last transform vector component to zero, we reconstruct to get (4.375, 4.375, −4.125, −1.125, 0.875, 0.875, 0.875, 0.875). This is shown in figure 3.12(e). Removing the −2.121 component leaves the coefficient vector as (2.475, 0, 7.000, 0, 0, 0, 0, 0) and the reconstruction becomes (4.375, 4.375, -2.625, −2.625, 0.875, 0.875, 0.875, 0.875) as shown in figure 3.12(f). Removing the 2.475 component (the signal mean component) leaves the coefficient vector with only a single component: (0, 0, 7.000, 0, 0, 0, 0, 0). Reconstructing using this component leaves a single (wavelet shaped) fluctuation contained within the first half of the signal (figure 3.12(g)).

Figure 3.12   Signal reconstruction using thresholded wavelet coefficients. The coefficient vectors used in the reconstructions are given below each reconstructed signal. Note that (a) is the original signal as it uses all of the original eight wavelet coefficients.

Can you see what has happened? The least significant components have been smoothed out first, leaving the more significant fluctuating parts of the signal intact. What we have actually done is to threshold the wavelet coefficients at increasing magnitudes. First, all the components whose magnitude was equal to or below 0.354 were removed; i.e. 0.354 was the threshold. Then 0.5 was set as the threshold, then 0.707 and so on. By doing this we removed the least significant influences on the signal first. Hence, the shape of the reconstructed signal resembles the original even with a large number of coefficients set to zero. Compare this magnitude thresholding of the coefficients to the scale-dependent smoothing of figure 3.11, which removed the components at each scale in turn, beginning with the smallest scales. Note that in practice we would normally deal with the signal mean coefficient separately, either retaining it regardless of magnitude or removing it at the beginning of the thresholding process.

We can define the scale-dependent smoothing of the wavelet coefficients as

3.68 $T m , n scale = { 0 m ≥ m * T m , n m < m *$
where m* is the index of the threshold scale, or the transform vector. Figure 3.13(a) shows a schematic diagram of scale-dependent smoothing for sequentially indexed coefficients W i (Assuming a full decomposition, we have dropped the M superscript from $W i M$ ) In this case the thresholding criterion is defined as
3.69 $T i scale = { 0 i ≥ 2 M − m * W i i < 2 M − m *$
where the range of the sequential index i is from 0 to N − 1 and N is the length of the original signal; hence i = 2 Mm* is the first location index within the transform vector where the coefficients are set to zero. (Remember that the smallest scale coefficients are placed at the right-hand end of the transform vector and hence have the highest index values.) Note that in practice the coefficients at the very largest scales are sometimes also set to zero to remove drift effects from the signal. The reconstructed signal using the scale thresholded coefficient vector with scale threshold m* is simply the smooth approximation of the signal at scale m* expressed at the scale of the input signal, m = 0, i.e. x m*,i (See previous section.)

Magnitude thresholding is normally carried out to remove noise from a signal, to partition signals into two or more (and not necessarily noisy) components, or simply to smooth the data. It involves the reduction or complete removal of selected wavelet coefficients in order to separate out the behaviour of interest from within the signal. There are many methods for selecting and modifying the coefficients. The two most popular are hard and soft thresholding. Unlike scale-dependent smoothing, which removes all small scale coefficients below the scale index m* regardless of amplitude, hard and soft thresholding remove, or reduce, the smallest amplitude coefficients regardless of scale. This is shown schematically in figure 3.13(b). To hard threshold the coefficients, a threshold, λ, is set which is related to some mean value of the wavelet coefficients at each scale, e.g. standard deviation, mean absolute deviation, etc. Those coefficients above the threshold are deemed to correspond to the coherent part of the signal, and those below the threshold are deemed to correspond to the random or noisy part of the signal. Hard thresholding is of the form:

3.70 $W i hard = { 0 | W i | < λ W i | W i | ≥ λ$
Hard thresholding makes a decision simply to keep or remove the coefficients. (Hard thresholding was performed in the example of figure 3.12.) Soft thresholding

Figure 3.13   Scale-dependent smoothing and coefficient thresholding, (a) Scale-dependent smoothing, (b) Amplitude thresholding, (c) The relationship between the original coefficients and hard (left) and soft (right) thresholded coefficients.

recognizes that the coefficients contain both signal and noise, and attempts to isolate the signal by removing the noisy part from all coefficients. Soft thresholding is of the form
3.71 $W i soft = { 0 | W i | < λ sign ( W i ) ( | W i | − λ ) | W i | ≥ λ$
where all coefficients below the threshold, λ, are set to zero and all the coefficients whose magnitude is greater than A are shrunk towards zero by an amount λ. Figure 3.13(c) shows a schematic diagram of the relationship between the original and thresholded coefficients for both hard and soft thresholding. The retained coefficients shown in the figure are kept as they are when hard thresholding is employed and their magnitude is reduced by λ when soft thresholding is employed.

Figure 3.14 shows examples of both hard and soft thresholding of a test signal composed of two sinusoids plus Gaussian white noise (figures 3.14(a) and (b)). The wavelet coefficients obtained from a Daubechies D10 decomposition are shown in figure 3.14(c). These have been hard thresholded for thresholds set at λ = 2, 4, 6 and 8 respectively. The reconstructions corresponding to each of the thresholds are shown in figures 3.14(d)–(g). We can see that for low thresholds some of the high frequency noise is retained, whereas for high thresholds the signal is excessively smoothed. From visual inspection, an optimum threshold would seem to lie somewhere in the region of λ = 4. One commonly used measure of the optimum reconstruction is the mean square error between the reconstructed signal and the original signal and, in fact, it is found to be minimum near to this value of the threshold. The corresponding soft thresholded reconstructions are shown in figures 3.14(h) and (i) for λ = 2 and 4 respectively.

Often we do not know the exact form of either the underlying signal or the corrupting noise. The choice of threshold is therefore non-trivial and we can apply a number of signal and/or noise based criteria in producing a value pertinent to the problem under consideration. The threshold can, for example, be a constant value applied to the coefficients across all scales, some of the scales, or its value can vary according to scale. One of the most popular and simplest thresholds in use is the universal threshold defined as

3.72 $λ U = ( 2 ln ⁡ N ) 1 / 2 σ$
where (2 In N)1,2 is the expected maximum value of a white noise sequence of length N and unit standard deviation, and σ is the standard deviation of the noise in the signal under consideration. Thus if the underlying coherent part of the signal is zero (i.e. the signal contains only noise), using λU as the threshold gives a high probability of setting all the coefficients to zero. For large samples the universal threshold will remove, with high probability, all the noise in the reconstruction, but part of the underlying function might also be lost, hence the universal threshold tends to over-smooth in practice. (This method is often known as the VISUSHRINK method as the resulting smooth signal estimate is visually more appealing.) In addition, in practice it is usual to leave the coefficients at the largest scales untouched even if they do not pass the universal threshold. The example of the sinusoids plus noise shown in figure 3.14 contains Gaussian noise with σ = 1 and N = 1024, the universal threshold for this data is then λU = 3.723. Within a wide range of practical data sizes (26–219), only in about one tenth of the realizations will any pure noise variable exceed the threshold. In this sense universal thresholding method gives a ‘noise-free’ reconstruction.

Figure 3.14   Hard and soft thresholding, (a) Original time series composed of two sinusoids, both with unit amplitude, one twice the periodicity of the other, (b) Time series in (a) with added Gaussian noise (zero mean and unit standard deviation), (c) Wavelet coefficients in sequential format derived using the Daubechies D10 shown, (d) Hard thresholded coefficients, λ = 2 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.) (e) Hard thresholded coefficients, λ = 4 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.) (f) Hard thresholded coefficients, λ = 6 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.) (g) Hard thresholded coefficients, λ = 8 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.) (h) Soft thresholded coefficients, λ = 2 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.) (i) Soft thresholded coefficients, A = 4 (top), and corresponding reconstructed time series (bottom). (The original time series of figure (a) is shown dashed.)

Universal soft thresholding usually produces a signal reconstruction with less energy than that for hard thresholding using the same threshold value, as the retained coefficients are all shrunk towards zero. This can be seen by comparing figure 3.14(e) with 3.14(1) for λ = 4. Hence, it is often the case in practice that the universal threshold used for hard thresholding is divided by about 2 when employed as a soft threshold. Another problem encountered when implementing the universal threshold in practice is that we do not know the value of σ for our signal. In this case, a robust estimate a is used, typically set to the median of absolute deviation (MAD) of the wavelet coefficients at the smallest scale divided by 0.6745 to calibrate with the standard deviation of a Gaussian distribution. Thus the universal threshold becomes

3.73 $λ U = ( 2 ln ⁡ N ) 1 / 2 MAD 0.6745 = ( 2 ln ⁡ N ) 1 / 2 σ ^$
Many other thresholding methods have been proposed including the minimax method, the SURE method, the HYBRID method, cross-validation methods, the Lorentz method and various Bayesian approaches. Some of these produce a global threshold of constant value across scales and others provide a scale-dependent threshold. We do not consider these in detail here. More information is provided at the end of the chapter.

#### 3.5  Daubechies wavelets

As we saw with the Haar wavelet transform earlier in section 3.3.5, the coefficients are ordered in two distinct sequences: one acts as a smoothing filter for the data, the other extracts the signal detail at each scale. The Haar wavelet is extremely simple in that it has only two scaling coefficients and both are equal to unity. In this section we will look at a family of discrete wavelets of which the Haar wavelet is the simplest member—the Daubechies wavelets. The scaling functions associated with these wavelets satisfy the conditions given in section 3.2.4, as all orthogonal wavelets do. In addition to satisfying these criteria, Daubechies required that her wavelets had compact support (i.e. a finite number, N k of scaling coefficients) and were smooth to some degree. The smoothness of the wavelet is associated with a moment condition which can be expressed in terms of the scaling coefficients as

3.74 $∑ k = 0 N k − 1 ( − 1 ) k c k k m = 0$
for integers m = 0,1,2, …, N k /2 − 1. These wavelets have N k /2 vanishing moments, which means that they can suppress parts of the signal which are polynomial up to degree N k /2 − 1. Or, to put it another way, Daubechies wavelets are very good at representing polynomial behaviour within the signal. Some examples of Daubechies wavelets, their scaling functions and associated energy spectra are given in figure 3.15. The support lengths of the Daubechies wavelets are N k − 1, i.e. the D2 (Haar) wavelet, as we already know, has a support length of 1, the D4 wavelet has support length of 3, the D6 a support length of 5 and so on. We can see from figure 3.15 that the scaling function lets through the lower frequencies and hence acts as a lowpass filter, and the associated wavelet lets through the higher frequencies and hence acts as a highpass filter. In addition, we can see that the spectra are oscillatory in nature, with bumps decreasing in amplitude from the lower to higher frequencies. The magnitudes of the secondary bumps in the spectra reduce as the number of scaling coefficients, and hence the number of vanishing moments of the wavelet, increase.

In this chapter we have already looked in detail at the Daubechies wavelet with only two scaling coefficients (the D2 or Haar wavelet). Let us now look at the Daubechies wavelet which has four scaling coefficients, the D4. The ‘D’ represents this particular family of Daubechies wavelet and the ‘4’ represents the number of nonzero scaling coefficients, N k . (Note that Daubechies wavelets are also often defined by the number of zero moments they have, which equals Nk/2, in which case the sequence runs Dl, D2, D3, D4, …, etc.)

Figure 3.15   A selection of Daubechies wavelets and scaling functions with their energy spectra. All wavelets and scaling functions are shown only over their respective support—outside their support they have zero value. Note also that only the positive part of the spectrum is given; an identical mirror image is present at negative frequencies.

From equation (3.20), we know that the scaling equation for a four-coefficient wavelet is

3.75 $ϕ ( t ) = c 0 ϕ ( 2 t ) + c 1 ϕ ( 2 t − 1 ) + c 2 ϕ ( 2 t − 2 ) + c 3 ϕ ( 2 t − 3 )$
and, from equation (3.23), that the corresponding wavelet function is
3.76 $ψ ( t ) = c 3 ϕ ( 2 t ) − c 2 ϕ ( 2 t − 1 ) + c 1 ϕ ( 2 t − 2 ) − c 0 ϕ ( 2 t − 3 )$
To find the values of the scaling coefficients for the D4 wavelet we use equations (3.21), (3.22) and (3.74). From equation (3.21) we get
3.77 $c 0 + c 1 + c 2 + c 3 = 2$
from equation (3.22) we get
3.78 $c 0 2 + c 1 2 + c 2 2 + c 3 2 = 2$
and from equation (3.74) with m = 0 we get
3.79 $c 0 − c 1 + c 2 − c 3 = 0$
and again using equation (3.74), this time setting m = 1, we get
3.80 $− 1 c 1 + 2 c 2 − 3 c 3 = 0$
Four scaling coefficients which satisfy the above four equations are
$c 0 = 1 + 3 4 c 1 = 3 + 3 4 c 2 = 3 − 3 4 c 3 = 1 − 3 4$
and so do
$c 0 = 1 − 3 4 c 1 = 3 − 3 4 c 2 = 3 + 3 4 c 3 = 1 + 3 4$
One set leads to ϕ(t) and the other to ϕ(−t). We will adopt the first set, which are c 0 = 0.6830127, c 1 = 1.1830127, c 2 = 0.3169873 and c 3 = −0.1830127 respectively. The scaling coefficients for the Daubechies wavelet system for larger numbers of coefficients are found by numerical computation. The coefficients for Daubechies wavelets up to D20 are given in table 3.1.

We can compute the scaling function from the D4 coefficients using equation (3.75). To do this we must rewrite it as

3.81 $ϕ j ( t ) = c 0 ϕ j − 1 ( 2 t ) + c 1 ϕ j − 1 ( 2 t − 1 ) + c 2 ϕ j − 1 ( 2 t − 2 ) + c 3 ϕ j − 1 ( 2 t − 3 )$
where the subscript j is the iteration number. Then choosing an arbitrary initial shape for the scaling function ϕ 0(t) we find ϕ 1(t), then using ϕ 1(t) we find ϕ 2(t), and so on, iterating until ϕ j (t) = ϕ j−1(t) (or at least until ϕ j (t) is close enough to ϕ j−1(t) for our purposes). Once we have an approximation for ϕ(t) we could define the wavelet directly using equation (3.76). However, to perform the D4 wavelet transform of a discrete signal in practice we are not required to compute the wavelet or scaling functions directly; rather we simply employ the scaling coefficients within the multi-resolution algorithm (equations (3.36) and (3.37)) in the same manner as we did for the Haar wavelet. In this case, the approximation coefficients are computed using

### Table 3.1   Daubechies wavelet coefficients D2 to D20

 D2 −0.34265671 0.01774979 0.04345268 1 −0.04560113 6.07514995e − 4 −0.09564726 1 0.10970265 −2.54790472e − 3 3.54892813e − 4 −0.00882680 5.00226853e − 4 0.03162417 D4 −0.01779187 −6.67962023e − 3 0.6830127 4.71742793e − 3 D16 −6.05496058e − 3 1.1830127 0.07695562 2.61296728e − 3 0.3169873 D12 0.44246725 3.25814671e − 4 −0.1830127 0.15774243 0.95548615 −3.56329759e − 4 0.69950381 0.82781653 −5.5645514e − 5 Do 1.06226376 −0.02238574 0.47046721 0.44583132 −0.40165863 D20 1.14111692 −0.31998660 6.68194092e − 4 0.03771716 0.650365 −0.18351806 0.18207636 0.26612218 −0.19093442 0.13788809 −0.02456390 0.74557507 −0.12083221 0.03892321 −0.06235021 0.97362811 0.0498175 −0.04466375 0.01977216 0.39763774 D8 7.83251152e − 4 0.01236884 −0.35333620 0.32580343 6.75606236e − 3 −6.88771926e − 3 −0.27710988 1.01094572 −1.52353381c − 3 −5.54004549e − 4 0.18012745 0.89220014 9.55229711e − 4 0.13160299 −0.03957503 D14 −1.66137261e − 4 −0.10096657 −0.26450717 0.11009943 −0.04165925 0.0436163 0.56079128 D18 0.04696981 0.0465036 1.03114849 0.05385035 5.10043697e − 3 −0.01498699 0.66437248 0.34483430 −0.01517900 −0.20351382 0.85534906 1.97332536e − 3 D10 −0.31683501 0.92954571 2.81768659e − 3 0.22641898 0.10084647 0.18836955 −9.69947840e − 4 0.85394354 0.11400345 −0.41475176 −1.64709006e − 4 1.02432694 −0.05378245 −0.13695355 1.32354367e − 4 0.19576696 −0.02343994 0.21006834 −1.87584156e − 5
3.82a $S m + 1 , n = 1 2 ∑ k = 0 N k − 1 c k S m , 2 n + k = 1 2 [ c 0 S m , 2 n + c 1 S m , 2 n + 1 + c 2 S m , 2 n + 2 + c 3 S m , 2 n + 3 ] = 0.483 S m , 2 n + 0.837 S m , 2 n + 1 + 0.224 S m , 2 n + 2 − 0.129 m , 2 n + 3$
We therefore take the product of a four-digit sequence of the signal by the scaling coefficient vector $( 1 / 2 ) c k$ (the lowpass filter) to generate the approximation component at the first scale. The mechanics of this are shown schematically in figure 3.16. The four-digit sequence, $( 1 / 2 ) c k$ , slides across the signal at scale index m in jumps of two generating, at each jump, an approximation component at

Figure 3.16   Filtering of the signal: decomposition. The original signal at scale index m = 0 (i.e. S 0, n ) is filtered to produce the approximation coefficients S 1,n . This is done by sliding the lowpass filter along the signal one step at a time. Subsampling removes every second value. The diagram shows only the retained values, i.e. effectively the filter coefficient vector jumps two steps at each iteration. Next, the sequence S 1,n is filtered in the same way to get S 2,n and so on. The detail coefficients are found by using the same method but employing highpass wavelet filter coefficients.

the next scale (m + 1). To generate the corresponding coefficients for the wavelet we use the reconfigured scaling coefficient sequence $( 1 2 )$ b k (the highpass filter), where $b k = ( − 1 ) k c N k − 1 − k$ , as follows:
3.82b $T m + 1 , n = 1 2 ∑ k = 0 N k − 1 b k S m , 2 n + k = 1 2 [ c 3 S m , 2 n − c 2 S m , 2 n + 1 + c 1 S m , 2 n + 2 − c 0 S m , 2 n + 3 ] = − 0.129 S m , 2 n + 0.224 S m , 2 n + 1 + 0.837 S m , 2 n + 2 − 0.483 m , 2 n + 3$
As an example, we will now consider the D4 wavelet decomposition of the signal x i = (1,0,0,0,0,0,0,0) which we input into the multiresolution algorithm as S 0,n . It has N = 8 components, so we iterate the transform decomposition algorithm three times. After the first iteration the transform vector becomes (0.483, 0.000, 0.000,0.224, −0.129,0.000, 0.000, 0.837) containing four approximation components followed by four detailed coefficients, i.e. (S 1,0, S 1,1, S 1,2, S 1,3, T 1,0, T 1,1, T 1,2, T 1,3). Remember that we employ wraparound to deal with the edges of the signal. When either of the filter pairs is placed at 2n = 6, the 2n + 2 and 2n + 3 locations are not contained within the signal vector (i = 0, 1,…, N − 1). Thus, the last two filter coefficients of 0.224 and −0.129 (approximation) or 0.837 and −0.483 (detail) are placed back at the start of the signal vector, the first of each coinciding with the unit value at the beginning of the signal, hence producing respectively the values 0.224 or 0.837. These values are placed at location S 1,3 and T 1,3 respectively. The next iteration is performed on the remaining four approximation components S 1,1, S 1,2, S 1,3, producing two approximation components followed by two detailed wavelet components. These are added to the four detailed coefficients obtained at the previous iteration to give the vector (0.204, 0.296, −0.171, 0.354, −0.129, 0.000, 0.000, 0.837). The third and final iteration produces (0.354, −0.065, −0.171, 0.354, −0.129, 0.000, 0.000, 0.837). This vector has the form (S 3,0, T 3,0, T 3,0, T 2,1 , T 1,0, T 1,1, T 1,2, T 1,3), where the signal mean can be found from the first coefficient, i.e. S 3,0/23/2 = 0.354/23/2 = 0.125.

We know that the Daubechies D4 wavelet has two vanishing moments, thus it can suppress both constant signals and linear signals. This means that for an infinitely long constant or linear signal all the coefficients are zero. However, end effects occur if the signal is of finite length. We will use the linear ramp signal (0, 1, 2,…, 31), shown in figure 3.17(a), as an example. The first iteration produces the transform vector

$W ( 1 ) = ( S 1 , 0 , S 1 , 1 , S 1 , 2 , … , S 1 , 15 , T 1 , 0 , T 1 , 1 , T 1 , 2 , … , T 1 , 15 ) = ( 0.897 , 3.725 , 6.553 , 9.382 , 12.21 , 15.039 , 17.867 , 20.696 , 23.524 , 26.352 , 29.181 , 32.009 , 34.838 , 37.666 , 40.495 , 40.291 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , − 11.314 )$

Figure 3.17   The multiresolution decomposition of a ramp signal using Daubechies wavelets, (a) Original signal, (b) Decomposition 1 (D4 wavelet), (c) Decomposition 2 (D4 wavelet), (d) Decomposition 3 (D4 wavelet), (e) Decomposition 4 (D4 wavelet), (f) Decomposition 5, full decomposition (D4 wavelet), (g) Full decomposition using D6 wavelets, (h) Full decomposition using D12 wavelets. (Note change in scale of the vertical axes.)

where all the detail coefficients at the lowest scale, T 1,n , are zero except for the end coefficient T 1,15. This nonzero coefficient is caused by the signal wraparound where the wavelet filter vector $( 1 / 2 )$ b k placed at the end of the signal encounters the sequence 30, 31, 0, 1. We can see that the computation for the corresponding detail component is
$T 1 , 15 = − 0.129 × 30 − 0.224 × 31 + 0.837 × 0 − 0.483 × 1 = − 11.314$
A histogram of the coefficients is shown in figure 3.17(b). Notice also the edge effect in the approximation coefficients where they all increase linearly in value except the last one due to the wraparound with the scaling filter. The second iteration produces the transform vector
$W ( 2 ) = ( 3.804 , 11.804 , 19.804 , 27.804 , 35.804 , 43.80 , 52.196 , 52.981 , 0 , 0 , 0 , 0 , 0 , 0 , 1.464 , − 15.321 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 11.314 )$
where we see that the next set of detail coefficients (T 2,0 to T 2,7) are zero except those at the end (figure 3.17(c)). There are two nonzero coefficients now due to the edge effect in the smooth coefficients at the previous scale. If we repeat the decomposition to the largest scale we get the fully decomposed signal
$W ( 5 ) = ( 87.681 , − 31.460 , 6.405 , − 29.530 , 0 , 0 , 3.624 , − 21.149 , 0 , 0 , 0 , 0 , 0 , 0 , 1.464 , − 15.321 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 11.314 )$
As we get ‘edge’ coefficients at all scales, we get no zero coefficients at scales m = 4 and m = 5 where there are respectively only two (T 4,0, T 4,1) and one (T 5,0) coefficient.

The full D2 (Haar) decomposition of the ramp signal is

$W ( 5 ) = ( 87.7 , − 45.3 , − 16.0 , − 16.0 , − 5.7 , − 5.7 , − 5.7 , − 5.7 , − 2.0 , − 2.0 , − 2.0 , − 2.0 , − 2.0 , − 2.0 , − 2.0 , − 2.0 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 , − 0.7 ,$
Notice that there are no zero coefficients, as we would expect, since the Haar wavelet has only one vanishing moment and therefore does not suppress linear signals.

On the other hand, all Daubechies wavelets with more than four scaling coefficients have more than two vanishing moments and all can therefore suppress linear signals. For example, the D6 wavelet can suppress mean, linear and quadratic signals. The D6 decomposition of the ramp signal, which employs the scaling coefficients (c 0, c 1, c 2, c 3, c 4, c 5) = (0.470, 1.141, 0.650, −0.191, −0.121, 0.0498), is

$W ( 5 ) = ( 87.681 , − 2.259 , − 29.201 , 25.344 , 0.149 , 9.909 , − 22.224 , 6.2484 , 0 , 0 , 0 , 0 , 0 , 4.22 , − 17.636 , 4.766 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , − 15.175 , 3.861 )$
Notice that this time the two end coefficients are nonzero. This is because the filter is longer. It now has six components and hence two end locations of the filter on the discrete signal overlap and wraparound to the beginning of the signal. A D8 decomposition would generate three end coefficients, a D10 would generate four and so on. The full decompositions of the ramp signal using both the D6 and D12 wavelet are shown in figures 3.17(f) and (g) for comparison. The more scaling coefficients the wavelet has, the higher the number of its vanishing moments and hence the higher the degree of polynomial it can suppress. However, the more scaling coefficients that a wavelet has, the larger its support length and hence the less compact it becomes. This makes it less localized in the time domain and hence less able to isolate singularities in the signal (including edge effects). This is the trade-off which must be considered when selecting the best wavelet for the data analysis task.

#### 3.5.1  Filtering

Let us revisit the filtering process described in the last section and shown in figure 3.16. In signal processing, the approximation coefficients at resolution m, S m,n , are convolved with the lowpass filter. This done by moving the filter along the signal one step at a time. The approximation coefficients are then subsampled (or down-sampled) where every second value is chosen to give the approximation coefficient vector at scale m + 1. The approximation coefficients at resolution m, S m,n , are also convolved with the highpass filter and subsampled in the same way to give the detail signal coefficients at scale m + 1. The detail components T m+1,n are kept and the approximation components S m+1,n are again passed through the lowpass and highpass filters to give components T m+2,n and S m+2,n . Fhis process is repeated over all scales to give the full decomposition of the signal. Each step in the decomposition filtering process is shown schematically in figure 3.18(a). The sequences $( 1 / 2 )$

c k and $( 1 / 2 )$ b k are contained respectively within the lowpass and highpass filter vectors used within the wavelet decomposition algorithm. The filter coefficients are sometimes referred to as ‘taps’.

For signal reconstruction (equation (3.42)), the filtering process is simply reversed, whereby the components at the larger scales are fed back through the filters. This is shown in figure 3.18(b). The approximation and detail components at scale index m + 1 are first upsampled (zeros are inserted between their values) and then passed through the lowpass and highpass filters respectively. This time, however, the filter coefficients are reversed in order and are shifted back along the signal. An example of the reconstruction filtering is shown in figure 3.19 where the component S m,5 is found from the sequences S m+1,n and T m+1,n . Note that the leading (right-hand) filter coefficient defines the location of the S m,n coefficient. Hence the computation of the S m,0, S m,1 and S m,2 coefficients involve components of the filter vector lying off the left-hand end of the upsampled signal at scale m + 1. These are simply wrapped back around to the right-hand end of the signal.

Figure 3.18   Schematic of the signal filtering, (a) Schematic diagram of the filtering of the approximation coefficients to produce the approximation and detail coefficients at successive scales. The subsample symbol means take every second value of the filtered signal. HP and LP are, respectively, the highpass and lowpass filters. In practice the filter is moved along one location at a time on the signal, hence filtering plus subsampling corresponds to skipping every second value as shown in figure 3.16. (b) Schematic diagram of the filtering of the approximation and detail coefficients to produce the approximation coefficients at successively smaller scales. The upsample symbol means add a zero between every second value of the input vector. The coefficients in the HP′ and LP′ filters are in reverse order to their counterparts used in the decomposition shown in (a). Figures (a) and (b) represent a subband coding scheme.

Figure 3.19   Filtering of the signal: reconstruction. The smooth and detail coefficients at scale index m are passed back through the filters to reconstruct the smooth coefficients at the higher resolution S m−1,n . This is done by sliding the lowpass and highpass filters along their respective coefficient sequence with zeros inserted between the coefficients. (Refer back to equation (3.42) where, remember, ‘n’ is the coefficient location index at the lower scale and ‘k’ is the index at the higher scale.)

We can produce a discrete approximation of the scaling function at successive scales if we set all the values of the transform vector to zero except the first one and pass this vector repeatedly back up through the lowpass filter. This is illustrated in the sequence of plots contained in figure 3.20. Figure 3.20(a) shows the initial transform vector, 64 components long, with only the first component set to unity, the rest set to zero, i.e. (1,0,0,0,…, 0). After the first iteration, four components can be seen. These are equal to $( c 0 / 2 , c 1 / 2 , c 2 / 2 , c 3 / 2 , 0 , 0 , 0 , … 0 )$

(notice that this is the decomposition filter). The next iteration (figure 3.20(c)) produces 10 nonzero values in the transform vector, the next, 22 nonzero values, and so on. The transform vector components take on the appearance of the scaling function quite quickly. What we are actually doing is reconstructing S 0,n based on the initial vector (1,0,0,0,…, 0) equal to the following transform vectors: (S 1,0 , S 1,1 , S 1,2,…, T 1,0 , T 1,1, T 1,2,…) for figure 3.20(b); (S 2,0 , S 2,1 , S 2,2,…, T 1,0 , T 1,1, T 1,2,…) for figure 3.20(c); (S 3,0 , S 3,1 , S 3,2,…, T 1,0 , T 1,1, T 1,2,…) for figure 3.18(d), and so on. Notice that, as we employ wraparound to deal with the edge effects, the final signal shown in figure 3.20(g) is a constant value equal to the signal mean. Figure 3.20(h) shows a high-resolution representation of a Daubechies scaling function generated by iterating nine times with an initial transform vector containing 4096 components.

Figure 3.20   The reconstruction of a Daubechies D4 scaling function, (a) The transform vector (1, 0, 0, 0, 0, 0, …). (b) One iteration, i.e. S 1,0 = 1. (c) Two iterations, i.e. S 2,0 = 1. (d) Three iterations, i.e. S 3,0 = 1. (e) Four iterations, i.e. S 4,0 = 1. (f) Five iterations, i.e. S 5,0 = 1. (g) Six iterations of the transform vector with 64 components, i.e. S 6,0 = 1. This results in a constant signal equal to the mean due to wraparound, (h) After nine iterations of a transform vector (1, 0, 0, 0, 0, …) with 4096 components.

You can verify that the constant signal of figure 3.20(g) stems from the wraparound employed when reconstructing from the full decomposition wavelet vector with S M,0 set to unity. In this case, for the first iteration, adding a zero to get (S M,0,0) and running the reconstruction filter over both components leads to two approximation coefficients (S M,0 , S M−1,1) equal to $( ( c 0 + c 2 ) / 2 , ( c 1 + c 3 ) / 2 ) = ( 0.707 , 0.707 )$

. Adding zeros to this vector and iterating again produces four approximation coefficients (0.5,0.5,0.5,0.5) and so on until we obtain the full reconstruction after the sixth iteration which contains constant values of S 0,n equal to $( 1 / 2 ) 6 = 0.125$ . We can see this is true for the general case as, from equation (3.74) with m = 0 together with equation (3.21), it can be shown that
3.83 $∑ k even c k = ∑ k odd c k = 1$
Hence, the combined effect of wraparound plus the zeros inserted in the reconstruction process causes S M,0 to be multiplied by both $( 1 / 2 ) ∑ k even c k$ and $( 1 / 2 ) ∑ k odd c k$ in turn to find S M−1,0 and, S M−1,1, which both have the same value simply equal to $S M , 0 / 2$ . This transform vector now has two constant values (the rest zero). The next iteration produces four constant nonzero values and so on. It is easy to prove to yourself that using Daubechies wavelets with different numbers of nonzero scaling coefficients, N k , will always result in constant valued transform vector components.

If we repeat the reconstruction filtering, this time setting the first detail coefficient to unity, we can generate an approximation to a wavelet. Notice that, in doing so, only the first iteration of the reconstruction algorithm requires the reordered scaling coefficient sequence for the wavelet, b k , and subsequent reconstruction filtering then uses the scaling coefficient sequence c k . Figure 3.21 shows the wavelet reconstruction process. From these examples, where a single component was set to unity, we can see that the transform vector components, when passed back through the filters during the reconstruction process, increasingly spread their influence (information) over the transform vector in terms of either discrete scaling functions (if the original component was an approximation coefficient) or discrete wavelet function (if the original component was a detail coefficient). Figures 3.20 and 3.21 illustrate the discrete approximations of the wavelet and scaling functions at various scales implicit within the multiresolution algorithm. The discrete scaling and wavelet functions at scale 1 are, in fact, the filter coefficient vectors $( 1 / 2 )$

b k and $( 1 / 2 )$ c k respectively.

#### 3.5.2  Symmlets and Coiflets

Looking back at figure 3.15 we can see that Daubechies wavelets are quite asymmetric. To improve symmetry while retaining simplicity, Daubechies proposed Symmlets as a modification to her original wavelets (also spelled symlets). Symmetric wavelet filters are desirable in some applications, e.g. image coding, as it is argued that our visual systemis more tolerant of symmetric errors. In addition, it makes it easier to deal with image boundaries. Daubechies came up with Symmlets by ‘juggling with their phase’ during their construction. Figure 3.22 contains two examples of Symmlets together with their scaling functions. They have N k /2 − 1 vanishing moments, support length N k − 1 and filter length N k . However, true symmetry (or antisymmetry) cannot be achieved for orthonormal wavelet bases with compact support with one exception: the Haar wavelet which is antisymmetric. Coifiets are another wavelet family found by Daubechies. They are also nearly symmetrical and have vanishing moments for both the scaling function and wavelet: the wavelet has N k /3 moments equal to zero and the scaling function has N k /3 − 1 moments equal to zero. They have support length N k − 1 and filter length N k . Three examples of Coifiet wavelets are shown in figure 3.23. The number of coefficients, N k , used to define Coiflets increase in multiples of six.

Figure 3.21   The reconstruction of a Daubechies D4 wavelet, (a) One iteration with only T 1,0 initially set equal to 1. (b) Two iterations with only T 2,0 initially set equal to 1. (c) Three iterations with only T 3,0 initially set equal to 1. (d) Four iterations with only T 4,0 initially set equal to 1.

Figure 3.22   Symmlets and their associated scaling functions. (a) S6. (b) S10. (The scaling functions are shown dotted.)

Figure 3.23   Coiflets and their associated scaling functions. (a) C6. (b) C12. (c) C18. (Scaling functions are shown dotted in figures.)

#### 3.6  Translation invariance

Translation invariance (or shift invariance) is an important property associated with some wavelet transforms but not others. It simply means that if you shift along the signal all your transform coefficients simply move along by the same amount. However, for the dyadic grid structure of the discrete wavelet transform this is clearly not the case: only if you shift along by the grid spacing at that scale do the coefficients become translation invariant at that scale and below. Even for the discretization of the continuous wavelet transform, the transform values are translation invariant only if shifted by any integer multiple of the discrete time steps. This is illustrated in figure 3.24 where a simple sinusoidal signal is decomposed using the discrete orthonormal Haar wavelet and a continuous Mexican hat wavelet discretized at each time step. The original signal shown on the left-hand side of figure 3.24(a) is composed of 10 cycles of a sinusoid made up of 1024 discrete data points. The middle plot in figure 3.24(a) contains the Haar transform coefficients of the signal plotted in their dyadic grid formation. The right-hand plot contains the continuous wavelet transform using the Mexican hat wavelet. The coarse structure of the dyadic grid is evident when compared with the smooth, high resolution Mexican hat transform plot.

We will assume the signal continues indefinitely and shift the signal window along to see what happens to the transform plots. The variation in the coefficients of the Haar transform with displacement of the time window along the signal can be seen by looking down the central plots in the figure. Shifting the window forward (or back) along the signal by a scale of 2 m produces the same coefficients in the transform plot at and below that scale shifted in time by 2 m . This can be observed in figures 3.24(b)–(d) where the size of the shift is indicated in the transform plot of figure 3.24(a). The coefficient level at and below which the coefficients remain the same as the original is indicated by the arrows at the left-hand side of each transform plot. Shifting the signal by an arbitray scale (not a power of two) leads to a completely different set of coefficients, as can be seen in figure 3.24(e), where the signal is shifted by an eighth of a sinusoidal cycle (12.8 data points). The translation invariance of the continuous transform plots is obvious by looking down the right-hand figures where, for any arbitrary shift in the signal, the wavelet transform values are simply translated with the shift.

Figure 3.24   Translation invariance. (a) Original signal, 1024 data points, (b) Shift by 64 data points, (c) Shift by 128 data points, (d) Shift by 256 data points, (e) Shift by 1/8 cycle.

We can make a translation invariant version of the discrete wavelet transform. Known by a variety of names including the redundant, stationary, translation invariant, maximal overlap or non-decimated wavelet transform, we simply skip the subsampling part of the filtering process described in section 3.5.1. This results in the same number of wavelet coefficients generated at each step, equal to the number of signal components N. The decomposition is the same as that shown in figure 3.16 except that every value is retained as the filter moves one step at a time along the signal. In addition, the filters have to be stretched through the addition of zeros between coefficients, hence this algorithm is called the ‘à trous’ algorithm from the French ‘with holes’. An average basis inverse can be performed which gives the average of all possible discrete wavelet transform reconstructions over all possible choices of time origin in the signal. This is sometimes useful for statistical applications including denoising. In addition to being translation invariant, the redundant discrete wavelet transform is extendable to signals of arbitrary length. We do not consider the redundant discrete wavelet transform herein, rather the reader is referred elsewhere at the end of this chapter.

#### 3.7  Biorthogonal wavelets

For certain applications real, symmetric wavelets are required. One way to obtain symmetric wavelets is to construct two sets of biorthogonal wavelets: ψ m,n and its dual, $ψ ˜ m , n$

. One set is used to decompose the signal and the other to reconstruct it. For example, we can decompose the signal using ψ m,n as follows:
3.84 $T m , n = ∫ − ∞ ∞ = ∫ − ∞ ∞ x ( t ) ψ m , n ( t ) d t$
and perform the inverse transform using $ψ ˜ m , n$ ,
3.85 $x ( t ) = ∑ m = − ∞ ∞ ∑ n = − ∞ ∞ T m . n ψ ˜ m , n ( t )$
Alternatively, we can decompose the signal using
3.86 $T ˜ m , n = ∫ − ∞ ∞ x ( t ) ψ ˜ m , n ( t ) d t$
and reconstruct using
3.87 $x ( t ) = ∑ m = − ∞ ∞ ∑ n = − ∞ ∞ T ˜ m , n ψ ( t )$
Biorthogonal wavelets satisfy the biorthogonality condition:
3.88
Using biorthogonal wavelets allows us to have perfectly symmetric and antisymmetric wavelets. Further, they allow certain desirable properties to be incorporated separately within the decomposition wavelet and the reconstruction wavelet. For example, ψ m,n and $ψ ˜ m , n$ can have different numbers of vanishing moments. If ψ m,n has more vanishing moments than $ψ ˜ m , n$ then decomposition using ψ m,n suppresses higher order polynomials and aids data compression. Reconstruction with the wavelets $ψ ˜ m , n$ with fewer vanishing moments leads to a smoother reconstruction. This can sometimes be a useful property, for example in image processing. Figure 3.25 shows three examples of compactly supported biorthogonal spline wavelets and their duals commonly used in practice, together with their associated scaling equations.

Figure 3.25   Biorthogonal spline wavelets. (a) Biorthogonal (1,5) spline wavelets: ϕ(t), ψ(t) (left) and ϕ ˜ ( t ) ,     ψ ˜ ( t ) (right), (b) Biorthogonal (2,4) spline wavelets: ϕ(t), ψ(t) (left) and ϕ ˜ ( t ) ,     ψ ˜ ( t ) (right), (c) Biorthogonal (3,7) spline wavelets: ϕ(t), ψ(t) (left) and ϕ ˜ ( t ) ,     ψ ˜ ( t ) (right). (Scaling functions shown dotted.)

#### 3.8  Two-dimensional wavelet transforms

In many applications the data set is in the form of a two-dimensional array, for example the heights of a machined surface or natural topography, or perhaps the intensities of an array of pixels making up an image. We may want to perform a wavelet decomposition of such arrays either to compress the data or to carry out a wavelet-based parametric characterization of the data. To perform a discrete wavelet decomposition of a two-dimensional array we must use two-dimensional discrete wavelet transforms. We can simply generate these from tensor products of their one-dimensional orthonormal counterparts. The most common (and simplest) arrangement is to use the same scaling in the horizontal and vertical directions. These give square transforms. (Other forms are possible, e.g. rectangular transforms where the horizontal and vertical scaling vary independently and also two-dimensional transforms which are not simply tensor products. These are outside the scope of this text.) The two-dimensional Haar scaling and wavelet functions are: two-dimensional scaling function

3.89a $ϕ ( t 1 , t 2 ) = ϕ ( t 1 ) ϕ ( t 2 )$
two-dimensional horizontal wavelet
3.89b $ψ h ( t 1 , t 2 ) = ϕ ( t 1 ) ψ ( t 2 )$
two-dimensional vertical wavelet
3.89c $ψ v ( t 1 , t 2 ) = ψ ( t 1 ) ϕ ( t 2 )$
two-dimensional diagonal wavelet
3.89d $ψ d ( t 1 , t 2 ) = ψ ( t 1 ) ψ ( t 2 )$
Remember that in both the last chapter and this we have been using t as our independent variable, either temporal or (less common) spatial. In this section t 1 and t 2 represent spatial coordinates.

The multiresolution decomposition of the two-dimensional coefficient matrices can be expressed as

3.90a $S m + 1 , ( n 1 , n 2 ) = 1 2 ∑ k 1 ∑ k 2 c k 1 c k 2 S m ( 2 n 1 + k 1 , 2 n 2 + k 2 )$
3.90b $T m + 1 , ( n 1 , n 2 ) h = 1 2 ∑ k 1 ∑ k 2 b k 1 c k 2 S m ( 2 n 1 + k 1 , 2 n 2 + k 2 )$
3.90c $T m + 1 , ( n 1 , n 2 ) v = 1 2 ∑ k 1 ∑ k 2 c k 1 b k 2 S m ( 2 n 1 + k 1 , 2 n 2 + k 2 )$
3.90d $T m + 1 , ( n 1 , n 2 ) d = 1 2 ∑ k 1 ∑ k 2 b k 1 b k 2 S m ( 2 n 1 + k 1 , 2 n 2 + k 2 )$
where k 1, k 2 are the scaling coefficient indices and n 1, n 2 are the location indices at scale m + 1. (Compare with the one-dimensional case of equations (3.36) and (3.37).) We can simply use discrete versions of the two-dimensional wavelets at scale index 1 to perform

Figure 3.26   The two-dimensional discrete Haar wavelet at scale index 1. (a) Scaling function, (b) Horizontal wavelet, (c) Vertical wavelet, (d) Diagonal wavelet.

the multiresolution analysis of the array. The discrete two-dimensional Haar scaling and wavelet functions in matrix form at scale index m = 1 are
$scaling function vertical wavelet ↓ ↓ 1 2 [ 1 1 1 1 ] 1 2 [ 1 − 1 1 − 1 ] horizontal wavelet diagonal wavelet ↓ ↓ 1 2 [ 1 1 − 1 − 1 ] 1 2 [ 1 − 1 − 1 1 ]$
As these are at scale index 1, they are simply tensor products of the scaling and wavelet filter coefficients $c k / 2$ and $b k / 2$ . These discrete functions are shown schematically in figure 3.26. Note the $1 2$ factor before each matrix is simply the square of the $1 / 2$ factor preceding the corresponding one-dimensional functions. For discrete scaling and wavelet functions at larger scales, this normalization factor becomes 1/2 m . These four 2×2 matrices are required for the Haar wavelet decomposition of the two-dimensional array. The general idea of the two-dimensional wavelet decomposition is shown schematically in figure 3.27. The original input array is represented by X 0 defined at scale index m = 0. As with the one-dimensional case, its components are input as the approximation coefficients at scale index 0, i.e. the matrix S 0. After the first wavelet decomposition, a decomposition array is formed at scale index 1 which is split into four distinct submatrices: the vertical detailed

Figure 3.27   Schematic diagram of the matrix manipulation required to perform wavelet decomposition on a two-dimensional grid.

components $T 1 v$ , the horizontal detailed components $T 1 h$ , the diagonal detailed components $T 1 d$ and the remaining approximation components S 1. The decomposition array is the same size as the original array. As with the discrete wavelet decomposition of one-dimensional signals, the detail coefficients are subsequently left untouched and the next iteration further decomposes only the approximation components in the submatrix S 1. This results in detail coefficients contained within submatrices $T 2 v , T 2 h$ and $T 2 d$ at scale index 2 and approximation coefficients within submatrix S 2. This procedure may be repeated M times for a2 M × 2 M array to get a number of coefficient submatrices $T m v , T m h$ and $T m d$ of size 2 Mm × 2 Mm , where m = 1, 2, … M.

Figure 3.28   A two-dimensional signal containing a single spike.

Let us look at a very simple example, a matrix with a single nonzero component

3.91 $X 0 = [ 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 ]$
This array could represent a single pixel turned on in a display or perhaps a spike protruding from a flat surface. We can visualize it in figure 3.28. The two-dimensional Haar multiresolution decomposition of this array is carried out by scanning over it with each of the four discrete wavelet matrices in turn. This is shown schematically in figure 3.29. On the first iteration (from scale m = 0 to 1), the scaling function 2×2 matrix is scanned over the input array. We require four of these matrices to cover this 4×4 array. The components of the original array are multiplied in turn by the scaling function array to give the resulting matrix product S 1 which is placed in the top right-hand quadrant of the first iteration matrix. Similarly, the vertical wavelet 2×2 wavelet matrix is scanned over the array and each value is placed in the top right-hand quadrant of the first iteration array. This is illustrated in the $T 1 v$ matrix marked on the right-hand of the figure. The procedure is repeated for the horizontal and diagonal wavelet components, placing the products in the bottom left-hand and bottom right-hand quadrants respectively. The normalization factor of 1/2 has been left out of the matrix manipulation until the end to aid clarity. Hence, after the first iteration, the resulting wavelet transform matrix is
3.92 $W ( 1 ) = [ 0.25 0 − 0.5 0 0 0 0 0 − 0.5 0 0.5 0 0 0 0 0 ]$
The next iteration uses only the approximation components in the top left-hand quadrant (shown within the dotted box). The four component values of this array

Figure 3.29   Schematic diagram of the matrix manipulation required to perform the Haar wavelet decomposition of the spike signal. Components of the discrete scaling and wavelet matrices are shown in bold, components of the original discrete signal are shown shaded.

are then interrogated by each of the scaling and wavelet matrices in turn. This produces the second iteration matrix
3.93 $W ( 2 ) = [ 0.25 0.25 − 0.5 0 0.25 0.25 0 0 − 0.5 0 0.5 0 0 0 0 0 ]$
This is the second and last iteration we can perform in the decomposition of the 4×4 (i.e. M = 2) array. This matrix is made up of $T 1 v , T 1 h , T 1 d , T 2 v , T 2 h , T 2 d$ and, in the top left-hand corner, the approximation component S 2 which is related to the mean of the original array. Notice that, at each iteration, the resulting matrix has the same energy as the original matrix where the energy is the sum of the squared components of the matrix, i.e.

Figure 3.30   An 8 × 8 ‘step’ array together with its associated Haar decompositions. (a) Original step, (b) Iteration 1. (c) Iteration 2. (d) Iteration 3. Note that black dots have been added to the non-zero coefficients in plots (c) and (d) to aid clarity.

3.94 $E = ∑ i = 0 2 M − 1 ∑ j = 0 2 M − 1 ( X 0 , i , j ) 2 = ∑ i = 0 2 M − 1 ∑ j = 0 2 M − 1 ( W i , j ( m ) ) 2$
where X 0,i,j and $W i , j ( m )$ are, respectively, the elements of the input and wavelet decomposition matrices located on row i and column j. E is equal to 1 for the array considered above. We can easily verify that E is equal to 1 for the intermediate decomposition matrix as well as the full decomposition matrix. You can confirm this conservation of energy property in all the examples which follow.

Let us look at another simple example, the step shown in figure 3.30(a) and given in matrix form as

3.95 $X 0 = [ 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ]$
This 8×8 array can be decomposed through three iterations using the Haar wavelet as follows:

first decomposition W (1)

3.96a $[ 4 4 4 4 0 0 0 0 4 4 4 4 0 0 0 0 3 3 3 3 0 0 0 0 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ]$
second decomposition W (2)
3.96b $[ 8 8 0 0 0 0 0 0 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 ]$
third (full) decomposition W (3)
3.96c
Note that, after the three iterations, only the approximation component and the components in regions corresponding to the horizontal wavelet decomposition (labelled $T 1 h , T 2 h$ and $T 3 h$ ) contain nonzero elements. This is because the original array only contains a single horizontal feature—the horizontal discontinuity or step. Another thing to notice is that the 64 nonzero elements of the original array have been transformed into only eight nonzero elements, i.e. a compression of the data has taken place. The original arrays, together with the three levels of decomposition, are shown as histograms in figure 3.30.

We can now use the detail coefficients to reconstruct the array at different scales. This is shown in figure 3.31 and is performed using the detail coefficients at each scale of interest. In the same way as we did for the one-dimensional signals earlier in section 3.4, discrete detail arrays $D m h , D m d$

and $D m v$ can be reconstructed at the scale of the input array using each of the detail coefficients, $T m h , T m d$ and $T m v$ . A combined detail can be found at scale index m, given by
3.97 $D m = D m h + D m d + D m v$
As with the one-dimensional case, the array can be represented as a sum of the discrete details plus an array mean.
3.98 $X 0 = X M + ∑ m = 1 M D m$
where X M is the smooth version of the input array at the largest scale index, M, expressed at the scale of the input array m = 0. For example, to reconstruct the detail array components at the smallest scale (m = 1), each corresponding component in each of the three coefficient submatrices, $T 1 h , T 1 d$ and $T 1 v$ , are used in conjunction with the corresponding Haar wavelets (horizontal, diagonal and vertical) to produce the array details at this level: $D 1 h , D 1 d$ and $D 1 v$ . The m = 1 submatrices are shown in figure 3.31(a). These three 4×4 coefficient submatrices are each expanded into an 8×8 array corresponding to the horizontal, diagonal and vertical details of the original array using the corresponding discrete wavelets. This is shown schematically in figure 3.31(b) for the horizontal detail at scale index 1, i.e. $D 1 h$ . The expansion of the transform coefficients through the discrete Haar wavelet into the detailed components is shown explicitly for two of the $T 1 h$ components in the figure. We can see that the detail matrices $D 1 d$ and $D 1 v$ will have all elements equal to zero as both $T 1 d$ and $T 1 v$ contain only zero elements. Hence the combined detail component matrix is

Figure 3.31   Schematic diagram of the matrix manipulation required to derive the detail matrices at scale index m = 1. (a) The three detail coefficient submatrices at scale index 1 for the step, (b) Computing the horizontal discrete detail from the detail coefficients at scale index 1.

3.99 $D 1 = D 1 h + D 1 d + D 1 v = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 − 0.5 − 0.5 − 0.5 − 0.5 − 0.5 − 0.5 − 0.5 − 0.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ]$
Similarly, we can get the detail matrices at scale index 2, $D 2 h , D 2 d$ and $D 2 v$ , using the scale 2 coefficients, respectively $T 2 h , T 2 d$ and $T 2 v$ and the discrete wavelet at scale index 2. This is shown for $D 2 h$ in figure 3.32. Note that the normalization factor at scale 2 is 1/22. The scale 3 coefficient submatrices, $T 3 h , T 3 d$ and $T 3 v$ , are simply [3], [0] and [0] respectively. It is easily seen that this gives the detail at the largest wavelet

Figure 3.32   Schematic diagram of the matrix manipulation required to compute the horizontal detail at scale index m = 2.

scale equal to
3.100 $D 3 = [ 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 − 0.375 ]$
where each component is either 3/23 or −3/23. In fact, this matrix is simply a single large horizontal discrete Haar wavelet scaled by 3/23. The approximation coefficient (=13) leads to a matrix of values all equal to the mean of the original array (=13/23), i.e.
3.101 $X 3 = [ 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 1.625 ]$
Figure 3.33 shows the reconstructions at each scale together with the mean component. From the figure we can see that if we add them all together, D 1 + D 2 + D 3 + X 3, we get back the original input array X 0.

Figure 3.33   The details of the step array. (a) Original step array X 0. (b) D 1. (c) D 2. (d) D 3. (e) X 3. (f) Sum of (b), (c), (d) and (e) giving original signal (a).

Figure 3.34   An image and its first two decomposition matrices, (a) Original array S 0. (b) Transform array V (1) containing submatrices S 1, T 1 v ,   T 1 h and T 1 d . (c) Transform array V (2) containing submatrices S 2, T 1 v ,   T 1 h ,   T 1 d ,   T 2 v ,   T 2 h and T 2 d .

Figure 3.35   Haar decomposition of a surface data set. (a) Original data set 128 × 128 array of surface heights taken from a river bed. (b) Scale index m = 1 discrete detail D 1. (c) Scale index m = 2 discrete detail D 2. (d) Scale index m = 3 discrete detail D 3. (e) Scale index m = 4 discrete detail D 4. (f) Scale index m = 5 discrete detail D 5. (g) Scale index m = 6 discrete detail D 6. (h) Scale index m = 7 discrete detail D 7. (i) Sum of first three discrete details D 1 + D 2 + D 3 (= (b) + (c) + (d)). (j) Scale index m = 3 discrete approximation X 3 = X 0 − (D 1 + D 2 + D3 ) (= (a) − (b) − (c) − (d)). (Greyscale used in all images: maximum = white, minimum = black.)

Figures 3.34 and 3.35 show examples of much larger data sets, both using the Haar wavelet transform. Figure 3.34 shows an image (‘Lena’) together with its first two decomposition matrices, where the approximation and detail coefficient sub-matrices can be clearly seen. Figure 3.35 shows an example of the Haar decomposition of a more irregular array. The 128 × 128 array shown in figure 3.35(a) contains the heights of a measured rough surface. The details of this array, at scale indices m = 1–7, are shown in figures 3.35(b)–(h). Figure 3.35(i) contains the summation of the first three details and figure 3.35(j) shows the resulting approximation at scale index m = 3 when these details are subtracted from the original array. The blocky nature of the Haar decomposition is noticeable from the plots. The two-dimensional Haar wavelet is very simple in its form and, as with their one-dimensional counter-parts, there are more complex forms of Daubechies wavelets and, of course, many other wavelet families to choose from. Some examples of these wavelets are given in figure 3.36. Using these wavelets will result in overlap of the wavelet at the array edge and therefore would require the use of wraparound or other methods to deal with the data edges.

Figure 3.36   Examples of two-dimensional orthonormal wavelets: Daubechies, Symmlets and Coiflets. (a) D2 Haar wavelet, (b) D4. (c) D8. (d) D12. (e) D12, another perspective, (f) D20. (g) Symmlet S8. (h) Coiflet C6. (Wavelets shown viewed at various angles and elevations.)

#### 3.9  Adaptive transforms: wavelet packets

As we saw in chapter 2 (figure 2.29) the resolution of the wavelet transform is not uniform in the time–frequency plane. The Heisenberg boxes expand in frequency and contract in time as fluctuations at smaller and smaller scales are explored. The short term Fourier transform (STFT), on the other hand, covers the time–frequency plane with tiles of constant aspect ratio (figure 2.30). We also looked briefly at matching pursuits which offer another way of extracting time-frequency information. In this section we will consider another method which can adapt to the signal and hence allows for more flexibility in the partitioning of the time-frequency plane: the wavelet packet transform.

Wavelet packet (WP) transforms are a generalization of the discrete wavelet transform. Wavelet packets involve particular linear combinations of wavelets and the wavelet packet decomposition of a signal is performed in a manner similar to the multiresolution algorithm given earlier for the discrete wavelet transform. The difference is that, in the WP signal decomposition, both the approximation and detailed coefficients are further decomposed at each level. This leads to the decomposition tree structure depicted at the top of figure 3.37. Compare this with the schematic of the wavelet decomposition given in figure 3.6. At each stage in the wavelet algorithm, the detailed coefficients are simply transferred down, unchanged, to the next level. However, in the wavelet packet algorithm, all the coefficients at each stage are further decomposed. In this way, we end up with an array of wavelet packet coefficients with M levels each with N coefficients. A total of N coefficients from this M × N array can then be selected to represent the signal. The standard wavelet transform decomposition coefficients are contained within the WP array, shown by the bold boxes in figure 3.37. A new nomenclature is employed in the figure to indicate the operations that have been performed on each set of coefficients. S produces the approximation components of the previous set of coefficients by lowpass filtering, and T the detail components through highpass filtering. We simply add the letter S or T to the left-hand end of the coefficient name to indicate the most recent filtering procedure. SSTS n , for example, corresponds to the original signal lowpass filtered, then highpass filtered then passed twice through the lowpass filter. Notice also that the subscript contains only the location index n. The scaling index m is omitted as it is obviously equal to the number of letters S and T in the coefficient name. As with the original wavelet transform, the number of coefficients at each scale depends upon the scale, with one coefficient in each coefficient group at the largest scale M and N/2 coefficients at the smallest scale m = 1. Hence, the coefficient index spans n = 0,1, …,2 Mm − 1.

Figure 3.37   Schematic diagram of wavelet packet decomposition.

At each stage in the decomposition, the wavelet packet algorithm partitions the time-frequency plane into rectangles of constant aspect ratio. These become wider (in time) and narrower (in frequency) as the decomposition proceeds. This is shown schematically at the bottom of figure 3.37 for each scale. A variety of tilings of the time-frequency plane is possible using the wavelet packet coefficients. For example, we could keep all the coefficients at a level and discard all the others. This would tile the plane in boxes of constant shape, just like one of those shown at the bottom of figure 3.37. Other tilings are possible, some examples of these being shown in figure 3.38. The standard wavelet transform is just one of all the possible tiling patterns. Figure 3.39 contains the wavelet packet coefficient selections corresponding to the tiling patterns of figure 3.38.

Figure 3.38   Schematic diagrams of allowable wavelet packet tiling of the time-frequency plane. The right-hand tiling is that used in the wavelet transform algorithm and corresponds to the components contained in the bold boxes shown in the previous figure.

Figure 3.39   The choice of wavelet packet components leading to the tilings shown in figure 3.38.

The optimal or ‘best’ coefficient selection (hence tiling arrangement) is chosen to represent the signal based on some predefined criterion. This criterion is normally based on an information cost function which aims to retain as much information in a few coefficients as possible. The most common measure of information used is the Shannon entropy measure. This is defined for a discrete distribution p i , i = 0,1, … ,N − 1, as

3.102 $S ( p ) = − ∑ i p i log ( p i )$
where, for this case, p i are the normalized energies (i.e. squared magnitudes) of the wavelet packet coefficients under consideration. Low entropies occur when the larger coefficient energies are concentrated at only a few discrete locations. The minimum possible entropy of zero occurs when p i = 1 for only one value of i, the other probabilities being zero. In this case all the information needed to represent the signal is condensed within a single coefficient. The maximum entropy occurs when there is an equal distribution of coefficient energies. In this case p i = 1 / N and the signal information is evenly spread throughout all the coefficients. We can see that p i acts as a discrete probability distribution of the energies. (More information on the Shannon entropy measure together with an illustrative figure is given in chapter 4, section 4.2.4.)

The set of TV wavelet packet coefficients which contain the least entropy are selected to represent the signal. That is, we want the signal information to be concentrated within as few coefficients as possible. To find these coefficients the WP array, such as the one we saw in figure 3.37, is inspected from the bottom upwards. At each scale, each pair of partitioned coefficient sets (the ‘children’) are compared with those from which they were derived (their ‘parent’). If the combined children’s coefficients have a smaller entropy than those of their parent then they are retained. If not, the parent’s coefficients are retained. When the children are selected their entropy value is assigned to their parent in order that subsequent entropy comparisons can be made further up the tree. This is shown schematically in figure 3.40. Once the whole WP array has been inspected in this way, we get an optimal tiling of the time-frequency plane (with respect to the localization of coefficient energies). This tiling provides the best basis for the signal decomposition.

Figure 3.40   Wavelet packet coefficient selection.

Figure 3.41   Wavelet packet decomposition of a simple signal, (a) Signal (top) with wavelet packet decomposition (below). The coefficient values are plotted as heights. The scale indices, m = 1 to 6, are given down the left-hand side of the plot. Trace WP contains the best selection of wavelet packets and trace WT contains the wavelet transform decomposition for comparison, (b) The time-frequency tiling associated with each wavelet packet decomposition in (a). Larger coefficient values are plotted darker. The time-frequency tiling associated with the best wavelet packet decomposition (left) and wavelet decomposition (right), (d) The 16 largest coefficients from (c): wavelet packet decomposition (left) and wavelet decomposition (right), (e) The reconstruction of the signal using only the 16 largest coefficients given in (d): wavelet packet (left) and wavelet (right).

Figure 3.41 illustrates the wavelet packet method on a simple discrete signal composed of a sampled sinusoid plus a spike. The signal is 64 data points in length. A Haar wavelet is used to decompose the signal. Figure 3.41(a) shows the wavelet packet coefficients below the original signal for each stage in the WP algorithm. The coefficients are displayed as histograms. The bottom two traces contain the coefficients corresponding to the best wavelet packet basis and the ‘traditional’ discrete wavelet basis respectively. The WP tiling of the coefficient energies in the time-frequency plane for each scale is given in figure 3.41(b). The larger coefficient energies are shaded darker in the plot. In figure 3.41(c) the optimal WP tiling is compared with the wavelet transform tiling of the time–frequency plane. The plots in figure 3.41(d) outline the 16 largest coefficients in both time–frequency planes of figure 3.41(c). These are used to reconstruct the signals shown in figure 3.41(e). The 16 largest wavelet packet coefficients contain 98.6% of the signal energy, whereas the 16 wavelet transform coefficients contain 96.5% of the signal energy. The wavelet packet reconstruction using the selected coefficients is visibly smoother than the reconstruction using the traditional wavelet transform coefficients. Figure 3.42 contains the same signal as figure 3.41. However, this time the WP decomposition is performed using a Daubechies D20 wavelet (refer back to figure 3.15). Using this wavelet results in a different tiling of the time-frequency plane for the WP method (compare the left-hand plots of figures 3.42(b) and 3.41(c)). Again, the 16 largest coefficients are used in the signal reconstruction. The oscillatory parts of both reconstructions shown in figure 3.42(d) are visibly smoother than their Haar counterparts in the previous figure (figure 3.41(e)). We expect this as the D20 is more smoothly oscillatory than the Haar. Note, however, comparing figures 3.42(c) and 3.41(d), we see that the signal spike leads to a single high frequency tile for both Haar decompositions but respectively five and four high frequency tiles for the D20 wavelet. The more compact support of the Haar wavelet has allowed for a better localization of the signal spike, but it does makes it less able than the D20 to represent the smooth oscillations in the signal. The energies of the reconstructed signals for the D20 decompositions using only the largest 16 coefficients are 99.8% (WP) and 99.7% (WT), indicating the data compression possibilities of the techniques.

Figure 3.42   Wavelet packet decomposition of a simple signal using a Daubechies D20 wavelet, (a) The wavelet packet decomposition of the signal shown at the top of figure 3.31(a). (b) The time-frequency tiling associated with the best wavelet packet decomposition (left) and wavelet decomposition (right). The 16 largest coefficients from figure 3.31(c): wavelet packet decomposition (left) and wavelet decomposition (right), (d) The reconstruction of the signal using only the 16 largest coefficients given in figure 3.31(d): wavelet packet (left) and wavelet (right).

#### 3.10.1  Chapter keywords and phrases

(You may find it helpful to jot down your understanding of each of them.)

 discrete wavelet transform wraparound wavelet/detail coefficients scale index wavelet frames sequential indexing tight frame level indexing orthonormal basis hard thresholding dyadic grid soft thresholding scaling function scale thresholding approximation coefficients Daubechies wavelets inverse discrete wavelet transform subsampled discrete approximation upsampled continuous approximation Symmlets signal detail Coiflets multiresolution translation invariance scaling equation redundant/stationary/translation scaling coefficients invariant/non-decimated/maximal compact support overlap/discrete wavelet transform fast wavelet transform biorthogonal wavelets decomposition algorithm wavelet packet transforms reconstruction algorithm Shannon entropy measure

#### 3.10.2  Further resources

Papers describing the discrete wavelet transform at an introductory level include those by Kim and Aggarwal (2000), Depczynski et al (1997), Asamoah (1999) and Graps (1995). The paper by Williams and Armatunga (1994) contains a good explanation of the derivation of the Daubechies D4 scaling coefficients and multi-resolution analysis. They present a clear account of the wavelet filtering of a signal using matrices. Other useful introductory papers are those by Sarkar et al (1998), Meneveau (1991a), Jawerth and Sweldens (1994), Newland (1994a-c), Strang (1989, 1993) and the original paper on multiresolution analysis is by Mallat (1989). In his book, Newland (1993a) gives a more detailed account of the conditions that must be satisfied for discrete orthonormal wavelets. Diou et al (1999) and Toubin et al (1999) provide some useful information on projecting the details and approximations determined at one scale at another scale. Chui (1997) provides a little more mathematical detail in a readable account of discrete wavelet transforms and their role in signal analysis. Strang and Nguyens’ (1996) text concentrates on the connection between wavelets and filter banks used in digital signal processing. Daubechies’ (1992) book provides a good grounding in all aspects of wavelet transforms, containing among other things useful text on wavelet frames and wavelet symmetry. It also contains more information on the construction of Symmlets, Coiflets and biorthogonal wavelets. This chapter has concentrated on two compact, discrete orthonormal wavelets: the Haar wavelet and the Daubechies D4 wavelet. There are many other wavelets we have not considered, most notably the Shannon, Meyer, and Battle-Lemarié wavelets. More information on these can be found in, for example, Chui (1997), Mallat (1998), Daubechies (1992) and Blatter (1998). More examples of the multiresolution analysis of simple discrete signals can be found in the book by Walker (1999). Books providing a more mathematical account of discrete wavelet transforms are those by Hernandez and Weiss (1996), Benedetto and Frazier (1994), Chui (1992a,b), Walter (1994) and Percival and Waiden (2000). There is also a lot of useful information on the web. The appendix contains a list of useful websites from which to begin a search.

The book by Starck et al. (1998) contains a lot of good introductory material on two-dimensional discrete wavelet transforms, covering many of their practical applications including remote sensing, image compression, multiscale vision models, object detection and multiscale clustering. A nice illustrative paper on the application of two-dimensional wavelet transforms is that by Jiang et al (1999), who investigate the three-dimensional surface of orthopaedic joint prostheses. We considered only square two-dimensional transforms in this chapter, where the horizontal and vertical scalings were kept the same. If they are allowed to vary independently, we get rectangular transforms. Also possible are two-dimensional transforms which are not simply tensor products but wavelets constructed ‘intrinsically’ for higher dimensions (Jawerth and Sweldens, 1994). There are many applications of wavelet packets cited in the rest of this book. Quinquis (1998) provides a nice introduction to wavelet packets, while the paper by Hess-Nielsen and Wickerhauser (1996) and Wickerhauser’s (1994) book provide a more in-depth account.

We mentioned briefly the redundant wavelet transform, a variant of the discrete wavelet transform which produces N coefficients at each level and is translation invariant. This has been found useful in statistical applications: see for example Coifman and Donoho (1995), Lang et al (1996) and Nason and Silverman (1995). A concise summary of the various wavelet thresholding methods developed over recent years is to be found in the paper by Abramovich et al (2000) together with a comprehensive list of references on the subject. In addition, the book by Ogden (1997) provides a more detailed overview together with numerous examples. The universal threshold method is detailed in the paper by Donoho and Johnstone (1994) together with another global method, the minimax thresholding method. Donoho and Johnstone (1995) also developed a scheme which uses the wavelet coefficients at each scale to choose a scale-dependent threshold. This method is known as the SURE or SureShrink method after ‘Stein’s Unbiased Risk Estimate’ on which it is based. They suggest a HYBRID method to be used in practice for decompositions where, at some scales, the wavelet representation is sparse, i.e. a large number of the coefficients are zero (or very near zero). The HYBRID method uses the SURE method, but defaults to the universal threshold for scales where the data representation is sparse due to the noise overwhelming the little signal contributed from the nonzero coefficients. A number of other methods have been proposed: including those based on cross validation, e.g. Nason (1996), Jansen and Bultheel (1999); Bayesian approaches, e.g. Chipman et al (1997), Vidakovich (1998), Abramovich et al (1998); and the Lorentz curve (Katul and Vidakovich, 1996 , 1998). Translation invariant denoising is considered by Coifman and Donoho (1995) who compute the estimated signal using all possible discrete shifts of the signal. See also the papers by Donoho and Johnstone (1995, 1998), Donoho (1995), Johnstone and Silverman (1997), Hall and Patil (1996), Efromovich (1999), Downie and Silverman (1998), Moulin (1994), Krim and Schick (1999) and Krim et al (1999). We will come across the use of thresholding extensively in the subsequent application chapters of this book. In particular, thresholding is revisited in chapter 4, section 4.2.3, where another thresholding method, the Lorentz threshold, is explained.

Often we want to compress the signal into as few detail coefficients as possible without losing too much information for data compression applications such as speech and audio transmission. We can do this by finding a wavelet which decomposes the signal into a few large amplitude detail coefficients, which we retain, and many coefficients close to zero, which we discard. How many coefficients we set to zero affects the subsequent quality of the reconstruction. In practice for larger data sets and more suitable wavelets we can get good compression ratios without the loss of significant detail, where the term ‘significant’ is determined by the user. Here we have not considered compression schemes, including quantization and encoding, but rather the reader is referred in the first instance to the simplified explanation of Walker (1999) and the more detailed treatments in Chui (1997) and Mallat (1998).

Note that all examples in this chapter have used wraparound to deal with the signal edges, i.e. the part of the wavelet spilling over the end of the signal is placed back at the start. This is the simplest and one of the most common treatments of edge effects for a finite length signal, and it results in exactly the same number of decomposition coefficients as original signal components. However, it is not always the best for the application. Other methods were shown in chapter 2, figure 2.35. In addition, there are families of boundary wavelets which are intrinsically defined on finite length signals (see for example the discussions in Jawerth and Sweldens (1994) and Chui (1997)). Take particular care when using off-the-shelf wavelet software packages as they may employ other boundary methods as the default setting, for example zero padding, which results in slightly more than N coefficients resulting from the full decomposition. In addition, we have employed one version of the scaling coefficient reordering b k (equation (3.25)). Again this is very much software dependent and you will find alternative ordering employed by various software packages.

## Use of cookies on this website

We are using cookies to provide statistics that help us give you the best experience of our site. You can find out more in our Privacy Policy. By continuing to use the site you are agreeing to our use of cookies.