# Processing, Operations, and Simulations

Authored by: David J. Whitehouse

# Handbook of Surface and Nanometrology

Print publication date:  December  2010
Online publication date:  December  2010

Print ISBN: 9781420082012
eBook ISBN: 9781420082029

10.1201/b10415-4

#### Abstract

Although the subject of engineering surfaces covers a large number of interdisciplinary subjects ranging from, say, economics on the one hand to chemistry on the other, there are a number of areas in which a degree of unity can be seen. One of these, perhaps the most important, is in the processing of measured information to enable it to be used most effectively for control of production or design purposes. Processing information is taken to mean making changes to the original data to isolate, separate, or identify features which may be of particular significance. Certain operations which keep on recurring will be highlighted. There are also other considerations which accompany processing such as filtering. For example, in some cases it may be necessary to construct surface profiles or areal maps from signals obtained from a surface by means other than by scanning i.e., optical diffraction or it may be necessary to investigate ways of generating surfaces in the computer data from. In this section a few of these will first be described and then illustrated with particular reference to some of the issues pertinent to the subject matter being discussed. Parts of the chapter will be tutorial in character to help ensure that a common base of understanding is available.

#### Comment

Although the subject of engineering surfaces covers a large number of interdisciplinary subjects ranging from, say, economics on the one hand to chemistry on the other, there are a number of areas in which a degree of unity can be seen. One of these, perhaps the most important, is in the processing of measured information to enable it to be used most effectively for control of production or design purposes. Processing information is taken to mean making changes to the original data to isolate, separate, or identify features which may be of particular significance. Certain operations which keep on recurring will be highlighted. There are also other considerations which accompany processing such as filtering. For example, in some cases it may be necessary to construct surface profiles or areal maps from signals obtained from a surface by means other than by scanning i.e., optical diffraction or it may be necessary to investigate ways of generating surfaces in the computer data from. In this section a few of these will first be described and then illustrated with particular reference to some of the issues pertinent to the subject matter being discussed. Parts of the chapter will be tutorial in character to help ensure that a common base of understanding is available.

Using all or any combination of these “processing” methods should allow a researcher, designer, or other user not only to understand more fully the background to the subject but also to progress the technique of surface analysis and measurement further. As this part of the subject of surface metrology is continuously being advanced it will only be possible to outline lines of advance rather than to delve in great detail.

The signal being processed is that obtained from the surface by any means. This could mean digital or analog information obtained from a measuring instrument. It can, however, equally be a signal obtained by utilizing a mathematical model of the surface.

The first of these fundamental operations to be considered is that of numerical techniques, and the second is filtering in one form or another. These are singled out because of their special importance. Obviously the techniques have overlapping regions so that the discussion of filtering will necessarily involve some discrete concepts. However, the repetition of such important concepts is not a redundant exercise. Some mention will be made of splines wavelets, fractals, and various other issues roughly mirroring the more general concepts of Chapter 2.

Filtering is examined in the same sense as that used in system vibration analysis; that is to say that the frequency characteristics of the input signal are operated on by the same method whether electrical, mechanical, or computational. One reason for doing this is to introduce the important concepts of system response and convolution. These terms have wider mathematical significance than equivalent expressions used in either of the individual disciplines, for example they happen to have equivalents in random process theory. Other subjects will be included wherever deemed necessary.

The other point about the numerical analysis of random surfaces is that, to some degree, this is exploring the properties of discrete random surfaces. To this extent there is some overlap with the methods of characterizing surfaces given in Chapter 2.

#### 3.1  Digital Methods

In what follows it should be assumed that digital methods are being used unless specified. Various optical processing methods will be considered in Chapter 4. Graphical methods, which are useful if the existing recorded data have to be checked and can also be used as a last resort to verify the results of simulations are mentioned.

Digital methods arise in three major areas: the first is in the analysis of the various waveforms, the second is in simulations, and the third is the use of computers to control instruments, machines, etc. Here various aspects of the analysis of waveforms using digital methods will be considered; simulations such as generation and reconstruction will be discussed where appropriate: control will be left until Chapter 4. Digital methods have virtually replaced analog methods in surface metrology because they are more flexible and more accurate than analog methods.

Three basic considerations have to be taken into account in order to get useful digital results. These are the sampling, quantization, and the numerical technique. Each is briefly considered together with typical examples. Emphasis is given to those problems unique to surface metrology. The starting point is taken to be a typical waveform which could have been obtained by any number of methods.

#### 3.1.1  Sampling

The operation of sampling both in time (or space) and frequency is shown in Figure 3.1 in which (a) represents a typical waveform z(t) to be sampled and (b) its frequency characteristic which is shown limited at a frequency B. In what follows the word “time” will be synonymous with “space”: strictly speaking surfaces are analyzed in terms of spatial frequency not temporal.

The process of taking a single sample of a time signal is equivalent to multiplying the time function by a unit impulse δ at t 1 and integrating. Thus

3.1 $z ( t 1 ) = ∫ - ∞ ∞ z ( τ ) ( t 1 - τ ) d τ$

Figure 3.1   Pictorial representation of Nyquist sampling theorem showing (a) signal profile, (b) spectrum of signal profile, (c) sampling points comb, (d) spectrum of sampled comb, (e) sampled signal, (f) spectrum of sampled signal, (g) sample comb with interval too wide, (h) spectrum of (g) showing spectral overlap causing aliasing.

where τ is a dummy variable. This operation is called the sampling property of impulses. Sampling at regular intervals “h” in time is equivalent to multiplying the waveform by an impulse train, where each impulse is separated by h, and then integrating. The equivalent operation in frequency is shown in Figure 3.1b, d, and f. The Fourier transform of an impulse train is itself an impulse train whose spacing is 1/h as shown in Figure 3.1d. Because time sampling involves a multiplication, the equivalent operation in frequency is a convolution. Thus Figure 3.1f shows the effect of convoluting the frequency impulse train with that of the frequency characteristic of the waveform (shown symmetrical about the zero-frequency axis).

Figure 3.2   Use of filter to reduce bandwidth.

The criterion for good sampling is that all information should be recoverable. From Figure 3.2 it is obvious that passing this sampled signal through a low-pass filter whose low-frequency cut is higher than B will remove the other bands of frequency introduced by the sampling, namely A, B, C, etc. But this is only possible if the other bands do not encroach into the band around zero frequency and this is only possible providing 1/h > 2B, otherwise the situation shown in Figure 3.1h arises in which an overlap occurs. Cutting out the high-frequency bands, even with an infinitely sharp cut filter still does not isolate the original frequency because some degree of scrambling has taken place. The extent of this scrambling can be seen by the cross-hatched area shown in Figure 3.1h. It is in effect a folding back of the frequency characteristic, on itself, about the mid-line. Problems arising from this folding will be described shortly. However, one important fact emerges: in order to be sure of preserving all the information in an analog signal of frequency B it is necessary to sample in time at a spacing which is a maximum of 1/2B long. This is called the Nyquist criterion.

Figure 3.3   Aliasing.

Unfortunately signals in the real world do not have a distinct cut-off in frequency B as shown in Figure 3.2. Various insignificant frequencies are invariably present therefore precluding satisfactory sampling and opening the way to the folding (or aliasing) effect. Consequently, it is usual to decide on the highest frequency of real interest, to filter the signal by analog means to remove higher frequencies and then to sample.

Notice that the Nyquist criterion has to be relaxed slightly depending upon how sharp the analog filter can cut. Simply sampling at 1/2B will cause a fold-over at B due to the inability of filters to attenuate infinitely sharply. A guard band G of frequency is usually incorporated to cater for the finite drop-off of the filter.

G is taken to be about 0.5B so that the sample rate becomes ~ 1/3B in time. It is possible to use a digital filter for this preprocessing only if the data are previously sampled at a much higher rate than the Nyquist rate for the frequency of interest. But this is sometimes wasteful in effort. It has the advantage that artificial filters can be used (described in the section on filters).

To illustrate the sort of misleading results which can occur when there is an interaction between sampling rate and signal frequency, consider Figure 3.3. This shows that by sampling at a distance slightly different from that of the true wavelength a false wavelength appears. This is similar to “beats” between waves.

“Aliasing” is a similar effect. Apparent low frequencies are introduced by the folding of frequencies around the sampling frequency. It becomes impossible to detect whether a signal of frequency f is genuine or whether it is the result of an aliased signal f 2fs or is it simply f 1 as shown in Figure 3.4.

Other forms of sampling can reduce these problems. Second-order sampling is still periodic in nature but within each period two measurements are taken, usually close together. This sort of sampling has been used on signals having a band-pass frequency characteristic. Random sampling has also been used where fewer samples need to be taken but the problem then arises of unscrambling the data afterwards.

Figure 3.4   Folded frequency response.

Summarizing, samples should be taken at about 3 × the rate of the highest frequency required and known to be present. Sampling much more often than this can be wasteful and only results in highly correlated data which can give biased results and lead to numerical problems.

In surface texture measurement the signal representing the roughness waveform has already been smoothed relative to the true surface profile because of the finite size of the stylus tip, which acts as a mechanical filter, or optically by the limited resolution of the sensor so that the problem of trying to decide the highest frequencies has to some extent been solved prior to measurement. In practical instruments the tip is about 2.5 μm, which implies that sampling should take place every micrometer or so. Similarly the wavelength of light sets an optical limit as discussed together with the restrictions of other sensors in Chapter 4.

#### 3.1.2  Quantization

This is not concerned with the way in which an analog signal is turned into a time series of data points it is concerned with the conversion of an analog waveform into a digital form. This always means a choice between two signal levels: the separation being determined by the discrimination of the analog to digital (A/D) convertor.

Figure 3.5   Quantization.

A point on the analog signal at P in Figure 3.5 will have to take either the value of level A or level B, whichever is the nearer. Having to take discrete values is the process of digitization.

This breaking down of the continuous signal into discrete levels can introduce errors known as quantization errors. They do not refer to instrumental accuracy and such errors are usually small. For instance, some idea can be obtained by using Sheppard’s grouped data result. With this it is easy to show that, if q is the separation of levels, then an RMS noise ε of $q / 12$

will be introduced into any assessment of the RMS value of the digital signal above that of the signal itself. Normally in metrology the quantization interval, expressed as a percentage of the signal value, is about 0.1%, and hence $ε = q / 12 = 0.03 %$ which is negligible. It only becomes significant if the separation of levels, that is the quantization interval, becomes comparable with the signal size. In almost all practical cases in metrology the quantization intervals are equal over the whole range, but use has been made of unequal intervals in the measurement of autocorrelation, for example.

#### 3.1.3  Effect of Computer Word Length

Often spurious resolution can appear to develop during a calculation, especially when converting from one type of word in the computer to another, for instance from integer to floating arithmetic. The number 10 becomes, say, 10.000; the decimal digits to the right of the decimal point are not significant in this case. Suppose the number 10 represents a profile ordinate that the A/D convertor could not resolve better than the unity digit. The signal itself may have been anywhere between 10 and 11. Certainly the probability of it being 10.000 is remote, but this does not mean that all floating point numbers derived from integer numbers are incorrect. As an example, if a profile is made up of a string of integer numbers 583, 621, 718, etc., then the low-pass-filtered profile may be expressed as 592.3 for instance, because the uncertainty in a mean (weighted mean for filtering) is less than that of individual numbers. It still does not mean, however, that all the printout of the floating point numbers is significant. As a rule of thumb, if q is the uncertainty in the individual numbers then the mean can be taken significantly to about one decimal digit further than the individual values. Note that if the A/D converter is two digits and the word length of the computer is 16 bits (four digits), the last digit is not likely to be significant if a simple convolution type of exercise is being carried out. This has nothing to do with numerical analysis problems—it is purely the accuracy in the values themselves, whether theoretically correct or not!

Another form of number notation called the scientific or E notation appears at first sight to offer both high resolution and range that is 0.63152 E 23 would mean 0.63152 × 1023.

The reason for this apparent benefit is that most computers use two words or more to present a number in the scientific notation. Thus, for a 16-bit word, 32 bits are available, usually 24 bits for the mantissa and 6 bits for the exponent, so an increase in accuracy is only at the expense of store. This used to be a problem but is less so today. Whether this accuracy is real or not depends on the initial data and the arithmetic operations carried out.

Problems of this nature are always arising in curve fitting, matrix inversion, etc. Their individual solution depends on the particular problem and the type of computer. There is always an improvement if care is taken.

Some obvious checks should be made to ensure that sensible values are being obtained. One example of this is in the measurement of slope. Here the model, quantization, and sampling again conspire to confuse the issue (see Figure 3.6). This figure shows a case where all three are poor: the quantization interval is too big, the sampling is too fast, and the model (say three-point) is too restrictive. Let the sampling interval be q/10.

Figure 3.6   Balance needed between quantization; model and sampling example in slope measurement.

Then the slopes measured at points 1, 2, 3, … will all be unity until the point E is reached. Then the slope will be q/2 ÷ q/10 = 5, giving an angle of 78! Not only does this look ridiculous, it cannot make physical sense if the device used for obtaining the analog signal is a stylus with a semi-angle of 45°. Real slopes are greater than this cannot be seen. Also, curvature measurement as revealed digitally cannot or should not be greater than that of the stylus itself! Common-sense rules like this often show up numerical errors. A point to note here is that, although some emphasis has been placed upon the problems of digital analysis, they are in no way simplified by reverting to analog methods. What usually is the case is that the digital method forces attention onto the real problems.

#### 3.1.4  Numerical Analysis—the Digital Model

One of the main reasons why the results obtained by different people often do not agree—even given the same data—is that not enough attention is given to numerical techniques. In this section some of the basic operations needed in analysis are examined, followed by particular problems more specific to engineering surfaces. No attempt is made to cover the subject as a whole. Suitable references enable the operations usually employed—including differentiation, integration, interpolation, extrapolation, curve fitting—to be separately studied.

#### 3.1.4.1  Differentiation

There is a tendency among people versed in analog ways to take a very simple formula for the first differential (Figure 3.7a). Thus, the differential between points z 1 and z –1 at z 0 is

3.2 $z i − z − 1 2 h .$

In fact, this is only the tangent. More than just the two ordinates are needed to get a good estimate of the first differential. One usually adequate formula involves the use of seven ordinates, equally spaced by h.

Thus the differential

3.3 $d z d x | z = z 0 = 1 60 h | z 3 − 9 z 2 + 45 z 1 − 45 z − 1 + 9 z − 2 − z − 3 | .$

The errors in this are of the order of (l/140) μδ7 z 0, where μ is the averaging operator between ordinates and δ is the central differencing operator. These are very small providing that ordinates outside z 3 and z –3 are well behaved. A similar error in the three-point formula given above is of the order of (1/6) μδz0, which turns out to be

Figure 3.7   Numerical differentiation.

$1 3 h ( z 2 − 2 z 1 + 2 z − 1 − z − 2 )$

when expressed in terms of ordinates. These error formulae show that if the ordinates outside those used are significantly different then errors can creep in Ref. [1].

The numerical Formulae 3.2 and 3.3 are examples of Lagrangian formulae.

By choosing a formula encompassing a wide range of ordinates the chances of rogue ordinates affecting the true derivatives are reduced. Hence the need for seven rather than three-point analysis. Similarly, to use

3.4 $d 2 z d x 2 | z = z 0 = 1 h 2 ( z 1 + z − 1 − 2 z 0 )$

as a formula for the second differential is sometimes dangerous. The errors here are of the order of

3.5 $1 12 δ 4 z 0$

that is

3.6 $1 12 ( z 2 − 4 z 1 + 6 z 0 − 4 z − 1 + z − 2 ) .$

An equivalent seven-point formula that reduces noise is

3.7 $d 2 z d x 2 | z = z 0 = 1 180 h 2 ( 2 z 3 − 27 z 2 + 270 z 1 − 490 z 0 + 270 z − 1 − 27 z − 2 + 2 z − 3 )$

with error (1/560) δ8 z 0.

Note the fundamental point about these formulae: it is still the central three ordinates that are dominant; the adjacent ordinates merely apply some degree of control over the value obtained should z 1, z 0, or z 1 be freaks or in error. Similarly, the z 3 and z –3 values act as constraints on z 2 and z –2, and so on.

$h d z d x | z = z 0 = ( Δ + 1 2 Δ − 1 6 Δ 3 ) z 0$

Alternative formulae to these exist. It is possible to extend the number of ordinates on either side indefinitely, their effect getting smaller and smaller. It is also possible to evaluate the differentials in terms of backward and forward differences rather than central differences. For example, where Δ is the forward difference, whence

3.8 $d z d x | z = z 0 = 1 12 h ( z 3 − 6 z 2 + 8 z 1 − 10 z 0 − 3 z − 1 ) .$

The derivative at z 0 has been evaluated by ordinates obtained later in the sequence of data points, similar formulae can be obtained for second derivatives, etc. The only usual reason for using these is because of the difficulty of using symmetrical numerical formulae at the beginning and ending of a set of data points. This enables all the waveform to be differentiated leaving a gap at the front or the end of the data rather than leaving gaps at both ends as is the case for central difference.

#### 3.1.4.2  Integration

The tendency when attempting to integrate is to regard the sum of ordinates multiplied by their spacing as equal to the value of the integral i.e., the area under a curve. Although this is true enough when the number of terms is large it is not so when the number of ordinates is small. The numerical formulae for any function being integrated are not simply the sum of the evaluated points; the values can only be added after modification

The simplest modification is to halve the first and last ordinates and to add the rest unchanged. This is no more than a statement of the trapezoidal rule for numerical integration and it corresponds, in fact, to joining the function values by straight lines instead of each ordinate being represented by a block. The area is made up of trapezoids rather than rectangles. It is a special case of Gregory’s formula.

Further improvement can be achieved by fitting curves between the points of evaluation. The quadratic curve gives rise to Simpson’s rule for numerical integration. Interpreted in a purely geometric way this gives the sum of the areas under second-degree parabolas. If there are n + 1 pairs of data points the integration formula is

3.9 $∫ x 0 x n b d x = h 3 [ b 0 + b n + 4 ( b 1 + b 3 + ... + b n + 1 ) + 2 ( b 2 + b 4 + .... + b n − 2 ) ] ,$

where the b’s are the ordinates of the curve. For n + 1 pairs if n is a multiple of 3:

3.10 $∫ x 0 x n b d x = 3 h 8 [ b 0 + b n + 3 ( b 1 + b 2 + b 4 + b 5 + ... + b n - 2 + b n - 1 ) ] .$

Other more elaborate formulae exist involving unequal spacing of the function values. These require fewer values for a given accuracy for the reasons given below. However, it is well known that for most engineering applications the equal-interval formulae show a surprising accuracy, especially Simpson’s rule. If, however, more accuracy is required then other techniques may have to be adopted. Lagrange interpolation formulae involve fitting a polynomial through any set of points not necessarily equally spaced. This polynomial therefore represents the function as a whole. Even when the intervals are equal, Lagrangian techniques have the advantages of permitting interpolation without the need to construct a difference table. They have the disadvantage common to all polynomial curve-fitting methods of requiring some knowledge of the degree of the polynomial needed to achieve any real accuracy.

In integration, use can also be made of the unequal interval in the Gauss’ formulae for numerical integration. Although these and other formulae exist they are most useful in those situations where a limited amount of well-understood data is present. Usually there is enough data to get the required accuracy for the basic operations using equal intervals and first- or second-degree interpolation.

#### 3.1.4.3  Interpolation and Extrapolation

In many respects these two areas of digital analysis are the most fundamental because in the process of sampling continuous signals have been transformed into discrete values in time (space). Reconstitution of the original signal in between sampled values is of prime importance, as is the ability to project the signal outside its immediate vicinity. Fortunately in surface metrology a lot of data are usually available, and consequently the use of such techniques is rarely required. In essence a polynomial is fitted to the measured or listed points and made to conform with the fixed points. Examination of the polynomial behavior between and also outside the fixed values is then possible. In particular, the most used polynomial is called the Lagrange interpolating polynomial.

Well-known formulae exist for interpolation and extrapolation, in particular those due to Everett and Bessel [1] for equally spaced data and Lagrange for unequally spaced data. An example of the use of interpolation formulae in surface topography comes in the field of contact, where mechanical models of surface peaks are used. Using interpolation it is possible to find the position of the maximum value of a peak even if it is not touched by the sampled points. For instance, if the ordinates are z –1 and z + 1 where z 1z –1 then the apex is not at z 0 but at a distance V from z 0 given by

3.11 $V = ( h / 2 ) ( z - 1 - z 1 ) 2 z 0 - z 1 - z - 1 .$

It is further possible to work out curvatures at the apex of this peak. Interpolation can also help to cut down the number of mean line points that need to be calculated for the mean line assessment.

#### 3.2.1  Some Numerical Problems Encountered in Surface Metrology

Some examples will be given by way of illustration of some of the typical problems encountered in surface metrology. Those chosen will not be exhaustive in content but should give a fairly representative cover of the types of problem likely to be encountered.

There are a limited number of features of most interest. From the section on surface characterization much emphasis was placed on the peak and slope measurement of a profile and, in particular, in two lateral dimensions (i.e., the areal case). These features can be extended to include peak height distributions, curvature distributions, and how they vary with height, slope, and associated parameters.

It is only in the last 30 years that digital methods have become readily available to make possible the measurement of these important parameters. However, simply theorizing about parameters is not enough—they have to be measured. This apparently straightforward task is fraught with problems as Section 3.2.2 depicts. In the 1950s, parameters were restricted to those that could be measured with simple analog circuitry. This in itself imposed natural constraints upon the wavelengths which could be measured. Recorders, amplifiers, and meters have bandwidth limitations which cannot be overcome but the restriction on digital methods is less distinct; the experimenter is confronted head-on with the sampling, quantization, and numerical model problem. Sometimes these considerations are outside the experience or training of the investigator. Unfortunately, the parameters of use to the surface metrologist are just those parameters that are difficult to measure. In what follows it will become apparent that the correlation function of the surface is of fundamental importance in assessing the change in value of such parameters with sampling. Instrumental limitations such as the stylus tip or optical resolution will be incorporated more fully in Chapter 4, and parameter variability in Chapter 5.

#### 3.2.2  Definitions of a Peak and Density of Peaks

The problem of defining peaks and assessing peak properties often arises in metrology, especially in contact theory. For instance, it is often of interest to know the density of peaks of a surface profile [2]. The question that has to be posed is how this count of peaks depends on the digitizing interval, the quantization interval, and the definition of the peak. All three can affect the count. This is one of the reasons why it is so very difficult to get agreement between researchers, even given the same data.

Figure 3.8   (a) Possible definition of peaks and (b) effect of quantization on peaks.

Take, for example, the problem of the definition. One of the most used definitions is a three-point model as shown in Figure 3.8a. If the central ordinate of three consecutive ordinates is the highest then the three together constitute a peak. An alternative one is also shown in the figure in which four ordinates are used, the central two being higher than the others for a definition. Many similar possibilities exist. In any case the number of peaks counted will be different for the same data. For example, a peak counted by the three-point method could get ignored using the four or more ordinate models.

Also, differences in definition have been used within the three-point method. Some investigators have imposed a height difference constraint on the definition. For instance, the central ordinate has to be a height z’ above the higher of the other two before a peak is registered, that is as in Figure 3.8a, z 0z 1 > z’. This constraint reduces the count as before.

#### 3.2.3  Effect of Quantization on Peak Parameters

The quantization interval can influence the count as is shown in Figure 3.8b. It can be seen that using exactly the same waveform, simply increasing the quantization interval by a factor of 2 means that, in the case of Figure 3.8b, the three-point peak criterion fails, whereas in the other case it does not.

So, even the A/D resolution can influence the count. In order to get some ideas of the acceptable quantization interval it should be a given ratio of the full-scale signal size, subject to the proviso that the interval chosen gives sufficient accuracy. As an example of the quantitative effect of quantization, consider a signal that has a uniform probability density. If the range of this density is split up into m1 levels (i.e., m blocks) then it can be shown that the ratio of peaks to ordinates is given by

3.12 $ratio = 1 3 - 1 2 m + 1 m 2 .$

This makes the assumption that the samples are independent and that the three-ordinate model is used as a definition of a peak.

Examination of the formula shows that, when m is large, the ratio is 1/3. This makes sense because, for independent samples with no quantization restriction, one would expect one-third of all samples to be peaks. Similarly, when m = 1, the ratio is zero. Again this makes sense because no information is being conveyed.

Various other values of m are listed in Table 3.1. Therefore, it is obvious that even representing the signal by one decimal digit goes a long way toward giving acceptable results. An extra 15% can be obtained by going to the second decimal digit. On this basis, taking measurements to the third place seems unnecessary. Taking them to the fourth place certainly is.

A similar formula to the one for rectangular (or uniform) distributions can be obtained for the very important Gaussian distribution. Again using the three-point analysis the probability of an ordinate being a peak is given at a level between quantization levels of nΔz and (n–1)Δz by

### Table 3.1   Digital vs. Analog Peaks

Percentage Drop

m

Ratio Peaks/ Ordinates

From Continuous Signal

1

0.125

70

3

0.19

45

4

0.21

37

5

0.23

30

10

0.25

15

100

0.33

<1

3.13 $∫ ( n - 1 ) Δ z n Δ z 1 2 π exp ( - z 2 2 ) d z | ∫ - ∞ ( n - 1 ) Δ z 1 2 π exp ( - z 2 2 ) d z | 2$

assuming that the z values have a zero mean and unit variance.

Taking this over all possible values of interval allowed, in this case taken to be symmetrical about z = 0 and extending m/2x blocks to either extreme of 3σ where σ is the RMS value of the distribution, the probability is

3.14 $1 8 ∑ − 3 σ ≅ Δ z n = 3 σ ≅ Δ z [ erf ( n Δ z 2 ) erf ( ( n − 1 ) Δ z 2 ) ] 1 + erf ( ( n − 1 ) Δ z 2 ) | 2$

which gives the following results:

$m = 2 probability = 0.125 m = 4 = 0.167 m = 6 = 0.211 m = 12 = 0.266.$

These generally follow the rectangular case except for being slightly lower. In both cases the value is asymptotic to 1/3, which is correct for the three-point analysis.

#### 3.2.4  Effect of Numerical Analysis on Peak Parameters

While investigating the effect of quantization, it is informative to investigate the effect of different models. This is especially valid in peak behavior because it is in the region of a peak that changes in level tend to be small and the effect of quantization can dominate the result. To combat this predicament it is sometimes useful to use a modified three-ordinate model in which, if the next-to-central ordinate is at the same digital level as the central ordinate, the judgment of whether or not a peak exists is deferred for one ordinate. If this is still not conclusive it is deferred further until an adjacent ordinate is lower.

Under these circumstances one would expect rather more peaks to be revealed than when restricting the model strictly to three ordinates. The effect of this modification can be taken account of in probability merely by noting that the probability of the situation shown in Figure 3.9 is the same as for the three-point one except that the central ordinate has been repeated three times, that is the probability is $P 1 3 P 2 2$

where P 1 is the probability of an ordinate lying between (n–1) Δz and nΔz, and P 2 is the probability of an ordinate being below (n–1)Δz.

Taking all such possibilities into account gives an additive sequence of probabilities. Thus

3.15 $P 1 P 2 2 + P 1 2 P 2 2 + P 1 3 P 2 2 + ... + P 1 n P 2 2 = P 1 P 2 2 1 - P 1 .$

The formulae derived previously for the three-point model definition of a peak can therefore be modified to give the following:

1. For a rectangular distribution
3.16 $probability = 2 m 2 - 2 m - 1 6 m ( m - 1 ) .$

Figure 3.9   n-ordinate definition of peak.

2. For the Gaussian distribution
3.17 $[ erf ( n Δ z / 2 ) − erf ( n − 1 ) Δ z / 2 ] 1 8 ∑ n = − m / 2 n = + m / 2 [ 1 + erf ( ( n − 1 ) Δ z / 2 ) ] 2 1 2 [ erf ( n Δ z / 2 ) − erf ( ( n − 1 ) Δ z / 2 ) ] .$

Both of these formulae have the effect of considerably increasing the probability of a peak, especially at small values of m (see Figure 3.10). For both in fact, if the quantization is purely linear, that is 0 + 1 or ±1, a complete change in the count can result depending simply upon which numerical model is chosen to represent a peak—on the same original profile.

As will now be obvious, each one of the constraints of sampling, quantization, and numerical definition can aggravate the effects of the others.

Again, notice that as m→∞ for both formulae the value of the probability tends to 1/3 because as m→∞ the chances of more than one ordinate lying within the infinitesimal limit become increasingly remote.

Figure 3.10   Peak density-effect of quantization, model, and sampling numerical problems in system parameter evaluation—an example of peak density count.

The main reason for taking a transducer range of more than one decimal digit is that it is not always possible to ensure that the signal is a significant percentage of the range of the A/D convertor. For very smooth surfaces this is often the case when the part is fitted. Another reason that will be amplified later is concerned with the best use of computers in metrology. The latest trend is to let the computer take out a large part of the manual setting-up by accepting all the signal from the workpiece—including errors of form, mis-alignment, waviness, etc.—removing all the errors that are unwanted, digitally, and then magnifying the remainder for evaluation. This technique presupposes that the remaining signal is known with sufficient accuracy (i.e., to a sufficient resolution) to be useful. In order to ensure this, the whole of the original signal has to be digitized to a reasonably small quantization interval. This is because the process of removing the large error signal leaves only a small percentage of the total, as seen in Figure 3.11.

Figure 3.11   (a) Total signal and (b) magnified remnant.

#### 3.2.5  Effect of Sample Interval on the Peak Density Value

The real issue here is not the physical distance between the discretely measured sampled ordinates, but how this distance relates to the autocorrelation function of the surface. In order to investigate the way in which ordinate spacing affects surface parameters it is necessary to know what constraints exist on the choice of a typical correlation function.

Perhaps the obvious choice of autocorrelation function is the exponential form shown in Figure 3.12. The argument for adopting this is that most finished surfaces exhibit such autocorrelation functions in regions other than at the origin. This is a direct result of the Poissonian nature of surface generation incurred because of the random cutting action. τmax is an

estimate of the correlation length (for ρ = 0.1) and is not to be confused with τcorr which is the general term for the correlation length given by Equation 3.22.

There are, however, theoretical objections to the use of the exponential autocorrelation function, for example there are certain features of the exponential function at the origin which are undesirable. In particular, the density of peaks Dp is given, for an ordinary, random process, by

3.18 $D p = 1 2 π ( A iv ( 0 ) − A " ( 0 ) ) 1 / 2 ,$

where A”(0) is the second differential of the autocorrelation function at the origin and A iv(0) is the fourth differential. For the exponential function these are undefined because A’(0) and A”’(0) are non-zero: they should be! This becomes necessary because the autocorrelation function is an even function. In practice, this is not a serious problem because there is always some degree of smoothing of the profile caused by the finite resolution of the measuring instrument. In what follows the exponential correlation will be assumed for simplicity unless specified otherwise.

Consider for example, as before, the measurement of peak density or the probability that an ordinate is a peak. Using the three-point model the latter becomes N, where N is given by

3.19 $N = 1 π tan − 1 ( 3 − ρ 1 + ρ ) 1 / 2$

Figure 3.12   Exponential autocorrelation function.

and ρ is the correlation between ordinates separated by h. For an exponential correlation function h = β ln(l/ρ) as shown in Figure 3.12.

The peak density is given simply by Equation 3.20 and gives the number of peaks in the unit length in which h is measured.

3.20 $N / h .$

In the case of a general autocorrelation function shown, for example in Figure 3.13, two correlation coefficients are needed for the three-point analysis, ρ1 corresponding with the autocorrelation value at h and ρ2 the value at 2h. The formula relating the peak density to the correlation function and hence to the spacing is modified somewhat to give

3.21 $N h = 1 π h tan − 1 [ ( 3 − 4 ρ 1 + ρ 2 1 − ρ 2 ) 1 / 2 ] .$

How the value of peak density changes with h can easily be calculated for an exponential autocorrelation function and is shown in Figure 3.14.

Varying ρ from 0 to 1 gives values of N from 1/3 to 1/4, respectively, in theory at least. The value of 1/3 is to be expected because when ρ = 0 the sampled ordinates are independent. The separation when this occurs is usually a matter of definition. One such definition, shown in Figure 3.12, is the value of h such that ρ~0.1 to measure is τmax. Another definition, which is more reliable is τcorr thus,

Figure 3.13   General autocorrelation function.

Figure 3.14   Peak density-effect of correlation.

3.22 $τ corr = ∫ 0 ∞ | A ( τ ) | d τ .$

For an exponential correlation function normalized i.e., A(0) = 1, τcorr = 1, so the value of correlation ρ at this is exp(–1) which is 1/e~35%.

When the ordinates are highly correlated for the exponential correlation the density is, in theory at least, as ρ→1, N→1/4.

When h is small the practical value of peak density always falls to zero. This can be seen by postulating any correlation function having well-behaved derivatives at the origin and inserting the ρ1 and ρ2 values measured from it into the general formula for N given above in Equation 3.21. Take for example the case when the correlation function is Gaussian.

Figure 3.15   Autocorrelation functions Gaussian vs. exponential.

### Table 3.2   Peak Densities for Different Sample Intervals, Gaussian, and Exponential Correlation

 Correlation h = 0.5 (ρ1) 2h = 1 (ρ2) Probability N/h Gaussian 0.82 0.46 0.164 Exponential 0.61 0.37 0.280 Correlation h = 1 2h = 2 Gaussian 0.47 0.01 0.297 Exponential 0.37 0.13 0.301

The comparison between this and the exponential is as follows.

From the standard Gaussian form the normalized autocorrelation function is

3.23 $A G ( τ ) = exp ( − π τ 2 4 )$

and the comparison for the exponential is

3.24 $A E ( τ ) = exp ( − τ ) .$

These are plotted in Figure 3.15.

Consider the formula for the probability that an ordinate is a peak given above as Equation 3.21. When h = 1

It can be seen from Table 3.2 that there are considerable differences between the various types of correlation function. In fact for the Gaussian correlation the relationship is very nearly linear.

Summarizing, therefore,

3.25 $N ≈ 3 + h 12 for exponential N ≈ k h for Gaussian,$

where k has a value between 1/3 and 1/4 depending on the approximation. The relevant point here is that the probability of an ordinate being a peak using the three-point numerical model is nearly constant for all sample intervals for the exponential correlation function surface but is nearly a linear function of the sample interval for surfaces having a Gaussian correlation. For small values of ĥ0.1 many surfaces, even the exponential, will exhibit a linear relationship between probability of a peak and sample interval similar to that of the Gaussian in Equation 3.25b because of the smoothing produced by the instrument probe, whether using a stylus or any another method. Remember that for values of N the sample interval h has to be expressed as a fraction of the correlation length (which is about equal to one-third of the high-spot wavelength).

All digital features can be incorporated into the evaluation of N. Thus, for example, the exponential case gives

3.26 $N = 1 8 h [ erf ( n Δ q 2 1 − ρ 1 + ρ ) − erf ( ( n − 1 ) Δ y 2 ) ] [ 1 + erf ( ( n − 1 ) Δ y 2 1 − ρ 1 + ρ ) ] 2 / { 1 − 1 2 [ erf ( n Δ y 2 1 − ρ 1 + ρ ) − erf ( ( n − 1 ) 2 1 − ρ 1 + ρ ) ] } .$

This incorporates quantization, sampling, and the numerical model and is shown in Figure 3.10 for ρ values of 0, 0.7, and 0.9, the last two being typical of those used in practice. Note that in this equation N is the peak density. Removal of the h in the RHS of the equation makes N the probability.

From what has been said it is clear that problems of surface analysis, such as peak definition, arise from three different factors: the quantization, the sampling, and the model. These three cannot be divorced from each other. Not to realize this interrelationship can be the source of much trouble when different investigators compare results. Any one of these three variables can be used as a fixed datum: the model, because it is used universally; the quantization, perhaps because of an accuracy requirement on individual measurements or subsequent analysis; and/or the sampling, because of instrumentation problems (stylus resolution) or the functional significance of the harmonic picked out.

To illustrate this interdependence, suppose that it is important to make sure that a particular numerical model is universally valid, say the simple three-point method for peaks. A criterion could be used in which the allowable loss of peak data is, for instance, 5%. If the surface is Gaussian some quantitative picture can be obtained.

From Figure 3.10 it can be seen that in order to ensure that only 5% of peak information is lost using the three-ordinate model, about 50 or 60 quantization intervals are required throughout the range of the signal. Because the ordinates in this model are Gaussian it is possible to work out a relationship between the average difference of height from ordinate to ordinate and to equate this difference of adjacent ordinate heights as E(z 1z 2)2 = 2σ2(1–2), where σ is the RMS value of the surface and ρ is the value of the correlation between sets of data separated by such an interval in space that the autocorrelation is ρ.

The average separation Δz is $2 / π$

the square root of the variance. Thus

3.27 $Δ z = 2 σ π 1 − ρ 2 = 1.13 σ 1 − ρ 2 .$

For ρ = 0, that is independence between ordinates, Δz = 1.13σ.

If it is assumed that the range of the signal is ±3σ then the acceptable relation between z and q, the quantization interval, is

$1.13 × ( 50 ~ 60 ) 6 ~ 12$

that is Δz ~ 12q. (It is not just in the measurement of peaks that problems lie. There are similar problems in slope measurement and curvature, as will be seen later in Sections 3.2.6.2 and 3.2.6.3.)

Remembering that this ratio must be preserved for all other digital sample intervals in order to maintain the 95% integrity, the ratio of q to σ for any other interval can be determined. Take for example the case where ρ = 0.7. The value of Δz becomes 0.58σ. Hence q = Δz/12~0.05σ Therefore, to cover the range of 6σ for the signal requires about 120 quantization intervals q.

Thus, in order to maintain the integrity of the model from a sampling giving ρ = 0 to one giving ρ = 0.7, the number of q levels has to be increased from a nominal 60 to 120, a factor of 2. If this integrity is not preserved comparative deviations in peak count cannot be attributed to the surface profile itself. A similar quantitative result is obtained if the differences between ordinates around the peaks themselves are investigated rather than those of the profile as a whole.

Other constraints may have to be added, for instance the inability of the stylus to resolve small-wavelength undulations or the need to exclude the short undulations for functional reasons. Two things are plain: first, the three variables of numerical model, sampling, and quantization are related and, second, at least one of these three needs to be fixed, preferably by using a functional requirement—which is usually a sampling consideration. So the preferred rule is to choose the sampling to match the function and work out a suitable quantization and numerical model in terms of the sampling. In a lot of work on contact wear and friction, it is the features of long wavelength picked out by independent samples that dominate performance and therefore need to be unambiguously measured.

#### 3.2.6.1  Peak Height Measurement

The method adopted previously in which the exponential autocorrelation function is used as a basis will be continued here. Expressions are derived which can easily be transformed into the general autocorrelation case, the height distribution should be nominally Gaussian.

The mean peak height $z ¯ p$

as a function of correlation between ordinates of ρ is given by

3.28

and for the general autocorrelation by

3.29 $z ¯ p = [ ( 1 − ρ 1 ) / π ] 1 / 2 2 N$

where N is the probability that an ordinate is a peak; $z ¯ p$

is normalized with respect to σ.

Therefore, in the simple exponential case the mean peak height expressed in terms of the RMS value of the surface migrates from $1.5 / π$

to zero as the correlation changes from zero to unity. This height value would have to be multiplied by $π / 2$ if the deviations were to be expressed in terms of the R a value of the surface. This behavior is modified slightly in the case of the general autocorrelation because although, as one would expect, the independent result is the same, the limiting behavior as ρ1 and ρ2 tend to unity is different. There is a limiting value of mean peak height for ρ1 = ρ2 = 1 which depends on the particular type of autocorrelation function modeled. Thus, for a highly oscillatory autocorrelation function, the limiting value of $z ¯ p$ could be near to or even higher than the RMS value Rq of the surface. In any event it must be positive.

The standard deviation of peak height σ p (ρ) is also affected in the same way. Thus for an exponential autocorrelation function

3.30

which, for ρ = 0, gives σ p = 0.74 and for ρ = l gives σ p = 1, the same value as the profile itself. The standard deviation of mean peak height for a general correlation is

3.31 $σ p ( ρ 1 , ρ 2 ) = [ 1 + ( 1 - ρ 1 ) 2 π N ( 1 - ρ 2 3 - 4 ρ 1 + ρ 2 ) 1 / 2 − ( 1 - ρ 1 ) 4 π N 2 ] 1 / 2 ,$

where N is defined for the general case in Equation 3.22.

Some idea of the effect of quantization can be obtained by using Sheppard’s correction for grouped data. Thus if σ pq (ρ) is the quantized value

3.32 $σ p q 2 ( ρ ) = o p 2 ( ρ ) + q 2 / 1 2 .$

With a typical worst-case value of q = 0.5, Rq reveals that an error of a few per cent in σ p will result. Obviously peak height measurements can also be in error due to numerical modeling. For example, the three-point model is only an approximation, and is in effect a vertical parabola fitted through three ordinates. What is normally assumed is that the central ordinate is at the apex of the parabola. In fact this is only true if those ordinates adjacent to the central one are of equal height; if not, the error in peak height δz p is given by

3.33 $δ z p = 1 8 ( z + 1 - z - 1 ) 2 ( 2 z 0 - z + 1 - z - 1 ) .$

Taking typical values of z 0, z 1, and z 1 to be 3, 1, and –1, respectively, an error of about 3% results. Errors are generally much smaller for highly correlated data.

#### 3.2.6.2  Peak Curvature

The curvature of peaks is especially sensitive to extraneous effects. This is mainly due to the fact that curvature is defined in terms of derivatives of the signal which are notoriously prone to noise. To get an estimate of the size of digital effects, consider the basic formula for curvature:

3.34 $C = d 2 z / d x 2 ( 1 + ( d z / d x ) 2 ) 3 / 2 .$

Usually, because in the vicinity of a peak dz/dx~0 the curvature can be closely approximated by the second differential, so the curvature C can be most simply expressed by the three-ordinate Lagrangian formula

3.35 $C = 2 z 0 - z + 1 - z - 1 h 2 = C ′ h 2$

where C′ is curvature measured in terms of ordinate differences only.

As before, the mean peak curvature can be expressed digitally as $C ¯ ′ ( ρ )$

where

3.36 $C ¯ ′ ( ρ ) = ( 3 - ρ ) ( 1 - ρ ) / π 2 cos - 1 [ ( 1 + ρ ) / 2 ] 1 / 2 .$

To get $C ¯ ( ρ )$

, this is simply divided by h 2.

For the general case $C ¯ ( ρ )$

is given by

3.37 $C ¯ ( ρ ) = 3 - 4 ρ 1 + ρ 2 2 N h 2 [ π ( 1 - ρ 1 ) ] 1 / 2 R q 2 .$

Comparing, for example, $C ¯ ′ ( ρ )$

at ρ = 0.1 and ρ = 0.9 shows a 10–1 difference, but in practice the difference in $C ¯ ( ρ )$ will be larger because, not only are the differences between ordinates changing as shown in $C ¯ ′ ( ρ )$ but the actual value of h changes and it is this which has the dominant effect. Changes in curvature of 50–1 can be obtained on ordinary surfaces by changing the sampling rates!

A simple maximum probable estimate for the errors in C due to quantization is δC:

3.38 $δ C = 4 q h 2$

where h is, as before, the ordinate spacing and q is the quantization interval. For typical values of q and h it would appear at first sight that the error is not likely to be serious. This is not necessarily so because near to a peak the differences between ordinates is small and can even be of the same order as q, with the result that very large errors can occur. Increasing the ordinate spacing h in this case not only increases the ordinate differences but reduces δC in the above expression.

It is possible to get a limit for q in terms of ordinate differences above which useless estimates of curvature will be obtained. The mean value of ordinate differences is given by

3.39 $( 3 − ρ ) ( 1 − ρ ) 1 / 2 4 π .$

Hence the upper limit for q could be taken as the inequality

3.40 $q ≪ ( 3 − ρ ) 1 − ρ 4 π R q ~ 1 − ρ 2 π R q .$

Knowing q, ρ can be found (assuming the exponential model of a given exponent), from which the minimum ordinate spacing can be worked out. For example, for ρ = 0.7 it works out that q = 0.17Rq , which implies that there should be about 35 quantization intervals across the range of the signal.

Numerical model errors are also important in measuring curvature. The three-ordinate model used earlier is merely the simplest second-differential formula. It is only strictly valid to use it when the fourth-order central differences are small when compared with it. A factor of one-tenth is typical, that is

3.41 $2 z 0 − ( z + 1 + z − 1 ) ≈ 2 z 0 − ( z + 1 + z − 1 ) + δ 4 z 0 / 12.$

This in turn is most likely to be true when the correlation between ordinates is high and the profile smoothly fits between discrete sampled ordinates.

For higher accuracy better Lagrangian formulae exist; for example, the seven-point model gives

3.42 $C = 1 180 h 2 ( 2 y 3 − 27 y 2 + 270 y 1 − 490 y 0 + 270 y − 1 + 27 y − 2 + 2 y − 1 )$

in which case the eighth central differences have to be small.

One way to assess the suitability of a curvature formula is to find out where it fails to be a good second differential. The way to do this is to consider the formula to be a digital filter by finding the Fourier transform of the numerical sequence and thence to find the break point where the gain does not increase as ω2. This is because a theoretically correct second differential should have a transfer function of ω2 for all ω.

Hence if C(x) is the operator on f(x) to get f”(x), that is

3.43 $f " ( x ) = f ( x ) * C ( x ) ,$

where * denotes convolution

• C(x) = 2δz (x–3h)–27δz (x–2h) + 270…

Thus, if A(ω) is the transform of C(x):

3.44

which results in

3.45

If the frequency characteristic is plotted out it will be observed that it is only when ω~2/h that it behaves as a true second differential. For higher values of ω the differentiating property breaks down. For the three-point differential model, ω has to be even shorter, so even though the three-point analysis is suitable for detecting peaks without the complication of quantization, it is severely restrictive when measuring curvature and should only be used when the shortest wavelengths present are four or five times the spacing interval h.

#### 3.2.6.3  Profile Slopes

Exactly the same arguments can be used when attempting to measure slopes: the first-differential formulae referred to in an earlier section still hold. In this case, however, the breakdown of the formulae is that point where the frequency characteristic fails to increase as ω and not ω2. A similar conclusion to that for C is reached [3].

In terms of the autocorrelation function and discrete ordinates, the mean absolute slope is given by

3.46 $or m ¯ = R q h 1 − ρ π or Δ q = R q h 2 1 − ρ from E [ z − 1 − z + 1 ] 2 ( 2 h ) 2 m ¯ = R q h 1 − ρ 2 π$

for a general correlation function. As before for curvature of peaks

3.47 $f ′ ( x ) = M ( x ) * f ( x ) .$

where M is impulse train operator not the slope m corresponding to C for curvature in Equation 3.43.

Hence

3.48 $M ( x ) = 1 12 h [ δ z ( x − 2 h ) − 8 δ z ( x − h ) ]$

for the five-point model from which the spectrum Am (ω) is given by

3.49

Expanding this as a power series in sin ωh:

3.50 $A m 5 ( ω ) = j h ( ω h − ( ω h ) 5 40 − ( ω h ) 7 252 + K )$

from which it is obvious that only the first term represents a true differentiation being proportional to ω; the others are errors due to the limitation of the five-point method.

The actual error at a specific frequency can be found by using Equation 3.49. Thus, as an example, consider a sine wave sampled at four points per cycle. It will have a wavelength of 4h and ω = 2π4h, the true value is Am (ω):

3.51 $i.e., A m ( ω ) = j ω = j π 2 h$

where as

3.52

which is 15% down on the true value.

Note that the formula for the average values of slope curvature and other features in terms of the spacing and the correlation coefficients of the single, double, and in fact multiple spacings between ordinates can be extended.

There is a general pattern that can be used to get some idea of the applicability of surface parameters in terms of digital characteristics. This is shown in simple form in Figure 3.16.

In the figure, region A gives the wrong results because the ordinates are too highly correlated and the small differences between them (perhaps due to the finite tip of the stylus or the limited resolution of the optical device) do not show at all, because the quantization interval Δz does not see the differences, so region A reflects instrument resolution quantization. Region B tends to produce realistic values where the correlation is about 0.7. In this region the instrument limitations are minimal and the reasonably fine structure of the surface as well as the gross structure are picked up. In region C the fine structure is lost because ρ~0, aliasing can be a problem and only the gross structure is picked up. Loss of information and misleading information are very possible. For region D the answer is completely dominated by the numerical model— the data is completely uncorrelated. The probability of an ordinate being a peak is 1/3 if the three-ordinate model is used and 1/5 if the five-ordinate model is used, so region D is numerical model limited.

Figure 3.16   Fidelity of surface parameters.

The graph shown in Figure 3.16 demonstrates the limited range in which some sort of result can be obtained which is not corrupted by either the instrumentation or the digital processing. The question arises, therefore, whether the data mean anything anyway, because the answer appears to be dependent on the sampling even if no quantization effects are present! There are two issues: one is whether a digital technique would ever agree with an analog one; and the other is whether surfaces are well behaved enough in the sense of differentiability to allow a unique answer for a feature which did not continually change as the sample interval and hence “scale of size” changed. The first point will be addressed first, not because it is simpler but because it can be attempted. Even then there are two aspects to it: the profile problem and the “areal” problem. Consider the profile first and using conventional terminology.

Take for example the case of the mean number of peaks m0 in unit distance and the mean number of crossings n 0 in unit distance in terms of D 2 and D 4 where

3.53 $D r = d r A ( h ) d h r | h = 0 .$

These are as revealed in Chapter 2 and Section 3.3.2

3.54 $m 0 = 1 2 π ( D 4 − D 2 ) 1 / 2 , n 0 = 1 π ( D 2 − D 0 ) 1 / 2 .$

The results here can be compared with those given in digital form in Equation 3.55 for m 0 and n 0:

3.55 $1 π tan − 1 ( 3 − 4 ρ 1 + ρ 2 1 − ρ 2 ) and 1 π cos − 1 ρ 1$

by first expressing the autocorrelation function as a Taylor expansion and then investigating the behavior as the sampling interval h is made to approach zero when ρ1 = ρ(h) and ρ2 = ρ(2h). Thus

3.56 $ρ ( h ) = 1 − ( − D 2 ) h 2 2 ! + D 4 h 4 4 ! + O ( h 4 ) .$

Using this expansion and inserting the values for ρ1 and ρ2 in the equations for peak distributions and other features described above, it can be shown [4] that they become the results obtained by Rice in 1944 [5] and by Bendat [6] thereby satisfactorily allowing the conclusion that the digital method does indeed converge onto that which would have been obtained from a continuous surface with perfect instrumentation. Unfortunately this happy state of affairs is subject to two massive assumptions: that Equation 3.56 is allowable (i.e., D 2 and D 4 exist) and that the results obtained digitally from a profile are acceptable simplifications of the areal digital contour of a surface.

Assuming that these two conditions are acceptable—a point which will be raised later—it is possible to use the limiting case of the discrete analysis to express any parameters hitherto not obtained and yet which are of considerable importance in the continuous theory of tribology (which is in fact what counts in the macrophysical world).

Thus the joint probability density function (PDF) of the peak height and curvature is given as the probability density of an ordinate being a peak of height z 0 and curvature C shown as ρ(C, z 0 / peak) where

3.57 $ρ ( C , z 0 | peak ) = C [ 2 π D 4 ( D 4 − D 2 2 ) ] 1 / 2 exp | − ( D 4 z 0 2 + 2 D 2 z 0 C + C 2 ) 2 ( D 4 − D 2 2 ) | for C > 0.$

Also, the correlation coefficient between peak height and curvature is

3.58 $corr ( C , z 0 | peak ) = ( − D 2 ) ( D 4 ) 1 / 4 ( 2 − π / 2 1 + ( 1 − π / 2 ) D 2 2 / D 4 ) 1 / 2$

among others, which demonstrates that useful results can be obtained by moving between the discrete and continuous cases for surface metrology.

#### 3.2.7  Summary of Profile Digital Analysis Problems

What has been indicated is how many of the most useful surface parameters of surface roughness vary in the general sense as a function of the digital (discrete) parameters. Hopefully, this gives an insight into the possible relationships that might be obtained on real surfaces. However, obviously the results in the foregoing section pose the question of whether the correlation function is measured and from this whether the measured values of the various tribological parameters of interest are predicted or should the actual parameters be measured directly from the surface using a data logging system and a computer? The former is simple and less variable because the expressions given in terms of the correlation coefficients are concerned with the average behavior of surfaces, assuming them to be nominally Gaussian in statistics. From the formulae given, which illustrate the way in which the parameters can be estimated from the correlation coefficients, it has to be said that the specific behavior of different surfaces will depend on the actual type of correlation function. For this reason, to estimate the relationships between parameters it is necessary to assume a correlation function which approximates to the surface in question. This is a mathematical modeling problem of surfaces.

However, as a first guess the exponential correlation function is usually sufficiently adequate for values away from the origin that is ρ = 0.7. It also fits in with fractal-type surfaces. For correlation values larger than 0.7, a Gaussian correlation could be used. It is considered that in many applications of surfaces the function depends on two surfaces and not one: it is the properties of the gap between them that are important. Thus one advantage of specifying surfaces in terms of the correlation coefficients is that, at least to a first approximation, the gap properties (neglecting actual separation) can be estimated from the additive properties of the mating surfaces.

Thus if σ2 A is the variance (R 2 q ) value of surface A and ρ A (h) + ρ A (2h) are its coefficients, and similarly for surface B, then

3.59 $σ g 2 = σ A 2 + σ B 2 ρ g ( h ) = [ σ A 2 ρ A ( h ) + σ B 2 ρ B ( h ) / σ g 2 ρ g ( 2 h ) = [ σ A 2 ρ A ( 2 h ) + σ B 2 ρ B ( 2 h ) / σ g 2 .$

Use of the density of peaks and zero crossings instead of correlation coefficients is not attractive because they are not additive. See Chapter 7 for a full examination of these issues.

Using the nomenclature n 0 and m 0 as before, the comparison with Equation 3.59 becomes

3.60 $σ g 2 = σ A 2 + σ B 2 n 0 g = [ ( σ A 2 n 0 A 2 + σ B 2 n 0 B 2 ) / σ g 2 ] 1 / 2 m 0 g = σ A 2 n 0 A 2 m 0 A 2 + σ B 2 n 0 B 2 m 0 B 2 σ A 2 n 0 A 2 + σ B 2 n 0 B 2 .$

Any errors tend to be cumulative. (More will be said about this aspect of characterization in Chapter 7.)

The latter method of obtaining tribological data (i.e., the formal data logging method) gives better estimates of extreme properties in the sense that the odd, very high peak or valley may be picked up, but it does tend to be more time consuming to gather the necessary data. Obviously in the former method the correlation coefficients have to be determined. Another way to be discussed is concerned with estimation by other than stylus means.

#### 3.2.8  Areal (3D) Filtering and Parameters

The same basic techniques can be used for areal (3D) filtering as were used in profile filtering. However, areal manipulation offers opportunities for a better understanding of the manufacturing process and the function of the workpiece than was possible with profile information.

The starting point is the surface data f(x,y). This has a frequency (wavelength) content given by its Fourier transform

3.61 $F ( ω , v ) = ∫ - ∞ ∞ ∫ - ∞ ∞ f ( x , y ) exp ( - j ( x ω + y v ) ) d x d y .$

F(w,v) can be modified by a filter function, say, H(w, v) so that the filtered areal data is now F′ (w,v) where

3.62 $F ′ ( ω , v ) = ∫ ∫ F ( ω , v ) H ( ω , v ) d x d y$

this can be inversely transformed to give the filtered areal data f′(x,y):

3.63 $f ′ ( x , y ) = 1 4 π 2 ∫ ∫ F ′ ( ω , v ) exp ( + j ( x ω + y v ) ) d ω d v .$

The data need not be analyzed using the Fourier transform. Dong [7] has been using wavelet transforms to examine areal (3D) data.

The way to calculate the areal spectrum is by carrying out a spectrum over all the profile y values of which there may be N and finding the average and then carrying out an equivalent operation in the orthogonal direction and averaging [8]. The fast Fourier transform (FFT) is used for these calculations. The resultant output is a 2D FFT.

The same arguments apply to areal filtering as to profile filtering, namely that a conventional filter does not need to have knowledge of the surface topography in order to be applied effectively. The penalty for this freedom is losing data at the boundaries because of the finite size of the impulse response of the filter (Figure 3.17).

It is possible to play some tricks with the data in order to salvage some of the “lost” area. One is simply to fold A to A” and repeat the data (Figure 3.18).

The shaded area and the repeat areas A’B’C’D’ A”B”C”D” can now be utilized.

Unfortunately, although there is no discontinuity between the levels at A A’ and B B” and A A”’ B B”’, etc., the differentials at the boundaries will be discontinuous so that slope and curvature data will be corrupt. The first differentials could also be made to fit on a first order surface but higher differentials cannot.

Figure 3.17   Area available for assessment.

Figure 3.18   Technique for utilizing all data-overlap scheme.

Polynomials can be effectively fitted to the areal data providing that the longer wavelengths can be matched to the order of the polynomial which presupposes prior knowledge. Failure to do this will produce spurious results.

There has been some interest in using spline functions e.g., Ref. [9]. It is instructive to carry out the spline fit for a profile and the extend it to the area fit. Following Radhakrishnan [9] and using his nomenclature the following method can be adopted.

The equation of the spline can be written as

3.64 $P ( x i ) = ∑ i = 1 n N i k ( x ) B i .$

Nik are the B spline basis functions given by

3.65 $N k = { 1 . x i ≤ x ≤ x i + 1 0 o t h e r w i s e N i k ( x ) = x - x i x i + k - x i ⋅ N i k - 1 ( x ) + x i + k - x x i + k - x i ⋅ N i + 1 , k - 1 ( x ) ,$

where xi is the “i” th knot value, Bi is the “i”th control value = values at knot, N is the number of control values, and k is order of the curve.

If xi, yi are data points

$∂ ∂ B i ∑ [ ( p ( x i ) - y i ) ] 2 = 0.$

The canonical equations obtained match the number of unknowns so that the Bs can be found.

When simplified this becomes

3.66 $∑ i N i k ( x i ) ∑ i 2 = 1 B i 2 N i 2 k 2 ( n i ) = ∑ y 1 N i k ( x i ) .$

As i is stepped to n, n equations result from which the n control values of the mean curve are obtained.

For two dimensions, Equation 3.66 can be extended to give Equation 3.67 [9, 59].

3.67 $P ( x j y i ) ∑ i [ P ( x i , y i − Z i ) ] 2 = 0$
3.68 $= ∑ ∑ z i N i 1 l n ( x i ) M j 1 . k n y ( i ) .$

Varying i 1 from 1 to n and j 1 from 1 to M gives a set of m × n equations which, when solved, give the control values of the mean surface. Parameters from this mean surface are extensions of the one dimension profile case.

So,

3.69 $S a = 1 A ∫ ∫ A | z ( ∂ c , y ) | d x d y$
3.70 $= 1 M N ∑ j = 1 N ∑ i = 1 M | z ( i , j ) | .$

Other parameters can also be obtained similarly. It should be remembered that the true spline is a cubic equation and is meant to be a line of minimum potential energy. It was originally used in ship building. An elastic or bendy piece of wood was pressed against a set of n stanchions. The resulting curve made contact at n points. The curve in between restraints could be measured and drawn. The shape of the strip of wood is naturaly such that it has a minimum potential energy. Mathematical splines are derived from this.

#### 3.2.9.1  General

Note here that areal is taken to be the 3D picture having two independent variables, x and y, the term 3D is sometimes used instead of areal but is not formally correct and is not recommended.

In the earlier sections on the characterization of surface roughness it became clear that a major objective is to measure the areal dimensional characteristics of the surface rather than to restrict the discussion to single profiles. This expansion of subject immediately brings into focus such things as specification and measurement of lay, the relationship between the moments of the spectrum of the area versus those measured from one or more profiles. Exactly the same situation exists in the digital domain. It has already been highlighted that there are differences between the theoretical values of features for profile and areal measurements according to Longuet-Higgins [10] and Nayak [11]. However, the questions emerge as to the differences which might arise between profile discrete measurement on the one hand and the convergence of areal discrete estimates of surfaces to the theoretical values on the other.

Exactly the same sort of calculation can be carried out as before, yielding some very interesting and important results.

Two attempts have been made to analyze the areal discrete problem for a random surface, the first by Whitehouse and Phillips [4] and the second by Greenwood [12]. The former results will be given here, yet the more digestible results of the latter will also be included for comparison. Both approaches can be considered to be a typology of surfaces in their own right (see Chapter 2).

So far profile information has been examined. It is understood that such information is only a convenient approximation to the real surface geometry. Exploring the areal properties on a continuous basis in the way that Longuett-Higgins and Nayak have is most informative but does not actually relate directly to what is measured. All measurements are now carried out digitally. It will be observed that areal discrete measurements cannot converge to the values for a continuous waveform because of the discontinuous spatial bandwidth of any sampling pattern. In certain circumstances, for example isotropic surfaces or when the two perpendicular components are independent, measuring the spectra either in one direction in isotropy or in two directions for independent lay. These enable Nyquist criteria to be tested if the pass conversion is possible. It is the purpose of this next section to investigate the nature of the differences that can and do occur in parameters which are functionally important, such as summit curvatures and height distributions.

Some of the theory used in investigating discrete properties overlaps the theory used for surface characterization using the discrete parameters reported in Chapter 2. The analytical results are necessarily complicated but are absolutely essential if the results for surfaces obtained by experimenters are ever to be made consistent with each other.

Also, whereas the idea of tribology parameter prediction using the correlation function for a profile might not seem very attractive, it most certainly is for areal estimation because of the very great difficulty in measuring over an area and maintaining geometric fidelity between traces. Simply taking disjointed profile tracks is not good enough—each trace has to be integrated into the overall mapping scheme in height and starting point. Another point which adds to the complexity is that, in areal measurement, there is not necessarily a straightforward grid system for data ordinates to be used. Before discussing the alternative methods, the analysis for the areal equivalent of the three-point method will be given.

In one dimension a continuous definition of a local maximum or peak on a profile only requires one first-order and one second-order differential, and only three ordinates are needed for the discrete definition of a peak. However, in two dimensions, the continuous definition of a local maximum (or summit, in the terminology of Nayak) requires two first-order and three second-order differentials, and a minimum of five ordinates are usually considered necessary for the discrete definition of a summit. Sayles and Thomas [13] gave two discrete definitions of a summit, one using the five nearest-neighbor ordinates, and the other nine. Here a number of possibilities will be considered; to start with, the conventional five-ordinate definition will be used. The definition states that an ordinate at height z 0 above an arbitrary datum is a five-point summit if it is higher than the four neighboring ordinates, which are at a distance h from it, on the two orthogonal Cartesian axes. If the differences between z 0 and each of the four neighboring ordinates are denoted by s 1, s 2, s 3 and s 4, then the condition for the ordinate z 0 to be a summit is that s 1, s 2, s 3, and s 4 are all positive.

The summit density is the proportion of ordinates that are summits. The summit height is the height z 0 of an ordinate that is a summit.

It seems natural to extend to two independent dimensions the discrete definition of Whitehouse and Archard [14] for peak curvatures, which was given by

3.71 $C h = ( s 1 + s 2 ) h$

where s 1 and s 2 are defined later. This extension uses the average orthogonal peak curvatures, and gives the discrete definition of five-point summit curvature as

3.72 $C h ( 2 ) = 1 2 ( s 1 + s 2 + s 3 + s 4 ) / h 2$

which again is a linear combination of the ordinates.

#### 3.2.9.2  The Expected Summit Density and the Distributions of Summit Height and Curvature

The distributions of peak height and curvature for a profile in one dimension were obtained by Whitehouse and Phillips [4] for a surface with ordinates from a Gaussian (normal) distribution. (A multivariate normal distribution (MND) for a vector Z will be denoted by ZÑ[μ; V] where μ is the vector of means and Z is the variance–covariance matrix, and the PDF is given by ϕ(m)(z’−μ′; V).) They obtained these results from the theory of truncated random variables. This was because the peak height distribution is the conditional distribution of z 0, the profile height, given that s 1 and s 2 are positive, or, in other words, the distribution of z 0 conditional on s 1 and s 2 being positive, where

3.73 $s 1 = z 0 - z − 1 s 2 = z 0 − z 1$

and remembering that z 1 and z –1 are the preceding to z 0 and succeeding ordinate values, respectively, on the profile. Similarly the distribution of peak curvature is the conditional distribution of Ch given that s 1 and s 2 are positive. Hence the results of Whitehouse and Phillips [4] can be obtained by using the results for truncated random variables, with m = 2, $Y 0 ≡ Z 0 , X = ( 2 − 2 ρ 1 ) − 1 / 2 ( s 1 , s 2 ) ′ d = ( 1 / 2 − 1 2 ρ 1 ) 1 / 2$

and

3.74 $V = ( 1 1 2 ( 1 − 2 ρ 1 + ρ 2 ) / ( 1 − ρ 1 ) 1 2 ( 1 − 2 ρ 1 + ρ 2 ) / ( 1 − ρ 1 ) 1 )$

where ρ1 = ρ(h), ρ2 = ρ(2h).

The derivations can also be used to obtain the more important 3D (areal) distributions of five-point summit height and curvature. For this analysis the surface height measurements will be assumed to have a MND and, because the surface is assumed to be isotropic, the distribution properties of a profile are invariant with respect to the direction of the profile. Hence

3.75 $z 0 , ( 2 - 2 ρ 1 ) - 1 / 2 ( s 1 , s 2 , s 3 , s 4 ) ∼ N [ 0 ; V 5 ] ,$

where

3.76 $V 5 = ( 1 d d d d d 1 a b b d a 1 b b d b b 1 a d b b a 1 ) ,$
3.77 $a = 1 2 ( 1 - 2 ρ 1 + ρ 2 ) / ( 1 - ρ 1 ) and b = 1 2 ( 1 - 2 ρ 1 + ρ 3 ) / ( 1 - ρ 1 )$

where ρ1 = ρ(h), ρ2 = ρ(2h), $ρ 3 = ρ ( h 2 )$

, and ρ(h) is the correlation coefficient between ordinates a distance h apart.

If γ5 is the event (s 1 > 0, s 2 > 0, s 3 > 0, s 4 > 0) then the expected five-point summit density is the probability of Y 5 occurring, and the distribution of five-point summit height is the conditional distribution of Z 0 given that Y 5 has occurred. These can be obtained from m = 4,

3.78 $X = ( 2 - 2 ρ 1 ) - 1 / 2 ( s 1 , s 2 , s 3 , s 4 ) ′$
3.79 $d = ( 1 2 - 1 2 ρ 1 ) 1 / 2$

and V obtained from V 5 by removing the first row and column (and denoted by V 4) so that

3.80 $λ = 1 + a + 2 b .$

Then the PDF of the five-point summit height distribution is given by

3.81 $p ( z 0 / Y 5 ) = φ ( 4 ) ( z 0 [ ( 1 - ρ 1 ) / ( 1 + ρ 1 ) ] 1 / 2 ; V c ) φ ( z 0 ) / φ ( 4 ) ( 0 , V 4 ) ,$

where Vc is given by

3.82 $[ 1 a c b c b c a c 1 b c b c b c b c 1 a c b c b c a c 1 ]$
3.83 $a c = ( ρ 2 - ρ 1 2 ) / ( 1 - ρ 1 2 ) and b c = ( ρ 3 - ρ 1 2 ) / ( 1 - ρ 1 2 ) .$

The expected (average or mean) five-point summit height is given by E(z 0|Y 5), where E denotes the statistical expectation:

3.84 $E [ z 0 1 γ 5 ] = 2 [ ( 1 - ρ 1 / π ) ] 1 / 2 Φ ( 3 ) ( 0 ; B 4 ) Φ ( 4 ) ( 0 ; V 4 ) ,$

where

3.85 $B 4 = ( 1 − b 2 a − b 2 b ( 1 − a ) a − b 2 1 − b 2 b ( 1 − a ) b ( 1 − a ) b ( 1 − a ) 1 − a 2 ) .$

From Equation 3.82 it can be seen that the expected five-point summit height depends on two orthant probabilities Φ(3)(0; B4) and Φ(4)(0;V 4), which have to be evaluated. From Whitehouse and Archard [14], Φ(3)(0;B4) is given by

3.86 $Φ ( 3 ) ( 0 ; B 4 ) = 1 2 - ( 4 π ) - 1 cos - 1 [ ( a - b 2 ) / ( 1 - b 2 ) ] - ( 2 π ) - 1 cos - 1 { b ( 1 - a ) [ ( 1 - a 2 ) ( 1 - b 2 ) ] - 1 / 2 } .$

Cheng [15] has evaluated ϕ(4)(0;V 4), so using this result the expected five-point summit density has been found, using Plackett’s [16] method. As this orthant probability only depends on the parameters a and b, it will be denoted by ϕ(4) [a,b]. Then

3.87 $pr ( Y 5 ) = Φ ( 4 ) [ a , b ] = Φ ( 4 ) [ a , 0 ] + ∫ 0 b ∂ Φ ( 4 ) [ a , t ] δ t d t = [ 1 4 + ( 2 π ) - 1 sin - 1 a ] 2 + ( 2 π ) - 1 sin - 1 b + π - 2 ∫ 0 b ( 1 - t 2 ) - 1 / 2 sin - 1 [ t ( 1 - a ) ( 1 + a - 2 t ) - 1 ] d t .$

This result was given by Cheng [15] in terms of the dilogarithm function.

The distribution of a five-point summit having a height z 0 conditional on a curvature C (2) h is normal and is given by

3.88 $( Z 0 | C h ( 2 ) = c h ( 2 ) ) ∼ N [ h 2 ( 1 + a + 2 b ) - 1 c h ( 2 ) , 1 − 2 ( 1 - ρ 1 ) / ( 1 + a + 2 b ) ] .$

This is henceforth called the conditional distribution of summit height given curvature. Thus the expected five-point summit curvature is given by

3.89 $E [ C h ( 2 ) | Y 5 ] = h - 2 ( 1 + a + 2 b ) E [ Z 0 | Y 5 ] .$

Hence, by the application of the theory of Gaussian (normal) truncated random variables, it has been possible to obtain the expectations of the five-point summit height, curvature, and density in terms of the correlation coefficients. These are the basic tribological parameters required. The results have been arrived at by using a five-point model for summits (called the tetragonal model) and this implies a rectangular grid of sampled data. These results therefore can be regarded as the tribological parameters of a discrete random surface with values at the intersections of a rectangular grid. As long as the correlation is ρ1 and ρ2 at h and 2h in both directions or scaled accordingly the results are exact. A wide variety of surfaces can be modeled by allowing ρ1 and ρ2 to vary.

#### 3.2.9.3  The Effect of the Sample Interval and Limiting Results

The distributions of five-point summit height and curvature have been derived in terms of the correlation coefficients between ordinates. These correlation coefficients are ρ1 for ordinates a distance h apart, ρ2 for ordinates 2h apart, and ρ3 for ordinates $2 h$

apart. If the surface is isotropic then

3.90 $ρ 1 = ρ ( h ) ρ 2 = ρ ( 2 h ) and ρ 3 = ρ ( h 2 ) .$

So ρ1, ρ2, and ρ3 will vary as h varies, depending on the shape of the autocorrelation function of the surface.

As h approaches zero

3.91 $lim h → 0 ρ 1 ( h ) = lim h → 0 ρ 2 ( h ) = lim h → 0 ρ 3 ( h ) = 1$

and as h approaches infinity

3.92 $lim h → ∞ ρ 1 ( h ) = lim h → ∞ ρ 2 ( h ) = lim h → ∞ ρ 3 ( h ) = 0.$

If ρ1, ρ2, and ρ3 are plotted in three dimensions then, as h varies, the curve will start at (1,1,1) for h = 0 and end at (0,0,0) for h = +∞ In order that the matrix V4–d2J is positive definite it is necessary for this curve to lie in the region bounded by ρ2 < 1 and $1 / 2 ( 1 + ρ 2 ) + 2 ρ 1 2 < ρ 3 < 1 2 ( 1 + ρ 2 )$

.

Results for the summit height have been obtained by Nayak [11] for the continuous surface, so it is possible to compare his results with those obtained for the discrete results of this chapter as the sampling interval h approaches zero. The expected summit height depends on ρ1, ρ2, and ρ3 through a and b, and

$lim h → ∞ a = - 1 and lim h → ∞ b = 0$

The autocorrelation function of the surface can be approximated by Taylor’s expansion

$ρ ( h ) = 1 + D 2 h 2 / 2 ! + D 4 h 4 / 4 ! + O ( h 4 )$

where D 2 and D 4 are the second and fourth derivatives of the autocorrelation function at the origin and if

3.93 $η = - D 2 ( D 4 ) - 1 / 2 < 2 3 .$

The limiting value for the expected five-point summit height is given by

3.94 $lim E [ Z 0 | Y 5 ] = 1 6 π + 2 sin - 1 ( 1 / 3 ) + 4 2 ( 1 2 π ) 1 / 2 η = 1.688 ( 1 2 π ) 1 / 2 η .$

Nayak [11] showed that the expected continuous summit height for the areal case was

3.95 $E [ z | Y N ] = ( 4 2 π ) ( 1 2 π ) 1 / 2 η = 1 . 8 0 1 ( 1 2 π ) 1 / 2 η ,$

which is comparable with the expected continuous peak height for the profile

3.96 $E [ z | Y p ] = ( 1 2 π ) 1 / 2 η$

a result given by Whitehouse and Phillips [4]. Then it can be seen that the limit of the expected five-point summit height (Equation 3.94) is 69% larger than the expected peak limit (Equation 3.96) as opposed to 80% larger than the expectation of the distribution of summit height for the continuous definition of Nayak [11] (Equation 3.94). However, this is only an overall reduction of about 6% and suggests that the discrete five-point definition may be adequate. Compare this with the seven-point result, Equations 3.128 and 3.132 below.

It is possible to obtain Φ(4)(z 0[(l–ρ1)/1 + ρ1)]1/2; V c) by the methods of [15] and [16], and hence to obtain the PDF of the limiting distribution of the five-point summit height as h converges to zero. This is given by

3.97

where

3.98 $T ( 2 ) ( ω , v ) = 1 2 π ∫ 0 t ) ( 1 + x 2 ) - 2 exp ( - ω 2 2 ( 1 + x 2 ) ) d x$

and

3.99 $w = η ( 1 - η 2 ) - 1 / 2 z 0 .$

This PDF is compared with the PDF of continuous summit height, given by

3.100

which was obtained by Nayak [11] for three values η = 0, $1 / 3$

and $2 / 3$ , as shown in Figure 3.19. For η = 0 both distributions are the standardized normal distribution. When $η = 2 / 3$ , the PDF of the continuous summit height is

3.101 $p ( z | Y N ) = 2 3 { z 2 - 1 + 2 π [ φ ( z ) ] 2 } φ ( z ) for z > 0 = 0 for z ≤ 0$

while the limiting distribution of the five-point summit height is given by

3.102

Figure 3.19   The probability density function (PDF) of the distribution of summit height (full line) and the limiting distribution of the five-point summit height (broken line) for η = 0, square root 1/3, and square root 2/3.

Nayak [11] used the definition of mean summit curvature K m given by Sokolnikoff as minus the average of the second partial derivatives in orthogonal directions.

3.103 $K m = - 1 2 [ ∂ 2 z ∂ x 2 + ∂ 2 z ∂ y 2 ]$

With this definition the conditional distribution of continuous summit height given the curvature is a normal distribution with

3.104 $E p [ z / K m , Y N ] = 3 2 η K m ( D 4 ) - 1 / 2$

and

3.105

This is also the limit of the conditional distribution of z 0 given C (2) h . Hence, the limit as h tends to zero of the expected five-point summit curvature will be 6% smaller than the expectation of the continuous summit curvature. Thus the operational procedure of sampling in a plane and taking the sample interval to zero gives different results from the continuous values for the surface!

Greenwood [12] approached the problem of profile measurement in the same basic way using the multinormal distribution from the joint distribution of height z, slope m, and curvature k:

3.106 $p ( z , m , k ) = 1 2 π 3 / 2 1 - r 2 × exp ( - 1 2 ( 1 - t 2 ) ( ξ 2 + t 2 - 2 ξ t ) ) exp ( - s 2 2 )$

where ξ = z/σ, s = m m, and t = –k k , and ξ, s, and t are the standardized height, slope, and curvature; z, m, and k normalized to their standard deviations σ, σ m , and σ k , respectively. In Equation 3.103 r is σ2 m /σσ k and corresponds to the variable α = m 0 m 4/m 2 2 as t –2.

Greenwood introduces another variable, θ to replace in part the sampling interval defined by

3.107 $sin θ = h σ k / 2 σ m$

where σ m is the standard deviation of the slope and σ k that of curvature. If h is the sampling interval, this transcription from h and the correlation coefficients ρ(0), ρ(h), and ρ(2h) preferred by Whitehouse and Phillips for a profile makes some of the subsequent equations simpler in form. However, the concept of sampling interval is masked, which is a disadvantage to the investigator.

Formulae for the joint distributions of height and curvature are found, as are those for summit curvature distributions (Figure 3.20).

Figure 3.20   Effect of surface character on surface features.

The reader is requested to consult the paper by Greenwood [12] to decide the simplest approach to the areal problem. However, it is gratifying to note that the results and predictions of both methods are compatible.

#### 3.2.10  Patterns of Sampling and Their Effect on Discrete Properties (Comparison of Three-, Four-, Five- and Seven-Point Analysis of Surfaces)

An investigation of the various sampling pattern options open to a researcher will be given here to see if there is any advantage to be had by changing from the conventional rectilinear sampling pattern. The real issue is to find what sampling pattern best covers the area and picks out summits. Obviously, there are instrumental factors that have to be taken into account as well as purely theoretical ones. One such factor is the ease with which a complex sampling pattern can be achieved by hardware comprising a specimen table driven by two orthogonal, motorized slides.

One thing that emerges from these sampling procedures is that the properties so revealed are substantially different from each other.

#### 3.2.10.1  Four-Point Sampling Scheme in a Plane

A comparison of possible sampling schemes for sampling in an areal plane will now be presented. They are illustrated in Figures 3.21 and 4.73. In all cases the distance between measurements is h.

First, there is sampling along a straight line (from a profile of the surface). This sampling scheme only takes measurements in one dimension of the plane. This has been presented for completeness and because it was the first case considered by Whitehouse and Archard [9] and Whitehouse and Phillips [4]. In Figure 3.21a it is illustrated with k = 2.

Second, a sampling scheme could be used that would take measurements at the vertices of a hexagonal grid. The summit properties could be defined using four ordinates, that is the measurement at a vertex and the three adjacent ordinates at a distance h from the chosen vertex. This is the case when k = 3 and is referred to as the trigonal symmetry case.

Figure 3.21   Sampling patterns: (a) three points (digonal), (b) four points (trigonal), k = 3, and (c) five points (tetragonal), k = 4.

In order to produce such a hexagonal grid pattern on a surface it would be necessary to sample along parallel lines separated alternately by a distance of ½h and h. The spacing between ordinates along a line would be $3 h$

but the position at which the first ordinate is measured would be 0 for the (4j–3)rd and 4jth lines and sinθ = hσ k /2σ m $3 h$ for the (4j–2)nd and (4j–1)st lines for j > 1. This is illustrated in Figure 3.21b. Alternatively, it would be possible to sample along parallel lines a distance of 1/2. $3 h$ apart, but this would involve a different position for the first ordinates and the spacing between ordinates would alternatively be h and 2h.

Third, there is sampling on a square grid. This was considered by Whitehouse and Phillips [4] and Greenwood [5] and will be referred to as the tetragonal symmetry case. It is illustrated in Figure 3.21c with k = 4. The sampling scheme requires sampling along parallel lines separated by a distance h and with a spacing between ordinates along a line of h.

Results will also be given for the hexagonal grid or trigonal symmetry and the hexagonal case with k = 6. For purposes of comparison, the cases when k = 2, 3, and 4 will also be considered from earlier. The notation will be used below. If the m random variables x = (x 1, x 2,..., xm ) have a joint multivariable Gaussian (normal) distribution with mean μ and variance–covariance matrix V then this is denoted by xN[μ,V]. Also the convention of using an upper-case letter for a random variable and a lower-case letter for a realization of the random variable will be followed, as in the previous section.

#### 3.2.10.2  The Hexagonal Grid in the Trigonal Symmetry Case

Results have been obtained for the PDF and expectation (mean) of peak (or summit) height, the density of summits and the expected peak (or summit) curvature in the cases when k = 2 by Whitehouse and Phillips [17] and when k = 4 by Whitehouse and Phillips [4] and Greenwood [12]. The results for the hexagonal grid (k = 3) in the trigonal symmetry and the hexagonal case k = 6 will now be given [14]. These can be obtained from the general results of truncated random variables in the appendix of Whitehouse and Phillips [4].

For measurements with four ordinates let z 0 be the height of the central ordinate and s 1, s 2, and s 3 be the differences between this ordinate and the three adjacent ordinates at a distance h. The ordinate z 0 will be defined to be a four-point summit if s 1, s 2, and s 3 are all positive. By analogy with the three-point and five-point definitions of curvature the discrete definition of four-point curvature is

3.108 $C = 2 ( s 1 + s 2 + s 3 ) 3 h 2 .$

Assuming that the surface height measurements have a multivariate Gaussian distribution and that the surface is isotropic then

3.109 $( z 0 , 2 - 2 ρ 1 ) - 1 / 2 ( s 1 , s 2 , s 3 ) ) ∼ N [ 0 ; V 4 ] .$

The probability that s 1, s 2, and s 3 are all positive gives the probability that an ordinate is a four-point summit. Thus, again using the nomenclature of Cheng [15],

3.110 $V 4 = ( 1 d d d d 1 a a d a 1 a d a a 1 )$

with

3.111 $d = ( 1 2 - 1 2 ρ 1 ) 1 / 2 and a = 1 2 ( 1 - 2 ρ 1 + ρ 3 ) ( 1 - ρ 1 ) ,$

where ρ1 = ρ(h) and $ρ 3 = ρ ( 3 h )$

.

If Y 4 is the event (s 1 > 0, s 2 > 0, s 3 > 0) then the distribution of four-point summit height is the conditional distribution of z 0 given that Y 4 has occurred. This can be obtained using the results of Whitehouse and Phillips [4] with m = 3:

3.112 $X = ( 2 - 2 ρ 1 ) - 1 / 2 ( s 1 , s 2 , s 3 )$
3.113 $d = ( 1 2 - 1 2 ρ 1 ) 1 / 2$

and

3.114 $V = V 3 = ( 1 a a a 1 a a a 1 )$

so that

3.115 $λ = 1 + 2 α .$

Then the PDF of the four-point summit height distribution is given by

3.116 $p ( z 0 | Y 4 ) = Φ ( 3 ) ( z 0 [ ( 1 - ρ 1 ) / ( 1 + ρ 1 ) ] 1 / 2 ; V c ) Φ ( z 0 ) Φ ( 3 ) ( 0 ; V 3 )$

in exactly the same way as for the tetragonal case where

3.117 $V c = ( 1 a c a c a c 1 a c a c a c 1 )$

and

3.118 $a c = ( ρ 3 - ρ 1 2 ) / ( 1 - ρ 1 2 )$

Φ(n)(z’; V) is the cumulative distribution function at z’ of the n-dimensional multivariate Gaussian distribution with zero expectation and variance–covariance matrix V. z is used for (z, z 2, z 3,...) = z’. Φ(x) is the PDF of the univariate standard Gaussian distribution.

The denominator of Equation 3.116 is the orthant probability which gives the probability that an ordinate is a four-point summit. Hence

3.119 $pr ( Y 4 ) = Φ ( 3 ) ( 0 ; V 3 ) = 1 2 - 3 4 ( π ) - 1 cos - 1 a .$

The expected (mean) four-point summit height is given by

3.120 $E [ z 0 | Y 4 ] = 3 ( 1 - ρ 1 ) 1 / 2 Φ ( 2 ) ( 0 ; B 3 ) 2 π Φ ( 3 ) ( 0 ; V 3 )$

where

3.121 $B 3 = [ 1 − a 2 a ( 1 − a ) a ( 1 − a ) 1 − a 2 ] .$

Hence

3.122 $E [ z 0 1 Y 4 ] = 3 ( 1 - ρ 1 ) 1 / 2 [ 1 4 + 1 2 π sin - 1 ( 1 - 2 ρ 1 + ρ 3 3 - 4 ρ 1 + ρ 3 ) ] / { 2 π [ 1 2 - 3 4 π cos - 1 ( 1 - 2 ρ 1 + ρ 3 2 - 2 ρ 1 ) ] } .$

The distribution of the height z 0 of a four-point summit conditional on curvature C is Gaussian with an expectation given by

3.123 $E [ z 0 | C = c , Y 4 ] = 3 h 2 ( 1 − ρ 1 ) c 4 ( 2 − 3 ρ 1 + ρ 3 )$

and variance given by

3.124

This is the same as the distribution of the height z 0 of an ordinate conditional on the four-point curvature but not conditional on the ordinate being a summit, and is a result which holds for the three values of k = 2, 3, and 4. This is because Vk is of the form

3.125 $V k = ( 1 d l ′ d l V ¯ )$

where V is a correlation matrix with a constant row (column) sum ensured by cofactors dl and dl’ This result enables the expected (k + 1)-point peak (or summit) curvature to be obtained from the expected (k + 1)-point peak (or summit) height.

Hence the expected four-point summit curvature is given by

3.126 $E [ C | Y 4 ] = 4 ( 2 - 3 ρ 1 + ρ 3 ) 3 h 2 ( 1 - ρ 1 ) E [ Z 0 | T 4 ] = 2 ( 2 - 3 ρ 1 + ρ 3 ) [ 1 4 + 1 2 π sin - 1 ( 1 - 2 ρ 1 + ρ 3 3 - 4 ρ 1 + ρ 3 ) ] / { h 2 [ π ( 1 - ρ 1 ) ] 1 / 2 [ 1 2 - 3 4 π cos - 1 ( 1 - 2 ρ 1 + ρ 3 2 - 2 ρ 1 ) ] } .$

It is also possible to obtain the following simple connection between the variances of the (k + 1)-point peak (or summit) height and curvature:

3.127 $v a r [ Z 0 1 T k + 1 ] = v a r ( Z 0 1 C , Y k + 1 ) + [ E [ Z 0 1 C = C , T k + 1 ] C ] 2 v a r ( C , T k + 1 )$

This relation was shown by Whitehouse and Phillips [4] for k = 1 (with z 0y 0) and by Greenwood [12] for k = 4.

So, by the application of the theory of Gaussian truncated random variables, it has been possible to obtain connections between the expectations and variances of four-point summit and curvature. Also, it is straightforward to show, without getting involved in the calculation that the probability that an ordinate is a summit for a hexagonal sampling model is with

### Table 3.3   Expected Summit (Peak) Density

3.128 $p r ( Y 7 ) = Φ ( 6 ) ( 0 ; V 6 ) ,$
3.129 $V 6 = ( 1 1 2 b c b 1 2 1 2 1 1 2 b c b b 1 2 1 1 2 b c c b 1 2 1 1 2 b b c b 1 2 1 1 2 1 2 b c b 1 2 1 )$

where

3.130 $b = ( 1 - 2 π 1 + π 3 ) / ( 2 - 2 π 1 ) c = ( 1 - 2 π 1 + π 2 ) / ( 2 - 2 π 1 ) π t = π ( t Δ h )$

and Δ = 21/2/31/4, giving

3.131 $E [ z | Y 7 ] = 3 ( 1 - ρ 1 ) / π ⋅ φ 5 ( 0 , B 6 ) p r ( Y 7 ) ,$

where B 6 is the variance-covariance matrix of the conditional distribution of the differences s 1,s 2,...,s 5 given s 6, from which [20]

3.132 $E [ z | Y 7 ] = 3 ( 1 − ρ 1 ) / π p r ( Y 7 ) [ 6 π - 1 2 tan - 1 ( 1 / 2 ) - 2 π ] 8 3 π 2 Δ ( D 4 - D 2 ) 1 / 2 h .$

From this the limiting values as h→0 can be found and inserted in Table 3.3.

It can be seen from Figure 3.20 that the values of the summit properties approach that of the continuous case as k increases—a not unexpected result. It has also been shown for the hexagonal case k = 6 that the data storage is 13% less than for the rectangular case and in many instances the processing is quicker (see Ref. [52]). Also, if faults are present in the surface they are more easily detected. This result is shown by Whitehouse and Phillips [18].

The most important point to emerge is that the best results are those where the sampling pattern follows most closely the areal bandwidth pattern of the surface. For example, the hexagonal case k = 6 is most suitable for isotropic surfaces which have circular symmetry about the origin in frequency (and wavelength) [19]. In the case of anisotropic surfaces a suitably scaled sample interval in both directions with a rectangular grid for the tetragonal case will probably be best.

So simply applying a sampling procedure to a surface is not good enough. The sampling pattern should, whenever possible, image the surface properties. There is obviously another good case here for looking at the surface before trying to evaluate it.

#### 3.2.10.3  The Effect of Sampling Interval and Limiting Results on Sample Patterns

It is important to investigate the variation of parameters with h because it is due to the large number of possible differences in sampling interval that the scatter of measured values of parameters occurs between investigators [15].

The distributions of four-point summit height and curvature have been derived in terms of correlation coefficients between ordinates. These two correlation coefficients are ρ1 for ordinates a distance h apart, and $ρ 3$

for ordinates a distance $3 h$ apart. If the surface is isotropic and the autocorrelation function is ρ(x), then ρ1 = ρ(h), and $ρ 3 = ρ ( 3 h )$ . So ρ1 and ρ3 will vary as h varies, depending on the shape of the autocorrelation function of the surface.

Results for the summit height have been obtained for the continuous surface, so it is possible to compare these results with those obtained earlier for the discrete results as the sampling interval h converges to zero.

To do this it is necessary, as before, to make assumptions about the behavior of the autocorrelation function ρ(h) near the origin. It will be assumed as before that

3.133 $ρ ( h ) = 1 + D 2 h 2 / 2 ! + D 4 h 4 / 4 ! + O ( h 4 )$

where

3.134 $D 2 < 0 D 4 > 0 and η = − D 2 ( D 4 ) − 1 / 2 < 5 6 .$

D 2 and D 4 are the second and fourth derivatives of the autocorrelation function at the origin.

Comparison will be made for the estimates of parameters measuring peak and summit properties of the surface. This will be done for the four cases of three-, four-, five-, and seven-point estimates.

The first parameter that will be considered is the density of peaks or summits. These parameters are known for a continuous random Gaussian surface and were given for peaks as

3.135 $D peak = 1 2 π D 4 − D 2 .$

by Rice [5] and for summits as

3.136 $D sum = 1 6 π 3 ( D 4 − D 2 )$

by Nayak [11].

The density of the peaks or summits is the number of peaks per unit length or summits per unit area, using the (k + 1)-point definition of peak for k = 2 and summit for k = 3, 4, and 6. The expected density of peaks or summits is given by the product of pr(Y k + 1) and the density of ordinates, where T k + 1 is the event (s 1 > 0,..., sk > 0) and s 1 to sk are the differences between the central ordinate and the k adjacent ordinates at a distance h.

The limiting behavior of pr(Y k + 1) as h tends to zero, the density of ordinates and the limit of the expected density of peaks (or summits) are given in Table 3.3. The limits are given in terms of the limiting results for a continuous surface given by Refs. [4, 11]. It can be seen that the density of peaks (when k = 2) converges to the continuous limit. This is not the case for summits (when k = 3, 4, and 6). In these cases the density would be overestimated by 73%, 31%, and 4%, respectively (see Figure 3.20).

The second parameter that will be considered is the average peak (or summit) height. The results are known for a continuous random Gaussian surface and were given for peaks as

3.137

by Rice [5] and Whitehouse and Phillips [4] and for summits as

3.138

by Nayak [11]. Hence the average summit height is 80% higher than the average peak height.

### Table 3.4   Expected Summit (Peak) Height E[Z 01/2Yk + 1]

Again the expected height of peaks (when k = 2) converges to the continuous limit for peaks on a profile, but this is not the case for summits (when k = 3 and 6) as is seen in Table 3.4. In these cases the expected summit height is underestimated by 13% for the four-point case, by 6% for the five-point case, and by only 1% for the hexagonal case.

Because the conditional distribution of height given curvature is Gaussian with a mean which is a linear function of curvature, for all values of k, the expected summit curvature will converge in the same manner as the expected summit height.

To study the qualitative effect of the change of the sampling interval h on the digital measurements of an isotropic surface it is necessary to specify a model for the autocorrelation function of the surface. Note that any autocorrelation could be used. For the model to fit in with observed autocorrelation functions of surfaces it would be desirable to have a negative exponential function with a multiplicative periodic function. Whitehouse and Phillips [4] “smoothed” the exponential cosine function to give a function that was smooth at the origin. This alternative approach replaced the negative exponential function by another function which is smooth at the origin but behaves like the negative exponential function for large lag values. Lukyanov [21] has used such models extensively. Both of these are close to the autocorrelation functions of many typical practical surfaces. Note, however, that the results are quite general and that the specific models are only used to give an idea of actual scale. This model autocorrelation function is given by

3.139 $ρ ( h ) = sec h ( 1 2 π h A ( θ ) ) cos ( 2 π θ h A ( θ ) )$

where

3.140

where h is the sampling interval. The values of θ used are 0 and 1/2. For this autocorrelation function

3.141 $D 2 = - ( 1 / 2 π ) 2 [ 1 + ( 4 θ ) 2 ] [ A ( θ ) ] 2$

and

3.142 $D 4 = + ( 1 / 2 π ) 4 [ 5 + 6 ( 4 θ ) 2 + ( 4 θ ) 4 ] [ A ( θ ) ] 4 .$

A point needs to be made here. The rather complicated correlation function (Equation 3.139) is not the only function for which the analysis will work. It will work for any correlation function having ρ1, ρ2, etc., as values at the given spacings. This is wide enough to cover most, if not all, reasonable surfaces. The reason for this complex correlation function is simply to ensure that it has near perfect properties at the origin and elsewhere just to forestall any criticism. For practical cases the exponential, Gaussian, Lorentzian or whatever correlation function could have been picked to give an idea of the quantitative effects of sampling on the tribological properties of random surfaces.

What these results show is that by measuring the correlation values on a surface at spacings h, 2h, $3 h$

, etc., the tribo-logical summit parameters can be found simply by inserting the values of ρ1, ρ2, etc., into the derived formulae. There is no need to measure the surface itself to get curvatures etc., providing that it is reasonably Gaussian (to within a skew of ±1, see Staufert [22]). The very tricky parameters can be simply calculated by knowing the correlation values.

For theoretical purposes a model of the surface can be devised as above and the values of ρ1, ρ2, etc., calculated. The tribological parameters can then be similarly evaluated.

Figure 3.22   Expected density of summits (four and five points) of peaks (three points).

Figure 3.23   Expected density of summits (four and five points) of peaks (three points), θ = 1/2.

The expected density of summits is given in Figures 3.22 and 3.23 and the expected height of peaks or summits is given in Figures 3.24 and 3.25 for the autocorrelation function for θ = 0 and 1/2.

Figure 3.24   Expected height of summits (four and five points) of peaks (three points) θ = 0.

Figure 3.25   Expected height of summits (four and five points) of peaks (three points), θ = 1/2.

The expected four-, five-, and seven-point summits differ little as the sampling interval h exceeds one correlation length. For smaller sampling intervals the four-point expected density of summits exceeds that for the five- and seven-point expectations.

#### 3.2.11  Discussion

The technique of using discrete measurements has application in fields where it is expensive or time consuming to obtain large amounts of data. The reason for the analysis into sampling schemes is to try and see whether taking measurements using a non-conventional sampling scheme would produce any advantages to outweigh the disadvantage of complexity. The advantages are less information to collect, easier analytical derivation of theoretical results and simpler numerical methods.

The sampling schemes that were considered all had the property that the information could be collected by sampling along parallel straight lines with a fixed sampling interval. (It might be necessary, however, to have a variable starting point, though this would follow a regular pattern.)

This ensured that if a measurement (ordinate) was chosen when using a particular scheme it would always have the same number of adjacent ordinates at a distance h (the chosen sampling interval), provided the chosen ordinate is not on the boundary.

From the point of view of simplicity of sampling mechanism the square grid (k = 4) in the tetragonal case is the best. In this case the spacing between the lines is constant and equal to the sampling interval h along the line. Also the starting points for the sampling all lie along a straight line. However, other schemes do have advantages to offset their complexity.

The trigonal (k = 3) case has the advantage that measurements of slope can be taken in three directions as opposed to two for the tetragonal (k = 4) case. Though the theoretical results have been restricted to the consideration of isotropic surfaces it may still be of practical value to be able to check the assumption of isotropicity in more than two directions.

The trigonal (k = 3) case can be obtained by an alternative sampling method but this involves alternating the sampling interval from h to 2h. This alternative method is equivalent to rotating the grid through π/6. For k = 6 the sampling is very straightforward, as can be seen from Figure 3.21.

From the point of view of collecting digital information, the trigonal (k = 3) case is preferable as “less” information is collected. The density of ordinates is $( 4 / 3 3 ) h 2 ( = 0.77 / h 2 )$

(= 0.77/h2) compared with 1/h2 for the square grid (k = 4), so in the same area 23% fewer ordinates would be needed. The advantage of this would need to be weighed against the disadvantages.

Another advantage of the trigonal (k = 3) case is that fewer ordinates are used when defining the properties of the extremities. To check the definition of a four-point summit only three conditions have to be obeyed, as opposed to four conditions for the five-point summit and six for the seven point. It should also be noted that some properties of the discrete defined random variables, such as the limiting values of the (k + 1)-point summit (or peak) height as the sampling interval tends to infinity, are simply a function of the numerical definition and are independent of the surface being measured. The problem is that the surface has to be measured to get any values for the properties.

Any discrete measurement of a surface must lose information compared with a complete “map” of the surface. This is inevitable! However, ideally, any discrete measurement should produce results that converge to the results for the continuous surface as the sampling interval h tends to zero.

For sampling along a straight line (k = 2) it has been seen that the discrete results do converge to those for the continuous profile. They do not, however, converge to the results of the areal measurement. For example, D 2 peak = 0.83 Dsum, so that assuming independent measurements at right angles would produce a limit that is 17% too small.

For areal measurements when sampling with k = 3, 4, or 6, the limiting results for expected summit density and expected summit height do not converge to the results for the continuous surface. In the case of expected summit density the limit is 73% too large for k = 3, 31% too large for k = 4 and 4% for k = 6. Again for expected summit height, the case k = 3 is worse than for k = 4 and k = 6 but the differences are not so large. This suggests that some surface parameters may be estimated by discrete methods fairly well but others may not. For the case of average profile slope three sampling schemes agree (for k = 2, 3, and 4) but this is, of course, an essentially profile parameter.

In order to consider the merits of sampling schemes it is necessary to study their theoretical properties. By doing this it is possible to obtain new insights into the general problem, but only by using models which lead to tractable mathematics. The three sampling schemes with k = 2, 3, and 4 considered here have been chosen because they have a common property that enables them to be investigated using the analytical results of theoretical statistics. Using the trigonal (k = 3) symmetry case leads to a simpler mathematical model than for the tetragonal (k = 4) symmetry case, as this reduces the dimension by one. However, taken as a whole it may be that a hexagonal sampling plan where k = 6 offers the maximum benefit in terms of the three criteria mentioned above. One message which has emerged is that the conventional grid pattern method of sampling is not necessarily the best. The implications of this in general pattern recognition and image analysis scanning systems could be significant.

The result given here has been concerned primarily with the effect of sampling patterns on the values of parameters obtained from measured surfaces. It has not therefore been aimed at investigating the actual nature of surfaces in general. Well-behaved correlation functions have been assumed and certain specific examples have been used to give the researcher an idea of the value of parameter changes that might be expected to occur on typical measured surfaces. This has been justified by the fact that to some extent all instruments used for obtaining the data have a short-wavelength filter incorporated, whether it is a stylus or a light spot, which tends to force the correlation at the origin to be smooth. However, there can be no denying that the very nature of random manufacture encourages the presence of exponential and other misbehaved characteristics in the correlation function. The effect of sampling patterns on such fractal (multiscale) surfaces will be the subject of further comment in Chapter 7.

Rosen [62] has made some interesting comments concerning sampling strategies. He proposes that the sampling rate should be flexible during a measurement: the rate depending upon the nature of the surface. Alternative “sparse matrices” with variable sampling distances and sampling patterns which include star and cross as well as the conventional orthogonal sampling are suggested, Also “adaptive sampling” which uses the surface “ as found” as the basis for determining the spacing, as an alternative to fixed sampling. He maintains that the level of confidence of a reading should match the importance of the feature being measured. His philosophy is that the measurement strategy should be related to the variation of the topography over the whole product and not just one part of it. A good way to approach this problem, he says is functionally: a functional element like a piston ring sliding along a surface will “feel” the variation of the cylinder liner roughness and will react accordingly regarding the oil film formation and the sealing properties, which will strongly affect the life, energy consumption, emissions, and power loss of the engine.

He splits the problem into two: one tackling the local variations and the other the global variations.

The technique proposes the production of parameter variation maps, both globally and locally, to make sure that critical measurements are taken where needed most and that only enough measurements are taken to ensure a specific level of confidence and that once achieved the measurements stop. This way the dual objectives are met namely that the measurements are significant and that the time used is a minimum. The criteria for this sampling plan are admirable but the problem is that there would have to be as many plans as functions and consequently would be too flexible to be practical.

In the above sections much emphasis has been placed on statistical parameters, distributions and functions. Some effort has to be expended to make sure that the values obtained are realistic. The next section therefore will revert back to a simple discussion of the processing of these parameters. However, because of its central importance in random process theory, the role and evaluation of Fourier transforms will be described here.

#### 3.3.1  Amplitude Probability Density Function

This contains the height information in the profile. It has been used extensively in one form or another in surface analysis. Various terms are used to describe the function, its derivatives and its integral.

In Figure 3.26 the value for the amplitude probability density function (APDF) at a given height is called p(z) and is the number of profile ordinates having a numerical value between z and z + δz divided by the total number of profile ordinates. There are two basic ways of evaluating the APDF digitally. The first and most obvious way is to select a height interval, say between z and z + δz, and to scan through the whole profile data, counting how many ordinates lie within this height range. The height interval is changed and the operation is repeated. This is carried on until all the vertical range of the profile has been covered. In this method the height interval is selected before the counting is started. The other method involves examining every profile ordinate in sequence, finding to which height interval it belongs and registering a count in the store position corresponding to this level. The advantage of the last method is that the whole of the amplitude distribution (APDF) is found after just one traverse and the only store requirement is that of the number of height intervals to cover the height range; the profile itself need not be stored. Measurement of the central moments, skew and kurtosis can be made from the APDF directly.

Flow Chart 3.1   Global sampling strategy. (From Rosen, B-G., XII Int. Colloquium on Surfaces, Chemnitz, January 28–29, 2008. With permission.)

Figure 3.26   Statistical height distributions.

For instance, the RMS value Rq is given by

3.143 $R q 2 = ∑ i = 1 n ( i Δ z ) 2 p i$

where Δz is the quantization interval of the pockets and where the mean

3.144 $z ¯ = ∑ i = 1 n ( i Δ z ) p i$

and there are N levels of pi , in the APDF. Notice that the Rq value is subject to an error of $Δ z / 12$

which is a penalty for being in discrete rather than continuous form. These moments can be measured more accurately without recourse to increasing the store as running summations in terms of each other. It is not necessary to determine pi as such, the natural quantization interval of the incoming data q being the only limitation in accuracy. Thus

3.145 $R q = | 1 m ∑ i = 1 m z i 2 − ( 1 m ∑ i = 1 m z i ) 2 | 1 / 2$

where the number of ordinates is m and the z values are measured from an arbitrary datum.

#### 3.3.2  Moments of the Amplitude Probability Density Function

3.146 $skew = [ 1 m ∑ i = 1 m z i 3 ∑ z i ∑ z i 2 + 2 m 3 ( ∑ z i ) 3 ] / [ 1 m ∑ z i 2 − ( 1 m ∑ z i ) 2 ] 3 / 2$
3.147 $Kurtosis = [ 1 m ∑ z i 4 + 4 m 2 ∑ z i ∑ z i 3 + 6 m 3 ( ∑ z i 2 ) − 3 m 4 ( ∑ z i ) 4 − 3 ] / [ 1 m ∑ z i 2 − ( 1 m ∑ z i ) 2 ] 2 .$

In other words the central moments can all be expressed in terms of the moments about an arbitrary level. It is the central moments which contain the information in surface metrology.

All the important moments can be found directly from the APDF, even the Ra value. It is given by

3.148 $R a = 2 ( z ¯ ∑ i = 1 z i < z ¯ N p i − ∑ i = 1 z i < z ¯ Δ z i ⋅ p i )$

so that, providing the APDF is measured as the input signal enters, all the moments, peak values (subject to a Δz limit on accuracy) and Ra can be measured without actually storing the data!

The distribution function is merely the cumulative sum of pi , up to a given level. Thus

3.149 $P j = ∑ i = 1 j p i .$

The bearing ratio, material ratio, or Abbott–Firestone curve often used in surface metrology are all 1–Pj .

One of the problems in assessing the profile parameters without evaluating pi , is that the number of ordinates can be so great that the summations may overload the computational word length.

These techniques are not restricted to the measurement of a surface profile; any waveform can be analyzed in the same way whether it is derived from the profile itself, the slope of the profile or whatever. There is a problem, however, in isolating the different components of, say, texture. This is because, although these quick and economic techniques for statistical moment evaluation do not require knowledge of the mean level, they do assume that the signal follows the general direction of the surface. In other words, it is not possible to eliminate the need for a reference line but it is possible to relax the stringent positioning of it relative to the profile. Fortunately this is not a problem in measuring slope or curvature; the general trend of the waveform will be zero because of the differentiation that takes place.

#### 3.3.3  Autocorrelation Function

There are a number of ways in which the autocorrelation function can be evaluated; which one is chosen depends on time and space factors. Most often the temporal (serial) form is used for convenience although the ensemble (parallel) method is strictly the correct method. The most obvious way for the temporal method is to store all the initial filtered data of the profile, say m points, z 1, z 2,..., zm , and then to evaluate, step by step, the correlation coefficient corresponding to each lag point, making sure that the average values are taken. Thus for a lag jΔx the correlation array

3.150 $A ( j Δ x ) = 1 m − j + 1 ∑ i = 1 m − j + 1 z i z i + j / 1 m ∑ i = 1 m z i 2 .$

This method requires that all the data are stored, which may well not be convenient. Under these circumstances an alternative way is possible which, although taking longer, requires considerably less storage. The data are sifted and operated on as it becomes available from the input; the correlation array is built up in parallel rather than serial as in the first method.

Organizationally the operation is as shown in Figure 3.27. After the m ordinates have been operated on the array A(1(r)j) is normalized with respect to the variance of zi with the mean value removed and the result in A(1(r)j) is the autocorrelation function. Only j data storage locations have been required as opposed to m for the previous method. Because j is of the order of m/10 a considerable gain in space can be achieved. This method is used when storage is at a premium. Fortunately, with the spectacular increase in storage capability schemes like this are becoming redundant.

Figure 3.27   Autocorrelation computation.

By far the most often used method is that involving the FFT simply because of the time saving. The method is different from the others because it involves getting the power spectral density first and then deriving the autocorrelation from it.

#### 3.3.4  Autocorrelation Measurement Using the Fast Fourier Transform

Schematically the calculation follows the steps below:

• Profile z(x)→z(kΔx) (sampled profile) fast Fourier transform
• Discrete Fourier transform using fast Fourier transform
• P(hΔf) power spectral density
• Discrete inverse Fourier transform using the fast Fourier transform
• A(iΔτ) autocorrelation function

In this technique the basic FFT operation is performed twice, once to get the power spectrum and once to get the autocorrelation function from it. This is the reverse of the old method in which the power spectrum is obtained from the autocorrelation function. Note that P is not the distribution function of equation 3.149.

Apart from the obvious time advantage there is a storage advantage. For autocorrelation this involves a realization that two steps are required and, furthermore, because of the nature both of the data and of the transformation some economy can be made. In this case the original data—the profile, say—is real. This means that the discrete Fourier transform (DFT) will be conjugate even (the amplitude terms have even symmetry about the frequency origin while the phases have odd) and consequently only half the transform needs to be evaluated—the other half can be deduced and the storage allocated to these points can be used elsewhere. The original data points are completely replaced in the store by an equal number of transformed points in the FFT as described in Section 3.8.2. Furthermore, in undergoing the inverse discrete Fourier transform a further gain can be effected because, although the data (in this case the Fourier transform of the original profile) is conjugate even, only the cosine transformation needs to be used—the phase is lost in the autocorrelation function. Hence space which would normally be used to house the imaginary part of the complex numbers (resulting from the FFT) can be used for other purposes.

All these methods use the “time average” or “temporal” version of the autocorrelation function for evaluation rather than the ensemble average where it is understood in the case of surfaces that the term temporal really means spatial average because it is the surface geometry which is being used and not a voltage or other time function.

3.151 $A ( τ ) = E [ z ( x ) z ( x + τ ) ] .$

This choice is for purely practical reasons. It is easier to evaluate A(τ) from one record than to measure it from a number of records in parallel. It does assume ergodicity, however.

A similar function known as the structure function S(τ) can be obtained by measuring the expected variance between ordinates z 1 and z 2 separated by τ

3.152 $S ( τ ) = E [ ( z 1 - z 2 ) 2 ] = 2 σ 2 [ 1 - A ( τ ) ] .$

The structure function is a form of cumulative autocorrelation function. It has the advantage that any slope on the waveform is eliminated and the value z 1z 2 can be evaluated directly by means of a skid-stylus combination without the need to have two independent styluses or without having to store any data (providing that σ is known). If σ is not known, a digital filter can be used to preprocess the data in real time using very little storage. From the point of view of classification it is not necessary that σ be known because it is a constant unless fractal—the essential variation of the structure with the lag τ is the same as the autocorrelation function, that is

3.153 $d S ( τ ) d τ = - k d A ( τ ) d τ .$

#### 3.3.5  Power Spectral Density

The power spectral density can be evaluated from the data either via the autocorrelation function or the Fourier transform of the data. The former method is possible because of the Wiener–Khinchine relationship linking the two together. Thus in continuous form

3.154 $P ( ω ) = 2 ∫ 0 ∞ A ( τ ) cos ( ω π ) d τ .$

In practice the record from which the autocorrelation function has been measured is finite, of length L. Thus

3.155 $A ( τ ) = 1 L - τ ∫ 0 L - τ f ( x ) f ( x + τ ) d x .$

This limits the available length of autocorrelation that can be used to get a power spectral density. This truncation of the autocorrelation causes some problems in the frequency domain. These problems are twofold: one is that the statistical reliability of the information is limited, and the other is that the shape of the truncation causes spurious side effects in the spectrum. Truncation in the extent of the record is equivalent to multiplying the data waveform by a box function. This box function when transposed into frequency is highly resonant; it produces “ringing” around simple peaks in the spectrum, for instance frequencies corresponding to the feed of a tool. The transform of the box function is a sine function, which has a considerable lobbing up to 25% of the peak value. To reduce this in practice a weighting function can be applied to the autocorrelation function prior to transformation. A criterion for the shape of this is that ringing does not occur, or at least to only a few per cent.

The box function is an example of what is in effect a “lag window.” This lag window would normally be a box function, but it can be shaped in the correlation domain to have a minimum effect in the frequency domain. The frequency equivalent of the lag window is called a spectral window. The criterion for a well-behaved spectral window is that it should have a highly concentrated central lobe with side lobes as small and rapidly decaying as possible [24].

The most used lag window WL (τ) is that due to Hanning, which has the formula

3.156

where τmax is the maximum autocorrelation lag allowed by reliability considerations. An alternative is also due to Hamming

3.157

Some other examples are shown in Figure 3.28, which also illustrates the well-used Gaussian window. This is unique in that the spectral window is the same shape as the lag window.

Assuming that the Hanning window is chosen the digital equivalent formula for the power spectrum is given by

3.158

For the power spectrum the sample points are taken Δτ apart, usually equal to Δx, the spacing of the ordinates or a multiple of it.

Another way to measure the power spectrum is directly from the data using the FFT routine described earlier. First the periodogram |F(ω)2 is obtained by transforming the real data. This will yield N transform points corresponding to N real data points. To get the PSD from this it is necessary to apply the spectral window corresponding to the Hanning (or other) lag window. Now this is operated on the frequency data by means of convolution. Thus

Figure 3.28   Windows for the correlation function.

3.159 $P ( ω 0 ) = 0.2 5 ( F ω − 1 ) ) 2 + 0.5 ( F ( ω 0 ) ) 2 + 0.2 5 ( F ( ω + 1 ) ) 2$

for the Hanning window, where ω–1, ω0, and ω1 are adjacent frequencies in the array. For the special case where ω = 0:

3.160 $P ( 0 ) = 0.5 ( F ( 0 ) ) 2 + 0.5 ( F ( ( ω + 1 ) ) 2$

and for the Hamming window

3.161 $P ( ω 0 ) = 0.2 3 ( F ( ω − 1 ) ) 2 + 0.5 4 ( F ( ω 0 ) ) 2 + 0.2 3 ( F ( ω + 1 ) ) 2 .$

#### 3.4.1  General

Filtering methods are the natural way to isolate specific bands of information of the surface. Obviously, originally the most usual way to apply filtering techniques was to pass the signal through a passive or active electrical network so that the breakdown of the signal occurred in frequency. This is not the natural domain of the surface, which is spatial, so the recent introduction of digital methods has been an important step in surface analysis. Each digital measurement can now be referred more easily to a point on the surface. Fundamentally the great advantage of filters, as mentioned in the previous Chapter, is that they take the signal as it is and do not assume any particular waveform for the surface. They take the waveform “as received” and operate on it, unlike best-fit polynomial curves which can completely distort the remnant signal if the order of the polynomial is mismatched compared with the general shape of the surface.

#### 3.4.2  Convolution Filtering

The first step is to work out the impulse response of the filter and express it in digital form or sequence. Thus, let h(n) be the digital sequence representing the impulse response and z(n) be the profile signal. Then the output from the filter g(n) becomes

3.162 $g ( n ) = Σ m = 0 n h ( m ) z ( n - m )$

where h(m) and z(n–m) are zero outside the limits.

The relationship between the weighting function and the impulse response has been explained in Section 3.2.2.

### Table 3.5   Ordinate Spacing and Normalizing Factor

No of Ordinates per Cut-off Length

K

0.25 mm Cut-off

0.08 mm Cut-off

2.5 mm Cut-off

50

0.02

0.005

0.016

0.05

150

0.0067

0.001 67

0.003 33

0.0167

Figure 3.29   Application of weighting function. (a) Standard and (b) phase-corrected function (Gaussian filter).

The working formula is

3.163 $g = K ( a 0 b 0 + a 1 b 1 + ...+ a m b m ) .$

Usually a 1, a 2 are equally spaced ordinates, but not necessarily; K is a constant equal to 1/the number of samples per cut-off length. The b terms are the digitized values of the weighting function (reversed impulse response of the filter).

The required spacings, and the associated value of K, are shown in Table 3.5.

How the weighting function appears digitally with respect to the profile (shown with the stylus) is illustrated in Figure 3.29. Here the shape of the weighting function is that of the 2CR filter Figure 3.29a and b for the Gaussian case.

A pictorial demonstration of how the weighting function is constructed can be seen in Figure 3.30. The 2CR filter is shown because it is asymmetrical, i.e., causal, and shows clearly the fundamental difference between the weighting function and the impulse response, but once the weighting function has been determined the principle is exactly the same for all other weighting functions. Basically the weighting function for the standard type of filter is the time (space) inverted version of the impulse response, but in the case of phase-corrected filters, the weighting function is the same as the impulse response because it is an even function in x. In all roughness filters there is an impulse which corresponds to the profile and a time dependent portion from which the weighting function is obtained and which is used to obtain the mean line.

To get the mean line value multiply each weighting factor by the profile ordinate in line with it, add the products and divide by the normalizing factor, which is 1/K. This gives the height of the mean line above the same datum and in the same units used for the profile ordinates.

Each mean line ordinate that has been calculated in this way refers to that profile ordinate which is directly in line with the zero time value of the weighting function or its center of symmetry. The weighting factor corresponding to the maximum b 0 of the weighting function corresponds to the first term in a causal system: the b 0 value for example in the standard 2CR filter.

Figure 3.30   (a) Profile, (b) unit impulse response of 2CR filter, (c) inverted impulse response (without impulse), and (d) tabulated version of weighting function1.

Figure 3.31   Gaussian weighting function from convoluted box functions.

A comparison of Figure 3.29a and b immediately illustrates the difference in the shape of the two weighting functions and, more importantly, it shows the fact that the mean line position acts in a different place relative to the weighting function for a phase-corrected weighting function than it does for a weighting function which is not phase-corrected.

Subtraction of the mean line value from the profile ordinate to which it refers gives the filtered profile at that point. Taking the average of the differences (without taking account of sign) over a number of cut-off lengths, usually two, gives the Ra parameter.

#### 3.4.2.1  Repeated Convolutions

An interesting property of the Gaussian function can be used in one application in surface metrology, and this is the use of a Gaussian filter to exclude waviness. The property is that if a number of windows are convoluted together they will always eventually produce an effect which is Gaussian, irrespective of the shape of the window. A good example is the box function (Figure 3.33). Three convolutions of the box function will produce an equivalent weighting function window which is already very close to Gaussian (Figure 3.31).

Some instrument manufacturers advocate the Gaussian filter simply because it has got this property. It means that if a running-average procedure is repeated three times, the end result is as if the profile signal had been subjected to a low-pass Gaussian filter. The very simplicity of the technique makes it fast and inexpensive.

In this form the weighting function has a low-pass characteristic that is the waviness profile. To get the surface roughness value of the surface data corresponding to the mid-point of the weighting function should be selected and the waviness line, as found by this Gaussian weighting function taken from it (Figure 3.32).

The Gaussian shape has some advantages. Probably the most important is that it is recognizable in industry, being the basis for the acceptance curve used in statistical process control. This means that production engineers are comfortable working with it. Also the curve is always positive and it falls off rather faster than the 2CR transmission curve. Large components of waviness therefore do not impinge on the roughness.

Figure 3.32   High pass weighting function.

The weighting function is given by

3.164 $h ( x ) = 1 α λ c exp ( - π ( x λ c α ) 2 )$

and its transform giving the transmission curve of Equation 3.165 where x is measured from the axis of symmetry.

3.165

Another more practical reason for using Gaussian type filters is that they minimize the RMS duration of the product of h(x) and H(w) where w x = 2π/x.

Bodschwinna [26] proposed a modified form of the Gaussian filter incorporating a second order polynomial to reduce the influence of end effects caused by the finite duration of the weighting function. The combination of least-square polynomial as well as filtering is somewhat messy. However, he suggests a form h(x) given by

3.166 $h ( x ) = 1 λ c π C [ 1.5 - π 2 C λ c 2 x 2 ] exp ( - π 2 C λ 0 2 x 2 )$

which has a transmission

3.167 $H ( 1 x ) = [ 1 + C ( λ c λ ) ] exp ( - C ( λ c λ ) 2 ) .$

Krystek [60] uses a spline to get rid of form and waviness and he produces a complicated form for the filter. It is questionable whether the extra complication is needed.

Wavelets have also been proposed as a means of filtering. The ability to vary the resolution and range—usually in octave bands [25] makes them useful in fractal analysis [27]. The term “mathematical zoom lens” has been used for wave-let analysis. Apart from the standard filtering use of wave-lets, which has a questionable credibility it seems that defect detection using “raised wavelets” has some benefits [28].

#### 3.4.3  Box Functions

While on the subject of box functions, it should be recognized that, when convoluted with the profile signal, a box function is a simple running average, the extent of the average being the length of the box. This average is taken to be at the midpoint of the box (Figure 3.33).

The process of using the averaging procedure was advocated by Reason [29] as a means of producing a mean line of the surface texture. It was then called the “mid-point locus” line. Unfortunately it has a poor frequency response (Figure 3.34). That is,

3.168 $B ( x λ c ) ⇔ sin [ 2 π | ( λ c / x | ] 2 π | λ c / x | .$

However, the “averaging” procedure produced by a running box function is a convolution and, therefore, the whole process acts as a filter. Furthermore, it is in effect a phase-corrected filter providing that the output is taken to act at the center of the box. It therefore represents the first attempt at phase-corrected (or linear phase) working in surface metrology.

It is obvious that in order to get no distortion of the signal but just an attenuation at certain frequencies a phase-corrected characteristic should be used. Another way to get phase-corrected characteristics, which is simple, is to use a “double-pass” method. This is useful when only analog methods are available, but the technique has also been used digitally. To see how this technique works imagine a filter has an impulse response h(t) whose Fourier transform is H(ω) Then if the signal is put through this filter and the output reversed and put through the same filter again the final output is

3.169

Figure 3.33   Mid point locus mean line.

Figure 3.34   Mid-point locus frequency characteristic.

This is achieved because turning the output round is equivalent to conjugation, that is

3.170 $h ( T - t ) ⇔ exp ( - j ω T ) h * ( ω ) ,$

where T is the time taken to reverse the output signal and enter it into the filter again. So the first pass is H(ω). The second pass (in the reversed mode) is simply

3.171

This equation has the linear phase term exp(–jωT) in it. This is phase corrected about the same time T.

Physically this means that to enable the signal (suitably filtered) to take on phase-corrected characteristics it has to be delayed by time T. It is impossible to get phase-corrected properties without the delay because an impulse response cannot be an even function about t = 0 and all realizable systems are causal (i.e., do not exist before t = 0). This argument is the same in spatial terms.

To use this technique it is essential that the square root of the intended final transmission can be taken, that is H(ω) must exist. This means that any characteristic, if capable of being expressed as H(ω)2, can be made into a phase-corrected filter by using this reversal technique. Once this is done the calculation is very fast and the storage small.

#### 3.4.4  Effect of Truncation

The finite length of the impulse response may call for comment. Many impulse responses have infinite extent (IIR) (infinite impulse response). Practically, in communication theory they are difficult to use because of the need to store masses of data points. What happens is that the impulse response is curtailed as in the case when only 100 ordinates are used. This results in truncation errors. These can be quantified easily enough because, instead of the frequency output being G(ω) = H(ω)Z(ω) it becomes

3.172 $G ′ ( ω ) = H ( ω ) * B ( ω ) Z ( ω )$

where B(ω) is the frequency spectrum of the truncating function, usually a box function. This shows the other operation performed by the box function, a multiplication rather than a convolution in averaging.

The effect of truncation is shown in Figure 3.35.

The amount of truncation allowable is usually decided by the percentage error produced on one of the averaging parameters, such as the Ra value. It has been found that making the weighting function between two and three cut-offs long suffices for an error of less than 5%. Even this can be greatly reduced by increasing (or decreasing) the weighting factor ordinates so that, if the truncation length is a, then

3.173 $∫ - a / 2 a / 2 h ( x ) d x = 1$

Figure 3.35   Truncated function (left) and normalized Fourier (right) for: (a) no truncation, (b) little truncation (c) big truncation truncating function.

rather than the ideal

3.174 $∫ − ∞ ∞ h ( x ) d x = 1 .$

If this crude compensation is used then at least the average frequencies are correct.

Later in this chapter recursive filters will be discussed. Here it will be found that they have less of a truncation problem than the convolution methods because the equivalent weighting function builds up as the number of evaluated points increases.

Errors in parameters resulting from truncated weighting functions are easy to evaluate for Rq and, hence, assuming a correction factor, for Ra . But for peak parameters the problem is much more complicated.

However, it can be argued that communication criteria should not be dominant in surface metrology. Two questions have to be asked. The first is whether an IIR means anything functionally when two surfaces make contact. The second is whether the requirements for filtering should be different for the functional applications of surfaces and for the control of manufacture.

Referring to the first question, it could well be that the effective impulse response could use, as a weighting function, the pressure ellipse of the Hertzian contact zone [30] (Figure 3.36). The term functional filtering has been used [31, 32]. The advantage of such an approach is that it more nearly represents what goes on in a functional situation. The disadvantage is that the instrument maker cannot provide every possibility: the obvious answer is to let the customer choose the filter nearest to his/her requirement.

Figure 3.36   Use of weighting function in functional simulation.

The advantage of the convolution method, using a weighting function representing a functional condition has many advantages. One is that it is versatile. The filter characteristics can be chosen to fit the function, within reason; the function can effectively be shaped to fit. The disadvantage is that the arithmetic operation is expensive on storage and takes a long time to perform. Simple tricks like the equal-weight method outlined below can be used. There are, however, other methods which preserve the equal-spacing criteria of ordinates.

#### 3.4.5.1  Overlap Methods

The first technique is called the overlap-add method. If the profile is long (N 1) and the weighting function h(n) short (N 2), the profile is split up into samples (of N 3 ordinates). Convenient values of N 2 would correspond to the sampling length

3.175

The convolution becomes

3.176

The duration of each of the convolutions of Equation 3.176 is (N 3 + N 2–1) samples—so there is a region of (N 2–1) samples over which the ith convolution overlaps the (k + 1) convolution and the outputs from each therefore overlap and so have to be added; hence the name overlap-add when referred to the outputs.

The second method to be used, which achieves an output reasonably quickly, is called the overlap-save method. This differs from the previous one in that it involves overlapping input sections rather than output sections and is used most often in the filtering of periodic signals, as in the case of roundness. This technique makes use of circular convolution.

In this method the input is sectioned into overlapping sections of length L and overlap M–1. Then each of the input sections is convoluted with the filter and the resultant outputs are added together, but the first M–1 outputs from each section are discarded because these will have been the last M–1 values of the previous section. These methods are valuable in metrology when using conventional computers because they do allow some output to be obtained very quickly rather than to have to wait until all the input data is stored. This time factor is not essential for post-process inspection but can be serious when in-process measurement is envisaged.

Different methods can also be used to speed up the processing time, one of which follows.

The evaluation of filtered results in the frequency domain can obviously be done by taking the data, doing a FFT on it, and then multiplying the spectrum by the desired filter characteristic. To get the filtered profile all that is needed is to carry out an inverse FFT. This is much quicker because all convolution operations are replaced by multiplications. However, if functional filtering is the objective, for example in contact situations, the spatial convolution method is more suitable because the weighting function can be shaped (as, for example, to the shape of a pressure distribution) to run across the surface (Figure 3.36).

#### 3.4.5.2  Equal-Weight Methods

In the general convolution form for filtering shown earlier, the output at time t is given by g(t) where

3.177

The constant here is a constant of proportionality dependent on the ordinate spacing h. In this equation the bi represent weighting function ordinates derived from the impulse response of the filter. Letting the constant be h’ (equal to the reciprocal of the density of ordinates in a sampling length), the equation can be written in a different form which makes better use of the data. The equation shows that each ordinate ai is multiplied by a weighting function: bi × h’ represents a strength or area of weighting function to be associated with ordinate ai . Each ordinate of the profile coming within the weighting function has a multiplier associated with it of a different value (i.e., different area in the numerical integration). It is possible to simplify the calculation by arranging that some profile ordinates have equal-value multipliers instead. The filtering is achieved by the selection of the position of ordinates. Thus, in, only those ordinates selected would be operated on. The areas are all of equal value equal to V, say. Then the output of the filter becomes

3.178 $V × Σ i = 0 k a i .$

This method has a number of advantages. First k is usually much smaller than N, perhaps 15 instead of 200. Second the a values of the profile are simply added and not multiplied, therefore it is much less prone to freak values than the equal-spacing weighting function. The only disadvantage is that the ai values are not equally spaced: their location is inversely proportional to the height of the weighting function. Their location addresses have to be stored in much the same way that the weighting function ordinates in the conventional form have to be stored. However, the look-up process is much faster than the multiplication process and gains in speed of an order of magnitude are possible. This numerical procedure for equal weight can also be applied to a graphical method—which needs to be possible so as to enable calibration routines to be established for people without computers. So, instead of multiplying equally spaced ordinates each by a different weighting factor, a useful approximation can be arrived at by multiplying unequally spaced ordinates by a constant factor, which reduces to adding up the graph ordinates at these spacings and multiplying by the factor (or dividing by its reciprocal if that is more convenient). The use of 26 ordinates was found to give reasonable accuracy and a convenient divisor of 20. A 26-ordinate template designed for 0.8 mm cut-off and a horizontal magnification of 100 has been used with surprisingly accurate results ~5% for the Ra value.

#### 3.4.6  Recursive Filters

The convolution method requires a lot of storage and computation. The output depends on previous and present inputs. Since a filter effectively contains a memory of past inputs, it might be expected that considerable savings in total effort could be obtained by calculating the past and present inputs and past outputs.

This technique is analogous to feedback in a linear control system and for this reason is called recursive. With conventional filtering methods such as that involved in the 2CR filter, the savings in time and storage over the convolution method are substantial. For instance, the example given earlier, which involved 100 weighting factors requires 100 multiplications and additions for each output point. The recursive approach reduces this to the order of only four each.

To appreciate this method (which is similar in every way to the autoregressive moring average equation (ARMA) models used for surface characterization examined in Chapter 2) an example will be given.

#### 3.4.6.1  The Discrete Transfer Function

A convenient notation for the discussion of sampled systems is the z transform in which the term z –1 is the unit-delay operator. Then a sequence of sampled values can be represented as the coefficients of a power series in z –1. For instance, the infinite sequence X = 1, B, B 2, B 3, can be written as

3.179

The unit-delay operator is related to the z transform operator as

3.180 $z - k ⇒ δ ( n - k )$

There are several ways of converting the continuous filter into a digital one. The standard method is that known as the impulse-invariant technique in which the discrete impulse response is identical to the sampled impulse response of the continuous filter. In this the continuous function is expanded into partial fractions and then transformed by

3.181 $1 p + a → 1 1 - exp ( - a T ) z - 1 .$

where T is the sampling interval so that although digital filters can be directly realized, it is more usual to approach the problem from a linear analysis. The Laplace operator is p.

Tables exist for transformations of this type. The difficulty of this approach is that the gain of the digital filter is proportional to the sampling frequency, which may cause problems if this frequency is not always the same value. Also, it may be noted that the continuous transfer function may not easily resolve into a form suitable for transformation.

A way of avoiding the need to expand the transfer function is to map the p plane onto the z plane.

A suitable way of doing this is to use the bilinear substitution:

3.182 $p → 2 T ( 1 - z - 1 1 + z - 1 ) .$

One of the main difficulties of this method may be shown by considering the unit circle in the z plane. Now

3.183 $z = exp ( - j ω d T ) p = z − 1 z + 1$

becomes

3.184

so that, as required, the imaginary axis of the p plane is mapped onto the unit circle of the z plane. However, it also follows that

3.185 $ω a = tan ( ω d T 2 ) .$

Thus the method causes a non-linear warping of the frequency scale. It would, therefore, generally be necessary to precompensate the continuous function.

A third method uses an approximation of the convolution integral on a continuous transfer function that has been either factorized or resolved into partial fractions. This method, of which the impulse-variant approach is a special case, has a gain that is independent of sampling rate and is more readily applied to a high-pass filter than the impulse-invariant transformation.

As computational requirements suggested the use of a cascade approach and the 2CR filter causes no factoring problems, the third method of transformation can be used and this will be discussed in more detail.

For convenience in the following analysis, the single-stage filter will be considered throughout. The two-stage filter is derived directly from it by squaring the transfer function. The basic form of filter section is the low pass and such sections in cascade have their transfer functions multiplied. So let the general low-pass filter have the form

3.186

Considering a single stage

3.187 $H ( p ) = Z ( p ) X ( p ) = 1 p + a$

and expressing its time domain response by the convolution integral, assuming the filter to be initially relaxed gives, for the output z(t) and input x(t),

3.188 $z ( t ) = ∫ 0 t exp [ - a ( t - τ ) x ( τ ) d τ ]$
$z ( t ) = exp ( − a t ) ∫ 0 t exp ( a τ ) x ( τ ) d τ .$

In a digital system, time is necessarily discrete (it is no longer “real time”) and Equation 3.188 may be set to successive values nT and nT-τ where n is an integer:

3.189 $z ( n T ) = exp ( - a n T ) ∫ 0 n T exp ( a τ ) x ( τ ) d τ$
3.190 $z ( n T − T ) = exp [ - a ( n T − T ) ] ∫ 0 n T − T exp ( a τ ) x ( τ ) d τ .$

Multiplying Equation 3.190 by exp(–aT) and subtracting from Equation 3.189 gives

3.191

This integral requires that an analytic expression for x(τ) exists within the interval (nT–T) to nT. An approximate solution can be obtained by applying the constraint, which is in any case normally imposed by the A/D conversion, that x(τ) is constant in the interval nT–T ≤ τ < nT. Then Equation 3.191 becomes

3.192 $z ( n T ) = exp ( - a T ) z ( n T - T ) + exp ( - a n T ) x ( n T - T ) exp ( a n T ) a [ 1 - exp ( - a T ) ] z ( n T ) = exp ( - a T ) z ( n T - T ) + [ 1 - exp ( a T ) ] a x ( n T - T ) .$

Since time (nT–T) is one unit delay from time nT the discrete transfer function can be found directly to be

3.193 $H ( z - 1 ) = Z ( z - 1 ) X ( z - 1 ) = 1 a [ 1 - exp ( - a T ) ] z - 1 ( 1 - exp ( - a T ) z - 1 ) .$

Having established the transfer function for the low-pass section, the other elementary sections can be derived. For instance, the bandpass can be expressed as the difference of two low-pass sections. Of particular interest is the high-pass section, which has a transfer function

3.194 $H ( p ) = p p + a = 1 - a p + a .$

Thus the high-pass is realized as a direct link of the profile in parallel with and opposite to a low-pass section. However, in the discrete system, if this approach were used, the fact that time is not continuous means that the signal applied to the low-pass section could not affect the output until a finite time later and so would not combine properly with the direct-link signal. The filter response would thus deteriorate, so it is necessary, for best performance, to introduce a compensating delay in the direct-link path. To find the required delay, it is necessary to estimate the phase angle of the discrete low-pass section. Using Equations 3.192 and 3.194 and grouping the constants gives

3.195

Substituting p = j and applying de Moivre’s theorem gives

3.196

so that the phase angle is

3.197 $φ = - ω T - β - tan - 1 ( B sin ω T 1 - B cos ω T )$

where β is a constant.

For B close to unity, which is equivalent to the product |aT| being much less than unity, Equation 3.197 simplifies to

3.198 $φ = - ω T - β - tan - 1 ( cot ω T 2 ) .$

From this, the group delay of the section is

3.199 $- d φ d ω = T 2$

so that a delay of half a sampling period should be introduced into the direct path. The high-pass section now has the form

3.200 $h h p ( z - 1 ) = z 1 / 2 - A z - 1 1 - B z - 1$

which is not a realizable linear digital filter since its transfer function is not rational in z –1. A further transformation is therefore required and a suitable one is z –1z –2 which in the continuous filter would be p→2p. Thus, to retain the same frequency response under this transformation, the cut-off frequency must be doubled and the fully realized high-pass section is

3.201 $H h p ( z - 1 ) = z - 1 - [ 1 - exp ( - 2 a T ) ] z - 2 1 - exp ( - 2 a T ) z - 2 .$

Thus the required two-stage high-pass filter has the transfer function [30]

3.202 $H ( z - 1 ) = | z - 1 - [ 1 - exp ( - 2 a T ) ] z - 2 1 - exp ( - 2 a T ) z - 2 | 2 .$

The discrete transfer function of Equation 3.201 has a pole near the Nyquist frequency (1/2T Hz) which is caused by the frequency-doubling procedure of the foregoing analysis and so will not be usable near this frequency.

It is the correction for group delay used here that gives the improved high-pass characteristics of the convolution integral approach over that of the impulse-invariant method. If the analysis embodied in Equations 3.196 through 3.200 were carried out on the impulse-invariant-derived transfer function, it would be found that the group delay was T/2, that is, a half time-period advance would be needed as compensation. Although this could be arranged when processing previously stored data, it is not a physically realizable system since it requires information about the future. It may be noted that this problem is similar to that encountered when a phase-corrected filter is being realized, because a true linear phase method requires both linear and group delay correction.

The single-stage low-pass section of Equation 3.193 can be rearranged to give

3.203 $z = A X z - 1 + B Z z - 1$

where X and Z are the input and output sequences, respectively.

The high-pass section described by Equation 3.201 can be implemented as it stands or multiplied out to give

3.204 $H h p ( z - 1 ) = z - 1 - A z - 2 - B z - 3 1 - B z - 2 .$

It will be noted that the parallel representation, Equation 3.193, requires less arithmetic than the canonical form. In general, therefore, this approach should be both faster and more accurate.

#### 3.4.6.2  An Example

A particular application of the recursive 2CR filter is in the fast assessment of profiles. With the limited memory size available at the time, the saving of the need to store a weighting function is significant and, for on-line work, the filter should preferably work in “real time.” In this sense, “real time” means that the time taken to process one sample in the filter is less than the smallest sampling period used. For the reasons discussed in the previous sections, the chosen method of implementation could be a cascade implementation of the “convolution-approximation” section.

The requirements of other programs with which the filter has to be used demands that, initially, there should be 400 samples per cut-off. (This requirement is to ensure that a reasonably well-defined 100:1 bandwidth for average wavelength measurements could be produced.) However, at small cut-offs (sampling lengths) this represents samples considerably less than one stylus width apart and requires high sampling rates. (It is likely that, in general, less severe restrictions will be imposed.) As an example, for the two-stage filter to demonstrate 50% transmission at the cut-off it can be shown that for each stage

3.205 $a = 2 π λ λ c$

or, if there are N ordinates per cut-off, a = 2π/NT.

Thus for an N of 400, the exp(–2aT) will be very close to unity and, consequently, 1–exp(–2aT) will be near zero. The multipliers that are used in the filter are fractional and so cannot be handled by the integer arithmetic. There are methods of pre- and post-scaling coefficients and coordinates which allow fractional quantities to be handled but in this case the near zero value is likely to introduce considerable errors. In the interests of accuracy it may be better to use floating point arithmetic although, since this depends on computer software, it is slower in operation.

Recursive methods can only be used in cases where the discrete transfer function can be written in terms of a rational ratio of polynomials such as

3.206 $H ( z - 1 ) = a 0 + a 1 z - 1 + a 2 z - 2 ... 1 + b 1 z - 1 + b 2 z - 2 ... .$

The question arises of whether this can be suitable for linear phase (phase-corrected) filters. Another point is whether the poles and zeros of Equation 3.206 are realizable. In practice the latter is decided by the transmission characteristics of the filter and whether or not it suffers discontinuities. From the earlier discussion on optimized filters it is clear that the 3:1 and 2:1 filters are not well behaved in so far as they have discontinuous slope in the transform and so cannot be directly implemented in a recursive mode. However, the fact that these filters cannot be implemented does not mean that linear phase filters as a whole cannot. One simple way of implementing filters of the kind shown above relies on the ability to describe the transfer function in a square root mode.

Remember that, if the impulse response is h(t) whose transform is H(ω) then reversing the output can be shown to be equivalent to conjugation, with an additional linear phase term; that is, if h(t)⇔H(ω) then

3.207 $h ( T - t ) ⇔ exp ( - j ω T ) H * ( ω )$

where H*(ω) is the complex conjugate of H(ω). Hence, passing the signal through a filter, storing the output, reversing it and passing it back through the filter yields

3.208 $H ( ω ) H * ( ω ) exp ( j ω t ) = | H ( ω ) | 2 exp ( - j ω t ) .$

If a recursive method is possible then this method is very fast.

#### 3.4.7.1  Areal Case

The FFT routine can be used very effectively in surface metrology because it is very easy to implement any desired filter characteristic. Because the input data (i.e., the surface) is real, the storage is reduced by a factor of 2 relative to conventional complex data. Once transformed the desired amplitude characteristic is simply a multiplication of the transform data by the required characteristic and then this is followed by an inverse FFT to reconstitute the filtered signals. Phase effects are invariant if the linear phase filter is being simulated.

As shown in the earlier section on FFTs there is a very big speed advantage of filtering in this way. However, there is the problem that all the data has to be stored before filtering can take place. This is not the case in the direct convolution method, so depending upon the storage capability either one or the other can be used. It is invariably a compromise between storage, speed, and accuracy. From the latter point of view the effective weighting function length is, as in the recursive method, equal to the length of the data.

Whatever is true for profile filtering is even more true of 2D filtering. One method due to Rabiner and Gold [8] allows a 2D (i.e., areal) filtering to take place in terms of profile filtering methods. If x, y are integer numbers in the xy plane and ν, ω in the frequency domain, then the areal transform of z(x,y) is Z(ν,ω) where

3.209

or

3.210 $Z ( v , ω ) = Σ x = 0 N 1 - 1 exp [ - j ( 2 π x ω N 1 ) ] { Σ y = 0 N 2 - 1 z ( x , y ) exp [ - j ( 2 π y v N 2 ) ] } .$

The term in braces in Equation 3.210 is a series of N 1 1D DFTs obtained by varying x from 0 to N 1–1 as shown in Figure 3.37.

So the transforms for each profile can be taken with x fixed for each profile, yielding N 1 transforms. These give the transform for a given ω given Z(ν,ω). Once Z(ν,ω) is mapped it can be multiplied (because it is in the frequency domain) by whatever 2D filtering characteristic is required, so fast convolution using the FFT is probably the most important means for realizing 2D filters. Note, however, that every one of the profile graphs shown in the figure has to have mechanical fidelity relative to each other. It is no good measuring each profile using a skid instrument because this will lose the reference level for each profile and the 2D spectrum will have little macro-geometry spectrum detail.

The filtered characteristic would become

3.211 $Z f ( v , ω ) = Z ( v , ω ) W ( v , ω )$

where W is the amplitude characteristic of the required filter in two dimensions. From this the filtered surface zf (x,y) emerges as

Figure 3.37   Areal FFT.

3.212 $z f ( x , y ) = Σ v=0 N 1 - 1 Σ ω =0 N 2 - 1 Z ( v , ω ) W ( v , ω ) exp [ j ( 2 π x v N 1 + 2 π y ω N 2 ) ] .$

This is much more efficient than the direct convolution. A typical value of the number of points in a surface map is 250,000. To give some idea of the advantage in time of this method, although the ratio of data points (assuming 250 per profile) is 250:1 the ratio of the FFT method to the direct method is about 50:1 for a profile and about 14,000:1 for the surface area, so the gain in time is tremendous. The problem for surface metrology is visualization; a true contact acting over an area is easy to imagine and is a functional effect. Unfortunately, because its effect is non-linear it cannot readily be transformed into the frequency domain. It can therefore still be advantageous to develop convolution-type operations in the spatial domain in order to simulate contact and other functional situations.

#### 3.5  Examples of Numerical Problems in Straightness and Flatness

The best-fit line has a slope m given by

3.213 $m = Σ X Σ Z - N Σ x z ( Σ x ) 2 - N Σ x 2$

and m is relatively small.

If the spacings of measurement in the x direction are taken to be of equal increments then the slope becomes

3.214 $m = 12 ∑ N i = 1 i z i - 6 ( n + 1 ) ∑ N i = 1 z i N ( N 2 - 1 ) = Σ i = 1 N z i [ 1 2 i N ( N 2 - 1 ) - 6 ( N + 1 ) N ( N 2 - 1 ) ]$

or

3.215 $m = ∑ i = 1 N z i ( k 1 i - k 2 )$

where

3.216 $k 1 = 1 2 N ( N 2 - 1 ) and k 2 = 6 N ( N - 1 ) .$

So, from the computational point of view, each z can be multiplied by a weighting factor which, for a given fixed N, can always be the same.

Similarly for flatness, the slopes become

3.217 $M 1 = 1 2 ∑ i = - N / 2 i = N / 2 ( i ∑ j = - M / 2 j = M / 2 z i j ) M N ( N + 1 ) ( N + 2 )$
3.218 $M 2 = 1 2 ∑ i = - M / 2 i = M / 2 ( i ∑ j = - N / 2 j = N / 2 z i j ) M N ( M + 1 ) ( M + 2 )$

assuming that the measurements are being taken from the center of the coordinate system.

Typical of the dilemmas that can arise is the play-off between the quantization and the sampling of a roughness waveform to determine the best-fit least-squares line. Usually one requires a small quantization interval and high sampling to get good results. But if one of these is not available it may not be advantageous to hold to the other; a mutual relaxation may sometimes be necessary. In this example, the least-squares mean line slope m as above is given by

3.219 $m = 12 ∑ N i = 1 i z i - 6 ( N + 1 ) ∑ N i = 1 z i N ( N 2 - 1 )$

where the z values are profile ordinates taken at N equal unit intervals along the surface. The essential numerical point to notice is that the numerator is made up of the difference between two summations. Two quantization factors influence the value of m that is obtained; first, the resolution of the digital measurement of the analog signal, and second the resolution in the computer, that is the word length. The best way to show these effects is by a simple example.

Suppose that five measurements are taken on a surface using a very high-resolution A/D converter and let these be processed in a computer of a long word length. Let the numbers as measured be 10.000, 10.000, 10.000, 10.000, 10.1999. Substituting these into the equation for m yields

$m = 1 8 11.9 4 - 1 8 07.1 6 4 1 2 0 = 0.0 3 98 = 2.2 7 9 0 o .$

If now the same measurements had been taken with an A/D convertor capable of seeing only three decimal digits, the numbers would be 10.0, 10.0, 10.0, 10.0, 10.2 giving

$m = 1806 - 1803.6 1 2 0 = 0.0200 = 1.145 8 o .$

Again if a high-resolution A/D convertor was used and the computer could only work to four decimal digits (13 bits), then

$m = 1811 - 1807 1 2 0 = 0.0333 = 1.90 9 o .$

Finally if the A/D convertor measures three digits and the computer four, then

$m = 1806 - 1803 1 2 0 = 0.025 = 1.43 2 o .$

Four different answers have been obtained (2.279°, 1.1458°, 1.909°, and 1.432°) using what were intended to be the same data from the surface profile—nearly 100% variation!

Part of the discrepancy is due to the small variations between the numbers being suppressed by the limited resolution of the A/D convertor and part due to the small difference between the two summations in the numerator being suppressed or modified by the limited word length. Both have entered into it. Yet another factor has to be taken into account and this is the sampling rate. If there were three times as many numbers, each of the originals counted three times, what effect does this have? If the A/D convertor has three digits and the computer four, the answer should be 0.025 = 1.432° in the five-ordinate case, but it is 0.0089 = 0.511° for the 15 ordinates 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.1, 10.1, 10.1, which contain exactly the same information. The reason for this further discrepancy is that merely adding more numbers into each of the summations of the numerator eventually leads to the least significant numbers (which contain the vital information) being rounded off or truncated. This is due to the limited word length. Again, as before, too much sampling aggravates the problems associated with quantization.

What is to be done? A number of possibilities are open. The first is only to use independent samples; do not use redundant information by sampling too fast. Next make sure that the real variations are preserved in the computer. This can be achieved in the example above by removing an estimate of the mean value of the data from each measurement before working out the numerator. Under these circumstances, the data becomes 0, 0, 0, 0, 0.1 and the danger of the summation overflowing the word length limitation is considerably reduced. In addition to these measures, it is always possible to use double word length arithmetic if the computer allows it, and it is also possible to increase the resolution of the A/D convertor at the expense of extra work and slower data capture rates. In general it is more advantageous to anyone working on the numerical analysis problem than it is on validating the data itself. The latter is of limited use without the former.

#### 3.6.1.1  Least-Squares Evaluation of Geometric Elements

In assessing the departure from straightness, roundness, flatness, sphericity, cylindricity, conality, and other geometric forms the method of least-squares occupies a dominant place, being the preferred method often included in international and national standards. Although this would appear to be straightforward, it is not so because of the many algorithms that could be used to achieve the assessment. Depending on which is used the result could be obtained quickly and accurately or, sometimes, not at all. Also, another problem that arises is very often the parameter is not linear, in which case, prior to applying least-squares some linearization has to take place, from which an approximate solution is obtained and used in the next iteration.

In the text so far the least-squares method has been used in a number of situations. There is, however, one general approach that might prove to be useful. This is based on the technique developed at NPL by Forbes [33] for use in coordinate-measuring problems. The technique is valid for surface metrology problems providing linearization is taken into account. The algorithms employ a common approach which has stable parameterization of the elements. What has to be watched carefully is the nature of the data obtained from the measuring instrument. Surface metrology instruments are different from dimensional measuring machines. In measuring an arc, for example, both can produce different forms of data; one can usually easily be linearized, the other cannot! Very unstable results can be produced if one sort of algorithm is applied to the wrong sort of data.

The actual shapes of elements are those already considered in Chapter 2 and which are specified in BS 7172 [34].

The rationale is as follows. It concerns parameterization, which in turn concerns the way in which the problem of solving for the best-fit solution is posed.

As an example, the best-fit plane will be discussed. This plane or any other can be specified by a point in it, say (x 0, y 0, z 0), and the direction cosines of the normal perpendicularly such that a 2 + b 2 + c 2 = 1. Thus, any plane can be defined by six parameters which are not all independent of each other.

If a plane exists as above and it is required to see how a set of points fit to it the intuitive thing to do is to measure the distance of each point from it. For a point x , y , z the distance d is given by

3.220 $d = a ( x - x 0 ) + b ( y - y 0 ) + c ( z - z 0 ) .$

The ith point has a distance di :

3.221 $d i = a ( x i - x 0 ) + b ( y i - y 0 ) + c ( z i - z 0 )$

and the one way to estimate the goodness of fit is to look at S = ∑d 2 i. This sum S depends on the parameters of the plane x 0, y 0, z 0 and a, b, c. These parameters have to be chosen to minimize S.

The steps for a least-squares fit are therefore as follows.

1. Choose parameters to describe the position, shape and sometimes also the orientation and size of the geometrical part.
2. Derive a formula for the distance of a point from this geometric object.
3. Express the sum of squares S in terms of a set of data points and their distances from the geometric object.
4. Develop a stable algorithm for determining the parameters such as a, b, c, x 0, y 0, z 0.

A precursor to evaluating the best-fit least-squares to improve the numerical accuracy is to find the centroid of the data points x, y, z and to remove these from the general data points so that the new xi, yi, zi are equal to the original xi , yi , zi – –x, –y, –z.

The case for the best-fit line and best-fit plane reduce to the simple case of solving a matrix equation that is finding the eigenvectors of matrices.

#### 3.6.1.2  Optimization

In general the function $S ( u ) = Σ i M ( u )$

has to be minimized with m data points and, say, n parameters to be minimized with respect to u where

3.222 $u = ( u 1 , u 2 , ... , u n ) T$

where T indicates the transpose and u, the vector form of u.

#### 3.6.1.3  Linear Least Squares

Here di is a linear function of the parameters u and there exist constraints aij and bi such that

3.223 $d i = a i 1 u 1 + ...+ a i j u j + ... a i n u n - b i .$

This is a set of linear equations which can be put in the matrix form Au = b:

3.224 ${ a 1 1 a 1 2 a i n a 2 1 a 2 2 a 2 n ⋮ . ⋮ a m l ... a m n l } { u 1 u 2 ⋮ u m } = { b 1 b 2 ⋮ b n } .$

In general m > n so that all the equations will not be satisfied simultaneously. Reliable and efficient algorithms do exist to solve Equation 3.285 in the least-squares sense:

3.225 $A T A u = A T b .$

These are the “normal equations.”

In Chapter 2 it has already been shown that, in surface metrology, this approach is really sufficient, because in the case of a circle and sphere, for example, the limaçon approach basically changes the equation problem from being that of a quadratic to that of being linear. It is this feature which differentiates the coordination measurement from that of the surface metrology approach. Rarely is it impossible to linearize the system from the point of view of surface metrology. This is because the “surface skin,” of whatever shape, is in practice so very much smaller than the dimension or position of the geometric object. Surface metrology instruments are built to see only the former; coordinate-measuring machines have to cater for the latter.

However, because the two methods are converging, as the scale of size reduces as discussed in Chapter 8, it is informative to take the coordinate-measuring approach because ultimately surface metrology will be incorporated.

#### 3.6.1.4  Eigenvectors and Singular Value Decomposition

For linear systems a good algorithm follows.

Given a square matrix B, an eigenvector u of B is such that

3.226 $B u = λ u$

for some eigenvalue λ The case in question is such that

3.227 $B = A T A ,$

as in Equation 3.226, for some m × n rectangular matrix A where m > n. In this situation a stable numerical solution is obtained by finding a “singular value decomposition” (SVD) of the matrix A. In this A can be written as a product

3.228 $A = U S V T$

with U and V orthogonal matrices and S a diagonal matrix containing the singular values of A. If i is as in Equation 3.227 the squares of the diagonal elements of S are the eigenvalues of B and the columns of V are the corresponding eigenvectors. These are usually produced as standard output from most software implementations. SVD is now standard for many solutions. See, for example Ref. [44].

#### 3.6.2.1  Planes

The steps are as follows.

1. Specify point x 0, y 0, z 0 on a plane
2. Evaluate direction cosines (a, b, c) of a normal to a plane; note that any point on a plane satisfies
$a ( x - x 0 ) + b ( y - y 0 ) + c ( z - z 0 ) = 0$
3. Evaluate distance from plane
$d i = a ( x i - x 0 ) + b ( y i - y 0 ) + c ( z i - z 0 )$
4. Describe the algorithm

The best-fit plane P passes through the centroid $x ¯ , y ¯ , z ¯$

and this specifies a point in the plane P. It is required to find the direction cosines of P. For this (a, b, c) is the eigenvector associated with the smallest eigenvalue of

3.229 $B = A T A$

where A is the m × 3 matrix whose ith row is $( x i - x ¯ , y i - y ¯ , z i - z ¯ )$

; alternatively (a, b, c) is the singular vector associated with the smallest singular value of A. Thus, an algorithm to find the best-fit line in 3D is:

1. Calculate the centroid $x ¯ , y ¯ , z ¯$
2. Form matrix A from the data points and $x ¯ , y ¯ , z ¯$
3. Find the SVD (singular value decomposition) of A and choose the singular vector (a, b, c) corresponding to the smallest singular value. The best-fit plane is therefore $x ¯ , y ¯ , z ¯$ , a, b, c.

Similarly for the best-fit line to data in two dimensions.

#### 3.6.2.2  Circles and Spheres

These shapes involving an axis of revolution are usually evaluated by linearization of the basic equations mechanically as stated by the process of radius suppression by mechanically shifting the instrument reference to a position near to the surface skin of the geometric element being measured. Failing this an iterative method has to be used. The Gauss–Newton iterative method can be used when the relationship between the distances di and the parameters uj is non-linear. Hence an iterative scheme has to be used. This is similar to the Deming method given in Chapter 2. The situation is shown in Figure 3.38.

One iteration of the Newton algorithm for computing the zero of a function is as follows.

Figure 3.38   Gauss–Newton method.

Suppose that there is a first estimate u0 of where the function u crosses the u axis. Then:

1. Evaluate f(u0 )
2. Form a tangent to the graph at (u0 , f(u0 )) as shown in Figure 3.38
3. Find u1 where the tangent crosses the u axis

Then

3.230 $u 1 = u 0 + f ( u 0 ) f ' ( u 0 ) = u 0 + p .$

u 1 is now the new estimate of where f(u) crosses the u axis. This is repeated until the result is close enough to u*.

Basically the Gauss–Newton method is as follows.

Suppose there is a first estimate u of u*. Then solve the linear least-squares system

3.231 $J p = - d$

where J is the m × n Jacobean matrix whose ith row is the gradient of di with respect to u, that is

3.232 $J i j = ∂ d i ∂ u j .$

This is evaluated at u and the ith component of d is di (u). Finally, the estimate of the solution is

These steps are repeated until u is close enough to u*. Ideally, changes in the iteration should be small for this method to be quick in convergence and stable.

For example, for the best-fit circle:

1. Specify circle center x 0, y 0, radius r. Note that (x–x 0)2 + (y–y 0)2 = r 2
2. Obtain distance from the circle point:
3.233 $d i = r i - r r i = [ ( x i - x 0 ) 2 + ( y i - y 0 ) 2 ] 1 / 2$
3. The elements of the Jacobean J are
3.234 $∂ d i ∂ x 0 = - ( x i - x 0 ) / r i ∂ d i ∂ y 0 = - ( y i - y 0 ) / r i ∂ d i ∂ r = - 1 .$
4. Algorithm: knowing x 0, y 0, and r for the circle center and radius estimates, use them in a Gauss–Newton iteration. Form J p = d from the d of Equation 3.233 and the J of Equation 3.234
5. Solve
3.235 $J ( p x 0 p Y o p r ) = - d$
for p
6. Update the x 0, y 0, r according to
3.236 $x 0 ≔ x 0 + p x 0 y 0 ≔ y 0 + p y 0 r ≔ r + r r .$

Carry on until successful and the algorithm has converged.

The linear best-fit circle can be evaluated by an approximation described earlier used by Scott [61] (Chapter 2, Ref. [172]). In this model

3.237 $S = ∑ f i 2$

is minimized, where $f i = r i 2 - r 2$

rather than ri–r as in the linear case—the trick is to make the $f i 2$ linear.

By changing the parameters f can be made into a linear function of x 0, y 0, and ρ = x 2 0 + y 2 0 r 2,

3.238 $f i = ( x i - x 0 ) 2 + ( y 0 - y i ) 2 - r 2 = - 2 x i x 0 - 2 y i y 0 - ( x 0 2 + y 0 2 - r 2 ) + ( x i 2 + y i 2 ) .$

Thus

3.239 $A ( x 0 Y o ρ ) = b$

(from which x 0, y 0, and ρ are found) where the elements of the ith row of A are the coefficients $( 2 x i 1 2 y i 1 − 1 )$

and the ith element of b is $x i 2 + y i 2$ .

An estimate of r is

3.240 $x 0 2 + y 0 2 - ρ .$

This can be used to get a first estimate of the parameter for the non-linear method if required.

Both the linear and non-linear methods described above can be used for spheres.

#### 3.6.2.3  Cylinders and Cones

It has been suggested [33] that a modified Gauss–Newton iterative routine should be used in the case of cylinders because one of the parameters is the direction of a line that is the axis of the cylinder. Such a line x 0, y 0, z 0, a, b, c, in 3D can be specified by four parameters together with two rules to allow the other two to be obtained:

Rule 1: represent a direction (a, b, 1)

Rule 2: given the direction above, ensure z 0 = –ax 0by 0

For nearly vertical lines these two rules give stable parameterization for a, b, x 0, and y 0. The problem of finding the distance of a data point to an axis is quite complicated. The following strategy is therefore followed based on the fact that for axes which are vertical and pass through the origin, a = b = x 0 = y 0 = 0 and all expressions become simple.

The strategy is as follows:

1. Iterate as usual but, at the beginning of each iteration, translate and rotate the data so that the trial best-fit cylinder (corresponding to the current estimates of the parameters) has a vertical axis passing through the origin.
2. This means that when it is time to evaluate the Jacobean matrix the special orientation can be used to simplify the calculations. At the end of the iteration use the inverse rotation and translation to update the parameterizing vectors x 0, y 0, z 0, a, b, c, and thereby determine the new positions and orientation of the axis.

Note:

To rotate a point (x, y, z) apply a 3 × 3 matrix U to the vector (x, y, z) T ; the inverse rotation can be achieved by using the transpose of U:

3.241 $( u v w ) = U ( x y z ) . ( x y z ) = U T ( u v w )$

A simple way to construct a rotation matrix U to rotate a point so that it lies on the z axis is to have U of the form

3.242 $U = ( C 2 0 S 2 0 1 0 - S 2 0 C 2 ) ( 1 0 0 0 C 1 S 1 0 S 1 C 1 )$

where Ci = cosθ and Si = sinθ i , i = 1, 2. So if it is required to rotate (a,b,c) to a point on the z axis, choose θ1 so that bC 1 + cS 1 = 0 and θ2 = aC 2 + (cC1 bS 1)S 2 = 0.

These notes are only suggestions. There are other methods that can be used but these are most relevant to geometric parts like cylinders, which in surface metrology can usually be oriented to be in reasonable positions that is for a cylinder nearly vertical.

Care should be taken to make sure that the algorithm is still stable if reasonable positions for the part cannot be guaranteed.

Cylinder

1. Specify a point x 0, y 0, z 0 on its origin, a vector a, b, c pointing along the axis and radius r
2. Choose a point on the axis. For nearly vertical cylinders
3.243 $z 0 = - a x 0 - b y 0 c = 1 .$
3. Distance of the chosen point to cylinder is
3.244 $d i = r i - r r i = ( u i 2 + v i 2 + w i 2 ) 1 / 2 ( a 2 + b 2 + c 2 ) 1 / 2$

where

3.245 $u i = c ( y i - y 0 ) - b ( z i - z 0 ) v i = a ( z i - z 0 ) - c ( x i - x 0 ) w i = b ( x i - x 0 ) - a ( y i - y 0 ) .$

To implement the Gauss–Newton algorithm to minimize the sum of the square distances the partial deviation needs to be obtained with the five parameters x 0, y 0, a, b, r (the five independent variables for a cylinder). These are complicated unless

3.246
3.247 $∂ d i ∂ x 0 = - x i / r i ∂ d i ∂ y 0 = - y i / r i ∂ d i ∂ a = - x i z i / r i ∂ d i ∂ b = - y i z i / r i ∂ d i ∂ r = - 1 .$

Algorithm Operation

1. Translate data so that the point on the axis lies at the origin:
$( x i , y i , z i ) ≔ ( x i , y i , z i ) - ( x 0 , y 0 , z 0 ) .$
2. Transform the data by a rotation matrix U which rotates a, b, c to a point on the z axis:
3.248 $( x i y i z i ) ≔ U ( x i y i z i ) .$
3. Form the right-hand side vector d and Jacobean according to Expressions 3.244, 3.246, and 3.247.
4. Solve the linear least-squares system for P x0, etc.
3.249 $J { P x 0 P Y 0 p a P b P r } = - d .$
5. Update the parameter estimates to
3.250 $( x 0 Y o z 0 ) ≔ ( x 0 Y o z 0 ) + U T { P P Y 0 - P x 0 P a - P y 0 P b } ( a b c ) ≔ U T ( P a P b 1 ) r ≔ r + P r .$

These steps are repeated until the algorithm has converged. In Step 1 always start with (a copy of) the original data set rather than a transformed set from the previous iteration.

If it is required to have (x 0, y 0, z 0) representing the point on the line nearest to the origin, then one further step is put in:

3.251 $( x 0 Y o z 0 ) ≔ ( x 0 Y o z 0 ) - ( a x 0 + b y 0 + c z 0 a 2 + b 2 + c 2 ) ( a b c ) .$

(See Forbes [33] for those situations where no estimates are available.) Luckily, in surface metrology, these iterative routines are rarely needed. Also it is not yet clear what will be the proportion of workpieces in the miniature domain that will have axes of centro-symmetry. Until now, in micro-dynamics the roughness of rotors and stators has made it difficult to measure shape. However, there is no doubt that shape will soon be a major factor and then calculations such as the one above will be necessary.

Cones

These can be tackled in the same way except that there are now six independent parameters from (x 0, y 0, z 0), (a, b, c), ϕ, and t, where t is shown in Figure 3.39. z 0 and c can be obtained dependent on the other parameters.

Figure 3.39   Coordinate arrangement for cone.

Specify the cone, a point x 0, y 0, z 0 on its axis and a vector (a, b, c) along its axis, and the angle ϕ at the apex giving information about where on the axis the cone is to be positioned. Parameterization requires a systematic way to decide which point on the axis to choose, along with a constraint on (a, b, c). For this

3.252 $c = 1 and z 0 = S 0 − a x 0 − b y 0$

for some constraint S 0, which is position sensitive and the choice of which has to be determined or specified with care depending on whether the cone is narrow angle or wide angle.

The distance of the point from the cone, di, is given by

3.253 $d i = e i cos ( φ / 2 ) + f i sin ( φ / 2 ) - t ,$

where ei is the distance from x i, yi, zi to the line specified by (x 0, y 0, z 0) and (a, b, c).

Again, as for the cylinder, making x 0 = y 0 = a = b = 0:

3.254 $e i = r i = x i 2 + y i 2 f i = z i − S 0$

and

3.255 $∂ d i ∂ x 0 = - x i cos ( φ / 2 ) / r i ∂ d i ∂ y 0 = - y i cos ( φ / 2 ) / r i ∂ d i ∂ a = - x i w i / r i ∂ d i ∂ b = - y i w i / r i ∂ d i ∂ φ = + w i / 2 ∂ d i ∂ t = - 1$

with

3.256 $w i = ( z i - S 0 ) cos ( φ / 2 ) - r i sin ( φ / 2 ) .$

For most cones S 0 is chosen such that

3.257

Algorithm Description

For cones with moderate apex angle (< 0.9π)let S 0 = 0:

1. Translate data so that the point on the axis lies at the origin:
$( x i , y i , z i ) ≔ ( x i , y i , z i , ) - ( x 0 , y 0 , z 0 , ) .$
2. Transform the data by a rotation matrix U that rotates (a, b, c) to a point on the z axis:
3.258 $( x i y i z i ) ≔ U ( x i y i z i ) .$
3. Form RHS vector d and J according to the above with S 0 = 0
4. Solve the linear least-squares system
3.259 $J { P x 0 P y 0 P a P b P φ P t } = - d .$
5. Update the parameter estimate
3.260 $( x 0 Y o z 0 ) ≔ ( x 0 Y o z 0 ) + U T { P P y 0 - P x 0 P a - P y 0 P b }$
3.261 $( a b c ) ≔ U T ( P a P b 1 )$
3.262 $φ ≔ φ + P φ t = t + P t .$
6. This step is the same as for the cylinder

For the special case when the periphery of the component is incomplete, the best-fit are has to be generated in order to provide a reference. This has been developed earlier [36]. It suffices here to repeat the formula. The formula used is the simplified version where the angular reference is taken to be the bisector of the arc angle 2θ.

Thus the center of curvature and radius are given by

3.263

These results have already been given in Chapter 2 in the roundness section. They can obviously be extended to the 3D case.

Equation 3.263 gives the best-fit conditions for a partial arc which can enclose any amount of the full circle. Often it is necessary to find the unique best-fit center to a concentric pair of circular arcs. This involves minimizing the total sum of squares of the deviations. Arc 1 has sum of squares S 1 and arc 2S 2:

3.264 $S 1 = Σ i = 1 M ( r 1 i - R 1 - x ¯ cos θ i - y ¯ sin θ i ) 2 S 2 = Σ i = 1 N ( r 2 j - R 2 - x ¯ cos θ j - y ¯ sin θ j ) 2 .$

Minimizing S 1 + S 2 and differentiating these polar equations with respect to $x ¯ , y ¯$

R 1 and R 2 gives

3.265

These equations are useful when data are available in the polar form. But when data are available in the Cartesian form the other criterion, namely minimizing the deviation from the property of the conic, is useful as described below. In this case, the equations of the arcs are written as

3.266 $x 2 + y 2 - u x - v y - D 1 = 0 x 2 + y 2 - u x - v y - D 2 = 0$

and the total sum of the squares of the deviation from the property of the arc/conic are defined as

3.267 $E s = ∑ ( x i 2 + y i 2 - u x i - v y i - D 1 ) 2 + ∑ ( x j 2 + y j 2 - u x j - v y j - D 2 ) 2$

where Ds are dummy radii. Differentiating partially with respect to u, ν, D 1, D 2, the equations in matrix form to find the solution of u, ν, D 1, D 2 are given by

3.268

Then

3.269 $x ¯ = u / 2 y ¯ = v / 2 R 1 = D 1 + ( u 2 + v 2 ) / 4 R 2 = D 2 + ( u 2 + v 2 ) / 4.$

Obviously the key to solving these sorts of problems is how to make the equations linear enough for simple solution. This is usually done automatically by the choice of instrument used to obtain the data. The fact that a roundness instrument has been used means that the center a, b is not far from the axis of rotation—which allows the limaçon approximation to be valid. If a Cooridinate measuring machine (CMM) had been used this would not be the case unless the center portions were carefully arranged.

The best-fit methods above have hinged on the best-fit limaçon technique because this is the natural way in which a roundness instrument sees the signal. Should the data be obtained with the radius not suppressed it can be treated as a circle. The equation for minimization then becomes

• k(x 2 + y 2) + ux + vy − 1 = 0

and the sum of errors S is

$S = ∑ [ k ( x i 2 + y i 2 ) + u x i + v y i − 1 ] 2 .$

Differentiating partially with respect to k, u, and ν to minimize S, the sum of squared residuals gives the matrix form

3.270 $( ∑ ( x i 2 + y i 2 ) 2 ∑ x i ( x i 2 + y i 2 ) ∑ y i ( x i 2 + y i 2 ) ∑ x i ( x i 2 + y i 2 ) ∑ x i ∑ x i y i ∑ y i ( x i 2 + y i 2 ) ∑ x i y i ∑ y i ) ( k u v ) = ( ∑ ( x i 2 + y i 2 ) ∑ x i ∑ y i ) .$

Then the unknowns $x ¯$

, $y ¯$ , and R are given by

$x ¯ = u / ( 2 k ) y ¯ = v / ( 2 k ) R = ( x ¯ 2 + y ¯ 2 + 1 ) / k .$

An alternative least-squares approach will be given at the end of the chapter for all shapes.

#### 3.6.3.1  Minimum Zone Method

The other basic method adopted in most standards is the minimum zone method. This is found by an iterative method based on the simplex method.

This method is a search routine (of which the Steifel exchange is just a subset) which is designed to climb mathematical maxima or minima. The simplex figure is obtained from the geometrical figure used in the search process. In 2D it is an equilateral triangle and in 3D it is a triangular pyramid or tetrahedron.

The basic principle is as follows:

1. Express the problem of minimization in mathematical terms:
$R i = [ ( x i − x ¯ ) 2 + ( y i − y ¯ ) 2 ] .$
The objective or object function is to get a function, say B, minimized, that is
• B = min[max R 1 − minR i ].
2. Select a set of feasible values for the independent variables and use this as a starting point. This starting point is invariably chosen as the best-fit center, that is
$x ¯ = 2 ∑ x i / N y ¯ = 2 ∑ y i / N .$
3. Evaluate the objective function from this center, that is the B value from $x ¯$ and $y ¯$
4. Choose, by means of the simplex figure, a second location near to $x ¯$ , $y ¯$
5. Compare the objective. If it is smaller move to this new point; if not try again

Rules for minimization:

1. Reject highest point of B
3. Terminate after a given number of iterations

This is one of many hill-climbing techniques but it is probably the simplest.

#### 3.6.3.2  Minimax Methods—Constrained Optimization

The criteria for parameters encountered in surface metrology fall into two categories: those based on least squares and those based on peaks and valleys or extrema.

Consider, for example, roundness. Typical references are the best-fit circle, the minimum circumscribing circle, the maximum inscribed circle and the minimum zone. The zonal methods, unlike the best-fit methods, rely for their basis on a limited number of high peaks and valleys rather than all of the data, as is the case with the best-fit methods. The problem in the extreme cases is, fundamentally, one of some sort of maximization or minimization, subject to a number of constraints. Thus, for example, the requirement for finding the plug gauge circle is to maximize the radius of a circle subject to the constraint that all data points lie on or outside the circle. In other words, the requirement for finding the plug gauge circle (or limaçon) is to find the maximum “radius” for which a limaçon may be constructed such that the limaçon lies completely inside the data representing the data being measured.

A number of methods are available for finding the optimum radius or zone subject to such constraints. Originally the assessment was based on a trial-and-error estimate. For roundness measurement, this involved drawing many circles on a chart in order to get the center which maximized, minimized, or optimized the circle or zone.

#### 3.6.3.3  Simplex Methods

This technique, although crude, could give answers accurate to a few per cent providing that the workpiece was reasonably centered—in which case it is acceptable to use compasses and to draw circles on the chart.

There is a simple method which gives a good approximation to the true method. As an example of this consider the roundness profile shown in Figure 3.40. Imagine that the minimum circumscribing circle (ring gauge method) is being used.

In all problems like this there are a number of ways of tackling the situation. Often there is a problem between technological relevance and mathematical elegance. In metrology the former has to take precedence. It has been pointed out many times by Lotze [37] that agreement on terminology and methodology should pre-empt any agreement on which algorithms to use for the calculation of any of the parameters in dimensional as well as in surface metrology. This is very true because as yet there is no agreement as to the procedures, let alone algorithms, for use in minimax routes (for example [38]). This has caused a lot of confusion between surface metrologists and coordinate-measuring machine metrologists. To illustrate this, a simple approach will be given to show that it is easily possible to get to within a reasonable range of the true value in a search routine. This approach will be followed by demonstrating that it is one application of linear programing. The basic elements of linear programing in metrology will be reviewed together with an indication of the concept of the elegant dual linear programing methods for use in metrology. Finally, an example of an algorithmic approach will be given following Dhanish and Shunmugam Ref. [39] which could, in principle, be applied to dimensional and surface metrology, with the proviso that great care in setting up the components is taken.

Select a point in Figure 3.40 at O1. This can be arbitrary or it can be based on a least-squares center [36]. Draw a circle from it with a minimum radius to touch just one point P 1. Move in the direction of OP’ at each step sweeping a radius around and reducing it until two points are at the same minimum radius. Then move the center along the bisector again reducing the radius until a further point P 3 is touched. The center position O 3 is then very often the exact center frame on which the ring gauge circle is centered. This is because only three points are needed to define a circle, the center coordinates and the radius, so three contacting points are required. The same is true for the maximum inscribed circle. Four points are needed for the minimum zone circle: two for the center and two for the two radii. A sphere needs four, a cylinder five, etc.

The method described above is not necessarily unique, even for the circumscribing circle, which should, in principle, give a unique center. The plug gauge method does not, neither does the minimum zone.

It should be pointed out here that in the case of roundness measurement it should be limaçons and not circles that are being described so that the evaluation should be of the least circumscribing limaçon, etc.

Although the method described above works in almost all cases it is not formal. The best method for constrained optimization is to use linear programing techniques. The simple method outlined above contains the basic elements of the technique, so for completeness the basic method will be outlined below and then followed by some examples. Although the basic idea for this optimizing technique has been used in the past quite independently [40], the formal derivation has been achieved by Chetwynd [41] and his terminology will be followed here. However, other attempts have also been used. It will become obvious that the subject is quite complicated and that a number of different approaches to get a workable, cheap, and fast solution are possible.

Linear programing implies constrained optimization involving either minimization or maximizing a function (called an objective function) while keeping other relationships within predefined bounds. If these relationships are linear, then they can be expressed as a set of linear parameters and the optimization becomes linear. This so-called linear programing is fundamental to the understanding and operation of the “exchange”-type algorithms which are now used extensively in metrology. See, for example, Figure 3.42.

Take the measurement of straightness, for example. The criteria, expressed in instrument coordinates, are, given a sequence of Cartesian datum points (xi, yi ), and the minimum zone value Z

3.271

Figure 3.40   Simple algorithm for minimum circumscribing circle.

for all (xi, yi ) simultaneously. This illustrates a convenient parameterization, namely a single line (slope m, intercept c) together with a zone of acceptability of width 2h set symmetrically about it. Equation 3.271 is a linear program in (m, c, h). (It is also a simple form of the minimax polynomial fit for which the so-called Steifel exchange algorithm offers an efficient solution. This may be derived from, and owes its efficiency to the properties of, the associated linear program, Figure 3.42.)

Standards present a method, originally for calculating the least-squares parameters, which has been extensively studied and is known as the “limaçon approximation” for roundness measurement. Referring to the notation adopted, the eccentric circle is reduced, providing e«R, to

3.272

This is a linearization of the parameters about the origin, which is produced mechanically by the instrument. Linearization about any other point involves considerable extra complexity of the coefficients. Whenever the limaçon approximation is valid in surface metrology the calculation of limiting reference circles becomes a linear program. For example, the minimum circumscribing figure to a set of data points (ri, θ i ) is expressible as

3.273

for all i. Others may be expressed in the form of either Equation 3.272 or Equation 3.273.

Before proceeding to develop algorithms from these formulations, it is useful to establish a practical context and a mathematical notation by first illustrating the earlier work on reference circles and reviewing, extremely briefly, the main points of linear programing theory following Chetwynd [41].

#### 3.7.1  General

A linear program is an optimization in which the objective function (e.g., minimizing a zone or maximizing a radius) and all the constraints are linear in the parameters. Using vector notation for brevity, it can be expressed as

3.274

where, for m positive parameters, x , and n constraints, c is an m vector, b an n vector and A an m × n matrix.

It is known (there is extensive literature on this subject) that the optimum solution occurs when each of the constraints (c) is satisfied to its limit by one of the parameters. Hence only certain combinations of parameter values need be examined. An orderly search through these is obtained by using the simplex method in which iterations involve only elementary row operations on the matrix–vector representation. Simplex organizes these vectors as a partitioned matrix (a tableau). This has the form

3.275

where K is A augmented by an n × n identity matrix and c is correspondingly extended by n zero elements. This adds n “slack variables” to the original parameters. If the ith parameter is limiting a particular constraint, the column Ki in K will have value 1 in the row corresponding to that constraint and zero in all other elements. The set of defining parameters so identified form the “basis.” Initially the basis is the n slack variables. Iterations attempt to match parameters to constraints in such a way that Z is rapidly maximized. It is usual always to maintain the feasibility of the current iteration by ensuring that no constraint is ever violated that is that no element of b′ becomes negative. This is one of the problems not easily addressed in the simplex method. The prime indicates the vector that currently occupies the position originally occupied by b . At each iteration, the largest positive element of c’T is chosen and its column brought actively into the solution (this is the strategy of “steepest descent”). When no positive elements remain in c’T , optimality has been achieved and the solution values are readily interpreted from the tableau.

At any iteration, the columns which originally consisted of the identity matrix carry a complete and interpretable record of the row transformations carried out on the tableau. Likewise, the columns of the current basis carry the same information in the inverse of their original form. The computationally efficient method of revised simplex does not update the full tableau but merely notes what would have been done at each iteration. The work required relates to that of inverting n × n matrices. It may, therefore, be advantageous to use a dual program. For any m × n linear program (termed the primal), an n × m dual can be defined as

3.276

where K is now the augmented form of AT (compare Equation 3.275) and the optimization has changed from minimization to maximization or vice versa. It contains exactly the same information as before, subject to the correct relative interpretation of specific elements.

#### 3.7.2  Dual Linear Programs in Surface Metrology

Straightness, flatness, and all routine roundness measurements involve reference fittings which appear naturally as linear programs. For more complex geometries, the errors inherent in parameter linearization may be judged acceptable when weighed against the computational efficiency of simplex. All the resulting formulations essentially have features in common indicating that the dual program will offer the most efficient solutions.

The sign of the parameters required for simplex cannot be guaranteed with metrological data and so each parameter is replaced by a pair having equal magnitude but opposite sign. Even then the number of constraints usually dominates the number of parameters. Thus, a circumscribing limaçon fit involves six parameters and the minimum zone seven, but typical measurements involve several hundred profile points each generating a constraint; each generates two in the case of the minimum zone because it may contribute to the inner or outer circles of the zone. The sources of the difficulties encountered with early attempts at circle fittings are now apparent. They did not exploit the simplex method of searching only certain basic solutions and, furthermore, they worked with a primal or conventional formulation involving, say, six parameters and 500 constraints or points, rather than dual which, while having 500 parameters, has only six constraints. This makes the computing long-winded, so, in moving from the primal to the dual, the roles of vectors b and c are interchanged. If at any iteration the dual is maintained in a feasible condition (all elements of c positive), the corresponding primal would be interpreted as being in an optimal, but generally infeasible, condition. The implications of dual feasibility are critical to what is to follow. Consider a physical interpretation for the case of a circumscribing limaçon (or circle). The primal feasibility condition amounts to starting with a figure which is too large but which certainly encloses the profile and then shrinking it to the smallest radius that still closes the profile. Dual feasibility would entail initially choosing a figure which is the smallest to enclose some of the data points and then expanding it as little as possible so as to include all the data— the same problem looked at a different way.

Note:

If a primal has three parameters, the dual has three constraints. The corresponding geometric observation is that a circle is defined by exactly three contacts with the data, which makes physical sense.

#### 3.7.2.1  Minimum Radius Circumscribing Limaçon

Transferring the primal, geometrical statement of the minimum radius circumscribing limaçon to the dual, the initial tableau can be written as a minimization:

3.277

At any iteration giving a feasible solution, the basis will be formed from three of these columns, so taking three general contact points at θ i , θ j , and θ k means that the basis β is given by

3.278 $β − 1 = ( cos θ i cos θ j cos θ k sin θ i sin θ j sin θ k 1 1 1 ) .$

No significance (such as θ i < θ j, for example) can be read into this matrix; the relative positioning of columns depends upon the workings of revised simplex in previous iterations. The determinant of β–1 is given by the sum of the cofactors of its third row, that is by the same cofactors which identify the elements of the third column of β. The non-negativity of the elements of the third column of β, thus, requires that these cofactors, Δ ij , Δ jk , Δ ki , must have the same sign where

3.279 $Δ i j = | cos θ i cos θ j sin θ i sin θ j | = sin ( θ j = θ i )$

and similarly for the others. Using Cartesian coordinates, the cofactor can be expressed as

3.280 $Δ i j = 1 r i r j | x i x j y i y j |$

and related to this is a function

3.281 $Δ i = 1 r i r | x i x y i y | = 0$

which (apart from an indeterminacy at the origin, of little importance here) is a straight line passing through (xi, yi ) and (0, 0) and dividing the xy plane into the two areas where Δ i > 0 and where Δ i < 0. The line is also the locus of all points having θ i as their argument. Noting the order of indices, dual feasibility requires that Δ ij and Δ ik have opposite sign and so lie on opposite sides of the line. An exactly similar argument applies to the other points and Δ = 0 or Δ k = 0. If point k is to lie on the opposite side of Δ jr = 0 from point j and on the opposite side of Δ jr = 0 from point i, it can only occupy the sector shown in Figure 3.41. As it is only in this geometry that Δ ik and Δ jk will have opposite signs, as required for dual feasibility, the following theorem, termed here “the 180° rule”, is proved. (A point known for a long time but proved by Chetwynd [43].)

Figure 3.41   The 180° rule in simplex iteration.

#### The 180° Rule

A circumscribing limaçon on a given origin to a set of points is the minimim radius circumscribing limaçon to those points if it is in contact with three of them such that no two adjacent contact points subtend an angle at the origin of more than 180°, where the term “adjacent” implies that the angle to be measured is that of the sector not including the third contact point.

A complete simplex iteration for the minimum radius circumscribing limaçon in the dual consists of selecting any point which violates the reference (conventionally, the point giving the largest violation is chosen) and substituting it for one of the points defining the reference in such a way that dual feasibility is maintained. The 180° rule allows the general iteration to be simplified to the following exchange algorithm:

1. Choose any three data points such that no two adjacent ones subtend an angle at the origin of more than 180°.
2. Construct a reference limaçon through these three points.
3. If no data points lie outside this limaçon the solution is found, otherwise choose the point which violates the reference by the largest amount.
4. Replace one of the reference points by this new point such that the 180° rule is still obeyed and go back to Step 2.

The exchange between any new point and the contacts is always unique.

An exchange algorithm depends upon the iterations moving monotonically toward an optimum solution in order to guarantee that cyclical exchanges do not occur. Here this is the case, because, as the exchange is unique at each iteration, it must be identical to the variable change at the simplex iteration of the linear program and that is known to converge monotonically.

A similar procedure needs to be followed to get the formal linear programing matrix for the minimum zone but essentially the result is as follows.

The conditions for the optimum solution to the minimum radial zone limaçons give rise to the following geometric interpretation. Expressing the zone as a band of width 2h placed symmetrically about a single limaçon:

1. All data points must lie not more than a distance h, measured radially from the origin, from the limaçon
2. There must be four data points all lying exactly h from the limaçon such that they lie with increasing angle θ i alternately inside and outside the limaçon

It may be noted that this alternation property is not unique to this problem; it occurs, for instance, in the Steifel exchange algorithm for best-fit polynomials, which may also be derived by a linear programing method [42]. These rules may be used to formulate an exchange algorithm. Thus:

1. Choose arbitrarily four data points.
2. Fit to these a reference limaçon such that they are radially equidistant from it and lie alternately on either side of it with increasing angle.
3. If no other points are further from the reference the solution is found.
4. Otherwise substitute the point which lies furthest from the reference for one of the four defining points such that the new set of points lie alternately on either side of the reference and return to Step 2.

#### 3.7.3  Minimum Zone, Straight Lines, and Planes

The minimum separation parallel and straight lines belong to the well-documented class of minimax polynomials, that is curves having the smallest possible maximum divergence from the data. The condition for this to occur is that, relative to an nth-order polynomial, the data must have n + 2 maxima and minima, all of equal magnitude. The solution can be found by the Steifel exchange algorithm, which proceeds by fitting the polynomial according to this condition to n + 2 points and then exchanging points further away from it with the defining set, while maintaining the condition. In terms of the minimum zone straight lines there will be three points, two contacting one line and one the other in an alternate sequence, which are iterated by exchanges (see Figure 3.42).

The minimum zone planes can be expressed, in instrument coordinates, as

3.282

for all data points (xi, yi, zi ). a, b, and c are sign unrestricted and h ≤ 0. Noting that h = 0 is valid only for the trivial condition that all points are coplanar, then it may be asserted that four points will be represented in the basis of the dual, which can be expressed as

Figure 3.42   Steifel exchange mechanism.

3.283 $β − 1 = ( S i x i S j x i S k x i S l x i S i y i S j y i S k y i S l y i S i S j S k S l 1 1 1 1 )$

where Si etc., take values of +1 or –1 according to whether (xi, yi, zi ) contacts the upper or lower of the minimum zone planes. As before, dual feasibility is guaranteed if all terms in the final column of β are positive, which will be true providing that the cofactors of the final row of β–1 all have the same sign. These cofactors are

3.284 $− S j S k S l Δ j k l − S i S j S l Δ i j l S i S k S l Δ i k l S i S j S k Δ i j k .$

They must all have the same sign. Consider the determinant equation representing the boundary between positive and negative regions of Δ jkl:

3.285 $Δ j k = | x j x k x y j y k y 1 1 1 | = 0.$

It is a plane parallel to the z axis (since it is independent of z), passing through points (zl, yl ) and (xk, yk ). Dual feasibility requires that if Si = Sl (contacts with the same place) Δ jkl and Δ jki must have different signs and vice versa. So if the ith and lth contacts are with the same plane they lie on opposite sides of Δ jkr = 0, but if they contact different planes they both lie on the same side of Δ jkr = 0. A parallel argument shows that the same is true for all pairs of points.

These relationships show the relative positions of contacts that give dual feasibility (Figure 3.43). There can be two contacts with each of the minimum zone planes, in which case the plan of lines joining the alternate types must form a convex quadrilateral or a 3:1 split, in which case the single contact must lie in the plan of the triangle formed by the other three contacts.

Figure 3.43   Contacts which give dual feasibility.

Even with this most simple of 3D zone fits, the advantage of using specific exchange algorithms rather than a general revised simplex solution in an automatic system is becoming questionable—hence the difficulty of standardization.

Note:

It may, at first glance, seem surprising that limaçon fitting rather than the apparently simpler case of flat surfaces has been used as the primary example. This section, being primarily concerned with linear programing, does not report detailed comparisons between algorithms but some comments are needed. The following remarks are based on practical tests made by Chetwynd [41].

A typical roundness “profile” would have 512 equally spaced radial ordinates each resolved over a 10- or 12-bit working range. Exchange algorithm systems have now been working with data of this type in both industrial and research environments for several years and their robustness has been established. Even with an arbitrary choice of points for the initial basis, the exchange algorithm virtually always solves for the minimum circumscribing limaçon in five or less iterations, while the minimum zone only occasionally needs more than five on real engineering profiles. The earlier (primal-based) algorithms were run with poorly defined end conditions, typically making 32 relatively coarse-stepped iterations and then 32 finer steps, after which the process was terminated with a result assumed to be close to the desired optimum. The dual techniques yield at least a 10:1 saving in the number of iterations as well as giving a fully determined convergence and so better accuracy. With both algorithms the iteration is dominated by the almost identical computation and checking of the updated figure, so the program size and the cycle times are closely similar on similar machines programed in the same language. A 10:1 speed increase is also obtained.

The direct use of revised simplex on dual programs representing limaçon fitting has been studied using a specially developed package containing only the subroutines essential for solving this class of problem. Memory requirements are only slightly larger than those of exchange algorithms and execution is typically about 20% slower. This is due to the simple way artificial variables are treated. This difference can be removed at the cost of extra program length.

The limaçon fits to roundness measurement have simple exchange rules that can be expressed in a few numerical comparisons and logic operations. Thus in a specialized system both a size reduction and a speed increase would be obtained by replacing the direct use of revised simplex on the dual by an exchange algorithm. However, the exchange logic is specific, so if several different references are to be implemented there will be less shared code. With more complex geometries it is of even greater importance that the efficiency of dual-based methods is obtained. Yet, even with the simplest 3D geometry, the exchange rules are becoming quite complicated. Using a conclusion made by Chetwynd, duality theory has shown that the “obvious” geometrical method is rarely the best approach.

The study of exchange algorithms gives a very clear insight into the geometrical implications of reference fitting. This is important for metrology. Measurement should always be based on engineering relevance rather than a mathematically convenient abstraction. The exchange algorithm also provides a good method for solution by hand should it be necessary. Relatively flexible measurement systems are likely to use a more general implementation of a revised simplex algorithm. This is no cause for concern: both are firmly based on the same theoretical foundation.

Note:

Even these considerations should show that the use of dual and exchange methods require a considerable knowledge of the subject so that the parameters to be measured and the constraints to be applied can be formulated properly. The gains are massive if the mathematical background is sound, but it should never be forgotten that these methods rely on a relatively small number of geometrical points to satisfy the ultimate optimum solution. This is a weakness because of the fleeting reliability of such extrema. If this is a consideration in one dimension such as in roundness and straightness, it is even more so in the 2D cases of sphericity, cylindricity, etc. In these cases, especially that of cylindricity, the whole problem becomes very complicated because the straightforward requirement of five constraints becomes masked in the data produced by a surface metrology instrument due to tilt and eccentricity effects. For a more detailed discussion on this subject, see Ref. [43].

#### 3.7.4.1  General Algorithmic Approach

Because there is no absolutely correct algorithm or even a fully agreed algorithm for the minimax problems in surface and dimensional metrology it is informative to give an example here. The algorithm chosen is to determine the minimax zone value of a number of different forms commonly found in industry, ranging from the flat to the cylindrical in parallel with the best-fit least-squares procedures given earlier.

The reason why the minimum zone has been selected is because, next to least squares, it is often the preferred way of specifying the tolerance on a form. In fact the ISO standards refer to it often but do not attempt to indicate how to obtain it! This is not surprising because nobody has yet agreed how to do it. The optimum obviously depends on a number of things, not the least being simplicity and technological relevance rather than mathematical elegance.

Such a straightforward approach has been suggested by Dhanish and Shunmugam [39]. This approach assumes that the form parameters have already been linearized, which is consistent with the constraints of most surface metrology instruments and potentially with coordinate-measuring machines providing that the part is very well aligned. Problems such as tilt, eccentricity, and direction of measurement have been ignored. The basic problem with tilt and eccentricity has only really been attempted properly by Chetwynd.

#### 3.7.4.2  Definitions

Let the function of the surface be b, and ϕ the function which best fits the surface. Assuming that the form shape has been linearized in surface metrology

3.286 $φ = a i 1 u 1 + ... + a i j u j + ... + a i n u n .$

n is the number of variables, aij denotes the value of the jth variable at the ith place, and uj is the coefficient of the jth variable.

As in least squares the deviation ei = ϕ i bi is

3.287 $e i = a i 1 u 1 + ... + a i n − b i .$

The form error is computed as

3.288 $h i = | e max ⁡ | + | e min |$

where e max and e min are the maximum and minimum errors.

The general problem for the minimum zone is, given aij and bi find uj of ϕ such that max|ei | is a minimum (following Dhanish and Shunmugam’s method below). The steps are as follows:

1. Choose a set of m + 1 points as a reference set. It is usual to use a set based on the least-squares approximation, usually the points giving maximum deviation from the least squares. Call these points ν kj so that ν hj = aij ; i takes the values in the reference set. The steps from 2 onwards operate only on the reference set
2. Find λ k , k = 1…m + 1, solutions of the equations
3.289 $Σ k = 1 m + 1 λ k v i j = 0.$
Since these are m homogeneous equations with m + 1 unknown the solution in determinant form is
3.290 $λ k = ( − 1 ) k + 1 | v i j | j = 1... m , i = 1... m + 1 ( i ≠ k ) .$
3. Calculate the reference deviation d where
3.291 $d = ∑ k = 1 m + 1 λ k b k / ( − ∑ k = 1 m + 1 | λ k | ) .$
If the denominator is zero the reference is degenerate, so choose a new reference. This is usually done by rotating the points to include the next point in order and dropping the first.
4. Find the values of ϕ k for this reference set:
3.292 $φ k = b k + sgn ( λ k ) d k = 1... m + 1.$
If λ k = 0 then Haar’s condition is violated and the situation is ambiguous with respect to sign, as mentioned earlier, so both the positive and negative d have to be tried. For both + and – values continue to Step 6 with that sign which gives the lowest value of error. Also this sign shows the direction to go.
5. Find the leveled reference by interpolation:
3.293 $φ k = ∑ j = 1 m u j v k j k = 1... m + 1$
so m + 1 equations exist for m coefficients u 1 u 2…so that any m equations will suffice
6. Find the value d i of this reference function at all the given points. The value of d i whose absolute value is the highest is referred to as the supreme error e* if
3.294 $| e * | ≤ d .$
Then the criterion of optimality is satisfied and the search is stopped. Then the function φ is the Chebychev approximation sought and the form error is calculated based on Equation 3.288 for h i. Otherwise the point i* corresponding to the supreme error e* enters the reference set and the search is continued.
7. Find the point to be discarded from the reference set by the Steifel exchange mechanism used by Chetwynd. Solve for μ k in the equations
3.295 $∑ k = 1 m + 1 μ k v k j + a i j = 0 j = 1... m$

Since there are m + 1 variables and only m equations a suitable value for one of the μ could be assumed; then calculate qk where

3.296 $q k = μ k / λ k k = 1... m + 1.$

If sgn(d)e* > 0 the point with minimum qk should be removed from the reference set, otherwise the point with maximum qk is discarded. (If any λ k = 0 a “trial exchange” has to be tried. Each point in the reference set is replaced by the point of supreme error one at a time and the trial reference deviation is computed for each set. The combination leading to the largest value of |d| is taken as the next reference step.) With this reference set, go back to Step 2. Note that the trial reference is not the exchange therein but alternative to it in the special case λ k = 0 where the exchange will not work.

Any example taken from Ref. [39] shows how this algorithm works. There are worked examples given for flatness, roundness, cylindricity, and sphericity.

However, even though the parameterization has been linearized there is still quite a number of special situations which have to be taken into account for obtaining a general algorithm. This detail is much greater when the system has to be linearized by iteration and when the effects of misalignment and eccentricity are taken into account. It seems that the more versatile and wide ranging the instrumentation becomes, the more difficult it will be to finalize a simple general algorithm. It may be that the best that will be done is to agree on a procedure whereby the algorithms will be able to be well behaved. For the fine detail of algorithms for working out minimax problems see books on computational geometry (e.g., Ref. [44]).

#### 3.8.1  General Properties

The Fourier transform may be of use whenever frequency, either in reciprocal time or wavenumber, is of importance. This is often the case in engineering, in a temporal sense in manufacturing systems and spatially in metrology. Examples of its use in correlation and filtering have already been mentioned as well as its use in random process theory.

The general form of the equation relating a time function f(t) to a frequency function F(ω) is

3.297 $F ( ω ) = ∫ − ∞ ∞ f ( t ) exp ( − j ω t ) d t .$

Alternatively this can be written with the factor 1/2π in front of the integral. The corresponding equation connecting f(t) with F(ω) is

3.298 $f ( t ) = 1 2 π ∫ − ∞ ∞ F ( ω ) exp ( j ω t ) d ω .$

As long as there is a 2π relationship between these two equations it does not matter in which domain it occurs. Some authors use a factor of $1 / 2 π$

outside both integrals to provide some symmetry. The first equation is referred to as the Fourier integral. As the transforms are usally considered to be time functions in behavior they will be designated f rather than z which is most often used in surface geometry. Before embarking upon the FFT algorithm there will be a short listing of some of the essential properties of the Fourier integral as follows:

1. If f(t) is real, as is usually the case in metrology, then the real and imaginary parts of the F(ω) namely R(ω) and X(ω)are given by
3.299 $R ( ω ) = ∫ - ∞ ∞ f ( t ) cos ω t d t$
3.300 $X ( ω ) = - ∫ - ∞ ∞ f ( t ) sin ω t d t$
that is R(ω) is an even function and X(ω) is odd. Thus R(–ω) = R(ω) and X(–ω) = –X(ω)
2. If f(t) is even, that is f(–t) = f(t),
$R ( ω ) = 2 ∫ - ∞ ∞ f ( t ) cos ( ω t ) d$
$X ( ω ) = 0$
which is particularly relevant when f(t) takes the form of an autocorrelation function.
3. If f(t) is odd, that is f(–t) = f(t),
3.301 $R ( ω ) = 0 X ( ω ) = - 2 ∫ - ∞ ∞ f ( t ) sin ( ω t ) d t$
4. Linearity: if F 1(ω) and F 2(ω) are the Fourier integrals of time functions f 1(t) and f 2(t) then
3.302 $f 1 ( t ) + f 2 ( t ) ↔ F 1 ( ω ) + F 2 ( ω )$
where the symbol ↔ indicates Fourier transformation.
5. Scaling:
3.303 $f ( K t ) = 1 K F ( ω K ) .$
6. Shifting in time by t 0:
$f ( t - t 0 ) ↔ exp ( - j t 0 ω ) F ( ω )$
a property which is useful when applying the theory of the phase-corrected (linear phase) filter as seen earlier. A similar relationship occurs in frequency. Thus
3.304 $F ( ( ω - ω 0 ) ↔ exp ( j ω 0 t ) f ( t ) .$
7. Differentiation:
• Time:
3.305 $d n f d t n ↔ ( j ω ) n F ( ω )$
a theorem which is useful for evaluating the frequency characteristic of a complicated impulse response. A similar theorem exists which enables the reverse to be done in frequency differentiation.
• Frequency:
3.306 $d n F d ω n ↔ ( - j t ) n f ( t ) .$
8. Convolution: this has already been used in filtering techniques. In condensed form, if f 1(t)↔F 1(ω)f 2(t) ↔F 2(ω) then written
3.307 $∫ - ∞ ∞ f 1 ( τ ) f 2 ( t - τ ) d τ ↔ F 1 ( ω ) × F 2 ( ω ) f 1 ( τ ) * f 2 ( τ ) = F 1 ( c 0 ) × F 2 ( ω )$
where the symbol * denotes convolution. These relationships will be used extensively in Chapter 4. The DFT may be written
3.308 $F ( k ) = 1 N ∑ n = 0 N - 1 f ( n ) exp ( - 2 π j N n k ) n = 0 , ... , N - 1 .$

#### 3.8.2  Fast Fourier Transform

The problem with the DFT as it stands is that it takes quite a long time to compute. This situation has been changed by the rediscovery of the FFT algorithm. This is not a special transform, it is a generic term enfolding a whole set of algorithms for computing the DFT. Some algorithms are matched to specific types of data etc. Generally, using the FFT, a reduction in the number of operations by a factor of log2 N/N is possible, where “operation” is usually taken to mean a complex multiplication and an addition. Thus the larger the number of data points N the greater the advantage. The FFT is also efficient in terms of storage because intermediate results in the calculation are stored in the original data places so that no extra storage is required for those extras beyond that of the data. Yet another benefit derives directly from the computation reduction: less computation means less numerical errors. Thus the FFT has three remarkable advantages: it is faster, more accurate and requires minimal storage. The disadvantage is that there have to be as many spectral points evaluated as there are data points, which may be more than is needed.

#### 3.8.2.1  Analytic Form

This is a fast algorithm of efficiently computing F(k)

3.309 $F [ k ] = ∑ n = 0 N - 1 f ( n ) exp ( - j 2 π k n N ) .$

For N = 2 r there are many variants.

Let k and n be represented in binary form as

3.310 $k = k r - 1 2 r - 1 + k r - 2 2 r - 2 + ... + k 0 n = n r - 1 2 r - 1 + ... + n 0 W = exp ( j 2 π / N ) .$

Then

3.311 $F ( k r - 1 , k r - 2 , ... , k 0 ) = ∑ n 0 = 0 1 ∑ n 1 = 0 1 ⋅ ⋅ ⋅ ∑ n r − 1 = 0 1 f ( n r - 1 , n r - 2 , ... , n 0 ) × W k 0 n r - 1 2 r - 1 × W ( k 1 2 1 + k 0 ) n r - 2 2 r - 2 × ... × W ( k r - 1 2 r - 1 + k r - 2 2 r - 2 + ... + k 0 ) n 0 .$

Performing each of the summations separately and labeling the intermediate results the following is obtained:

3.312

This operation is performed in N log N complex multiplications through the indirect method above. If F[k] is evaluated directly as

3.313 $∑ n = 0 N - 1 f ( n ) exp ( - j 2 π n N )$

it takes the order of N 2 complex multiplications, so the FFT is obviously faster. If N is large, as it usually is for the data set in any surface metrology, the gain in time is very great. The fact that there has to be as much store for the spectrum as there is for the data are no longer a disadvantage, although at one time when storage was at a premium in computers this was.

The only better way of achieving further gains is to use one or other of the clipped transforms, such as the Walsh transform [23]. As far as speed is concerned, some instrumental techniques can dispense altogether with digital methods and use coherent optical methods, as will be seen in the section on Fourier diffraction techniques in Chapter 4. In this case the transform is produced at the speed of light. A further advantage is that because the light is shone onto the surface, no digitization of the surface is required so that optical methods are very valid in surface metrology. The possibility of an optical method that is practical for evaluating the Wigner function is also feasible because it is a real function

In what follows, some conventional digital ways of filtering will be given with particular emphasis on the types of filter used in surface metrology.

Some of the comparative properties of the Fourier transform, ambiguity function and Wigner distribution are given in Table 3.6.

As previously mentioned the transform pair involves a pair of complementary transforms, one with positive exponents and one with negative exponents, the pair taken together having a normalization factor of 1/2π or 1/N for the DFT. This choice is arbitrary. The equation with positive exponential

### Table 3.6   Comparison of the Properties of the Fourier Transform, the Ambiguity Function, and the Wigner Distribution Function

is usually called the direct transform and the negative exponential the inverse transform (IDFT):

3.314 $f ( t ) = 1 2 π ∫ ∫ - ∞ ∞ F ( ω ) exp ( j ω ) d ω .$

Sometimes both are referred to as the DFT and sometimes the IDFT is called the Fourier series, as in most textbooks.

To see how the discrete form of the FFT works consider the discrete form of f(t): f(n)

3.315 $f ( n ) = ∑ k = 0 N - 1 F ( k ) exp ( 2 π N j n ) n = 0 , ... , N = 1 .$

Equation 3.315 can be written

3.316 $f ( n ) = ∑ k = 0 N - 1 F ( k ) W n k where W = exp ( 2 π j N )$

and Wnk = exp[(2π j/N)nk] which has a period of N.

The basic method used in the FFT algorithm is to make use of the symmetry of the exponential function.

Wnk is a sinusoid of period N and it also displays odd symmetry about k = N/2 for fixed n, etc. Thus W nk = –Wn (k + N/2).

This cyclic property is of fundamental importance. It can be shown that the efficiency of the FFT routine depends on the data sequence length. In fact it is optimized for the case where N = 3 m, where m is an integer. However, for digital computer operation it is more convenient to use N = 2 m or 4 m without significant loss in efficiency. Obviously base 2 is more suitable for computational work than base 3.

To see how it is possible to derive a base 2 FFT without recourse to the general theory, consider Equation 3.315. If N is divisible by 2 the summation can be split into two smaller sums involving odd and even indices. Thus

3.317 $f ( n ) = ∑ k = 0 N / 2 - 1 F ( 2 k ) W 2 n k + ∑ k = 0 N / 2 - 1 F ( 2 k + L ) W ( 2 k + 1 )$

so that the first sum uses values of F(0) to F(N–2) and the second sum uses values of F(l) to F(N–1). Hence f(n) may be written in terms of odd and even indices. Thus

3.318 $f ( n ) = E ( n ) + W n O ( n ) where W n = exp ( 2 π j n N )$

and

3.319 $E ( n ) = ∑ k = 0 N / 2 - 1 F ( 2 k ) W 2 n k O ( n ) = ∑ k = 0 N / 2 - 1 F ( 2 k + 1 ) W 2 n k .$

Wn is a constant for constant n.

The two series in Equation 3.319 will be recognized as discrete transforms of the odd and even-indexed terms of the original series in Equation 3.317. Note that the 2 in the index of W is appropriate because these two smaller series only have a length of N/2 and not N. It is this factor that is decisive in cutting the number of operations. To see this, consider Equation 3.317. To evaluate one f(n) requires N operations and to evaluate N values of f(n) therefore requires N 2 operations where operation refers to a complex multiplication and addition. However, the situation has changed in Equation 3.318 because there are only N/2 terms in the even series and N/2 terms in the odd series. To evaluate all the even components of all the F(n) therefore requires (N/2) 2 operations and to work out all the odd components of the f(n) also requires (N/2)2. The two, together with the N multiplications of Wn and N additions of the E(n) and O(n), give (N 2/2 + N) operations. For N large this tends to N 2/2, a gain over the direct method of 50% in computation. Similarly, if N/2 is also divisible by 2 a gain may be made by breaking down the two series. This reduction process can be continued M times until finally no further decomposition can take place. The overall number of operations carried out in the entire operation is N log2 N as distinct from N 2 in the direct approach, the ratio being log2 N/N. Thus for N = 256 the computational time is only 3% of that used in the direct method.

This iterative process is shown in Figure 3.44, which illustrates how an eight-point transform can be produced by three iterations. Notice that the input points are not arranged in their natural order. The reason for this becomes clear from the figure; at each stage a pair of points depends only on the pair previously residing at the same locations. Thus the results of each iterative step can overwrite the operands without subsequently affecting future calculations. The computation can proceed “in place”; no extra storage is needed for intermediate results. The order into which the original sequence has to be arranged is successively determined by separating out the odd from the even indices from the progressively shorter sequences. It can be shown that each element of the sequence should be placed at an address given by reversing the order of the bits in the binary representation of its original address.

Because of the increasing use of the FFT routine some consideration will be given here actually to implementing the routine on a small computer, and one or two special tricks associated with metrology will be indicated. This will be by no means an exhaustive account but it should serve to illustrate some of the features of the technique. It will be followed by some general applications to metrology problems on surfaces.

#### 3.8.2.2  Practical Realization

The term practical realization will be taken here to mean a form of the algorithm suitable for use with a general purpose digital computer which can be efficiently applied to typical transform problems. It is widely recognized that the optimum computational efficiency for a specific problem is obtained only by tailoring the program to its requirements. It follows from this that a general purpose routine will need to compromise between the sometimes conflicting needs of different problems. An obvious example is that if large amounts of data are to be handled, computer storage may be at a premium and it may be preferable to make the FFT routine small at the expense of reducing its speed. For a “real-time” problem, the opposite approach of using more storage to increase the computational speed would probably be necessary.

Figure 3.44   (a) Eight-point fast Fourier transform decimation-in-frequency branch transmissions in power of W 8, out-put shuffled and (b) eight-point fast Fourier transform decimation-in-tune branch transmissions in powers of W 8, input shuffled.

The first compromise is to use a base 2 FFT algorithm. The effect of this is to get good computational efficiency and very little “housekeeping” so that the actual FFT program may be both fast and physically small. To offset this, the data must be equal to 2 m points, where m is a positive integer.

Having decided on the base 2 FFT, the actual form of the derivation used is of little consequence. For the further discussion here the approach described earlier will be used, that is that based on Equations 3.317 and 3.318.

Some general considerations will now be given to the practical realization. In the first place the general situation, which involves complex data, will be considered. This will be followed by some details of how these can be recovered in the face of the less restrictive data often found in surface metrology.

#### 3.8.3  General Considerations of Properties

Remembering that all sequences in the DFT are complex, and that, therefore, each data point will occupy more than one store address, three distinct sources of potential inefficiency can be identified. These are the complex multiply and add operations at the heart of each calculation, the addressing of the data points at each stage of the iteration and the generation of the complex exponents.

There is little than can be done about the actual operation of multiplication without the use of special hardware, particularly if floating point operation is used. In fixed point operations, where scaling between operations is used, some optimization is possible since the nature of the multiplier (the exponential term) is known in advance. Prechecking each operand for zero and unity value can save time by eliminating unnecessary multiplications, but in general the overall time of the complete evaluation may not be reduced since extra time is required before every multiplication in order to perform this test.

Each iteration of the FFT requires at least one access to each point in the data array for reading and one for writing. The order in which the points are accessed will change for each iteration, since the point pairs move their relative positions, so that the data array must be treated as random access rather than cyclic. The accessing of array elements can introduce significant overheads in terms of the speed of operation and so should be kept to a minimum. While at least two accesses are needed to each pair of locations, these always occur as a group before the locations are addressed. Thus the addresses need only be calculated once per iteration, being stored for use during accessing, without significantly increasing the total store requirements. If the exponentials are stored in a look-up table, the amount of addressing to that can be minimized by ordering the calculations so that those using a particular exponent occur consecutively during each iteration.

The generation of complex exponentials is almost universally performed using de Moivre’s theorem:

3.320 $exp ( j θ ) = cos θ + j sin θ .$

By this method it is only necessary to calculate either the sine or cosine, since they are simply related and the calculation time is relatively small. However, compared with the other operations within the FFT the time to calculate sinθ is large. Since the calculation would also require quite a lot of programing, it is more usual to use some form of look-up table for the values. Only the first two quadrants need to be stored. The form which a look-up table takes depends largely on the compromise between speed and storage. The table may be made progressively shorter by increasing the amount of calculation required to access a particular value. The fastest access is by simply storing the N/2 complex exponent values. The store may be reduced with little overhead by having two separate arrays of N/2 points for sine and cosine, which may, of course, be overlapped by one quadrant, and combining them into a complex number at access time. An interesting method of retaining fairly fast access with only small storage is to use the approximate relationship that, for small B,

3.321

Here only a short table containing widely spaced A values and another short table of B values are needed.

Also, during the second iteration W may take only the values +1, –1, j, –j. By programing the first two iterations separately outside the iterative loop, no multiplications are needed during them.

Because of the ready availability of FFT subroutines, they will not be described here.

#### 3.8.3.1  Fourier Series of Real Data

By taking into account symmetries and identities readily identifiable it is possible to take great advantage of the FFT routine. Furthermore, in the case of evaluating the autocorrelation function from the former spectral density and vice versa, additional benefits can be achieved because the data is not only real but also even (i.e., symmetrical about the origin). Under these circumstances use can be made of the cosine transform rather than the full Fourier transform.

Instead of grouping the data in such a way as to appear complex it is grouped to appear as a conjugate, even series. Thus z(n) may be described as

3.322 $f 1 ( n ) = z ( 2 n ) f 2 ( n ) = z ( 2 n + 1 ) - z ( 2 n - 1 ) f ( n ) = f 1 ( n ) + j f 2 ( n ) for n = 0 , 1 , ... , N − 1.$

Since z(n) is real and even and f(n) is conjugate even, the sequence f(n) need only be calculated for n = 0,..., N/2; only the z(n) values of n = 0,1,..., N are needed for this. Therefore only N/2 + 1 complex storage locations are needed for processing 2N real, even data.

Using this technique only N + 1 real or N/2 + 1 complex numbers need to be stored, giving a saving of four times in store over the 2N complex data series storage requirement.

Similar statements can be made about the sine transform. It is obvious that the FFT routine has many uses. Some will be outlined here.

#### 3.8.4  Applications of Fourier Transforms in Surface Metrology

Main areas of use are:

1. Digital filtering
2. Power spectral analysis
3. Correlation
4. Other convolutions
5. Interpolation
6. Other analysis, for example Cepstrum
7. Roundness and other shapes—least-squares references

#### 3.8.4.1  Fourier Transform for Non-Recursive Filtering

This is basically a convolution filter using the impulse response of the filter ω as a window for the data. Thus the output g of a filter with input f is gi = w*fi . Using the transform it is preferable to work in the frequency domain; thus

3.323 $G j = W × F 1$

since convolution becomes multiplication in the transform domain. So filtering becomes:

1. Transform data
2. Multiply transformed data by frequency weighting function
3. Invert transformed and modified data

Note that it sometimes beneficial to change the spectrum obtained from 1 above to an analytical function. This involves making ω < 0 = 0. This operation involves taking the Hilbert transform (see Section 3.10.4.2).

Whether this method is quicker to compute will depend on the lengths of the data sequence and weighting function. Recently this technique has become progressively more attractive as the data lengths have increased. Because of the reciprocal nature of the transform domains, an added advantage of the transform technique may occur: a long, spatial weighting function will produce a short-frequency weighting function so that even less operations are required.

In the frequency domain, the amplitude and phase characteristics of the filter are separated into the real and imaginary parts of the weighting function. Thus it is easy to construct unusual filter characteristics. In particular, the phase-corrected filter is constructed by having real terms describing the required attenuation and by having zeros for the imaginary terms so that no phase modification takes place. Also, the defined amplitude characteristic of the phase-corrected filter is very short in the frequency domain. In the Whitehouse phase-corrected filter it is even simpler because the frequency characteristic is mostly either zero or one and is efficient to compute.

#### 3.8.4.2  Power Spectral Analysis

The measurement of power spectral density is the most obvious application of the Fourier transform. Put simply, a data sequence from the surface is Fourier transformed and a power spectrum obtained by squaring each term independently. This gives the periodogram, which is an estimate of the power spectrum.

The periodogram is a rather unstable estimate and various smoothing methods may be applied.

#### 3.8.4.3  Correlation

The calculation of auto and cross-correlation functions may be undertaken by transform methods since the correlation function and power spectrum form a transform pair.

Autocorrelation is performed by transforming the surface data, squaring term by term and then retransforming back to the spatial domain. For cross-correlation both data sequences must be transformed and a cross-power spectrum obtained so that more computational effort is needed. However, for real data, the system of performing two transforms at once using the FFT can reduce the total time taken.

The transform technique of obtaining correlation functions always gives all possible lag values up to the total data points, whereas, particularly with autocorrelation, often only lags up to a small proportion of the data sequence length are needed. Under these circumstances it may be computationally quicker to use traditional lag calculation for the autocorrelation function, especially if the total data length is fairly small.

The DFT operates by imposing a periodicity of wavelength equal to the data sequence length in that the Fourier integral is defined for an infinite waveform made up of repeats of the actual data. In forming the correlation function it is these constructed infinite sequences that are, in effect, lagged so that points lagged beyond the end of the sequence do not “drop-off” but match up with points on the start of the first repeat. This looks like a circular correlation where the data is placed in a complete, closed circle and the lag produced by rotating this.

This effect can be useful in roundness measurement, for instance when comparing two randomly oriented profiles. However, it can lead to erroneous values in linear situations. To overcome this, a sequence of zero-value points of length up to at least the maximum desired lag should be appended to the data sequence. The action of these is made clear in Figure 3.45.

Figure 3.45   Circular correlation.

#### 3.8.4.4  Other Convolutions

The main use of convolution is covered under filter and correlation. The heading here just stresses that any convolution in space may be handled as multiplication in frequency. Beware of the circular convolution of the DFT “described” above.

#### 3.8.4.5  Interpolation

It is easily possible to produce an interpolation of a data sequence in which the harmonic quality of the original data is retained in the expanded sequence.

The discrete transform pair always has the same number of points in the sequence in either domain. Thus a short data sequence may be transferred and the frequency domain sequence expanded by adding zero-value elements at the high-frequency position (the center, since the transform is symmetrical). Inverse transformation will produce a longer data sequence with harmonic interpolation, since no nonzero-frequency components have been added.

#### 3.8.4.6  Other Analysis in Roughness

Various other signal-processing techniques use Fourier transforms. Discrete transforms can be used to produce a form of discrete Laplace transform analysis.

Another possibility is Cepstrum analysis. The Cepstrum is defined as the transform of the logarithm of the power spectrum. As such it highlights periodicities of the power spectrum, which are, of course, harmonic sequences in the original data. Should the power spectrum of the data consist of two or more harmonic sequences multiplied together in the Cepstrum (log spectrum) they will be added. The power spectrum of this will clearly separate these different added harmonic sequences.

#### 3.8.4.7  Roundness Analysis

The Fourier transform will yield roundness reference lines from full circles since the first term of the Fourier series is the amplitude of the least-squares limaçon. Similarly approximation to reference ellipses, and so on, can be obtained. Because of the computation involved, it is unlikely that the DFT would be used just to determine the reference circle.

Direct evaluation of the harmonic series (since the circle is periodic) also seems useful for assessing the performance of bearings.

Perhaps more could be done using the DFT to produce circular cross-correlations, for instance to allow the comparison of cam-forms with stored masters without the need for accurate orientation. Specific applications of Fourier transforms will be dealt with in many places within this book.

#### 3.9.1  General

The application of random process analysis has had a profound influence on the understanding of surface texture from the point of view of generation and of function. Random process analysis is concerned with the Fourier transform and its inverse. However, the success of the Fourier transform has been so great that it has brought on an appetite for more and better transforms. Do they exist?

Are there transforms which have simpler or faster properties? Does another transform tell more about the surface?

These questions have been raised many times in recent years and have produced a plethora of transforms in the literature almost equivalent to the “parameter rash” reported earlier [45].

While there is an undenied excitement in introducing a new transform into the field, care has to be exercised that it does not produce an over-reaction. What usually happens is that a transform may give a benefit in one area, but not overall.

The transforms introduced fall into two main categories:

1. Faster and simpler ways of producing comparable results

In the former class there are the Walsh, Hartley, Hadamard, BIFORE, and Haar transforms (all orthogonal transforms) and in the latter case the Wigner, ambiguity, wavelet, and Gabor transforms (these being space–frequency transforms).

Another factor that enters into the question of the use of transforms different from the Fourier is that the old criteria of calculation are no longer as pressing as they once were, for example storage size and speed of implementation. Nowadays storage is no problem and the speed of modern, even small, computers is such that real-time operations are possible, so the benefit obtained by using other transforms has to be significant to warrant their use. Also there is the point that the Fourier transform is very well known and understood and is standard in many existing instruments. To displace it would pose many educational and usage problems.

The basic question in surface metrology is this: how relevant to surface behavior are new transforms that have usually been devised because of new demands in the subjects of communications and coding? Fundamentally the new transforms are devised for temporal properties. They all need to be transformed in one way or another to be useful in surface metrology. A case in point could be the curvature of an areal summit. It is hard to see how this could be important in communications yet it is very important in tribology. Unless or until transforms are developed which deal with the vertical (or top-down) properties of spatial features directly and not via the Fourier transform and the multinormal distribution, it seems that it is probably best to stick to the Fourier approach. However, because of their existence in the literature, mention of them will be made here. Some transforms are very close to the Fourier transform, for example the Hartley transform [46].

#### 3.9.2  Hartley Transform

The Hartley transform H(u, v) is the difference of the real and imaginary parts of the Fourier transform F(u, v). Thus if

3.324 $F ( u , v ) = F real ( u , v ) + F imag ( u , v )$

then

3.325 $H ( u , v ) = F rea 1 ( u , v ) - F imag ( u , v ) j F imag ( u , v ) = 1 2 [ F ( u , v ) - F ( - u , - v ) ] .$

A coherently illuminated object in the front focal plane of an optical system produces the Fourier transform in the back focal plane. Unfortunately this is not the case in the Hartley transform, but there are some features of the Fourier transform that are inconvenient. For example, the phase of the field contains substantial information but available optical sensing elements respond to intensity and not to phase. A photographic record of a Fourier transform plane abandons phase information which can be very important. On the other hand, the direct recording of the squared modulus of the Hartley transform would provide much more information. In cases where the transform does not go negative |H(u,v)|2 suffices to recover f(x, y) in full. In other cases, where the Hartley transform does go negative, knowledge of |H(u,v)| 2 does not by itself determine the sign; however, sign ambiguity is a much less serious defect than the complete absence of phase knowledge and sign knowledge can often be inferred.

The Hartley transform does not inherently use complex notation:

3.326 $H ( u ) ∫ 0 L f ( x ) cas ( k u x ) d x ,$

where the symbol “cas” means cosine and sine:

3.327 $cas θ = cos θ + sin θ .$

The discrete 1D Hartley transform offers speed gains over the FFT for numerical spectral analysis and therefore it has great potential in communications, but there has been difficulty in carrying the advantage over to more than one dimension. This is because, whereas exp[—jk(ux + vy)] is easily separable into two factors, the function cas[k(ux + vy)] is not. Until this becomes straightforward it seems that the Hartley transform will be of use only in profile evaluation [35].

#### 3.9.3  Walsh Functions–Square Wave Functions–Hadamard

The most popular of the square wave functions is the Walsh function. It is similar to the Fourier transform except that the sinusoidal function is replaced by an on-off signal, that is a pseudo-square-wave signal.

Because it is not a sinusoidal form the frequency axis has been renamed “sequency”. This is a measure of “half the number of zero crossings per unit time or length”, so any signal can be represented by a Walsh spectrum as opposed to a Fourier spectrum. Thus

3.328 $f ( θ ) = c 0 + c 1 Wal ( 1 , θ ) + c 2 Wal ( 2 , θ ) = a 0 + a 1 Cal ( 1 , θ ) + a 2 Cal ( 2 , θ ) + b 1 Sal ( 1 , θ ) + b 2 Sal ( 2 , θ )$

where Cal = cos Walsh and Sal = sin Walsh.

The Walsh functions, originally defined in 1923, take only the values 1, 0,–1 and consequently they are very fast to compute.

If the Walsh function is factorized using the Cooley–Tukey FFT routine then the speed advantage over the FFT is considerable because the usual routine of complex multiplication is replaced by one of multiplying simply by ±1 (similar to the Stieltjes technique for correlation).

Which of the two methods, Fourier or Walsh, is best? The answer is that it depends on the use. Common-sense metrology would say if the surface geometry is angular or discontinuous in any way, use the Walsh; if it is continuous and smooth use the Fourier, the argument being that the metrology should follow the function wherever possible. This is illustrated by the fact that a square wave has a Fourier transform of an infinite number of sinusoidal components, and one Walsh component. A sine wave has infinite Walsh coefficients, yet only one Fourier coefficient [47].

It is also beneficial that the surface can be reconstituted from the Walsh spectrum albeit with many coefficients.

The whole of the Walsh philosophy is discontinuous rather than continuous and as such its operation and properties have been examined by Smith and Walmsley [47] using dyatic calculus. In this Nayak’s results have been shown to have an equivalent derived via the Walsh spectrum rather than the Fourier. Because the Walsh spectra Pw are most suited to discrete operation they appear in the literature in this form:

$F w = ( θ ) - 1 N ∑ n = 0 N - 1 f ( n ) W a 1 ( n θ ) θ = 0 , 1 , 2 , ... , N - 1$

and the Walsh power spectrum by

3.329 $P w ( 0 ) = F w 2 ( 0 ) P w ( θ ) = F w 2 ( 0 ) ( 2 θ - 1 ) + F w 2 ( 2 θ ) ... θ = 1 , ... , N / 2 - 1 P w N / 2 = F w 2 ( N - 1 )$

Unlike the power spectra of the Fourier transform the Walsh transform is not invariant to circular time shifts. This has led to the development of phase-invariant square wave transforms, in particular the Walsh phase-shift-invariant transform which is obtained from the autocorrelation function by means of a series of translation matrices. This has the effect of summing and averaging all the possible Walsh transforms of the time-shifted versions of the data—not very elegant but effective!

Another possibility is the BIFORE transformation Pb, which is obtained from the Walsh transforms (BIFORE = binary Fourier representation):

3.330 $P b ( 0 ) = P w ( 0 ) P b ( θ ) = N 2 ( θ + 1 ) ∑ N = 1 N / 2 P w ( 2 θ n - 2 ( θ - 1 ) ) θ = 1 , 2 , ... , log N .$

This has only 1 + log2 N spectral values spaced logarithmically in sequence.

Mulvaney [48] compared some of these transforms and came to the general conclusion that, after all, the Fourier transform is best for surface characterization. Although the orthogonal binary transforms such as the Walsh, Haar, etc., were faster they did not give spectral estimates which converged rapidly as a function of record length. The phase-invariant ones, Fourier and BIFORE, were slower to evaluate yet converged quickly.

A fundamental problem keeps on emerging when considering the use of binary-type transforms for use in surface analysis and this is the difficulty in identifying strongly periodic continuous characteristics such as may well result from a surface analysis. Transforms like the Walsh do not respond well to such features; other ones are the Haar. Consequently many features that reflect machine tool misbehavior, such as chatter, would be missed at worst or degenerated at best. This seriously limits the use of such transforms. Other orthogonal transforms such as the Hadamard, like the Hartley, have been used in optics and in general signal processing. The Hadamard, which is another binary-type transform based on the Walsh functions, gives an order of magnitude increase in speed over the Fourier transform yet inherits the difficulties of application of the Walsh for basically continuous functions. In the past few years another variant of the Walsh-like transforms has emerged. This is called the wavelet transform [49]. This, like the others above, is binary and is especially useful for the compression of data containing many edges, such as fingerprints and in some cases TV images. It is not essentially suitable for surface analysis, which is basically continuous, but it may be that with the growing importance of fractal-like characteristics the wavelet transform may find a place.

It should be noted that these classes of transform based on the binary signal were originally devised to reduce the bandwidth of a signal being transmitted from one place to another, so they can hardly be expected to provide substantial advantages in their application to surface metrology.

#### 3.10.1  General

There is little doubt that the most important tool of characterization has probably been that of random process analysis in terms of spectral analysis and the correlation functions. Random process analysis is a very powerful tool for extracting the most significant information from a very noisy background. So far it has been shown that it can be used for characterizing both deterministic and random surfaces. It will be shown in Chapters 6 and 7 how it can be used with great effect to give information on manufacture and performance. Some examples of this have already been hinted at.

However, there is scope for improvement. Random process analysis, as used here, is best fitted to examine stationary random surfaces and their statistical properties. It tends to bring out average statistics. But in many cases the very thing that causes failure of a component is the non-uniformity of a part, not its uniformity—corrosion patches and fatigue cracks are just two functional cases. Tool wear is one example in manufacture, neither correlation nor spectra are well placed to look for non-stationarity because they are basically simple averages in space or frequency and, as a result, tend to integrate out changes from position to position or frequency to frequency. Ideally a function should be used that has the equal capability of characterizing random and periodic signals and their non-stationary embodiments. This implies using a function which has two arguments one spatial and the other in frequency. There is a class of function which fits this description called space–frequency functions. Note, not time-frequency functions because surface metrology is in space not time. Among these there are two obvious ones: one is called the Wigner distribution, which derives from quantum mechanics [50] and the other is the ambiguity function used in radar [51]. Both have been mentioned earlier as adjuncts to the Fourier transform.

The ambiguity function A(χ, ϖ) and the Wigner distribution W(x, ω) are defined as follows

3.331 $A ( χ , ϖ ) = ∫ - ∞ ∞ f ( x + χ 2 ) f * ( x - χ 2 ) exp ( - j ϖ x ) d x = 1 2 π ∫ 0 ∞ F ( + ϖ 2 ) F * ( - ϖ 2 ) exp ( + j χ ) d ω$

and

3.332 $W ( x , ω ) = ∫ - ∞ ∞ f ( x + χ 2 ) f * ( x - χ 2 ) exp ( - j ω χ ) d χ = 1 2 π ∫ - ∞ ∞ F ( + ϖ 2 ) F * ( - ϖ 2 ) exp ( j ϖ x ) d ϖ .$

Notice that there are two formulae for each function one spatial and the other in frequency.

They are related by the following formulae

3.333 $A ( χ , ϖ ) = 1 2 π ∫ − ∞ ∞ ∫ − ∞ ∞ W ( x , ω ) exp ( - j ( ϖ x - ϖ χ ) ) d x d ω$
3.334 $W ( x , ω ) = 1 2 π ∫ - ∞ - ∞ ∫ ∞ ∞ A ( χ , ϖ ) exp ( j ( ϖ x - ω χ ) d χ d ϖ .$

Both the ambiguity function and the Wigner function are space–frequency functions occupying neither space nor frequency but both. They can be thought of as being in between the two. This makes them suitable for easy access to either domain.

The basic difference between the two is that the ambiguity function is basically a correlation (because it integrates over the variable x leaving shift χ constant) and the Wigner distribution is basically a convolution (because it retains x and integrates over χ which is in effect the dummy argument in the convolution). It is this latter property—the retention of the value of x, the position in space or position in frequency—which makes the Wigner distribution the most useful of the two in engineering, although the ambiguity function has uses in optics.

Some properties will be given for comparison with the Fourier transform without proof.

#### 3.10.2  Ambiguity Function

This is

3.335 $A f ( χ , ϖ ) = ∫ - ∞ ∞ f ( x + χ 2 ) f * ( x - χ 2 ) exp ( - j ϖ x ) d x$

Figure 3.46   Dual relationship between space and frequency of the ambiguity function.

and

$A f ( χ , ϖ ) = 1 2 π ∫ 0 ∞ F ( + ϖ 2 ) F * ( - ϖ 2 ) exp ( + j ω χ ) d ω .$

Figure 3.46 summarizes the relationship.

Spatial shift

3.336 $f ( x - x 0 ) ↔ A A f ( χ , ϖ ) exp ( - j ϖ x 0 )$

where $↔ A$

denotes “ambiguity function of”.

Frequency shift

3.337 $f ( x ) exp ( j ω 0 x ) ↔ A A f ( χ , ϖ ) exp ( j ω 0 x ) .$

Convolution

If g(x) = f(x)*h(x), then

3.338 $A g ( ϖ , χ ) = ∫ - ∞ ∞ A f ( χ , ϖ ) A h ( x - χ , ϖ ) d x .$

Modulation

If g(x) = f(x)h(x), then

3.339 $A g ( χ , ϖ ) = 1 2 π ∫ - ∞ ∞ A f ( χ , ϖ ) A h ( χ , ϖ - ω ) d ω .$

#### 3.10.3  Discrete Ambiguity Function (DAF)

For the discrete form of a complex signal f(n), = 2, 1, 0, 1, the discrete ambiguity function (DAF) Af (v,η) is

3.340 $A f ( η , v ) = ∑ n = - ∞ ∞ 2 f [ n + η ] f * [ n - η ] exp ( - j 2 v η )$

or

3.341 $A f ( η , v ) = 1 π ∫ - n n F ( θ + v ) F * ( θ - v ) exp ( j 2 θ η ) d θ$

Note that the function is discrete in the spatial domain and continuous in the frequency domain. Note also the frequency doubling.

As in the continuous case

3.342 $f [ n ] ↔ A A f [ n , v ]$

where A is the ambiguity transformation.

#### 3.10.3.1  Discrete Ambiguity Function Computation

Computation (where n is restricted to 0, N—1, and N is even)

1. Compute r[n, η] = 2f[n + η]f*[n–η] for n = 0, 1,..., N–1, η = –M,..., 0,..., M where M = –N/2–
2. Using the DFT compute
3.343 $∑ n = 0 N - 1 r [ n , η ] exp ( - j k π n N )$
for η = –M,..., 0,..., M
3. Af (η, ϖ) is then obtained for ϖ = kπN where k = 0, 1,..., N–1 and η = –M,..., 0,..., M. Because Af (η, ϖ) is restricted in η and periodic in ϖ, so Af [m, fk] = Af [η, kπN] is obtained for k, η = –2, –1, 0, 1, 2; for odd N it is similar provided that it can be represented by N = N 1,..., N 2,..., Nk Δ.

The point to notice about the discrete ambiguity function is that the variable is the spatial x and spectral ω shown in Figure 3.47. It covers an area in x, ω space of area χϖ; for this reason features of the order χϖ will be highlighted in the space–frequency domain—hence the comment that the ambiguity function is good for identifying the presence or absence of features (χ, ω) However, it is not good for positioning absolutely in space or frequency; in other words, it is not very useful for finding the non-stationarity of a signal but good for detecting it!.

The box χ, ϖ scanning the space–frequency domain acts as a correlation with χ, ϖ being fixed for all x, and ω and acting as space–frequency lags. The instantaneous values extend by x’±χ/2 and ω'±ϖ/2, where x’, ω are the positions at the center of the domain in x’,ω of extent χ and ϖ, as seen in Figure 3.32.

Figure 3.47   Movement of variables in ambiguity space.

#### 3.10.4  Wigner Distribution Function W(x, ω)

3.344 $W f ( x , ω ) = ∫ - ∞ ∞ f ( x + χ 2 ) f * ( x - χ 2 ) exp ( - j ω χ ) d χ = 1 2 π ∫ - ∞ ∞ F ( ( ω + ϖ 2 ) F * ( ( ω - ϖ 2 ) exp ( j ϖ x ) d ϖ .$

#### 3.10.4.1  Properties

3.345 $f ( x ) ↔ W W f ( x , ω ) .$

W is Wigner transformation.

Symmetry

3.346 $W f ( x , ω ) = W f ( x , - ω ) .$

Realness

For any complex-valued signal the Wigner distribution is always real:

3.347 $W f ( x , ω ) = ( W f ( x , ω ) ) * .$

Spatial shift

3.348 $f ( x + x 0 ) ↔ W W f ( x , + x 0 , ω ) .$

Frequency shift

3.349 $f ( x ) exp ( j ω 0 , x ) ↔ W W f ( x , ω - ω 0 ) .$

Spatial energy

3.350 $1 2 π ∫ - ∞ ∞ W ( x , ω ) d ω = | f ( x ) | 2$

Frequency energy

3.351 $∫ - ∞ ∞ W f ( x , ω ) d x = | F ( ω ) | 2$

Total energy

3.352 $1 2 π ∫ - ∞ ∞ ∫ - ∞ ∞ W f ( x , ω ) d X d ω = ‖ f ( x ) ‖ 2$

The integral of the Wigner distribution over the whole plane (xω) is the total energy of the signal f(x).

Convolution

If g(x) = f(x)*h(x), then

3.353 $W g ( x , ω ) = ∫ - ∞ ∞ W f ( χ , ω ) W h ( x - χ , ω ) d χ .$

Modulation

If g(x) = f(x)*h(x) then

3.354 $W g ( x , ω ) = 1 2 π ∫ - ∞ ∞ W f ( x , ϖ ) W m ( x , ω - ϖ ) d ϖ$

#### 3.10.4.2  Analytic Signals

Note:

For real-valued signals mainly found in surface texture, in order for the Wigner distribution to be applied the signal should be converted to the analytic form (whose frequency content = 0, ω < 0). Thus, taking the Hilbert transform in order to get the analytic signal yields

3.355 $f ^ ( x ) = ∫ - ∞ ∞ f ( χ ) x - χ d χ$

so that fa = f(x) + jf(x). Rendering the signal analytic reduces the risk of aliasing because it reduces the artifact of frequencies less than zero.

It is possible that both the Wigner and the ambiguity functions have many useful properties in engineering (see Refs. [52, 53]). In this context it will suffice to derive the digital form and the moments of the Wigner function and then to consider surface data z(n).

In discrete form

3.356 $W f ( n , θ ) = ∑ η = - ∞ ∞ 2 f ( n + η ) f * ( n - η ) exp ( - j 2 θ η )$

where F(v) is the DFT of f(n) and

$f a ( n ) = f ( n ) + j f ^ ( n )$

where $f ^ ( n )$

is the discrete Hilbert transform of f[n], Hd f(n):

3.357 $discrete f ( n ) = ( H d f ) [ n ] = ∑ η = - ∞ ∞ f [ η ] 2 π s in 2 ( π ( n - η ) / 2 ) ( n - η ) .$

#### 3.10.4.3  Moments

The discrete moments are important in surface analysis: the zeroth in frequency is

3.358 $P f ( n ) = 1 2 π ∫ - π / 2 π / 2 W f ( n , θ ) d θ .$

Because the argument in the Wigner distribution has two variables, moments exist in both space and frequency as well as global moments involving both.

The first order, Γ f (n), is

3.359 $Γ f ( n ) = 1 2 arg ( ∫ - π / 2 π / 2 exp ( j 2 θ ) W f ( n , θ ) d θ ) .$

Then

3.360 $Γ f ( n ) = 1 2 arg ( f ( n + 1 ) f * ( n - 1 ) ) .$

If f(n) = v[n] exp(jϕ(n)), then

3.361 $Γ f ( n ) = φ ( n + 1 ) - φ ( n - 1 ) 2 mod π$

that is, the first-order moment in frequency represents the instantaneous frequency, being the differential of the phase ϕ over the region.

The second-order moment in frequency, M f (n), is

3.362 $M f [ n ] = ( P f ( n ) - | 1 2 π ∫ - π / 2 π / 2 exp ( j 2 θ ) W f ( n , θ ) d θ | / ( P f ( n ) - | 1 2 π ∫ - π / 2 π / 2 exp ( j 2 θ ) W f ( n , θ ) d θ | ) )$

Mf (n) can be expressed as

3.363 $M f ( n ) = | f ( n ) | 2 - | f ( n + 1 ) f * ( n - 1 ) | | f ( n ) | 2 + | f ( n + 1 ) f * ( n - 1 ) | .$

The zeroth-order moment in space, Pf (θ) is

3.364 $P f ( θ ) = ∑ n = - ∞ ∞ W f ( n , θ ) = | F ( θ ) | 2 + | F ( θ + π ) | 2 .$

The first-order moment Xf (θ) is

3.365 $X f ( θ ) = ∑ n = - ∞ ∞ W f ( n , θ ) n / P f ( θ ) .$

For analytic signals Xf can be expressed as

3.366 $X f ( θ ) = - Im ( d ln F ( θ ) ) d θ = - Im ( F ′ ( θ ) F ( θ ) )$

and similarly the second order as

3.367 $M f ( θ ) = - 1 2 Re d d θ ( F ′ ( θ ) F ( θ ) )$

(see Figure 3.48).

The Wigner distribution (Figure 3.48) centers on specific x and ω and scans via χ and ϖ over all the area. It is in effect a 2D convolution with χ and ϖ as dummy variables. The 2D integral gives complete information about the spot x’, ω'.

Figure 3.48   Movement of variables in Wigner space.

It was shown in Chapter 2 how the moments can be used to extract information about amplitude modulation, frequency modulation and phase modulation effects as well as chirp. Also, the spatial moments pick out the position and width of pulses. The dual character of this function enables the extraction of the most relevant information from the correlation technique and the spectral technique with the benefit of having to use only one function.

#### 3.10.4.4  Digital Wigner Distribution Applied to Surfaces

For practical applications the digital form of the Wigner distribution is used. For this the signal is f(n) where n = –2, –1, 0, 1, 2, and W(x, ω) becomes the discrete Wigner distribution (DWD) W(n,θ) where

3.368 $W ( n , θ ) = ∑ ζ = - ∞ ∞ 2 z [ n + ζ ] z * [ n - ζ ] exp ( - 2 j θ ζ )$

or equivalently

$W ( n θ ) = 1 n ∫ F ( θ + v ) F * ( θ - v ) exp ( 2 j v n ) d v$

where F(v) is the DFT of z[n]. The digital form of the moments become zeroth order Zeroth order P n

3.369 $Zeroth order P ( n ) = 1 2 π ∫ - ∞ ∞ W ( n , θ ) d θ = | z ( n ) | 2 .$

The first-order moment Γ(n) is

3.370 $Γ ( n ) = 1 2 arg ( ∫ - π / 2 π / 2 exp ( 2 j θ ) W ( n , θ ) ) d θ = 1 2 arg { z ( n + 1 ) Z * ( n - 1 ) }$

This is the instantaneous frequency; for example, for z(n) = v(n)exp(jϕn))

3.371 $Γ [ n ] = φ ( n + 1 ) - φ ( n - 1 ) 2 mod π = φ ′ ( n )$

and the second-order moment is or

or

3.372 $m ( n ) = p ( n ) − | ( 1 / 2 π ) ∫ - π 2 π 2 exp ( 2 j θ ) W ( n , θ ) d θ | p ( n ) + | ( 1 / 2 π ) ∫ - π 2 π 2 exp ( 2 j θ ) W ( n , θ ) d θ |$
3.373 $m [ n ] = | f ( n ) | 2 - | f ( n + 1 ) f * ( n - 1 ) | | f ( n ) | 2 + | f ( n + 1 ) f * ( n - 1 ) | .$

Because the Wigner function has been applied usually to roundness the data points are restricted to (0→N–1) where N is a binary number (e.g., 256):

3.374 $W ( n , θ ) = ∑ m = - ∞ ∞ 2 z ( n + m ) z * ( n - m ) exp ( 2 j θ m ) .$

Letting M = N/2–1

$W ( n , θ ) = ∑ m = 0 N - 1 2 z ( n + m - N ) z * ( n - m + M ) exp [ - 2 j ( m - M ) θ ] .$

Letting θ = kπ/N,

3.376 $W ( n , k π N ) = 1 N ∑ m = 0 N - 1 z ( n + m - M ) z * ( n - m + M ) exp ( - 2 j m k π N ) exp ( 2 j M k π N ) N .$

Therefore, to compute the WD it is only necessary to compute:

1. 2z[n + m M] z*[nm + M] where n, m = 0, 1, 2,..., N–1.
2.     (2z[n + mM]z*[n̶m + M])expj2mkπ/N) where n, k = 0, 1, 2,..., N−1.
3. exp(j2mkπ/N) for n, k = 0, 1, 2,..., N−1.

Equation 3.385 represents the WD computed for n, k = 0, 1, 2,...,N–1. The local moments in frequency become

3.377
3.378

How these moments can be applied is shown in an example in Chapter 6, in the detectionof waviness in the form of chatter.

#### 3.10.4.5  Examples of Wigner Distribution: Application to Signals—Waviness *

It was suggested earlier that the Wigner function might be useful in characterizing waviness. This is because the moments of the function clearly reveal the type of modulation that might be causing the waviness even if it is non-linear, where ordinary spectral decomposition breaks down.

According to the theory given earlier, the local moments in frequency are

3.379

where the right-hand sides of the equations are for a signal z(x) = a(x)exp(jϕ(x)).

For a frequency-modulated type of surface or a non-linear chirp signal it seems evident that the first moment in frequency would reveal the most information because this moment is sensitive to the phase, whereas the second moment responds to the envelope. Hence the amplitude of the envelope is most likely to be seen here, so the figures in Figure 3.49 would be easily separated. As an example, consider a signal with a quadrate change in frequency with distance—commonly called a chirp signal, z(x) = a exp(j(α/2)x 2)

3.380

For a frequency-modulated signal z(x) = a exp[j(ω0 x + ϕ0 + bsinω mx + ϕ m ))] the moments are

3.381

which are obviously very distinguishable from each other.

For a signal with amplitude modulation z(x) = a(x) exp(jφ(x)x)

3.382

Figure 3.49   The zeroth and first moments of the Wigner distribution for a chirp signal and a frequency-modulated signal.

Hence it can be seen that use of the local frequency moments can isolate the various forms that envelopes can take. This is most likely to be important in functional cases.

#### 3.10.5  Comparison of the Fourier Transform, the Ambiguity Function, and the Wigner Distribution

A comparison of the properties of three transforms has been given in Table 3.6. All three are based on the Fourier kernel and so can be seen to have similar properties. However, whereas the Fourier transform is in either the frequency domain or the space domain the ambiguity and Wigner functions are in both.

It is useful to think of the space–frequency domain as a “music score”. The only difference is that the space–frequency “score” shows the intensity as well as the pitch and spacing of the notes.

The ambiguity function as seen in Figure 3.47 could be considered to be a way of scanning across the whole score for a spectral pattern or a time/space pattern within a given size space–frequency frame. The Wigner function on the other hand tends to give a space–frequency “feel” for the content of the score centered on a specific x and ω as seen in Figure 3.48. The common feature of both is the “score” itself. The Fourier transform only gives the axes! The Wigner is easily represented as a real function because it can be decomposed into the sum of two integrals on the real axis (and so conserves energy as seen below) rather than be represented as a complex function as is the ambiguity function.

There are some interesting properties of these space frequency functions. For example in principle it should be possible to reconstitute the original profile signal from the ambiguity function.

Given

$A f ( χ , ϖ ) = ∫ - ∞ ∞ f ( x + χ 2 ) f * ( x - χ 2 ) exp ( - j ϖ x ) d x$

then

3.383 $1 2 π ∫ - ∞ ∞ A f ( χ , ϖ ) exp ( j ϖ x ) d ϖ = f ( x + χ 2 ) f * ( x - χ 2 )$

and letting x = χ/2

So,

3.384

which is the profile, apart from a constant which is not relevant for the roughness value. The question is how to measure the ambiguity function in the first place?

The ambiguity function is not very useful for differentiating between stationary profiles having high frequencies from those having low because the energy is always concentrated at the origin, but are effective in spotting instantaneous frequency changes but not where they occur, i.e.,

3.385 $| A f ( χ , ϖ ) | ≤ ∫ - ∞ ∞ [ f ( x ) ] 2 d x = A f ( 0 , 0 )$

from the Schwarz inequality

3.386

This is completely different from the energy picture given by the Wigner distribution which can be regarded as a measure of how the energy is distributed over the space–frequency domain, i.e.,

3.387

This indicates that the integral of the Wigner distribution over x at a given frequency gives the energy spectrum at that frequency and that the integral over frequency for a give position gives the energy at that position. Integrating over both gives the total energy i.e., the total value of the global power spectrum. From these comments it seems that the Wigner on balance may be the most useful function overall but the Fourier transform has the advantage that it can be measured optically by diffraction.

#### 3.10.6  Gabor Transform

The key to the Gabor transform is its discrete form. It is defined as

3.388 $z ( i ) = ∑ m = 0 ∞ ∑ n = 0 N - 1 C m , n h m , n ( i )$

where

3.389 $C m n = ∑ i = 0 ∞ z ( i ) γ m , n * ( i )$

where z(i) is the discrete space or surface signal, Cm,n is a Gabor coefficient, hmn (i) is the basis Gaussian function used in Gabor transformations, γ mn ,(i) is called the biorthogonal auxiliary function, γ * is the conjugate and where

3.390 $h m n ( i ) = h ( i - m Δ M ) W L n Δ N i γ m , n ( i ) = γ ( i - m Δ M ) W L n Δ N i$

Where ΔM, ΔN are sample intervals in the space and the frequency domain.

In general a Gabor transform is not unique. Depending on the selections of h(i) and γ(i) other transform pairs are possible.

The orthogonal-like Gabor transform maps a space (surface) signal which is composed of Gaussian functions separated in space into a joint space–frequency function with complex coefficients. These coefficients represent the amplitude of a space-shifted frequency-modulated Gaussian function centered at x 1 f 1, x 2 f 2, etc. for this space waveform. The Gabor coefficients can be used to reconstruct the original space series or to calculate power spectral density using the pseudo Wigner–Ville distribution. This is given by

3.391 $Winger − Ville ( i , k ) = 2 exp [ - ( i - m Δ M o 2 ) 2 + ( - 2 π 0 L ) 2 ( k - n Δ N ) ] .$

This equation can be precomputed and stored in a look-up table [54].

After the Gabor coefficients are obtained the Wigner–Ville distribution is applied to each of the time-shifted and frequency-modulated Gaussian functions.

One of the most promising applications of the Gabor transformation is to non-stationary surface signals, in much the same way as the Wigner function. However, it may be that it is better because it is virtually optimized in both space and frequency.

There is another possible benefit, which relates back to the discussions on waviness and filtering, and this concerns the Gaussian weighting function used to separate waviness. It may be that because the Gabor transform is also Gaussian, some practical benefit may be obtained.

So far the Gabor transform has not been used in surface analysis but it seems an obvious candidate for use

#### 3.10.7  Wavelets in Surface Metrology

The wavelet function is arbitrary but in order to be usable as a general operator the various scales of size have to be orthogonal—much as a sine wave and its harmonics are in the Fourier Series [58].

Wavelets can be very different in shape. Usually the actual shape used depends on the application. This causes two problems. One is that it is beneficial to have some knowledge of the basis of the data being operated on. Another problem is that there is usually no fall back shape to act as a contingency plan should the wavelet be unsuitable.

The general form is given by the wavelet transform.

3.392 $W ( a , b ) = 1 a ∫ - ∞ ∞ h * ( t - b a ) y ( t ) d t$

where h* is the complex conjugate of the wavelet equation h(-).

This is in a general form for profile data. Notice that the form of h is arbitrary but the arguments a and b are not. The wavelet transform has a positional parameter b and a scale parameter a. It does not contain frequency as such. The argument tb/a or xb/a could be more general still if raised to index n, which more precisely indicates multiscale properties than just a, especially in terms of fractal powers. This index is used for flexibility in other distributions such as the Weibull distribution although it has to be said that the kernel is exponential. However, h(tb/a)n has fractal and wavelet echoes.

The digital equivalent of Equation 3.392 is

3.393 $W ( i T , a ) = T s 1 a ∑ n N h * [ ( n - i ) T s a ] y ( n T s )$

where N is the number of samples and Ts is the sample interval. This representation is the discretized continuous wavelet transform (DCGT). This can be represented as a finite impulse response filter (see Figure 3.50).

Here the Zs refer to the Z transform and not height.

Figure 3.50   Digitized continuous wavelet transform.

The methods of calculation of these transforms can be graphical. (See Handbook of Surface Metrology, 1st Edition [55]).

#### 3.11.1  Profile Generation

The generation of surfaces in a computer for use in functional experiments requires a knowledge of digital characterization. This is essential in order to match the discrete behavior of surfaces as measured with those generated artificially. The third link is to relate the digital representation to the theory. How to do this has been described earlier in the surface characterization section (e.g., Ref. [99] in Chapter 2). Unless the digital generation is tied up properly, a comparison between simulation experiments will be meaningless. Consequently, a small review of the issues is needed to tie up the possible approaches to the generations that have been carried out in the past. To do this, consider first a profile. In its simplest form three running ordinates would be needed (to enable peaks as well as slopes to be described).

The joint probability density of three random variables is p(z 1, z 2, z 3). This can be rewritten in terms of conditional densities to give

3.394 $p ( z 1 , z 2 , z 3 ) = p ( z 1 ) p ( z 2 / z 1 ) p ( z 3 / z 2 , z 1 ) .$

The first approach to generation would be to select one ordinate from a random number generator. This first number could be generated from any random number generator. This usually means that there is a uniform chance of any height occurring. There are very many possibilities for this starting number and it is not within the remit of this book to consider every possibility. However, an example might be useful.

If the set of generated numbers is to have zero mean value and zero correlation between the ordinates, then methods such as congruent and mid-square techniques could be used. Both are based on elementary number theory.

As an example, if the set of numbers are denoted by [zn ], n = 0, 1, 2, 3,…, the congruent method of generation is such that adjacent ordinates are related by the recursive relationship

• z n + 1 = az n + b | modulo T |

where b and T are prime numbers and T is matched to the word length of the computer, a and b can be chosen to give certain statistical properties; for example by selecting a, b and T the correlation between ordinates can be made equal to ρ s , where

3.395 $ρ s = 1 - 6 b s ( 1 - b s / T ) a s + e$

and where as = as (mod T), bs = (1 + a + a 2 + … + as –1) b(mod T), and |e| < as/T. If a–T 1/2 then ρ s~T –1/2. Correlated data can also be generated for the linear correlation case by the recursive relationship

3.396 $z n + 1 = ρ z n - 1 + 1 2 ( 1 - ρ 2 ) ( R n - 0 . 5 )$

where Rn is obtained from a set of independent random numbers, for example obtained by squaring the two center digits of a six-figure number—not elegant, but effective.

In the method above one ordinate is correlated to the next. This is tantamount to a first-order Markov process. Most surfaces are more complicated and are produced by multiple processes. As a result the generator has to be at least second order, which means that an ordinate is related to the next ordinate and the one next to that. Furthermore the distribution of heights is not uniform: at best it is Gaussian, at worst it is not symmetrical and may well be skewed. These issues will be considered shortly.

The most important point in surface generation is to decide upon the correlation between ordinates. This is not arbitrary but can be at least estimated from the tribology prediction routines described in Ref. [56] and anticipated from the discrete analysis of random surfaces. For example, if it is required that a surface model needs to have curvature and slope variances of σ2 c and σ 2 m to test a contact theory— then the required correlations between ordinates can be worked out from the previously determined formulae:

3.397

where h is the spacing between ordinates taken here to be a profile; ρ1 is the correlation between ordinates and ρ 2 is the correlation between alternate ordinates.

The described values of ρ1 and ρ2 from Equation 3.397 are given by

3.398

Other parameters can be demonstrated in a similar way.

Given these values of ρ1, ρ2 and h, the generation can take place. Assume for example that the distribution is Gaussian. This can be arranged by shaping the uniform distribution of height frequency according to the Gaussian function. Thus the first ordinate can be chosen from the distribution

3.399 $p ( z 1 ) = 1 2 π exp ( - z 1 2 2 ) .$

The second (z 2|z 1) represents the second ordinate given the first at height z 1:

3.400 $p ( z 2 | z 1 ) = 1 2 π ( 1 - ρ 1 2 ) exp ( ( - z 2 - ρ 1 z 1 ) 2 2 ( 1 - ρ 1 2 ) )$

and the third is the probability of getting a height z 3, given that the second took a value z 2 and the first z 1, that is p(z 3| z 2, z 1):

3.401 $p ( z 3 | z 2 , z 1 ) = 1 - ρ 1 2 2 π ( 1 - ρ 2 ) ( 1 + ρ 2 - 2 ρ 1 2 ) × exp { [ ( z 3 - ρ 1 ( 1 - ρ 2 ) z 2 ( 1 - ρ 1 2 ) + ( ρ 1 2 - ρ 2 ) ( 1 - ρ 1 2 ) z 1 ) / 2 ( 1 - ρ 2 ) ( 1 + ρ 2 ) ( 1 + ρ 2 - 2 ρ 1 2 ) ( 1 - ρ 1 2 ) ] 2 } .$

All of these equations follow directly from the multinormal Equation 2.65.

The set of equations above read as follows. The value of z 2 given a previous z 1 has a standard deviation of $1 − ρ 1 2$

and mean of ρ1 z 1 from Equation 3.392. The following z 3 distribution (Equation 3.401) has a mean value of

3.402 $ρ 1 ( 1 - ρ 2 ) ( 1 - ρ 1 2 ) z 2 - ρ 1 2 - ρ 2 ( 1 - ρ 1 2 ) z 1$

and a standard deviation of

$( 1 - ρ 2 ) ( 1 + ρ 2 - 2 ρ 1 2 ) ( 1 + ρ 2 ) ( 1 - ρ 1 2 ) .$

The mean values and standard deviations are found simply by factorizing out first z 1, then z 2 and finally z 3 from the multinormal distribution. So, in sequence, z 1 is picked at random from a distribution which is shaped to be Gaussian of mean level zero and standard deviation unity (actually R 2 q . This value of z 1 determines the mean from which the distribution of z is centered, z 2 is picked from such a Gaussian distribution at this mean having a standard deviation which is not unity but $1 − ρ 1 2$

; similarly for z 3. Having found these three ordinates, z 2 replaces z 1, z 3 replaces z 2, and a new z 3 is found. This repeated exercise generates the profile.

Alternatively, but equally, a best-fit least-squares routine can be used from the linear relationship:

3.403 $z i + A z i - 1 + B z i - 2 + ...+ ε 1$

where ε1 is the residual or error term. Minimizing ε2 1 and making $∑ ε i 2 = S$

$∂ S ∂ A , ∂ S ∂ B .. ... = 0$