Risk and risk assessment are part of our daily life. Both professionally and privately we make many decisions, and risk analysis is an integral part of this. Risk is often defined as the probability that an event will occur, multiplied by the consequences of the event. This definition may be rather simplistic because it undervalues the risk of events with very small probabilities but very large consequences (for example, pandemics or nuclear disasters); however, it does correctly identify the two main components of any risk evaluation: the probability of occurrence and the magnitude of the consequences.
Risk and risk assessment are part of our daily life. Both professionally and privately we make many decisions, and risk analysis is an integral part of this. Risk is often defined as the probability that an event will occur, multiplied by the consequences of the event. This definition may be rather simplistic because it undervalues the risk of events with very small probabilities but very large consequences (for example, pandemics or nuclear disasters); however, it does correctly identify the two main components of any risk evaluation: the probability of occurrence and the magnitude of the consequences.
In this chapter the concept of probability is briefly explored. There are many excellent books on probability and the reader is referred to the reference list at the end of the chapter for further study. The book by Benjamin and Cornell (1970) is particularly recommended for its clarity and the way it eases the reader into the greater complexity of the subject. Although the book mainly uses examples from the field of civil engineering, it is accessible to readers from other disciplines as well. Another highly respected reference for probability theory, also from the engineering perspective is the book by Ang and Tang (2007).
In the following paragraphs the number of mathematical equations and specialized notations has been limited but many basic equations are unavoidable for a clear and correct description of the concepts of probability. Many equations are introduced stepbystep to ensure that mathematical difficulty does not inhibit any readers from understanding the basic concepts. Please note that a balance has been sought between mathematical rigor and easeofunderstanding. At the start of the chapter the concept of probability is introduced together with its basic axioms, which are the fundamental building blocks of probabilistic calculations. Subsequently, the frequentist interpretation of probability is elaborated, which allows us to estimate unknown probabilities from typically historical data. Further basic concepts including conditional probability are discussed next, ending with the presentation of Bayes’ rule for updating probabilities when new information becomes available. As Bayes’ rule has moved the discussion to the limits of the traditional frequentist interpretation, the more general Bayesian interpretation of probability is presented at the end of the chapter.
Many things are uncertain, such as the number of rainy days in London next year, the battery life of your cell phone, the oil price in two months’ time and the outcome of the next general election. The fact that these things are uncertain means that you are not able to give a definitive statement about their outcome. It does not mean, however, that you cannot have the impression that one outcome is more likely than another one. Knowing that there are about 160 rainy days in London every year, the statement that next year London will encounter only 30 rainy days seems very unlikely, while 156 rainy days intuitively feels much more reasonable. We consider the second statement to be more ‘probable’ than the first. These qualitative statements on the likelihood of events are often adequate for daily communication, however, as soon as we have to make important decisions we need something more tangible and specific to act upon – we need to measure it. The measurement of the likelihood of an event is called its probability.
Probability is often introduced through imaginary experiments: the rolling of a fair die or the flipping of an ideal coin. The concept is as follows: prior to rolling a die, the outcome χ of the throw can be either 1, 2, 3, 4, 5 or 6, without there being any preference between these possibilities. When we make a prior statement about the outcome of the throw – for example, ‘the die shows 5 dots’ – the probability of this statement being correct is intuitively equal to 1/6. This result can easily be explained considering that there are possible outcomes (values 1 to 6), which are each equally likely, while we consider only 1 outcome (χ = 5). The probability of the event χ = 5 is written mathematically as Equation 2.1, which reads as ‘the probability of the die showing 5 is 1/6’. Often the χ is omitted, and Equation 2.1 is written as P[5] = 1/6.
The example of the rolling of a die is intuitively clear to many; however, for other situations the calculation of the probability of an event may not be so straightforward and we should introduce the three basic ‘axioms’ or ‘rules’ of probability theory. These basic axioms are listed in Table 2.1 in text format and in mathematical form (Faber, 2012). The mathematical symbol ‘∪’ is used to indicate the union of two events, which is equivalent with the concept ‘OR’ in logic. The phrase ‘mutually exclusive events’ refers to events that cannot occur simultaneously, for example your body temperature cannot be both 37°C and 38°C at the same time.
Axiom I 
0 ≤ P[A] ≤ 1 
The probability of an event A is a number larger than or equal to 0, but less than or equal to 1. 
Axiom II 
P[B] = 1, when B is certain 
The probability of an event B which is certain is equal to 1. 
P[B] = 0, when B is never true 
The probability of an event B which is never true is equal to 0. 

Axiom III 
P[C ∪ D] = P[C] + P[D], when C and D are mutally exclusive 
The probability of the union of two events C and D which cannot happen simultaneously (i.e. C and D are mutually exclusive events) is the sum of their probabilities 
Note that our intuitive understanding of the probabilities associated with rolling a die fulfils these basic axioms:
The smooth intuitive acceptance of the basic axioms for our imaginary experiment of throwing a die is related to what is known as the frequentist interpretation of probability. In a large number of independent repetitions of an experiment (i.e. independent ‘trials’) the ratio of the number of ‘successes’ N_{A} for an event A to the total number of trials N converges to the probability P[A], when N is adequately large. In other words, N_{A} /N can be interpreted as an observed frequency of the event A, and the probability P[A] can be interpreted as the theoretical frequency of the event A which would be observed for large N (i.e. mathematically when N tends to infinity). Considering the event χ = 5 for the die, the number of throws where the die shows 5 will tend to get close to 1/6th of the total number of throws, provided we repeat a sufficient number of times.
The concept is demonstrated for the die in Figure 2.1: the die is rolled 1,000 times, and after every trial the number of eyes is observed, the value of N_{A} is updated, and the ratio N_{A} /N is calculated. The observed ratios N _{1}/N, N _{2}/N, … ,N _{6}/N are visualized in Figure 2.1. Note that for a small number of trials, large differences exist amongst the observed frequencies themselves and also between the observed frequencies and the theoretical frequency of 1/6. However, when the number of trials is much larger, the observed frequencies converge to the theoretical frequency.
Figure 2.1 Results of an experiment where a fair die is rolled 1,000 times: observed frequencies NA /N and (theoretical) longterm frequency of 1/6. Note how the observed frequencies tend to approach the longterm frequency as the number of trials N increases
In Figure 2.1, the effect of rolling the die one additional time is relatively large up to 100 trials, after which the observed frequencies change only slowly. Therefore, Figure 2.1 is better visualized differently, considering what we call a logarithmic scale for the number of trials N (see Figure 2.2). Now the effect of the number of trials on the observed frequencies can be more clearly observed for small N values.
Figure 2.2 Results of an experiment where a fair die is rolled 1,000 times: observed frequencies NA /N and (theoretical) longterm frequency of 1/6. The same data as in Figure 2.1, but a logarithmic scale for the number of trials N
The slower change of the observed frequencies in Figure 2.1 and Figure 2.2 for larger trial numbers is logical as the effect of one additional ‘4’ on the observed frequency becomes less and less pronounced when there have already been many trials. Another way of putting this is to say that the information gained by an additional roll of the die becomes less as the number of trials increases. This is similar to you obtaining more ‘information’ about the travel time to your new job on your first 5 trips, than on the 100th trip.
In the example of the fair die illustrated earlier we already knew the theoretical frequency or probability beforehand (i.e. 1/6). The experiment with the 1,000 rolls illustrated how the observed frequency tends towards the theoretical probability if the number of trials is large; however, for many practical situations related to risk we do not have a similar intuitively known theoretical probability.
Suppose you are working for a logistics company and, as part of a new marketing strategy, delivery within 24 hours will be guaranteed and for cases of late delivery, the delivery fee will be reimbursed twofold. During the feasibility study, you are asked to calculate the probability of late delivery. Clearly, if the probability of late delivery is high, the company is exposed to large possible losses and the marketing strategy may not be feasible. Unfortunately, you do not know the probability directly as in the case of the die discussed earlier. What you can do, however, is base your assessment on the frequentist interpretation of probability. Considering the log files of the deliveries made in the previous year, you obtain the delivery time T_{i} for every delivery made in the previous year, and for each of these deliveries you can determine whether T_{i} is larger than 24 hours. Denoting with N _{ Ti > 24} the number of deliveries where the delivery time exceeded 24 hours, and with N _{total} for the total number of deliveries in the previous year, the observed frequency for late delivery is given by Equation 2.2. The symbol ‘≈’ has been used to denote that the probability is estimated.
As a second example consider the presence of smoke alarms in dwellings. The smoke alarm is expected to result in an early warning to the occupants in case of fire. It is worth noting that the presence of smoke detectors is sometimes used as an argument in favour of reducing other legal requirements for fire safety, such as the number of escape routes. However, before accepting such conclusions we should investigate if we can assume that smoke detectors work perfectly, or whether we should take into account a probability of their failure. Consider the data in Table 2.2, listing the number of fires in Scotland in buildings where a smoke alarm was present, as reported by the Scottish fire brigades for the years 20034 to 20089. Note that many different causes may result in the malfunctioning of the smoke alarm – the removal of the batteries by the occupants being one of the most common causes.
Year 
Total number of recorded dwelling fires with smoke alarm present [–] 
Number of dwelling fires where the smoke alarm did raise the alarm [–] 
Number of dwelling fires where the smoke alarm did not raise the alarm [–] 
Success rate [–] 

2003–4 
4,463 
2,803 
1,660 
0.63 
2004–5 
4,141 
2,685 
1,456 
0.65 
2005–6 
4,331 
2,840 
1,491 
0.66 
2006–7 
4,296 
2,957 
1,339 
0.69 
2007–8 
4,230 
2,892 
1,338 
0.68 
2008–9 
4,325 
3,003 
1,322 
0.69 
total 2003–9 
25,786 
17,180 
8,606 
0.67 
Based on the data listed in Table 2.2, 8,606 failures have been recorded out of a total of 25,786 recorded dwelling fires where a smoke alarm was present. Based on the frequentist interpretation of probability our estimation of the probability of failure for a smoke alarm is 0.33. In accordance with axiom II and III, this probability of failure is the complement of the probability of success, i.e. 0.33 = 1 − 0.67. This can be explained in detail by considering that the smoke alarm will either give an alarm or fail to give an alarm in case of fire, and thus the statement that the smoke alarm will or will not work is a certain event with probability 1 (in accordance with axiom II), i.e. P[success ∪ fail] = 1. Furthermore, the events of failure and success are clearly mutually exclusive and thus P[success ∪ fail] = P[success] + P[fail] = 1, and therefore P[fail] = 1 − P[success].
When estimating probabilities we are slowly moving outside of the realm of probability theory, into the realm of statistics. By applying statistical theory, confidence intervals can be determined for these estimated probabilities, indicating the uncertainty related to our estimation. This is not discussed further.
In the previous sections the concept of probability has been briefly introduced. Using historical data we have even made an estimation of the probabilities of late delivery for a logistics company and of the probability of the malfunctioning of smoke detectors. When studying risk, however, there are some other probability concepts that are very valuable and should be introduced.
When considering risk, we are often interested in probabilities given an initiating event, such as the probability of severe injury given the occurrence of a car accident, the probability of flooding given the occurrence of a storm, or the probability of being infected by a disease given the occurrence of positive test results. These probabilities, which apply conditionally given the occurrence of the initiating event, are known as conditional probabilities. More specifically, the conditional probability P[AB] is defined as the probability that A has occurred given the knowledge that B has occurred.
This conditional probability P[AB] is generally not equal to the probability P[A ∩ B] that events A and B occur simultaneously. Note that the symbol ∩ is used to indicate the ‘intersection’ of events and is equivalent to ‘AND’ in logic. The important difference between P[AB] and P[A ∩ B] is easily understood using the following example.
A particular airport in the south of France has an excellent track record concerning ontime departures. This means that P[A] of a delayed departure is low. Furthermore, in the south of France it is exceptional to have severe winter conditions. This means that the probability P[B] to have severe winter conditions at the airport is low as well. As a consequence the probability (P[A ∩ B]) of having both delayed departure and winter conditions occurring simultaneously is also low; however, in cases of severe winter conditions the impact on departures can be significant. Almost all departures are delayed because the airfield has only one deicing machine to remove ice from the wings of airplanes before departure. This means that the conditional probability P[AB] of delayed departure given the occurrence of severe winter conditions is high.
The mathematical relationship between the conditional probability P[AB], the joint probability P[A ∩ B] and the probability P[B] can be derived in a formal way; however, for the sake of this introductory text, we will use a less formal and simple approach to determine the relationship (and get a feeling of its background) that applies the frequentist interpretation of probability.
Consider an imaginary experiment with a large number of trials. The total number of trials is N. The number of trials where event B occurs is denoted by N_{B} and the number of trials in which both events A and B occur simultaneously is denoted by N _{ A ∩ B }. Now only consider those trials where event B occurs, i.e. disregard situations where event B has not occurred and only consider situations conditional on the occurrence of event B. For this conditional situation count all trials where event A is observed and denote this number by N _{ AB }. Based on this counting procedure it is clear that:
Starting from Equation 2.3 we can divide both sides of the equation by N and multiply the right hand side by the factor N_{B} /N_{B} , which is equal to 1. This results in:
Note that N_{A ∩ B} /N is the observed frequency for the joint occurrence of events A and B, N _{ AB }/N_{B} is the observed frequency of the event A when we consider only trials where event B occurs (i.e. the frequency of the event A conditional on the occurrence of event B) and N_{B} /N is the observed frequency for event B. For a very large number of trials both N and N_{B} will tend to infinity and, in accordance with the frequentist interpretation of probability, Equation 2.4 can be written with probabilities:
Equation 2.5 can be rewritten as Equation 2.6, which is the classical and wellknown formula for the conditional probability P[AB].
These formulae are especially important for risk calculations because we are often not interested in the probability of a single event (for example, ‘fire ignition’) but of combined events (‘fire ignition’ and ‘failure to extinguish by the fire brigade’).
Conditional probabilities are also used to characterize the performance of tests, for example medical screenings or terrorist profiling. In the case of the medical test, the probability that the test result is positive given that the patient is indeed infected and the probability that the test result is positive given that the patient is not infected govern the extent to which conclusions can be drawn from a single test result. Similarly, for terrorist profiling the conditional probabilities (known as the operating characteristics of the test) determine whether there is indeed a considerable probability that a person has malicious intentions given that he fits the terrorist profile. We come back to this at the end of the chapter but it is important to stress that our intuitive sensing of probabilities tends to betray us in these situations.
A seemingly more complex but very important practical concept is the theorem of total probability. Consider a set of events B _{1}, B _{2}, … ,B_{n} which are mutually exclusive and collectively exhaustive. This means that always one (collectively exhaustive) and only one (mutually exclusive) of the events B_{i} will occur. This is mathematically equivalent to Equations 2.7 and 2.8, where the symbol Σ is the mathematical representation of the concept of ‘summation’. Equation 2.7 reads as ‘the sum of the probabilities of all the different events B_{i} is equal to 1’, and Equation 2.8 reads as ‘the probability of the simultaneous occurrence of the any two different events B_{i} and B_{j} is zero’.
Given this set of events B_{i} , the event A will always by definition coincide with one of the mutually exclusive and collectively exhaustive events B_{i} . Consequently, the event A can be split into a list of mutually exclusive subevents ‘A ∩ B_{i} ’, as indicated by Equation 2.9. Applying axiom III, the probability P[A] can be written as Equation 2.10, known as the theorem of total probability.
Considering Equation 2.5 derived for the joint occurrence of two events, the theorem of total probability can be written in the very practical form:
This Equation 2.11 has a lot of practical significance because one is, in general, interested in the total probability of an event A, although it is often much easier to determine probabilities P[AB_{i} ]. Consider, for example, the probability of casualties in the case of a fire in a concert hall. Most of the time, very few people are present in the concert hall (e.g. only staff), whilst during concerts the building can be packed with people. Clearly these are two very distinct scenarios to assess with respect to the evacuation of people. As the situations are mutually exclusive and collectively exhaustive, the overall probability of any casualties in the case of a fire can be determined by combining the conditional probabilities of any casualties when the fire occurs at the time of a concert with any casualties when the concert hall is only open to staff.
In many situations of risk management, backup systems will be present in order to avoid an adverse event that may result in a disaster. For example, in hospitals, backup power supply systems are present to avoid the scenario when a power shortage on the public grid results in a deactivation of lifesaving machines. Naturally even the backup system may potentially fail and consequently a nonzero probability is associated with the scenario that the emergency system fails to activate. Most importantly the primary system and the backup system should not be vulnerable to a commoncause failure, in other words the events ‘failure of the primary system’ and ‘failure of the backup system’ should be ‘independent’.
Two events are colloquially understood to be ‘independent’ when the occurrence of one event does not influence the occurrence of the other. Consequently, in probability theory two events A and B are independent if, and only if, the occurrence of one of these events does not affect the probability of occurrence of the other event. This is described mathematically by Equation 2.12. Combining Equation 2.12 with Equation 2.5 it can be shown that when event A and B are independent, the probability of their joint occurrence P[A ∩ B] is given by Equation 2.13.
The assumption of independence and the multiplication of simple probabilities to derive the probability of joint occurrence are very common in risk calculations. Importantly the simple rule of Equation 2.13 can be generalized to situations where events A, B…, K are considered. When the events can be assumed independent, the probability of joint occurrence is given by Equation 2.14, known as the ‘multiplication rule’.
The multiplication rule is often applied in risk calculations to derive the overall probability of failure of a system, considering the probability of failure of the main system and the failure probabilities of different independent safety systems (or ‘safety barriers’). Similarly, the multiplication rule is applied when calculating the probability of an event requiring the joint occurrence of multiple independent conditions. For example, the occurrence of a dust cloud explosion in a chemical facility requires the presence of a combustible atmosphere, an ignition source and the failure of suppression systems that may be installed (for example, a pressure venting system that releases the overpressure). In general the occurrence of these necessary conditions can be considered independent and consequently the probability P[explosion] of a dust cloud explosion is calculated as the probability P[cloud] of a combustible dust cloud, multiplied by the probability P[ignition] of an ignition source, multiplied by the probability that the suppression system is not active P[no suppression].
Bayes’ rule is a very powerful tool that allows updating probability estimates as new information or evidence becomes available. The widespread use of Bayes’ rule and the important results obtained by its application have contributed to it acquiring an almost mythic status. To introduce the concept with a simple example, consider a person going to the doctor for a medical checkup. During the medical checkup the doctor observes a number of symptoms of a serious but rare illness A. Can he conclude that the patient has actually contracted the disease? Now consider the following information. Based on statistical data, the disease is expected to be present in about 1 in 20,000 persons. Furthermore, for people who have the disease, the symptoms observed by our doctor are present in 99% of the cases, whilst the same symptoms would be observed in only 0.1% of people who do not have the disease. Clearly, the symptoms are a very strong indication of the illness A and intuitively many people are inclined to conclude that the patient is infected when the symptoms are observed and medical treatment should therefore be started as soon as possible. However, is this a reasonable conclusion or should further tests be performed before starting medical treatment with possibly negative sideeffects?
Now consider what we actually know. The initial or ‘prior’ probability P[A] of a random person having the disease A is 1/20,000, whilst the probability P[BA] of the symptoms given infection is 0.99 and the probability P[B $\overline{A}$ ] of symptoms given noinfection is 0.001 (note the notation $\overline{A}$ used to indicate the event ‘no infection’). Can we calculate a new updated or ‘posterior’ probability P[AB] for the person being infected, conditional on the fact that symptoms have been observed?
The answer is yes, we can calculate the conditional probability P[AB] with this data and we do not need to introduce any difficult new concepts. Just consider Equation 2.5 given earlier and write it down both for P[AB] and P[BA] (without worrying too much at this point about the meaning of both equations). Now note that both equations give a formula for calculating P[A ∩ B], so we can combine them in a single line, as in Equation 2.15, which can be easily adapted to Equation 2.16.
Equation 2.16 is the basic formulation of Bayes’ rule. However, although we have direct data for the probability P[BA] of symptoms given infection and for the prior probability of infection P[A], the probability P[B] of observing symptoms in a person picked at random was not explicitly given in the earlier description. Now consider that the events A and $\overline{A}$ are mutually exclusive and collectively exhaustive. We can calculate P[B] by Equation 2.17 (i.e. applying the theorem of total probability), with P[ $\overline{A}$ ] = 1 − P[A] = 1 − 1/20,000 = 19,999/20,000.
Using Equation 2.16, we find the probability of infection given the presence of symptoms, i.e. the posterior probability for infection, P[AB] as:
Although the symptoms are a strong indication, the posterior probability of infection is only 4.7%, much smaller than what most people would intuitively expect. This is due to the very low rate of occurrence of the disease, which means that there are more noninfected persons who display the symptoms of the disease than infected persons. Illustrating this with respect to the reference size of 20,000 people: if 20,000 people are selected at random, on average 20 noninfected people will show symptoms and there will be (on average) only 1 person who is actually infected (and who will generally also show the symptoms). Even when clear symptoms are observed the disease remains rare and one should take this into consideration before jumping to conclusions.
The probabilities P[A] and P[AB] have been referred to as prior and posterior probabilities. This is because prior to the observation made by the doctor the probability that the patient has contracted the disease would be assessed as 1/20,000; however, after the observation that symptoms are present, the initial assessment has been updated to obtain an improved assessment of the probability that the patient has contracted the disease. This improved assessment posterior to the incorporation of additional information is called the posterior probability. Generally stated, Bayes’ theorem allows the updating of probabilities as additional information becomes available.
In the examples discussed earlier we have slowly moved away from a clear frequentist interpretation of probability. The situations conforming with the frequentist interpretation of probability are closely related to what is generally referred to as the ‘variability inherent in nature’ and this is also the type of probability that is often presented by experts testifying in legal cases about the occurrence rate of, for example, a type of cancer in the general population or in people who have been exposed to a specific chemical substance (Finkelstein, 2009). In the last example related to whether a particular patient was infected by the disease, however, the patient is either infected or not and a traditional frequentist interpretation becomes more difficult to maintain.
Consider, as a second example, a statement on the probability of guilt of a suspect. Clearly, the suspect is either guilty or not and the probability assigned to his guilt rather represents ‘a degree of belief’. This interpretation of probability is called the Bayesian interpretation of probability. This probability is subjective because it allows for the incorporation of personal experience, expertise and preferences. Although it is true that historically the Bayesian and frequentist approaches have been counterposed, a Bayesian interpretation is not necessarily at odds with the more traditional frequentist interpretation because it is perfectly acceptable to base one’s degree of belief on a frequency observed from experiments. In risk calculations, most probabilities are Bayesian but this does not mean that they are chosen freely by the analyst. Clearly, the assigned degrees of belief should be consistent with physical constraints and experimental data.
For many practical situations this difference between a frequentist interpretation and a Bayesian interpretation is only of secondary importance. It is important to note that probability theory is concerned with the calculation of probabilities and not with the interpretation the user gives to the obtained results. Consequently, the basic axioms, concepts and calculation rules described earlier apply to both a frequentist interpretation of probability and a Bayesian interpretation of probability. In other words, two people do not have to agree on the interpretation of a calculated probability for them to agree with the obtained mathematical result.
Most people have a very good intuitive grasp of probabilities for simple situations, such as the rolling of a die, but for more complex situations our intuition tends to betray us. In order to overcome this problem the basic axioms of probability theory have been introduced, as well as the fundamental concepts of conditional probability, the theorem of total probability, independence and Bayes’ rule. The competing interpretations for probability, i.e. the frequentist interpretation and the Bayesian interpretation, have also been introduced but because probability theory indiscriminately applies to both interpretations it is noted that this difference is, in general, of secondary practical importance. From a practical perspective the estimation of probabilities from historical data has been briefly introduced and the application of Bayes’ rule has been illustrated for updating probabilities when additional information is available. In our modern world, the possibilities for applying probability theory go far beyond the original domains of gambling and mathematics, and a basic understanding of probability is increasingly turning into a necessity for decision makers in all fields. Particularly for decisions related to risk and life safety, probability theory is always just around the corner.