565

# Regression Approaches for ABC

Authored by: Michael G.B. Blum

# Handbook of Approximate Bayesian Computation

Print publication date:  August  2018
Online publication date:  August  2018

Print ISBN: 9781439881507
eBook ISBN: 9781315117195

10.1201/9781315117195-3

#### Abstract

In this chapter, we present regression approaches for approximate Bayesian computation (ABC). As for most methodological developments related to ABC, regression approaches originate with coalescent modeling in population genetics [3]. After performing rejection sampling by accepting parameters that generate summary statistics close enough to those observed, parameters are adjusted to account for the discrepancy between simulated and observed summary statistics. Because adjustment is based on a regression model, such approaches are coined as regression adjustment in the following.

#### 3.1  Introduction

In this chapter, we present regression approaches for approximate Bayesian computation (ABC). As for most methodological developments related to ABC, regression approaches originate with coalescent modeling in population genetics [3]. After performing rejection sampling by accepting parameters that generate summary statistics close enough to those observed, parameters are adjusted to account for the discrepancy between simulated and observed summary statistics. Because adjustment is based on a regression model, such approaches are coined as regression adjustment in the following.

Regression adjustment is a peculiar approach in the landscape of Bayesian approaches where sampling techniques are usually proposed to account for mismatches between simulations and observations [4, 5]. We suggest various reasons explaining why regression adjustment is now a common step in practical applications of ABC. First, it is convenient and generic because the simulation mechanism is used to generate simulated summary statistics as a first step, and it is not used afterwards. For instance, the software ms is used to generate DNA sequences or genotypes when performing ABC inference in population genetics [6]. Statistical routines, which account for mismatches, are completely separated from the simulation mechanism and are used in a second step. Regression adjustment can therefore be readily applied in a wide range of contexts without implementation efforts. By contrast, when considering sampling techniques, statistical operations and simulations are embedded within a single algorithm [5, 7], which may require new algorithmic development for each specific statistical problem. Second, regression approaches have been shown to produce reduced statistical errors compared to rejection algorithms in a quite diverse range of statistical problems [3, 8, 9]. Last, regression approaches are implemented in different ABC software including DIYABC [10] and the R abc package [11].

In this chapter, I introduce regression adjustment using a comprehensive framework that includes linear adjustment [3], as well as more flexible adjustments, such as non-linear models [8]. The first section presents the main concepts underlying regression adjustment. The second section presents a theorem that compares theoretical properties of posterior distributions obtained with and without regression adjustment. The third section presents a practical application of regression adjustment in ABC. It shows that regression adjustment shrinks posterior distributions when compared to a standard rejection approach. The fourth section presents recent regression approaches for ABC that are not based on regression adjustment.

#### 3.2.1  Partial posterior distribution

Bayesian inference is based on the posterior distribution defined as:

3.1 $θ ∈ ℝ p$

where $π ( θ | s obs ) ∝ p ( s obs | θ ) π ( θ ) .$

is the vector of parameters, and y obs are the data. Up to a renormalising constant, the posterior distribution depends on the prior π(θ) and on the likelihood function p(y obs|θ). In the context of ABC, inference is no longer based on the posterior distribution π(θ|y obs), but on the partial posterior distribution π(θ|s obs), where s obs is a q-dimensional vector of descriptive statistics. The partial posterior distribution is defined as follows:
3.2 $( θ c ( i ) , w ( i ) ) , i = 1 , … , n$

Obviously, the partial posterior is equal to the posterior if the descriptive statistics s obs are sufficient for the parameter θ.

#### 3.2.2  Rejection algorithm followed by adjustment

To simulate a sample from the partial posterior p(θ|s obs), the rejection algorithm followed by adjustment works as follows:

1. Simulate n values θ (i), i = 1, …, n, according to the prior distribution π.
2. Simulate descriptive statistics s (i) using the generative model p(s (i)|θ (i)).
3. Associate with each pair (θ (i), s (i)) a weight w (i)Kh (||s (i)s obs||), where ||· − ·|| is a distance function, h > 0 is the bandwidth parameter, and K is an univariate statistical kernel with Kh (||·||) = K(||· ||/h).
4. Fit a regression model where the response is θ and the predictive variables are the summary statistics s [equations (3.3) or (3.5)]. Use a regression model to adjust the θ (i) in order to produce a weighted sample of adjusted values. Homoscedastic adjustment [equation (3.4)] or heteroscedastic adjustment [equation (3.6)] can be used to produce a weighted sample $θ ( i ) = m ( s ( i ) ) + ε , i = 1 , ⋯ , n ,$ , which approximates the posterior distribution.

To run the rejection algorithm followed by adjustment, there are several choices to make. The first choice concerns the kernel K. Usual choices for K encompass uniform kernels that give a weight of 1 to all accepted simulations and zero otherwise [12] or the Epanechnikov kernel for a smoother version of the rejection algorithm [3]. However, as for traditional density estimation, the choice of statistical kernel has a weak impact on estimated distribution [13]. The second choice concerns the threshold parameter h. For kernels with a finite support, the threshold h corresponds to (half) the window size within which simulations are accepted. For the theorem presented in Section 3.4, I assume that h is chosen without taking into account the simulations s (1), …, s (n). This technical assumption does not hold in practice, where we generally choose to accept a given percentage p, typically 1% or 0. 1%, of the simulations. This practice amounts at setting h to the first p-quantile of the distances ||s (i)s obs||. A theorem where the threshold depends on simulations has been provided [14]. Choice of threshold h corresponds to bias-variance tradeoff. When choosing small values of h, the number of accepted simulations is small and estimators might have a large variance. By contrast, when choosing large values of h, the number of accepted simulations is large and estimators might be biased [1].

The principle of regression adjustment is to adjust simulated parameters θ (i) with non-zero weights w (i) > 0 in order to account for the difference between the simulated statistics s (i) and the observed one s obs. To adjust parameter values, a regression model is fitted in the neighbourhood of s obs:

3.3 $θ c ( i ) = m ^ ( s obs ) + ε ^ ( i ) ​ = m ^ ( s obs ) + ( θ ( i ) − m ^ ( s ( i ) ) ) ,$

where m(s) is the conditional expectation of θ given s, and ε is the residual. The regression model of equation (3.3) assumes homoscedasticity, for example, it assumes that the variance of the residuals does not depend on s. To produce samples from the partial posterior distribution, the θ (i)’s are adjusted as follows:

3.4 $m ^$

where $ε ^ ( i )$

represents an estimator of the conditional expectation of θ given s, and $θ ( i ) = m ( s ( i ) ) + σ ( s ( i ) ) ζ , i = 1 , ⋯ , n ,$ is the i th empirical residual. In its original formulation, regression adjustment assumes that m is a linear function [3], and it was later extended to non-linear adjustments [8]. Other regression adjustment techniques have also been proposed when parameters and summary statistics are real-valued functions [15].

The homoscedastic assumption of equation (3.3) may not be always valid. When the number of simulations is not very large because of computational constraints, local approximations, such as the homoscedastic assumption, are no longer valid because the neighborhood corresponding to simulations for which w (i) ≠ 0 is too large. Regression adjustment can account for heteroscedasticity that occurs when the variance of the residuals depend on the summary statistics. When accounting for heteroscedasticity, the regression equation can be written as follows [8]:

3.5 $θ c ′ ( i ) = m ^ ( s obs ) + σ ^ ( s obs ) ζ ^ ( i ) ​ = m ^ ( s obs ) + σ ^ ( s obs ) σ ^ ( s ( i ) ) ( θ ( i ) − m ^ ( s ( i ) ) ) ,$

where σ(s) is the square root of the conditional variance of θ given s, and ζ is the residual. Heteroscedastic adjustment involves an additional scaling step in addition to homoscedastic adjustment (3.4) (Figure 3.1):

Figure 3.1   Posterior variances of the histograms obtained after adjustment are reduced compared to the posterior variance obtained without adjustment.

3.6 $m ^$

where $σ ^$

and $m ^$ are estimators of the conditional mean and of the conditional standard deviation.

#### 3.2.4  Fitting regression models

Equations (3.4) and (3.6) of regression adjustment depend on the estimator of the conditional mean $σ ^$

and possibly of the conditional variance $E ( m ) = ∑ i = 1 n ( θ ( i ) − m ( s ( i ) ) ) 2 w ( i ) .$ . Model fitting is performed using weighted least squares. The conditional mean is learned by minimising the following weighted least square criterion:
3.7 $E ( log σ 2 ) = ∑ i = 1 n ( log ( ( ε ^ ( i ) ) 2 ) − log σ 2 ( s ) ) 2 w ( i ) .$

For linear adjustment, we assume that m(s) = α + βs [3]. The parameters α and β are inferred by minimising the weighted least square criterion given in equation (3.7).

For heteroscedastic adjustment [equation (3.4)], the conditional variance should also be inferred. The conditional variance is learned after minimisation of a least square criterion. It is obtained by fitting a regression model where the answer is the logarithm of the squared residuals. The weighted least squares criterion is given as follows:

$ϕ = m ( s ) + ε .$

Neural networks have been proposed to estimate m and σ 2 [8]. This choice was motivated by the possibility offered by neural networks to reduce the dimension of descriptive statistics via an internal projection on a space of lower dimension [16].

In general, the assumptions of homoscedasticity and linearity [equation (3.3)] are violated when the percentage of accepted simulation is large. By contrast, heteroscedastic and non-linear regression models [equation (3.5)] are more flexible. Because of this additional flexibility, the estimated posterior distributions obtained after heteroscedastic and non-linear adjustment is less sensitive to the percentage of accepted simulations [8]. In a coalescent model, where the objective was to estimate the mutation rate, heteroscedastic adjustment with neural networks was found to be less sensitive to the percentage of accepted simulations than linear and homoscedastic adjustment [8]. In a model of phylodynamics, it was found again that statistical error obtained with neural networks decreases at first – because the regression method requires a large enough training dataset – and then reaches a plateau [9]. However for a larger phylodynamics dataset, statistical error obtained with neural networks increases for higher tolerance values. Poor regularisation or the limited size of neural networks were advanced as putative explanations [9].

In principle, estimation of the conditional mean m and of the conditional variance σ 2 can be performed with different regression approaches. For instance, the R abc package implements different regression models for regression adjustment including linear regression, ridge regression, and neural networks [11]. Lasso regression is another regression approach that can be considered. Regression adjustment based on lasso (least absolute shrinkage and selection operator) was shown to provide smaller errors than neural network in a phylodynamic model [9]. An advantage of lasso, ridge regression and neural networks compared to standard multiple regression is that they account for the large dimension of the summary statistics using different regularisation techniques. Instead of considering regularised regression, there is an alternative where the initial summary statistics are replaced by a reduced set of summary statistics or a combination of the initial summary statistics [17, 18]. The key and practical advantage of regression approaches with regularisation is that they implicitly account for the large number of summary statistics and the additional step of variable selection can be avoided.

#### 3.2.5  Parameter transformations

When the parameters are bounded or positive, parameters can be transformed before regression adjustment. Transformations guarantee that the adjusted parameter values lie in the range of the prior distribution [3]. An additional advantage of the log and logit transformations is that they stabilise the variance of the regression model and make regression model (3.3) more homoscedastic [1].

Positive parameters are regressed on a logarithm scale ϕ = log(θ),

$ϕ c ( i ) = m ^ ( s obs ) + ( ϕ ( i ) − m ^ ( s ( i ) ) ) .$

Parameters are then adjusted on the logarithm scale:

$θ c ( i ) = exp ( ϕ c ( i ) ) .$

The final adjusted values are obtained by exponentiation of the adjusted parameter values:

$θ c ( i )$

Instead of using a logarithm transformation, bounded parameters are adjusted using a logit transformation. Heteroscedastic adjustment can also be performed after log or logit transformations.

#### 3.2.6  Shrinkage

An important property of regression adjustment concerns posterior shrinkage. When considering linear regression, the empirical variance of the residuals is smaller than the total variance. In addition, residuals are centred for linear regression. These two properties imply that for linear adjustment, the empirical variance of $θ c ( i )$

is smaller than the empirical variance of the non-adjusted values θ (i) obtained with the rejection algorithm. Following homoscedastic and linear adjustment, the posterior variance is consequently reduced. For non-linear adjustment, shrinkage property has also been reported, and the additional step generated by heteroscedastic adjustment does not necessarily involve additional shrinkage when comparing $θ c ′ ( i )$ to $π ^ j ( θ | s obs ) = ∑ i = 1 n K ˜ h ′ ( θ j ( i ) − θ ) w ( i ) , j = 0 , 1 , 2 ,$ [1]. However, shrinkage obtained with regression adjustments can be excessive, especially for small tolerance rates, and it can affect posterior calibration. In a model of admixture, 95% credibility intervals obtained with regression adjustments were found to contain only 84% of the true values in less favourable situations [19].

The following theoretical section is technical and can be skipped by readers not interested by mathematical results about ABC estimators based on regression adjustment. In this section, we give the main theorem that describes the statistical properties of posterior distributions obtained with or without regression adjustment. To this end, the estimators of the posterior distribution are defined as follows:

3.8 $θ 0 ( i ) = θ ( i )$

where $θ j ( i ) = θ c ( i )$

(no adjustment), $K ˜$ for j = 1, 2 (homoscedastic adjustment), $K ˜ h ′ ( ⋅ ) = K ˜ ( ⋅ ) / h ′$ is an univariate kernel, and $π ^ j ( θ | s obs )$ . Linear adjustment corresponds to j = 1 and quadratic adjustment corresponds to j = 2. In non-parametric statistics, estimators of the conditional density with adjustment have already been proposed [20, 21].

To present the main theorem, we introduce the following notations: if Xn is a sequence of random variables and an is a deterministic sequence, the notation Xn = oP (an ) means that Xn /an converges to zero in probability, and Xn = OP (an ) means that the ratio Xn /an is bounded in probability when n goes to infinity. The technical assumptions of the theorem are given in the appendix of [1].

Theorem 3.1. We assume that conditions (A1)–(A5) given in the appendix of [1] hold. The bias and variance of the estimators $E [ π ^ j ( θ | s obs ) − π ( θ | s obs ) ] = C 1 h ′ 2 + C 2 , j h 2 + O P ( ( h 2 + h ′ 2 ) 2 ) + O P ( 1 n h q ) ,$

, j = 0, 1, 2 are given by:
3.9 $Var [ π ^ j ( θ | s obs ) ] = C 3 n h q h ′ ( 1 + o P ( 1 ) ) ,$
3.10 $π ^ j ( θ | s obs )$

where q is the dimension of the vector of summary statistics, and the constants C 1, C 2,j , and C 3 are given in [1].

Proof: See [1].

There are other theorems that provide asymptotic biases and variances of ABC estimators but they do not study the properties of estimators arising after regression adjustment. Considering posterior expectation (e.g. posterior moments) instead of the posterior density, Barber et al. [22] provides asymptotic bias and variance of an estimator obtained with rejection algorithm. Biau et al. [14] studied asymptotic properties when the window size h depends on the data instead of being fixed in advance.

Remark 1. Curse of dimensionality The mean square error of the estimators is the sum of the squared bias and of the variance. With elementary algebra, we can show that for the three estimators $p ^ 2 ( θ | s obs )$

, j = 0, 1, 2, the mean square error is of the order of n −1/(q+5) for an optimal choice of h. The speed with which the error approaches 0 therefore decreases drastically when the dimension of the descriptive statistics increases. This theorem highlights (in an admittedly complicated manner) the importance of reducing the dimension of the statistics. However, the findings from these asymptotic theorems, which are classic in non-parametric statistics, are often much more pessimistic than the results observed in practice. It is especially true because asymptotic theorems in the vein of theorem 3.1 do not take into account correlations between summary statistics [23].

Remark 2. Comparing biases of estimators with and without adjustment It is not possible to compare biases (i.e. the constant C 2,j , j = 0, 1, 2) for any statistical model. However, if we assume that the residual distribution of ε in equation (3.3) does not depend on s, then the constant C 2,2 is 0. When assuming homoscedasticity, the estimator that achieves asymptotically the smallest mean square error is the estimator with quadratic adjustment $p ^ 1 ( θ | s obs )$

. Assuming additionally that the conditional expectation m is linear in s, then both $p ^ 2 ( θ | s obs )$ and $E [ θ | s obs ] = ∑ i = 1 n w i θ ( i ) ,$ have a mean square error lower than the error obtained without adjustment.

#### 3.4  Application of Regression Adjustment to Estimate Admixture Proportions Using Polymorphism Data

To illustrate regression adjustment, I consider an example of parameter inference in population genetics. Description of coalescent modeling in population genetics is out of the scope of this chapter, and we refer interested readers to dedicated reviews [24, 25]. This example illustrates that ABC can be used to infer evolutionary events, such as admixture between sister species. I assume that two populations (A and B) diverged in the past and admixed with admixture proportions p and 1 − p to form a new hybrid species C that subsequently splits to form two sister species C 1 and C 2 (Figure 3.2). Simulations are performed using the software DIYABC (Do It Yourself Approximate Bayesian Computation) [10]. The model of Figure 3.2 corresponds to a model of divergence and admixture between species of a complex of species from the butterfly gender Coenonympha. We assume that 2 populations of the Darwin’s Heath (Coenonympha darwiniana) originated through hybridisation between the Pearly Heath (Coenonympha arcania) and the Alpine Heath (Coenonympha gardetta) [26]. A total of 16 summary statistics based on Single Nucleotide Polymorphisms (SNPs) are used for parameter inference [26]. A total of 106 simulations are performed and the percentage of accepted simulations is of 0.5%.

I consider four different forms of regression adjustment: linear and homoscedastic adjustment, non-linear (neural networks) and homoscedastic adjustment, linear and heteroscedastic adjustment, and non-linear and heteroscedastic adjustment. All adjustments were performed with the R package abc [11, 27]. I evaluate parameter inference using a cross-validation criterion [11]. The cross-validation error decreases when considering linear adjustment (Figure 3.3). However, considering heteroscedastic instead of homoscedastic adjustment does not provide an additional decrease of the crossvalidation error (Figure 3.3).

Figure 3.2   Graphical description of the model of admixture between sister species. Two populations (A and B) diverge in the past and admixed with admixture proportions p and 1 − p to form a new hybrid species C that subsequently diverges to form two sister species (C 1 and C 2).

Figure 3.3   Errors obtained when estimating the admixture proportion p with different ABC estimators. The errors are obtained with cross-validation and error bars (two standard deviations) are estimated with bootstrap. adj. stands for adjustment.

Figure 3.4   Posterior distribution of the admixture coefficient p obtained using real data from a butterfly species complex to compute observed summary statistics. This example shows that regression adjustment not only shrinks credibility intervals, but can also shift posterior estimates. The admixture coefficient p measures the relative contribution of Coenonympha arcania to the ancestral C. darwiniana population.

#### 3.5  Regression Methods Besides Regression Adjustment

There are other regression methods besides regression adjustment that have been proposed to estimate E[θ|s obs] and π(θ|s obs) using the simulations as a training set. A first set of methods consider kernel methods to perform regression [28]. The principle is to define a kernel function K to compare observed statistics to simulated summary statistics. Because of the so-called kernel trick, regression with kernel methods amounts at regressing θ with Φ(s), where < Φ(s), Φ(s′) >= K(s, s′) for two vector of summary statistics s and s′. Then, an estimate of the posterior mean is obtained as follows:

3.11 $y o b s ∈ Y$

where wi depends on the inverse the Gram matrix containing the values K(s (i), s (j)) for i, j = 1 …, n. A formula to estimate posterior density can also be obtained in the same lines as formula (3.11). Simulations suggest that kernel ABC gives better performance than regression adjustment when high-dimensional summary statistics are used. For a given statistical error, it was reported that fewer simulations should be performed when using kernel ABC instead of regression adjustment [28]. Other kernel approaches have been proposed for ABC where simulated and observed samples or summary statistics are directly compared through a distance measure between empirical probability distributions [29, 30].

Another regression method in ABC that does not use regression adjustment considers quantile regression forest [19]. Generally used to estimate conditional mean, random forests also provide information about the full conditional distribution of the response variable [31]. By inverting the estimated conditional cumulative distribution function of the response variable, quantiles can be inferred [31]. The principle of quantile regression forest is to use random forests in order to give a weight wi to each simulation (θ (i), s (i)). These weights are then used to estimate the conditional cumulative posterior distribution function F(θ|s obs) and to provide posterior quantiles by inversion. An advantage of quantile regression forest is that tolerance rate should not be specified and standard parameters of random forest can be considered instead. A simulation study of coalescent models shows that regression adjustment can shrink posterior excessively by contrast to quantile regression forest [19].

#### 3.6  Conclusion

With the rapid development of complex machine learning approaches, we envision that regression approaches for approximate Bayesian computation can be further improved to provide more reliable inference for complex models in biology and ecology.

#### References

M G B Blum . Approximate Bayesian computation: A nonparametric perspective. Journal of the American Statistical Association, 105(491):1178–1187, 2010.
K Csilléry , M G B Blum , O E Gaggiotti , and O François . Approximate Bayesian computation (ABC) in practice. Trends in Ecology & Evolution, 25(7):410–418, 2010.
M A Beaumont , W Zhang , and D J Balding . Approximate Bayesian computation in population genetics. Genetics, 162(4):2025–2035, 2002.
P Marjoram , J Molitor , V Plagnol , and S Tavaré . Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences, 100(26):15324–15328, 2003.
S A Sisson , Y Fan , and M M Tanaka . Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences, 104(6):1760–1765, 2007.
R R Hudson . Generating samples under a wright–fisher neutral model of genetic variation. Bioinformatics, 18(2):337–338, 2002.
M A Beaumont , J-M Cornuet , J-M Marin , and C P Robert . Adaptive approximate Bayesian computation. Biometrika, 96(4):983–990, 2009.
M G B Blum and O François . Non-linear regression models for Approximate Bayesian computation. Statistics and Computing, 20:63–73, 2010.
E Saulnier , O Gascuel , and S Alizon . Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study. PLoS Computational Biology, 13(3):e1005416, 2017.
J-M Cornuet , P Pudlo , J Veyssier , A Dehne-Garcia , M Gautier , R Leblois , J-M Marin , and A Estoup . DIYABC v2. 0: A software to make approximate Bayesian computation inferences about population history using single nucleotide polymorphism, DNA sequence and microsatellite data. Bioinformatics, 30(8):1187–1189, 2014.
K Csilléry , O François , and M G B Blum . ABC: An R package for approximate Bayesian computation (ABC). Methods in Ecology and Evolution, 3(3):475–479, 2012.
J K Pritchard , M T Seielstad , A Perez-Lezaun , and M W Feldman . Population growth of human Y chromosomes: A study of Y chromosome microsatellites chromosome microsatellites. Molecular Biology and Evolution, 16(12):1791–1798, 1999.
B W Silverman . Density Estimation for Statistics and Data Analysis, Volume 26. CRC Press, London, UK, 1986.
G Biau , F Cérou , A Guyader , et al. New insights into approximate Bayesian computation. In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, volume 51, pp. 376–403. Institut Henri Poincaré, Paris, 2015.
G S Rodrigues , D J Nott , and S A Sisson . Functional regression approximate Bayesian computation for Gaussian process density estimation. Computational Statistics & Data Analysis, 103:229–241, 2016.
B D Ripley . Neural networks and related methods for classification. Journal of the Royal Statistical Society. Series B (Methodological), volume 56, pp. 409–456, 1994.
P Fearnhead and D Prangle . Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(3):419–474, 2012.
M G B Blum , D Nunes , D Prangle , S A Sisson , et al. A comparative review of dimension reduction methods in approximate Bayesian computation. Statistical Science, 28(2):189–208, 2013.
L Raynal , J-M Marin , P Pudlo , M Ribatet , C P Robert , and A Estoup . ABC random forests for Bayesian parameter inference. arXiv preprint arXiv:1605.05537, 2016.
R J Hyndman , D M Bashtannyk , and G K Grunwald . Estimating and visualizing conditional densities. Journal of Computing and Graphical Statistics, 5:315–336, 1996.
B E Hansen . Nonparametric conditional density estimation. Working paper available at http://www.ssc.wisc.edu/bhansen/papers/ncde.pdf, 2004.
S Barber , J Voss , and M Webster . The rate of convergence for approximate Bayesian computation. Electronic Journal of Statistics Electronic Journal of Statistics Electronic Journal of Statistics, 9:80–105, 2015.
D W Scott . Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, New York, 2009.
R R Hudson . Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology, 7(1):44, 1990.
N A Rosenberg and M Nordborg . Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nature Reviews Genetics, 3(5):380–390, 2002.
T Capblancq , L Després , D Rioux , and J Mavárez . Hybridization promotes speciation in coenonympha butterflies. Molecular Ecology, 24(24):6209–6222, 2015.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2016.
S Nakagome , K Fukumizu , and S Mano . Kernel approximate Bayesian computation in population genetic inferences. Statistical Applications in Genetics and Molecular Biology, 12(6):667–678, 2013.
J Mitrovic , D Sejdinovic , and Y Teh . DR-ABC: Approximate Bayesian computation with kernel-based distribution regression. In Proceedings of ICML 2016, volume 48, pp. 1482–1491. JMLR W&CP, New York, 2016.
M Park , W Jitkrittum , and D Sejdinovic . K2-ABC: Approximate Bayesian computation with infinite dimensional summary statistics via kernel embeddings. Proceedings of AISTATS, pp. 398–407, 2016.
N Meinshausen . Quantile regression forests. Journal of Machine Learning Research, 7(Jun):983–999, 2006.

## Use of cookies on this website

We are using cookies to provide statistics that help us give you the best experience of our site. You can find out more in our Privacy Policy. By continuing to use the site you are agreeing to our use of cookies.