Ruhr Economic Papers

Ruhr Economic Papers #660

Fast, Approximate MCMC for Bayesian Analysis of Large Data Sets: A Design Based Approach

by Matthias Kaeding

UDE, RWI, 10/2016, 23 S./p., 8 Euro, ISBN 978-3-86788-766-3 DOI: 10.4419/86788766



We propose a fast approximate Metropolis-Hastings algorithm for large data sets embedded in a design based approach. Here, the loglikelihood ratios involved in the Metropolis-Hastings acceptance step are considered as data. The building block is one single subsample from the complete data set, so that the necessity to store the complete data set is bypassed. The subsample is taken via the cube method, a balanced sampling design, which is defined by the property that the sample mean of some auxiliary variables is close to the sample mean of the complete data set. We develop several computationally and statistically efficient estimators for the Metropolis-Hastings acceptance probability. Our simulation studies show that the approach works well and can lead to results which are close to the use of the complete data set, while being much faster. The methods are applied on a large data set consisting of all German diesel prices for the first quarter of 2015.

JEL-Classification: C11, C55, C83

Keywords: Bayesian inference; big data; approximate MCMC; survey sampling