next up previous
Next: Example 8: AttritionLikelihood Up: Advanced topics and examples Previous: FiniteUpdate, and Example 6:

ReversibleJumpUpdate, MixtureBond, and Example 7: BinomialHypothesisTest

One of the most challenging sitations whenimplementing an MCMC is if a parameter has a mixture distribution and is most naturally thought of as living in one of two or more different spaces. A simple example arises in variable subset selection in regression: a slope parameter may be given a prior distribution that is a mixture of a normal distribution and a point mass at zero. In some cases it will be possible to update such a parameter using its full conditional distribution, but in other cases this distribution is not tractable, so that we want to use some method more like Metropolis-Hastings. Reversible jump MCMC, as presented in [8] demonstrates how to do this. The algorithm attempts to jump between spaces, and the Metropolis-Hastings acceptance probability is a function of the methods of proposing new parameter values so that the stationary distribution of the algorithm is as desired.

The ReversibleJumpUpdate can be thought of as a transition matrix between models, and an array of functions specifying how to propose a new value of the parameters in a new model, as a function of which model was the old model. For this to work, it is necessary to adjust acceptance probabilities in such a way that the algorithm's limiting distribution is the desired posterior distribution. The arguments to the constructor are

  • an array of parameters whose values might be changed by the reversible jump update;
  • an integer that determines how many models we are mixing over (denote this integer by $M$);
  • an integer that indicates which of these models is the initial state for the MCMC (this should be consistent with the initial values of the parameters, and its possible values are $0, 1, \ldots, M-1$);
  • a one-dimensional array of transition probabilities between the models. This array will be interpreted as a square $M \times M$ matrix whose $(i,j)$ coordinate is the probability that if the current model is model $i$, the next model proposed will be model $j$;
  • an array of JumpPerturbers. A JumpPerturber is much like a Perturber. Both are interfaces, and JumpPerturber extends Perturber by adding a density() method. This method computes the probability density of the proposed new value of the parameters.
  • a String indicating a directory where output specific to the reversible jump update should go. This output includes acceptance probabilities for the purpose of Rao-Blackwellization, but has not been used enough to be standard.

To be precise, suppose the current model is model $i$ and the current value of the parameter is $\theta$. To generate a new value of the parameter, we first choose a new model according to the probabilities in the transition matrix: suppose model $j$ is chosen; the probability that this happens is (say) $A_{ij}$. The proposed new value $\theta^\prime$ of the parameter is chosen from density $T_{ij}(\theta, \theta^\prime)$. Denote the unnormalized posterior by $f$. The acceptance probability for the reversible jump move is

\frac{A_{ji}T_{ji}(\theta^\prime, \theta)}{A_{ij}T_{ij}(\theta, \theta^\prime)}.\end{displaymath}

This is ``just'' the Metropolis-Hastings rule, but specialized to include the possibility of jumping across models. (The JumpPerturbers evaluate the densities $T_{ij}(\theta, \theta^\prime)$.)

In the following example, we perform a Bayesian hypothesis test for equality of two probabilities $p_1$ and $p_2$. We have binomial samples $x_i \sim$ Binomial$(n_i, p_i)$ for $i=1,2$. We make it into a hypothesis testing problem by putting a mixture prior on $(p_1, p_2)$: with probability $\lambda$, $p_1 = p_2$ and their common value has a beta prior distribution, while with probability $1-\lambda$, the $p_i$'s are exchangeable, each with a (possibly different) beta prior distribution. Click on these buttons to bring up new windows displaying the code and input files for this example.

To follow this example, it is necessary to understand the MixtureBond class. A MixtureBond is essentially an array of BasicMCMCBonds, one bond for each model that we are mixing over. While a BasicMCMCBond requires an array of parameters, an array of ArgumentMakers and a Likelihood, a MixtureBond requires two-dimensional arrays of parameters and ArgumentMakers and an array of Likelihoods. In addition, a MixtureBond requires a single ArgumentMaker that computes the mixing probabilities for each of the models. This is implemented in a slightly strange way: when this ArgumentMaker is applied to the parameters for the $m$th model, it should generate an array whose $m$th coordinate is the probability of the $m$th model. The simplest way to do this is with a ConstantArgument, but we wanted to enable the case where the mixing probabilities are themselves being estimated. We emphasize that a mixture bond is used for computing weighted averages of bonds. Suppose $x_1, x_2, \ldots x_n$ may come from several densities $f_m$, and each of these distributions has probability $\pi_m$. A single MixtureBond can be used to compute

\begin{displaymath}\sum_m \pi_m \prod_i f_m(x_i),\end{displaymath}

but not to compute

\begin{displaymath}\prod_i \sum_m \pi_m f_m(x_i).\end{displaymath}

The latter can be computed using $n$ MixtureBonds. In other words, a single MixtureBond mixes the distributions in such a way that all of the data points come from the same bond. You may want the other interpretation, and I intend to produce such a class as soon as possible. In short, one should expect some transience in the code for mixture distributions in YADAS.

To get (finally) to the example, the single unknown parameter $p$ contains both the probabilities $p_1$ and $p_2$. The MixtureBond contains the mixture distribution and is used in a somewhat nonstandard way here. If $\beta_{ab}$ denotes the beta density with parameters $a$ and $b$, the mixture bond is intended to compute the quantity

\begin{displaymath}(1-\lambda) \beta_{ab}(p_1)\beta_{ab}(p_2)I_{\{p_1 \neq p_2\}} +
\lambda \beta_{ab}(p_1)I_{\{p_1 = p_2\}}.\end{displaymath}

The AreTheyEqualArgument does the work here: if $p_1 = p_2$, this argument generates the array $\{0, \lambda\}$, while if $p_1 \neq p_2$, the output is the array $\{1-\lambda, 0\}$, since the first model is the $p_1 \neq p_2$ model. This is a nonintuitive trick but it will presumably be common in mixture models that require ReversibleJumpUpdates (depending on the parameter, one of the models may have zero probability).

Finally, the parameter $p$ is updated using reversible jump. The matrix $(a_{00} = 0.75, a_{01} = 0.25, a_{10} = 0.25, a_{11} = 0.75)$ indicates that if $p_1 \neq p_2$, the probability is 0.25 that the proposed new value of $p$ will feature $p_1 = p_2$. Also, if $p_1 = p_2$, the next proposed value will set them unequal with probability 0.25. The four JumpPerturbers operate as follows.

  • If the $p$'s are unequal and the proposed new values are also to be unequal, we add a different Gaussian random walk step to each on the logit scale.
  • If the $p$'s are unequal and the proposal is to equalize them, the proposal is deterministic and equal to a weighted average of the two $p$'s. The weights are set in the input file.
  • If the $p$'s are equal and are to be made unequal in the proposal, we again add different independent Gaussians to the $p$'s on the logit scale.
  • If the $p$'s are equal and they should be kept equal in the proposal, we update their common value by adding a Gaussian on the logit scale.

The standard output from a ReversibleJumpUpdate is a natural extension of the usual acceptance probability output; this time it shows the number of acceptances and trials from each model to each model. I obtained the following output:

Update 0: 0 -> 0: 1073 / 1551; 976 / 1551;
0 -> 1: 443 / 496;
1 -> 0: 442 / 2200;
1 -> 1: 4761 / 6753;
This means that when the algorithm moved from model 0 to model 0, the first $p$ was changed successfully 1073 out of 1551 times, while the second $p$ was changed 976 out of 1551 times. The acceptance probability was quite high when we proposed a move from $p_1 \neq p_2$ to $p_1 = p_2$, but low in the opposite direction.

Theoretical results about ideal acceptance rates, values of the transition matrix and choice of the JumpPerturbers would be of great interest and probably not easy to achieve.




next up previous
Next: Example 8: AttritionLikelihood Up: Advanced topics and examples Previous: FiniteUpdate, and Example 6:
Todd Graves 2008-09-24