Congratulations to Woody Burchett!

Congratulations to Woody for successfully defending his PhD titled “Improving the Computational Efficiency in Bayesian Fitting of Cormack-Jolly-Seber Models with Individual, Continuous, Time-Varying Covariates” on May 26. His thesis is now available online through the University of Kentucky archives. Woody is now working as a statistician with Pfizer pharmaceuticals in New London, Connecticut, where he’s been since last summer.

Posted in News | Comments Off on Congratulations to Woody Burchett!

LaTeX smartdiagram

Amanda Ellis just gave a practice presentation for her qualifying exam tomorrow. The use of the smartdiagram lists and figures in her slides were particularly eye catching. I know what I’m doing for my next talk!

Posted in LaTeX | Comments Off on LaTeX smartdiagram

Polymode for ESS

I wrote my first vignette for an R package the other day and was amazed at the ease of writing Rmarkdown. Of course, I am using emacs and was looking for a mode that would allow me to edit these files smoothly with syntax highlighting and the ability to send code chunks to R automatically. Et voila — Polymode. I followed the John Stanton-Geddes instructions for enabling markdown-mode and polymode, but ran up against a few minor issues. The first was that copying and pasting the LISP code into my .emacs file introduced some errors — single quotes turned into backticks and doublequotes turned into smart quotes. In the end, I had to reformat these/type them in myself.

I also found that I needed to add the line:

(add-to-list 'auto-mode-alist '("\\.Rmd" . poly-markdown+r-mode))

to my .emacs file for polymode to start automatically when I opened a .Rmd file. This was mentioned in the polymode instructions which I should probably have read first. The full entries in my .emacs file for markdown-mode and polymode are:

;;; Markdown mode
(autoload 'markdown-mode "markdown-mode" "Major mode for editing Markdown files" t)
(setq auto-mode-alist (cons '("\\.markdown" . markdown-mode) auto-mode-alist))
(setq auto-mode-alist (cons '("\\.md" . markdown-mode) auto-mode-alist))
(setq auto-mode-alist (cons '("\\.ronn?" . markdown-mode) auto-mode-alist))

;;; Polymode
(setq load-path (append '("/home/sbonner/.emacs.d/polymode/" "/home/sbonner/.emacs.d/polymode/modes") load-path))

(require 'poly-R)
(require 'poly-markdown)

(add-to-list 'auto-mode-alist '("\\.Rmd" . poly-markdown+r-mode))

Looking great!

Posted in R | Tagged , , , , , | Comments Off on Polymode for ESS

Shy Crabs Make the Most Sperm

Yesterday my paper with Danielle Bridger and Mark Briffa got coverage in Discover Magazine. I have to admit that it was Danielle and Mark who “peeked inside the crabs’ gonads”. I just helped out with the statistics.

Posted in Uncategorized | Comments Off on Shy Crabs Make the Most Sperm

Finding Articles in Ecological Statistics (and beyond)

As part of our lab meetings we discuss recent articles of interest. These articles are generally in ecological statistics, though they may be in fields relating to our projects (e.g., Bayesian inference). My students asked how to find articles, and so I thought I’d list some of the sources I keep an eye on.

I (try to) stay up to date with the articles published in the following journals:

  • Biometrics
  • Methods in Ecology and Evolution
  • Journal of Agricultural, Biological, and Environmental Statistics
  • Environmental and Ecological Statistics
  • Biometrika (to a lesser degree)
  • The Auk (mostly because I like birds)

Other journals that I should be watching include:

  • Ecology
  • Journal of Wildlife Management

But, there is only so much time in the day. The easiest way to keep up with what’s published in these journals is to subscribe to the table of contents alerts. For example, alerts for JABES can be requested by providing your e-mail address on the journal’s webpage.
The other good way to find recent articles is to search for specific authors or to look for new articles that cite old articles of interest. There are several tools that do this including the Web of Science and Google Scholar. These tools have different features and both have advantages and disadvantages. For example, the Web of Science allows you to limit searches by the field of the publication. On the other hand, Google Scholar provides tools that alert you when a specific author publishes new papers or is cited by other authors. For example, I use Google Scholar to follow the articles and citations of Bill Link from the USGS one of the leaders in Bayesian inference for the analysis of ecological data.

Several organizations also provide unified portals for journal alerts and allow for alerts using keyword searches as well (e.g. Zetoc). I have never used these myself. Just another thing to do!

Please feel free to comment if you think I’ve missed something important. It would be great to get more ideas.

Posted in Uncategorized | Comments Off on Finding Articles in Ecological Statistics (and beyond)

Using LaTeX in Word

I use LaTeX when writing my own papers. However, when collaborating with other authors, particularly those in biology, I have to used Microsoft Word. In the past I have converted LaTeX equations into images to embed into the Word document, but this is cumbersome and makes it difficult to edit the equations later.

I recently found Mr. Sparkle wonderful LaTeX in Word project which makes this a lot easier! The heart of this is Word document that contains embedded macros which use a remote server to process LaTeX code into embedded images. To use the package you simply download and extract the tarball, save it to a new file, and start typing away. The Alt-L keystroke will open a new window that you can type LaTeX commands into, and pressing the convert button will turn these commands into images. The best thing is that the code stays embedded in the document. With the same keystroke you can reopen the command window and edited the existing code. Fantastic!

Posted in LaTeX | Tagged , | Comments Off on Using LaTeX in Word

Simulating zero-truncated Poisson

I have been working on simulating data from Bob Dorazio’s version of the Royle N-mixture model for repeated count data. This version of the model assumes that the abundance at site  N_i is generated from a Hurdle model such that  P(N_i=0)=(1-\psi) and  P(N_i=n) = \psi e^{-\lambda} \lambda^{n}/(n! (1-e^{-\lambda}) ,  n=1,2,3, . This can be interpreted as specifying an occupancy rate of  \psi with abundance given by a zero-truncated Poisson with mean  \lambda at the occupied sites.

I was originally generating the zero-truncated Poisson values using a while loop to simulate from a Poisson with mean  \lambda until a non-zero value was generated. This is inefficient and runs into problems if  \lambda is sufficiently small that the probability of a non-zero value is below computer precision.

To remedy this I found this nice solution from Peter Daalgard in response to a question from Spencer Graves on the R-help list. This solution essentially uses the inverse probability transform but truncates the CDF so that the values generated are guaranteed to be greater than 1. The same could be done for any distribution (that has a distribution function implemented in R). My code below implements this trick but adds a tolerance that avoids problems if the value of lambda is too small.

rtpois <- function(n, lambda,tol=1e-10){
## Simulate from zero-truncated Poisson

## Initialize output
x <- rep(NA,n)

## Identify lambda values below tolerance
low <- which(lambda < tol)
nlow <- length(low)

if(nlow > 0){
x[low] <- 1

if(nlow < n)
x[-low] <- qpois(runif(n-nlow, dpois(0, lambda[-low]), 1), lambda[-low])
}
else
x <- qpois(runif(n-nlow, dpois(0, lambda), 1), lambda)

return(x)
}
Posted in Uncategorized | Comments Off on Simulating zero-truncated Poisson

MC(MC)MC: Exploring Monte Carlo integration within MCMC for mark-recapture models with individual covariates.

The paper from Simon’s plenary at the 2013 EURING Technical Meeting has finally appeared “in print”. The quotation marks are because MEE is an on-line journal, but the paper final has a volume, issue, and page numbers. The full citation is:

Bonner, S.J. and Schofield, M. (2014). MC(MC)MC: Exploring Monte Carlo integration within MCMC for mark-recapture models with individual covariates. Methods in Ecology and Evolution, 5(12):1305–1315.

Posted in Publications and Presentations | Comments Off on MC(MC)MC: Exploring Monte Carlo integration within MCMC for mark-recapture models with individual covariates.

First Attempts with the parallel Package

The last time I worked with parallel processing in R I used the package Rmpi. This ws very flexible, but took a lot of setup to handle passing jobs, code, and data between the master and the slave. I have used the package snow for some simple projects, but I have new need for parallel computing and recently found out about the parallel package (something I should have known about earlier). This is my first attempt at using the mcparallel function from the parallel package.

As a simple test, I have used mcparallel to implement multiple chains for a Metropolis-Hastings sampler of the mean parameter in a multivariate log-normal distribution. That is, to sample from the posterior distribution of  \mu given a an iid sample of random p-vectors X_1,\ldots,X_n such that  \log(X_i) \sim N_p(\mu,\Sigma) . In the simulation I have set  p=2 ,  \mu=(1,1)^T , and  \Sigma=.5 + .5 I_2.

The sampler I have implemented generates proposals using a symmetric multivariate normal proposal distribution with variance Sigma.prop. I have manually tuned the proposal variance to achieve acceptance rates just above .5 and visually pleasing traceplots.

The use of the parallel package comes in three steps:

1. Initialize the random number stream:

This part of the code selects and initializes the L’Ecuyer random number generator so that each chain uses a different RNG.


## Initialize random number generator to obtain different chains from
## each parallel run
RNGkind("L'Ecuyer-CMRG")
set.seed(7777)
mc.reset.stream()

2. Start the parallel processes:

The second part of the code starts the parallels processes running. I have used the lapply function here so that the number of parallel processes is not hardwired and can be changed easily.

## Start parallel processes
niter <- 1000
Sigma.prop <- .03*Sigma</p>
MHprocesses <- lapply(1:nchains,function(i){
mcparallel(MHalg(mu.init[i,],x,Sigma,n.iter,Sigma.prop))
})

3. Collect results:

The final chunk uses the mccollect function to wait for the parallel jobs to finish and collect the output.

## Wait and collect output
MHout <- mccollect(MHprocesses)

In this case, the chains could have been run in parallel using the mcapply function. This would avoid the need to set the RNG and to collect results manually. Both functions use forks and share memory until objects are modified, so they should not differ in speed or memory use. However, mcparallel seems to be more flexible and I will need the flexibility in some of projects.

Here is the full code. Any comments are welcome!

############################################################
#
# parallel_1.R
#
# First test with the parallel package.
#
# In this example I run parallel chains for a Metroposlis-
# Hastings sampler for the multivariate log-normal
# distribution using the mcparallel() function to
# parallelize the code.
#
############################################################

##### Preliminaries #####
## Load packages
library(parallel)
library(mvtnorm)

## Define functions
dmvlnorm <- function(x,mu,Sigma,log=FALSE){
 ## Multivariate log-normal density
 dmvnorm(log(x),mu,Sigma,log=log)
}

rmvlnorm <- function(n,mu,Sigma){
 ## Multivariate log-normal random generator

## Generate normal random variates
 logx <- rmvnorm(n,mu,Sigma)

## Exponentiate to get log normal random variates
 x <- exp(logx)

return(x)
}

MHstep <- function(mu.curr,x,Sigma,Sigma.prop){
 ## Metropolis-Hastings step

## Generate proposal
 mu.prop <- mu.curr + rmvnorm(1,rep(0,length(mu.curr)),Sigma.prop)

## Compute Hastings ratio (on log scale)
 log.alpha <- sum(dmvlnorm(x,mu.prop,Sigma,log=TRUE)) -
 sum(dmvlnorm(x,mu.curr,Sigma,log=TRUE))

## Accept or reject proposal
 if(-rexp(1) < log.alpha)
 return(mu.prop)
 else
 return(mu.curr)
}

MHalg <- function(mu,x,Sigma,niter,Sigma.prop){
 ## Initialize storage
 mu.store <- matrix(nrow=niter + 1,ncol=length(mu))
 mu.store[1,] <- mu

## Initialize acceptance monitor
 accept <- rep(0,niter)

## Run MH algorithm
 for(k in 1:niter){
 ## Perform one MH step
 mu <- MHstep(mu,x,Sigma,Sigma.prop)

## Store values
 mu.store[k+1,] <- mu

## Track acceptance
 if(all(mu.store[k+1,] != mu.store[k,]))
 accept[k] <- 1
 }

 ## Return sampled values
 return(list(mu=mu.store,accept=accept))
}

##### Run Samplers #####

## Generate some random data
mu <- c(1,1)
Sigma <- .5 + .5 * diag(2)

n <- 25

x <- rmvlnorm(n,mu,Sigma)

## Parameters
n <- 25
mu <- c(1,.5)
Sigma <- .5 + .5*diag(2)

## Contour plot of true density
xgrid <- seq(0,30,length=100)
Z <- sapply(xgrid,function(x1){
 dmvlnorm(cbind(x1,xgrid),mu,Sigma)
})

contour(xgrid,xgrid,Z)

## Generate data
x <- rmvlnorm(n,mu,Sigma)

## Generate initial values
nchains <- 4
mu.init <- rmvnorm(nchains,apply(log(x),2,mean),2*Sigma)

## Initialize random number generator to obtain different chains from
## each parallel run
RNGkind("L'Ecuyer-CMRG")
set.seed(7777)
mc.reset.stream()

## Start parallel processes
niter <- 1000
Sigma.prop <- .03*Sigma

MHprocesses <- lapply(1:nchains,function(i){
 mcparallel(MHalg(mu.init[i,],x,Sigma,n.iter,Sigma.prop))
})

## Wait and collect output
MHout <- mccollect(MHprocesses)

##### Examine Output #####

## One-dimensional traceplots
col <- rainbow(nchains)
par(mfrow=c(2,1))

trace1 <- sapply(1:nchains,function(i) MHout[[i]]$mu[,1])
matplot(trace1,type="l",lty=1,col=col)
abline(h=mu[1],lty=2,col="grey")

trace2 <- sapply(1:nchains,function(i) MHout[[i]]$mu[,2])
matplot(trace2,type="l",lty=1,col=col)
abline(h=mu[2],lty=2,col="grey")

## Two-dimensional traceplot
par(mfrow=c(1,1))
lim <- range(sapply(MHout,function(out) range(out$mu)))
plot(NA,NA,xlim=lim,ylim=lim)
for(i in 1:nchains)
 lines(MHout[[i]]$mu[,1],MHout[[i]]$mu[,2],col=col[i])

abline(v=mu[1],lty=2,col="grey")
abline(h=mu[2],lty=2,col="grey")

## Acceptance rates
sapply(1:nchains,function(i) mean(MHout[[i]]$accept))

## Density plots
par(mfrow=c(2,1))
dens1 <- lapply(1:nchains,function(i){
 density(MHout[[i]]$mu[,1],from=lim[1],to=lim[2])
})
ylim1 <- c(0,max(sapply(1:nchains,function(i) max(dens1[[i]]$y))))

plot(NA,NA,xlim=lim,ylim=ylim1)
for(i in 1:nchains)
 lines(dens1[[i]],col=col[i])

abline(v=mu[1],lty=2,col="grey")

dens2 <- lapply(1:nchains,function(i){
 density(MHout[[i]]$mu[,2],from=lim[1],to=lim[2])
})
ylim2 <- c(0,max(sapply(1:nchains,function(i) max(dens2[[i]]$y))))

plot(NA,NA,xlim=lim,ylim=ylim2)
for(i in 1:nchains)
 lines(dens2[[i]],col=col[i])

abline(v=mu[2],lty=2,col="grey")

Posted in Programming, R | Tagged , | Comments Off on First Attempts with the parallel Package

The Charleston Gazette: Mountaintop removal: The scientific evidence of environmental and health damage continues

Brenee Muncy’s work also got mention in an article by Keith Ward Jr. published on the Charleston Gazette’s blog on October 23.

Posted in Uncategorized | Comments Off on The Charleston Gazette: Mountaintop removal: The scientific evidence of environmental and health damage continues

Takepart: Sacrificing Wildlife for Big Coal in Appalachia

On October 31 covered Brenee Muncy’s salamander work in an article published on Takepart — “a digital news & lifestyle magazine and social action platform for the conscious consumer”.

 

Posted in Uncategorized | Comments Off on Takepart: Sacrificing Wildlife for Big Coal in Appalachia

Mountaintop Removal Mining Reduces Stream Salamander Occupancy and Richness in Southeastern Kentucky (USA) Biological Conservation

Back at the end of September Brenee Muncy’s paper on the effects of mountain top mining on salamander occupancy was accepted for publication in Biological Conservation. The full text is now available online. This paper uses Bayesian occupancy models to compare communities of salamanders in pristine streams and streams impacted by mountain top removal mining in southeastern Kentucky. The results show consist decrease in the probability of occupancy at the sites impacted by mountain top removal mining for all species. The overall species richness for salamanders was 2.27 (±1.27 SD) in the impacted streams versus 4.67 (±0.65 SD) for control streams, showing a clear negative effect of mountain top mining practices on these delicate environments.

Posted in Publications and Presentations | Tagged , , , | Comments Off on Mountaintop Removal Mining Reduces Stream Salamander Occupancy and Richness in Southeastern Kentucky (USA) Biological Conservation

Ben Augustine Presents at TWS

On October 27, Ben Augustine presented his work on mark-recapture models with behavioral effects and subsampling at the 2014 Annual Conference of The Wildlife Society in Pittsburgh, PA. This work has also been accepted for publication in Methods in Ecology and Evolution and is available online. You can read more about this work in our previous post.

Posted in Publications and Presentations, Uncategorized | Tagged , , | Comments Off on Ben Augustine Presents at TWS

Accounting for Behavioral Response to Capture when Estimating Population Size from Hair Snare Studies with Missing Data

Ben Augustine’s paper on behavioral effects and subsampling has been accepted for publication in Methods in Ecology and Evolution and is now available online. The work is particularly applicable to studies which use hair snares to capture animals. Snares are often baited causing a positive behavioral effect. Moreover, researchers often collect more samples than can be analyzed and so they identify individuals from only a fraction of the samples. This makes it possible to miss the first time that an individual is captured and the standard closed model with behavioral effects, M_{tb}, produces biased estimates of abundance when the subsampling is ignored. Ben’s approach treats the number of hair samples collected on each occasion for each individual as missing data and samples from the joint posterior distribution using MCMC sampling. Work is currently underway to extend the model to allow for multiple traps per occasion and different subsampling strategies.

Posted in Publications and Presentations, Uncategorized | Comments Off on Accounting for Behavioral Response to Capture when Estimating Population Size from Hair Snare Studies with Missing Data

A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models

We continued our discussion of prior distributions for mark-recapture models including covariates in lab today by looking at the paper A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models published by Andrew Gelman et al in the Annals of Applied Statistics in 2008. Surprise, surprise, the paper proposes a new, weakly informative prior distribution that the authors argue can be used as a default when fitting logistic regression models. Specifically, the authors recommend using independent, scaled Cauchy priors with mean 0 and scale 10 for the intercept, 2.5 all predictors where it is assumed that continuous predictors are standardized to have mean 0 and standard deviation .5 and binary predictors are standardized to have mean 0 and a difference of 1 between the upper and lower values.

Gelman’s argument for this prior is that we should expect that, in most cases, a difference of one standard deviation a predictor should change the linear predictor by at most 5 units “which would move the probability from 0.01 to 0.50 or from 0.50 to 0.99”. The paper also provides a very nice extension of the Iteratively Weighted Least Squares algorithm that gives a fast approximation of the posterior without the need for MCMC or other sampling methods.

One thing that did perplex me is the appliation to the biassay data from Racine et al. (1986). The plot in Figure 3 does not convince me that the new prior is doing any better. The fitted curve from glm just looks like a better fit! Also, it seems that the posterior is very sensitive to the prior posterior mean is almost halved. I think that I am thrown off by the size of the data — n=20 — and it would have been helpful to me to see credible envelopes for the two curves to judge the magnitude of uncertainty. I’d be more convinced if I saw that the fitted curve from glm lie inside the envelope from bayesglm and vice versa.

 

Posted in Uncategorized | Leave a comment