Congratulations to Alex Draghici who also passed his PhD proposal defense today. Two in 24 hours. It’s a privilege to work with such excellent students!

Recent Posts
Archives
Categories
Tags
Meta
Congratulations to Alex Draghici who also passed his PhD proposal defense today. Two in 24 hours. It’s a privilege to work with such excellent students!
Congratulations to Hanna Kim who just passed her PhD proposal defence. Great job all around. I’m looking forward to seeing the work progress over the rest of your degree!
In the past I have always used is.null()
in R to test whether a list contains a specified element. E.g., if I wanted to know whether a list, mylist
contained the element x
I would have used
is.null(mytest$x)
This failed me today when I tried to test if a list contained an element M
when it also contained an element Mtot
. Because R completes names of elements in lists, the call mylist$M
will return the first element in the list whose name begins with M. This means that if the list contains Mtot
(or any other elements whose name starts with M then mylist$M
will be TRUE. Here is an example:
> mylist < list(Mtot = 5)
> is.null(mylist$M)
[1] FALSE
> mylist$M
[1] 5
The (much better) way around this is the function exists
. For example:
> exists("M", mylist)
[1] FALSE
> exists("Mtot", mylist)
[1] TRUE
Just stumbled across this article about former student Gabrielle Miles on the University of Kentucky’s website. Gabby is now a Senior Biostatistician with Roche Professional Diagnostics in Indianapolis.
Congratulations to Woody for successfully defending his PhD titled “Improving the Computational Efficiency in Bayesian Fitting of CormackJollySeber Models with Individual, Continuous, TimeVarying Covariates” on May 26. His thesis is now available online through the University of Kentucky archives. Woody is now working as a statistician with Pfizer pharmaceuticals in New London, Connecticut, where he’s been since last summer.
Amanda Ellis just gave a practice presentation for her qualifying exam tomorrow. The use of the smartdiagram lists and figures in her slides were particularly eye catching. I know what I’m doing for my next talk!
I wrote my first vignette for an R package the other day and was amazed at the ease of writing Rmarkdown. Of course, I am using emacs and was looking for a mode that would allow me to edit these files smoothly with syntax highlighting and the ability to send code chunks to R automatically. Et voila — Polymode. I followed the John StantonGeddes instructions for enabling markdownmode and polymode, but ran up against a few minor issues. The first was that copying and pasting the LISP code into my .emacs file introduced some errors — single quotes turned into backticks and doublequotes turned into smart quotes. In the end, I had to reformat these/type them in myself.
I also found that I needed to add the line:
(addtolist 'automodealist '("\\.Rmd" . polymarkdown+rmode))
to my .emacs file for polymode to start automatically when I opened a .Rmd file. This was mentioned in the polymode instructions which I should probably have read first. The full entries in my .emacs file for markdownmode and polymode are:
;;; Markdown mode
(autoload 'markdownmode "markdownmode" "Major mode for editing Markdown files" t)
(setq automodealist (cons '("\\.markdown" . markdownmode) automodealist))
(setq automodealist (cons '("\\.md" . markdownmode) automodealist))
(setq automodealist (cons '("\\.ronn?" . markdownmode) automodealist))
;;; Polymode
(setq loadpath (append '("/home/sbonner/.emacs.d/polymode/" "/home/sbonner/.emacs.d/polymode/modes") loadpath))
(require 'polyR)
(require 'polymarkdown)
(addtolist 'automodealist '("\\.Rmd" . polymarkdown+rmode))
Looking great!
Yesterday my paper with Danielle Bridger and Mark Briffa got coverage in Discover Magazine. I have to admit that it was Danielle and Mark who “peeked inside the crabs’ gonads”. I just helped out with the statistics.
As part of our lab meetings we discuss recent articles of interest. These articles are generally in ecological statistics, though they may be in fields relating to our projects (e.g., Bayesian inference). My students asked how to find articles, and so I thought I’d list some of the sources I keep an eye on.
I (try to) stay up to date with the articles published in the following journals:
Other journals that I should be watching include:
But, there is only so much time in the day. The easiest way to keep up with what’s published in these journals is to subscribe to the table of contents alerts. For example, alerts for JABES can be requested by providing your email address on the journal’s webpage.
The other good way to find recent articles is to search for specific authors or to look for new articles that cite old articles of interest. There are several tools that do this including the Web of Science and Google Scholar. These tools have different features and both have advantages and disadvantages. For example, the Web of Science allows you to limit searches by the field of the publication. On the other hand, Google Scholar provides tools that alert you when a specific author publishes new papers or is cited by other authors. For example, I use Google Scholar to follow the articles and citations of Bill Link from the USGS one of the leaders in Bayesian inference for the analysis of ecological data.
Several organizations also provide unified portals for journal alerts and allow for alerts using keyword searches as well (e.g. Zetoc). I have never used these myself. Just another thing to do!
Please feel free to comment if you think I’ve missed something important. It would be great to get more ideas.
I use LaTeX when writing my own papers. However, when collaborating with other authors, particularly those in biology, I have to used Microsoft Word. In the past I have converted LaTeX equations into images to embed into the Word document, but this is cumbersome and makes it difficult to edit the equations later.
I recently found Mr. Sparkle wonderful LaTeX in Word project which makes this a lot easier! The heart of this is Word document that contains embedded macros which use a remote server to process LaTeX code into embedded images. To use the package you simply download and extract the tarball, save it to a new file, and start typing away. The AltL keystroke will open a new window that you can type LaTeX commands into, and pressing the convert button will turn these commands into images. The best thing is that the code stays embedded in the document. With the same keystroke you can reopen the command window and edited the existing code. Fantastic!
I have been working on simulating data from Bob Dorazio’s version of the Royle Nmixture model for repeated count data. This version of the model assumes that the abundance at site is generated from a Hurdle model such that and , . This can be interpreted as specifying an occupancy rate of with abundance given by a zerotruncated Poisson with mean at the occupied sites.
I was originally generating the zerotruncated Poisson values using a while loop to simulate from a Poisson with mean until a nonzero value was generated. This is inefficient and runs into problems if is sufficiently small that the probability of a nonzero value is below computer precision.
To remedy this I found this nice solution from Peter Daalgard in response to a question from Spencer Graves on the Rhelp list. This solution essentially uses the inverse probability transform but truncates the CDF so that the values generated are guaranteed to be greater than 1. The same could be done for any distribution (that has a distribution function implemented in R). My code below implements this trick but adds a tolerance that avoids problems if the value of lambda is too small.
rtpois < function(n, lambda,tol=1e10){ ## Simulate from zerotruncated Poisson ## Initialize output x < rep(NA,n) ## Identify lambda values below tolerance low < which(lambda < tol) nlow < length(low) if(nlow > 0){ x[low] < 1 if(nlow < n) x[low] < qpois(runif(nnlow, dpois(0, lambda[low]), 1), lambda[low]) } else x < qpois(runif(nnlow, dpois(0, lambda), 1), lambda) return(x) }
The paper from Simon’s plenary at the 2013 EURING Technical Meeting has finally appeared “in print”. The quotation marks are because MEE is an online journal, but the paper final has a volume, issue, and page numbers. The full citation is:
Bonner, S.J. and Schofield, M. (2014). MC(MC)MC: Exploring Monte Carlo integration within MCMC for markrecapture models with individual covariates. Methods in Ecology and Evolution, 5(12):1305–1315.
The last time I worked with parallel processing in R I used the package Rmpi. This ws very flexible, but took a lot of setup to handle passing jobs, code, and data between the master and the slave. I have used the package snow for some simple projects, but I have new need for parallel computing and recently found out about the parallel package (something I should have known about earlier). This is my first attempt at using the mcparallel function from the parallel package.
As a simple test, I have used mcparallel to implement multiple chains for a MetropolisHastings sampler of the mean parameter in a multivariate lognormal distribution. That is, to sample from the posterior distribution of given a an iid sample of random pvectors such that . In the simulation I have set , , and .
The sampler I have implemented generates proposals using a symmetric multivariate normal proposal distribution with variance Sigma.prop. I have manually tuned the proposal variance to achieve acceptance rates just above .5 and visually pleasing traceplots.
The use of the parallel package comes in three steps:
1. Initialize the random number stream:
This part of the code selects and initializes the L’Ecuyer random number generator so that each chain uses a different RNG.
## Initialize random number generator to obtain different chains from ## each parallel run RNGkind("L'EcuyerCMRG") set.seed(7777) mc.reset.stream()
2. Start the parallel processes:
The second part of the code starts the parallels processes running. I have used the lapply function here so that the number of parallel processes is not hardwired and can be changed easily.
## Start parallel processes niter < 1000 Sigma.prop < .03*Sigma</p> MHprocesses < lapply(1:nchains,function(i){ mcparallel(MHalg(mu.init[i,],x,Sigma,n.iter,Sigma.prop)) })
3. Collect results:
The final chunk uses the mccollect function to wait for the parallel jobs to finish and collect the output.
## Wait and collect output MHout < mccollect(MHprocesses)
In this case, the chains could have been run in parallel using the mcapply function. This would avoid the need to set the RNG and to collect results manually. Both functions use forks and share memory until objects are modified, so they should not differ in speed or memory use. However, mcparallel seems to be more flexible and I will need the flexibility in some of projects.
Here is the full code. Any comments are welcome!
############################################################ # # parallel_1.R # # First test with the parallel package. # # In this example I run parallel chains for a Metroposlis # Hastings sampler for the multivariate lognormal # distribution using the mcparallel() function to # parallelize the code. # ############################################################ ##### Preliminaries ##### ## Load packages library(parallel) library(mvtnorm) ## Define functions dmvlnorm < function(x,mu,Sigma,log=FALSE){ ## Multivariate lognormal density dmvnorm(log(x),mu,Sigma,log=log) } rmvlnorm < function(n,mu,Sigma){ ## Multivariate lognormal random generator ## Generate normal random variates logx < rmvnorm(n,mu,Sigma) ## Exponentiate to get log normal random variates x < exp(logx) return(x) } MHstep < function(mu.curr,x,Sigma,Sigma.prop){ ## MetropolisHastings step ## Generate proposal mu.prop < mu.curr + rmvnorm(1,rep(0,length(mu.curr)),Sigma.prop) ## Compute Hastings ratio (on log scale) log.alpha < sum(dmvlnorm(x,mu.prop,Sigma,log=TRUE))  sum(dmvlnorm(x,mu.curr,Sigma,log=TRUE)) ## Accept or reject proposal if(rexp(1) < log.alpha) return(mu.prop) else return(mu.curr) } MHalg < function(mu,x,Sigma,niter,Sigma.prop){ ## Initialize storage mu.store < matrix(nrow=niter + 1,ncol=length(mu)) mu.store[1,] < mu ## Initialize acceptance monitor accept < rep(0,niter) ## Run MH algorithm for(k in 1:niter){ ## Perform one MH step mu < MHstep(mu,x,Sigma,Sigma.prop) ## Store values mu.store[k+1,] < mu ## Track acceptance if(all(mu.store[k+1,] != mu.store[k,])) accept[k] < 1 } ## Return sampled values return(list(mu=mu.store,accept=accept)) } ##### Run Samplers ##### ## Generate some random data mu < c(1,1) Sigma < .5 + .5 * diag(2) n < 25 x < rmvlnorm(n,mu,Sigma) ## Parameters n < 25 mu < c(1,.5) Sigma < .5 + .5*diag(2) ## Contour plot of true density xgrid < seq(0,30,length=100) Z < sapply(xgrid,function(x1){ dmvlnorm(cbind(x1,xgrid),mu,Sigma) }) contour(xgrid,xgrid,Z) ## Generate data x < rmvlnorm(n,mu,Sigma) ## Generate initial values nchains < 4 mu.init < rmvnorm(nchains,apply(log(x),2,mean),2*Sigma) ## Initialize random number generator to obtain different chains from ## each parallel run RNGkind("L'EcuyerCMRG") set.seed(7777) mc.reset.stream() ## Start parallel processes niter < 1000 Sigma.prop < .03*Sigma MHprocesses < lapply(1:nchains,function(i){ mcparallel(MHalg(mu.init[i,],x,Sigma,n.iter,Sigma.prop)) }) ## Wait and collect output MHout < mccollect(MHprocesses) ##### Examine Output ##### ## Onedimensional traceplots col < rainbow(nchains) par(mfrow=c(2,1)) trace1 < sapply(1:nchains,function(i) MHout[[i]]$mu[,1]) matplot(trace1,type="l",lty=1,col=col) abline(h=mu[1],lty=2,col="grey") trace2 < sapply(1:nchains,function(i) MHout[[i]]$mu[,2]) matplot(trace2,type="l",lty=1,col=col) abline(h=mu[2],lty=2,col="grey") ## Twodimensional traceplot par(mfrow=c(1,1)) lim < range(sapply(MHout,function(out) range(out$mu))) plot(NA,NA,xlim=lim,ylim=lim) for(i in 1:nchains) lines(MHout[[i]]$mu[,1],MHout[[i]]$mu[,2],col=col[i]) abline(v=mu[1],lty=2,col="grey") abline(h=mu[2],lty=2,col="grey") ## Acceptance rates sapply(1:nchains,function(i) mean(MHout[[i]]$accept)) ## Density plots par(mfrow=c(2,1)) dens1 < lapply(1:nchains,function(i){ density(MHout[[i]]$mu[,1],from=lim[1],to=lim[2]) }) ylim1 < c(0,max(sapply(1:nchains,function(i) max(dens1[[i]]$y)))) plot(NA,NA,xlim=lim,ylim=ylim1) for(i in 1:nchains) lines(dens1[[i]],col=col[i]) abline(v=mu[1],lty=2,col="grey") dens2 < lapply(1:nchains,function(i){ density(MHout[[i]]$mu[,2],from=lim[1],to=lim[2]) }) ylim2 < c(0,max(sapply(1:nchains,function(i) max(dens2[[i]]$y)))) plot(NA,NA,xlim=lim,ylim=ylim2) for(i in 1:nchains) lines(dens2[[i]],col=col[i]) abline(v=mu[2],lty=2,col="grey")
Brenee Muncy’s work also got mention in an article by Keith Ward Jr. published on the Charleston Gazette’s blog on October 23.
On October 31 Brenee Muncy’s salamander work in an article published on Takepart — “a digital news & lifestyle magazine and social action platform for the conscious consumer”.
covered
Back at the end of September Brenee Muncy’s paper on the effects of mountain top mining on salamander occupancy was accepted for publication in Biological Conservation. The full text is now available online. This paper uses Bayesian occupancy models to compare communities of salamanders in pristine streams and streams impacted by mountain top removal mining in southeastern Kentucky. The results show consist decrease in the probability of occupancy at the sites impacted by mountain top removal mining for all species. The overall species richness for salamanders was 2.27 (±1.27 SD) in the impacted streams versus 4.67 (±0.65 SD) for control streams, showing a clear negative effect of mountain top mining practices on these delicate environments.
On October 27, Ben Augustine presented his work on markrecapture models with behavioral effects and subsampling at the 2014 Annual Conference of The Wildlife Society in Pittsburgh, PA. This work has also been accepted for publication in Methods in Ecology and Evolution and is available online. You can read more about this work in our previous post.
Ben Augustine’s paper on behavioral effects and subsampling has been accepted for publication in Methods in Ecology and Evolution and is now available online. The work is particularly applicable to studies which use hair snares to capture animals. Snares are often baited causing a positive behavioral effect. Moreover, researchers often collect more samples than can be analyzed and so they identify individuals from only a fraction of the samples. This makes it possible to miss the first time that an individual is captured and the standard closed model with behavioral effects, , produces biased estimates of abundance when the subsampling is ignored. Ben’s approach treats the number of hair samples collected on each occasion for each individual as missing data and samples from the joint posterior distribution using MCMC sampling. Work is currently underway to extend the model to allow for multiple traps per occasion and different subsampling strategies.
We continued our discussion of prior distributions for markrecapture models including covariates in lab today by looking at the paper A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models published by Andrew Gelman et al in the Annals of Applied Statistics in 2008. Surprise, surprise, the paper proposes a new, weakly informative prior distribution that the authors argue can be used as a default when fitting logistic regression models. Specifically, the authors recommend using independent, scaled Cauchy priors with mean 0 and scale 10 for the intercept, 2.5 all predictors where it is assumed that continuous predictors are standardized to have mean 0 and standard deviation .5 and binary predictors are standardized to have mean 0 and a difference of 1 between the upper and lower values.
Gelman’s argument for this prior is that we should expect that, in most cases, a difference of one standard deviation a predictor should change the linear predictor by at most 5 units “which would move the probability from 0.01 to 0.50 or from 0.50 to 0.99”. The paper also provides a very nice extension of the Iteratively Weighted Least Squares algorithm that gives a fast approximation of the posterior without the need for MCMC or other sampling methods.
One thing that did perplex me is the appliation to the biassay data from Racine et al. (1986). The plot in Figure 3 does not convince me that the new prior is doing any better. The fitted curve from glm just looks like a better fit! Also, it seems that the posterior is very sensitive to the prior posterior mean is almost halved. I think that I am thrown off by the size of the data — n=20 — and it would have been helpful to me to see credible envelopes for the two curves to judge the magnitude of uncertainty. I’d be more convinced if I saw that the fitted curve from glm lie inside the envelope from bayesglm and vice versa.