Sunday – April 28th
7:30 – 8:30 am Registration and Check–in for Workshop Participants,
Pre-convene Continental Breakfast in Foyer ABCD
Workshop 8:30 am – 5:00pm ABC-McDowell/Tuttle/Alcove
Applied Statistics in Agriculture Short Course
Statistical Graphics in Agriculture
Kevin Wright – Research Scientist at DuPont Pioneer
This course will illustrate the use of statistical graphics for agricultural
data. We will start with understanding perception of the basic building blocks
for graphics and how people perceive those elements. With that background, we
will cover specific graphical techniques for simple data and move on to more
complex genotype-by-environment interactions that include biplots, partial
least squares, and stability measures. We will look at data for field
experiments and consider aspects of data quality and graphics for visualizing
the results of mixed models. We will touch briefly on semi-graphical
techniques, dynamic graphics, and a gallery of graphics. Finally, we will
consider how to get from a basic graphic to a polished product ready for
presentation or publication. Graphics in R will be discussed briefly along
the way.
Break: 10:00 – 10:15 am
Lunch: Noon – 1:00 pm in Big Basin Ballroom D
Break: 2:30 – 2:45 pm
Please note: Break times are approximate
Notes
Monday – April 29th
8:00 –10:00 am Registration for Conference Participants,
Pre-Convene Continental Breakfastin Foyer ABCD
8:30 – 8:45 am Welcome ABC-McDowell/Tuttle/Alcove
Session #1A, 8:45 -9:15 am ABC-McDowell/Tuttle/Alcove
Non-Normal Data in Agricultural Experiments
Walt Stroup – University of Nebraska-Lincoln
Once there were two ways to deal with non-normal data from designed experiments. The first assumed the robustness of the Central Limit Theorem: ANOVA tests means; sample means are approximately normal even if the data aren’t; if the experiment is well-designed, all will be well. Some derided this as the “maybe if we don’t acknowledge it, it won’t really exist” approach. The other approach was to transform the data. Arguably there was a third way – nonparametric methods – but nonparametrics are not well-suited to complex experiments and thus their use in agriculture has been limited. Advances in computers and modeling over the past couple of decades have greatly expanded our options. In theory, we can apply generalized and mixed models to experiments of near arbitrary complexity with data from a wide variety of distributions. With expanded options come dilemmas. We have software choices – R, SAS among many others. Models have conditional and marginal formulations. There are GLMMs, GEEs among a host of other acronyms. There are different estimation methods – linearization (e.g. pseudo-likelihood), integral approximation (e.g. quadrature) and Bayesian methods. How do we decide what to use? How much, if anything, do we lose if we ignore the new and trendy stuff and revert to transformations? I am tempted to call this talk “When does the CLT CYA and when is it a TLA that fails to CYA?” In 2011, I introduced a design-to-model thought process I called WWFD (What Would Fisher Do – inspired by Fisher’s comments in a 1935 JRSS publication). In this talk, I’ll show how we can use this process to clarify our thinking about probability processes we conceptualize as giving rise to data in designed experiments and how we can use the results to help us understand what the various options for non-normal data actually do, how to evaluate their small-sample behavior, and how to make informed choices based on these insights.
Session #1B, 9:15-9:45 am ABC-McDowell/Tuttle/Alcove
On the Small Sample Behavior of Generalized Linear Mixed Models with Complex Experiments
J. Couton and W. W. Stroup – University of Nebraska-Lincoln
Generalized linear mixed models (GLMMs), regardless of the software used to implement them (R, SAS, etc.), can be formulated as conditional or marginal models and can be computed using pseudo-likelihood, penalized quasi-likelihood, or integral approximation methods. While information exists about the small sample behavior of GLMMs for some cases- notably RCBDs with Binomial or count data- little is known about GLMMs for continuous proportions (e.g. Beta), time-to-event (e.g. Gamma) data, or for more complex designs such as the split-plot. In this presentation we review the major model formulation and estimation options and compare their small sample performance for cases listed above.
Session #1C, 9:45 -10:15 am ABC-McDowell/Tuttle/Alcove
Estimation of Dose Requirements for extreme levels of efficacy
Mark Westand Guy Hallman – USDA Agricultural Research Service
The objective of this paper is to explore the extent of how dose-response models may be used to estimate extreme levels of efficacy for controlling insect pests and possibly other uses. Probit-9 mortality (99.9968% mortality) is a standard for treatment effectiveness in tephritid fruit fly research, and has been adopted by the United States Department of Agriculture for fruit flies and other pests. Data taken from the phytosanitary treatment (PT) literature are analyzed. These data are used to fit dose-response models with logit, probit and complimentary log-log links. The effectiveness of these models for predicting extreme levels of efficacy is compared using large (~100,000+ individuals) confirmatory trials that are also reported in the PT literature. We examine the role of model goodness-of-fit as a requirement for obtaining reliable dose requirements.
10:15 am Break & Poster Session Big Basin Ballroom D
Session #2, KEYNOTE ADDRESS, 10:45 – 12:00 pm ABC-McDowell/Tuttle/Alcove
Issues in Statistical and Graphical Literacy
Kevin Wright – Research Scientist at DuPont Pioneer
One of the defining attributes of statistics is the study of variability. Yet the results of statistical analyses often are presented as point estimates, sometimes without context and without conveying the variability, and without answering the specific question.
Examples:
+ Does salt contribute to high blood pressure? Have you *seen* the result, or only *heard* the result?
+ Are you troubled to hear that exposure to a carcinogen doubles your risk of cancer? What if this risk was visualized in comparison to other risks?
+ HIV tests report positive/negative. How accurate is this test? How can this accuracy be *visualized* for consumers?
+ What can be learned from a single p-value? What can be learned by looking at the distribution of p-values? How does this relate to the number of published papers that are not reproducible?
+ Monthly changes in the unemployment rate are often explained. How often is the explanation: "no change within the bounds of sampling error"? Would a graphic like Sparklines help?
+ Forensic experts can provide a probability that two DNA samples come from the same person. But the real question is different: "What is the probability of *a particular sample* of DNA matching a defendant.” This is a different question and necessitates consideration of factors such as the probability of laboratory error.
12:00 pm Lunch Big Basin Ballroom D
Session #3A, 1:30 – 2:00 pm ABC-McDowell/Tuttle/Alcove
Five things I wish my Mother had told me, about Statistics that is
Philip Dixon – Iowa State University
This talk is a collection of data analysis stories that illustrate some general points that I wish I had learnt a lot earlier in my career. These include: 1) Simpson’s paradox is everywhere. 2) A numerical optimization routine may think it has converged, but be aware that it might not have really converged. 3) You can’t always trust the Satterthwaite approximation. Be especially careful if you need to estimate a linear combination of variance components that has one or more negative coefficients. 4) BLUP’s are wonderful things. 5) It’s good to know Reverend Bayes. He can be a great help in many problems.
Session #3B, 2:00 – 2:30 pm ABC-McDowell/Tuttle/Alcove
Thou shall not brush your teeth while eating breakfast – a 7-step program for researchers previously hurt in data analysis
Edzard van Santen –Auburn University
After years of providing statistical advice to fellow faculty members and graduate students I have come to realize that it is not necessarily the big issues but lack of knowledge of basic data analysis principles that get my clients into trouble. My claim is that if researchers and students internalized two basic definitions they would not have any problems analyzing most of their experiments. The definitions of Experimental Unit (EU) as the smallest physical unit to which a treatment may be applied and Experimental Error (Exp. Err.) as the variation among EUs treated alike are the basis for successful data analysis of experiments. I follow a seven-step data analysis program for my graduate student and faculty clients: (1) Understanding the experiment; (2) Checking the data; (3) Getting a feel for the data; (4) Checking underlying assumptions; (5) Testing; (6) Estimating; and (7) Interpreting results. Clients who have adhered to the program generally have had fewer problems than clients who for some reason or another did not get on board of the program. I will also touch on the implications for teaching experimental design and data analysis to non-statistics majors.
Session #3C, 2:30 – 3:00 pm ABC-McDowell/Tuttle/AlcoveUse and Misuse of Multiple Comparisons Procedures of Means in Factorial Experiments
Siraj Omer – Agricultural Research Corporation
Multiple comparison procedures of means are frequently misused and such misuse may result in incorrect scientific conclusions. A review of the papers published in the Sudan Journal of Agricultural Research (SJAR) from 2005 to 2010 showed that in 150 papers some procedures was used for mean comparison in case of factorial experiments. The objectives of this study was to identify the most common errors made in the use of multiple comparison procedures of means in factorial experiments and to present correct methods. In 30 % and 20 of these papers an incorrect use of Pair-Wise test and multiple comparison test (MCT) was made and only 20 % could be considered entirely correct. Misuses of MCP were in comparison of levels of a quantitative factor, comparison of treatment means in a factorial arrangement and planned contrasts. In some cases, totally incorrect Duncan multiple range test were made. In conclusion, there is need for statistical reasoning in the future for evolving appropriate and convenience multiple comparison of means in factorial experiments for qualitative and quantitative level and in that way to appraise the right statistical differences. Enough statistical analysis for experimental design its results can be judged for validity and may serve as a basis for the design of future experiments.
3:00pm Break & Poster Session Big Basin Ballroom D
Session #4A, 3:30 – 4:00 pm ABC-McDowell/Tuttle/AlcoveCharacterizing Benthic Macroinvertebrate Community Responses to Nutrient Addition Using NMDS and BACI Analyses
Bahman Shafii and William Price – University of Idaho
Wayne Minshall – Idaho State University
Charlie Holderman –Kootenai Tribe of Idaho
Paul Anders –Cramer Fish Sciences
Gary Lester and Pat Barrett –EcoAnalysts, Inc.
Nonmetric multidimensional scaling (NMDS) is an ordination technique which is often used for information visualization and exploring similarities or dissimilarities in ecological data. In principle, NMDS maximizes rank-order correlation between distance measures and distance in the ordination space. Ordination points are adjusted in a manner that minimizes stress, where stress is defined as a measure of the discordance between the two kinds of distances. Before and After Control Impact (BACI) is a classical analysis of variance method for measuring the potential influence of an environmental disturbance. Such effects can be assessed by comparing conditions before and after a planned activity. In certain ecological applications, the extent of the impact is also expressed relative to conditions in a control area, after a particular anthropogenic activity has occurred. In this paper, two statistical techniques are employed to investigate the effect of stream nutrient addition on a benthic macroinvertebrate community. The clustering of sampling units, based on multiple macroinvertebrate metrics across pre-determined river zones, is explored using NMDS. BACI is subsequently used to test for the potential impact of nutrient addition on the specified macroinvertebrate response metrics. The combination of the two approaches provides a powerful and sensitive tool for detecting complex second-order effects in the river food chains. Statistical techniques are demonstrated using eight years of benthic macroinvertebrate survey data collected on an ultra-oligotrophic reach of the Kootenai River in Northern Idaho and Western Montana downstream from a hydro-electric dam.
Session #4B, 4:00 – 4:30 pm ABC-McDowell/Tuttle/Alcove
Fitting population models when detection is imperfect
Trevor Hefley, Andrew Tyre and Erin Blankenship – University of Nebraska-Lincoln
Population time series data from field studies are complex and statistical analysis requires models that describe nonlinear population dynamics and observational errors. State-space formulations of stochastic population growth models have been used to account for measurement error caused by the data collection process. Parameter estimation, inference, and prediction are all sensitive to measurement error assumptions. In particular, the observational process may also result in incomplete detection of individuals. We developed an N-mixture state-space modeling framework to estimate and correct for errors in detection while estimating population model parameters. We tested our methods using simulated data sets and compared the results to those obtained with state-space models when detection is perfect and when detection is ignored. Our N-mixture state-space model yielded parameter estimates of similar quality to a state-space model when detection is perfect. Our results show that ignoring detection errors in population time series analysis can lead to disastrously wrong estimates. We recommend that researchers consider the possibility of detection errors when analyzing population time series data.
5:00 – 7:00 pm Flint Hills Discovery Center, 25th Annual Conference Celebration
8:30 – 10:30 pm Kansas Country Dance at the Hilton Garden Inn and Convention
Center in Big Basin Ballroom D
Tuesday – April 30th Continental Breakfastis Available in Foyer ABCD
Session #5A, 8:30-9:00 am ABC-McDowell/Tuttle/AlcoveAccounting for heterogeneous pleiotropy in whole genome selection models
N. M. Bello – Kansas State University;
J. P. SteibelandR. J. Tempelman – Michigan State University
The additive genetic correlation between economically relevant traits is generally considered a critical factor determining the relative advantage of multi-trait models over single-trait models for whole genome prediction of genetic merit. Yet, the additive genetic correlation between traits may be considered an aggregate summary of between-trait correlations at the individual QTL level, thereby defining pleiotropic mechanisms by which individual genes have simultaneous effects on multiple phenotypic traits. Pleiotropic effects, in turn, may be gene specific and heterogeneous across the genome. In this study, we present a hierarchical Bayesian extension to bivariate genomic prediction models that accounts for heterogeneous pleiotropic effects across SNP markers. More specifically, we elicit a function of the SNP marker-specific correlation between traits as heterogeneous across markers following a square-rootCholeskyreparameterization of the marker-specific covariance matrix that ensures necessary positive semidefinite constraints. We use simulation studies to demonstrate the properties of the proposed methods. We assess the relative performance of the proposed method by comparing prediction accuracy for genomic breeding values and for SNP marker effects for each of two traits across putative scenarios of homogeneous and heterogeneous pleiotropic genetic mechanisms. We also consider extensive model comparisons for cases of null and non-null additive genetic correlations under conditions of high and low heritability of the traits of interest. Overall, the relative advantage of genomic prediction bivariate models that account for heterogeneous pleiotropy relative to their univariate counterparts was of small magnitude and seemed to depend upon trait heritability and genetic architecture of the pleiotropic mechanisms. The trade-off between methodological and computational modeling complexity and net gain in prediction accuracy is also discussed.
Session #5B, 9:00 – 9:30 am ABC-McDowell/Tuttle/AlcoveComparing Functional Data Analysis and Hysteresis Loops when Testing Treatments for Reducing Heat Stress in Dairy Cows
Spencer Maynes, A.M. Parkhurst, T. L. Mader –University of Nebraska-Lincoln
J. B. Gaughan – The University of Queensland, Gatton, Australia
Average yearly monetary losses due to heat stress in dairy cattle have been estimated at $897 million in the US alone. Various techniques are commonly used to reduce heat stress, including sprayers and misters, shading, and changes in feed. Oftentimes studies are performed where researchers do not control the times when animals use shading or other means available to reduce heat stress, making it hard to test differences between treatments. Two methods are used on data from a study where Holstein cows were given free access to weight activated “cow showers.” Functional data analysis, or FDA, can be used to model body temperature as a function of time and environmental variables such as the Heat Load Index. Differences between treatment groups can be tested using functional analysis of variance. Alternatively hysteresis loops, such as the ellipse, formed by a plot of air temperature or the Heat Load Index against body temperature over the course of a day can be estimated and their parameters used to test differences between cows with access to showers and cows without. An R package developed at UNL, hysteresis that can estimate these loops and their parameters is shown. Functional data analysis allows for looser assumptions regarding the body temperature curve and the ability to look for differences between groups at specific time points, while hysteresis loops give the ability to look at heat stress over the course of a day holistically in terms of parameters such as amplitude, lag, central values and area.
Session #5C, 9:30 – 10:00 am ABC-McDowell/Tuttle/Alcove
Using Functional Data Analysis to Evaluate Effect of Shade on Body Temperature of Feedlot Heifers During Environmental Heat Stress
Fan Yan,A. M. Parkhurst –University of Nebraska-Lincoln
C. N. Lee –University of Hawaii-Manoa
Heat stress can be a serious problem for cattle. Body temperature (Tb) is a good measure of an animal’s thermo-regulatory response to an environmental thermal challenge. Previous studies performed in controlled chambers found that Tb increases in response to increasing ambient temperature. However, when animals are in an uncontrolled environment, Tb is subject to many uncontrolled environmental factors, such as radiation, wind, humidity, etc., that increase variation in the data. Hence, functional data analysis (FDA) was applied to model Tb as curves over two weeks (from July 27 to Aug 5) for animals exposed to uncontrolled environmental factors. Breed (Angus, MARC-III, MARC-I, Charolais) and availability of shade (access versus no access to sun shade) were incorporated as treatment factors in the statistical model. This study illustrates the potential of FDA to retain all information in the curves. The specific objectives of this study are to use FDA to smooth Tb with large variation, to detect treatment effects on Tb, and to assess the interactions between breed and availability of shade with functional regression coefficients. The results show that FDA can be used to detect significant treatment interactions that may otherwise remain undetected using regular linear or nonlinear models. The significant interactions indicates that access to sun shade influences the way animals respond to a thermal challenge. Overall, it was found that breeds of cattle with dark-hides were more affected by temperature changes and peak temperatures than breeds of cattle with light-hides. Angus cattle (black) had the highest body temperatures in both shade and no shade areas, while Charolais (white) had the lowest body temperatures in the no shade area. However, the interaction showed MARC III (dark red) experienced the largest temperature differential between shade and no shade. Therefore, breed and availability of shade interactions are important considerations when making predictions to aid in management decisions involving feedlot cattle.