ST 524NCSU - Fall 2008
Due: 12/08/08
Final TAKE HOME EXAM – FALL 08
Analysis - Disease Progress Curve :Controlling papaya ring spot virus
Data kindly provided by Pedro Torres, graduate student in Statistics and TA for our course. Data was used for a study that is summarized below.
- Title:Nonlinear models for analyzing disease progress
- Authors:Raúl Macchiavelli, Wilfredo Robles, Edwin Abreu and Alberto Pantoja.College of Agricultural Sciences,Univ. of Puerto Rico – Mayagüez
Monitoring plant diseases
Diseases are normally monitored over time, assessing the amount of disease present in a population of plants:
“Disease Progress Curve”
Represents an interpretation of all host, pathogen and environmental effects occurring during an epidemic (Campbell and Madden, 1990) / Disease Progress Curve - Models
Yamount of disease
Proportion of diseased trees (out of 20)
ttime
dY/dtabsolute rate of disease increase (or decrease)
Quantitative description of epidemics:
dY/dt vs. Y, dY/dt vs. t
Logistic Disease Progress Curve
/ Gompertz Disease Progress Curve
Controlling papaya ring spot virus
Twenty plots were planted, each with 20 papaya plants
There were 4 different treatments for the control of certain insects (aphids) which are vectors of the virus
oControl (no weeds) T
oPlastic (black) PC
oPlastic (silver) PP
oWeeds M
Each treatment was randomly assigned to 5 plots (CRD)
/The experiment was monitored weekly
(8 weeks)
Each week, every plant was checked to see whether it showed symptoms
Once the plant showed symptoms, it was classified as diseased for the rest of the experiment
Analyzing the Data
The variable of interest is the disease index for treatment i, time j and plot k:
Traditional analysis
oFit separate curves for each plot using linear regression
oCompare slopes for different treatments
/Problems with traditional approach
oNon normal distribution
oNon linear models
oNon constantVariances
oObservations of the same tree in different weeks are dependent
oObservations on the same plot may not be independent (contagion)
Generalized linear models
oThe linear component is defined like in traditional linear models:
oA monotonic differentiable link function g describes how the expected value of y, , related to the linear predictor:
oThe response variables y are independent and have a probability distribution from an exponential family.This implies that the variance of the response depends on the mean through a variance functionV:
oThe dispersion parameter is either assumed known (for example, for the binomial distribution, =1) or it must be estimated to account for overdispersion.
a)Nonlinear fitting of observed proportion of diseased plants (out of 20 plants per plot) using PROC NLIN in SAS
Need to set up dummy variables for treatments
Needs initial values of parameters
- Logistic Fit
- Gompertz Fit
b)Estimation of parameters and treatment effect of Generalized Linear Model with PROC GENMOD in SAS and
oNo need for initial estimates
oUse of CLASS statement sets up dummy variables (treatment effect) directly
oUse of CONTRAST statement allows comparing treatment effect, equality of slopes, etc.
oUse of ESTIMATE statement allows predictions, confidence intervals.
i = 1,2,3,4 treatments
j=1,2,3,4,5,6, 7, 8 timepoints
k = 1,2,3,4, 5 blocks
c)Note
Non linear models with normal residuals (NLIN) do not take into account actual distribution or longitudinal nature.
- Because of contagion, Number of diseased trees (out of 20) is not a binomial random variable, variance may not correspond to a binomial random variable.
Non linear models fitting a binomial distribution with possibly overdispersion do not take into account longitudinal nature.
Overdispersion parameter may be estimated as the square root of deviance divided by its degrees of freedom. If the ratio (deviance/d.f) is greater than 1 indicates that overdispersion is present. Use scale= deviance option in PROC GENMOD, MODEL statement, to fit a binomial with overdispersion.Standard errors and tests are adjusted to account for extra variation
d)Estimation of the parameters of Generalized Linear Model with PROC NLINMIXED in SAS
Repeated observations from the same plot are correlated, same random plot effect.
i = 1,2,3,4 treatments
j=1,2,3,4,5,6, 7, 8 timepoints
k = 1,2,3,4 blocks
No accounting for correlation between measurements within same tree.
Questions
PROC NLIN is used to fit two disease curves.
Q1.a.Write down both estimated equations.
Q1.b.Calculate R2, measure of goodness of fit,
Q1.c.which model, GOMPERTZ or Logistic, shows better fit?
PROC GENMOD is used for a fitting the number of diseased trees within each plot as a binomial random variable with n=20 (trees) and the probability for a tree being diseased as a function of Treatment and Day. Full model fits four slopes (for linear time effect), one for each treatment and four separate intercepts (treatment effects).
Q1.d.Would you recommend to adjust for overdispersion?.
Contrasts test
☼whether slopes for plastic covers have same effects
☼whether slopes for control and weedy condition are the same.
☼whether average slope for “plastic” is equal to average slope for “nonplastic”
☼whether effects of plastic covers are the same
☼whether effects of control and weedy treatments are the same.
☼whether average effect for “plastic” treatments is equal to average effect for “nonplastic” treatments
Q1.e. Which model do you select, based on above results (PROC GENMOD)? Make reference to contrasts.Write down conclusions. Indicate limitations.
PROC NLMIXED is used to fit a model taking into account the distribution of the number of diseased trees within a plot as a binomial random variable with parameter depending on the treatment and time of measurement. Random block effects are also included in model. Full model fits separate slopes and intercepts for each treatment group, while the second model fits two models with common intercept and slope for treatments T and M and separate common intercept and slope for treatment PP and PC.
Q1.f.Which model do you select, make reference to contrasts. Limitations.
Q1.g.Write down the model for the proportion of diseased trees in a plot receiving a silver plastic cover at day t.
Q1.h.Interpret coefficients of model.
Q1.i.Write down the equation for the prediction of response in a plot with a silver plastic cover at day 85. And similarly for a weedy plot at day 85.
Question 2
Scientific paper:
Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5, 1989,
Question 2 will Refer only to Experiment1 in the above paper.
Please answer the following
Description
Q2.1.Objective
Q2.2.Response Variable
Q2.3.Indicate What Are The Different Experimental Units,
- Main-Unit
- Sub-Unit
- Sub-Sub Unit:
Q2.4.How Is A Block Defined?
Q2.5.What Are The Factors And Their Type: Random Fixed,
Counting
Q2.6.Number Of Blocks
Q2.7.Total Number Of Main-Units
Q2.8.Total Number Of Sub-Units
Q2.9.Total Number Of Sub-Sub-Units
Q2.10.How Many Main-Unit Within Each Block
Q2.11.How Many Sub-Units Within Each Main-Unit
Q2.12.How Many Sub-Sub-Units Within Each Sub-Unit
Statistical Analysis
Q2.13.Linear Model, based on above information.
Q2.14.Present The ANOVA Table, Sources Of Variation, Df, Ms If Possible,
Q2.15.Sub-Sub-Plot Factor And Interactions Was Analyzed As A Randomized Complete Block Design. IndicateThe Number Of Blocks That Should Be Considered.
Q2.16.Compare Your ANOVA Table With Table 2. Experiment 1. If any discrepancies are observed, please explain Them.
Q2.17.Describe An Alternative Plan Of Statistical Analysis For The Described Model.
Q3. This question ask you to write down a description of your research project, indicating
Q3.1.Objective
Q3.2.Response Variable
Q3.3.Experimental design. Detailed description
- Indicate What Are The Different Experimental Units,
- Main-Unit
- Sub-Unit (if any)
- Sub-Sub Unit (if any)
- How Is A Block Defined?
- What Are The Factors And Their Type: Random Fixed,
Q3.4.Present the Analysis of Variance table
- Sources of Variation (SOV)
- Degrees of Freedom
- Expected Mean Squares
- F test for each SOV
Q3.5.What type of statistical tests do you plan to carried on results to answer your research questions: pairwise mean comparisons, contrasts, orthogonal polynomial contrasts, curve fitting, etc
Q3.6.Do you have repeated measures, how do you plan to analyze them?
References
From North Dakota Agricultural Exp Station-Research Project Guidelines
Procedures:
This section is to provide a general design of the project. To begin, re-state each of the objective statements followed by a description of the procedures/methods for that objective. The procedure statements should show that the research needs and plans have been considered carefully and the proposed work has the potential to provide data and information which will permit accomplishing the objectives.
While the details of the experimental design do not need to be specified, provide sufficient information to indicate that an appropriate design is planned.
1
Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5, 1989,