ST 524NCSU - Fall 2008

Due: 12/08/08

Final TAKE HOME EXAM – FALL 08

Analysis - Disease Progress Curve :Controlling papaya ring spot virus

Data kindly provided by Pedro Torres, graduate student in Statistics and TA for our course. Data was used for a study that is summarized below.

  • Title:Nonlinear models for analyzing disease progress
  • Authors:Raúl Macchiavelli, Wilfredo Robles, Edwin Abreu and Alberto Pantoja.College of Agricultural Sciences,Univ. of Puerto Rico – Mayagüez

Monitoring plant diseases
Diseases are normally monitored over time, assessing the amount of disease present in a population of plants:
“Disease Progress Curve”
Represents an interpretation of all host, pathogen and environmental effects occurring during an epidemic (Campbell and Madden, 1990) / Disease Progress Curve - Models
Yamount of disease
Proportion of diseased trees (out of 20)
ttime
dY/dtabsolute rate of disease increase (or decrease)
Quantitative description of epidemics:
dY/dt vs. Y, dY/dt vs. t
Logistic Disease Progress Curve
/ Gompertz Disease Progress Curve

Controlling papaya ring spot virus

Twenty plots were planted, each with 20 papaya plants

There were 4 different treatments for the control of certain insects (aphids) which are vectors of the virus

oControl (no weeds) T

oPlastic (black) PC

oPlastic (silver) PP

oWeeds M

Each treatment was randomly assigned to 5 plots (CRD)

/

The experiment was monitored weekly

(8 weeks)

Each week, every plant was checked to see whether it showed symptoms

Once the plant showed symptoms, it was classified as diseased for the rest of the experiment

Analyzing the Data

The variable of interest is the disease index for treatment i, time j and plot k:

Traditional analysis

oFit separate curves for each plot using linear regression

oCompare slopes for different treatments

/

Problems with traditional approach

oNon normal distribution

oNon linear models

oNon constantVariances

oObservations of the same tree in different weeks are dependent

oObservations on the same plot may not be independent (contagion)

Generalized linear models

oThe linear component is defined like in traditional linear models:

oA monotonic differentiable link function g describes how the expected value of y, , related to the linear predictor:

oThe response variables y are independent and have a probability distribution from an exponential family.This implies that the variance of the response depends on the mean through a variance functionV:

oThe dispersion parameter is either assumed known (for example, for the binomial distribution, =1) or it must be estimated to account for overdispersion.

a)Nonlinear fitting of observed proportion of diseased plants (out of 20 plants per plot) using PROC NLIN in SAS

Need to set up dummy variables for treatments

Needs initial values of parameters

  • Logistic Fit

  • Gompertz Fit

b)Estimation of parameters and treatment effect of Generalized Linear Model with PROC GENMOD in SAS and

oNo need for initial estimates

oUse of CLASS statement sets up dummy variables (treatment effect) directly

oUse of CONTRAST statement allows comparing treatment effect, equality of slopes, etc.

oUse of ESTIMATE statement allows predictions, confidence intervals.

i = 1,2,3,4 treatments

j=1,2,3,4,5,6, 7, 8 timepoints

k = 1,2,3,4, 5 blocks

c)Note

Non linear models with normal residuals (NLIN) do not take into account actual distribution or longitudinal nature.

  • Because of contagion, Number of diseased trees (out of 20) is not a binomial random variable, variance may not correspond to a binomial random variable.

Non linear models fitting a binomial distribution with possibly overdispersion do not take into account longitudinal nature.

Overdispersion parameter  may be estimated as the square root of deviance divided by its degrees of freedom. If the ratio (deviance/d.f) is greater than 1 indicates that overdispersion is present. Use scale= deviance option in PROC GENMOD, MODEL statement, to fit a binomial with overdispersion.Standard errors and tests are adjusted to account for extra variation

d)Estimation of the parameters of Generalized Linear Model with PROC NLINMIXED in SAS

Repeated observations from the same plot are correlated, same random plot effect.


i = 1,2,3,4 treatments

j=1,2,3,4,5,6, 7, 8 timepoints

k = 1,2,3,4 blocks

No accounting for correlation between measurements within same tree.

Questions

PROC NLIN is used to fit two disease curves.

Q1.a.Write down both estimated equations.

Q1.b.Calculate R2, measure of goodness of fit,

Q1.c.which model, GOMPERTZ or Logistic, shows better fit?

PROC GENMOD is used for a fitting the number of diseased trees within each plot as a binomial random variable with n=20 (trees) and the probability for a tree being diseased as a function of Treatment and Day. Full model fits four slopes (for linear time effect), one for each treatment and four separate intercepts (treatment effects).

Q1.d.Would you recommend to adjust for overdispersion?.

Contrasts test

☼whether slopes for plastic covers have same effects

☼whether slopes for control and weedy condition are the same.

☼whether average slope for “plastic” is equal to average slope for “nonplastic”

☼whether effects of plastic covers are the same

☼whether effects of control and weedy treatments are the same.

☼whether average effect for “plastic” treatments is equal to average effect for “nonplastic” treatments

Q1.e. Which model do you select, based on above results (PROC GENMOD)? Make reference to contrasts.Write down conclusions. Indicate limitations.

PROC NLMIXED is used to fit a model taking into account the distribution of the number of diseased trees within a plot as a binomial random variable with parameter depending on the treatment and time of measurement. Random block effects are also included in model. Full model fits separate slopes and intercepts for each treatment group, while the second model fits two models with common intercept and slope for treatments T and M and separate common intercept and slope for treatment PP and PC.

Q1.f.Which model do you select, make reference to contrasts. Limitations.

Q1.g.Write down the model for the proportion of diseased trees in a plot receiving a silver plastic cover at day t.

Q1.h.Interpret coefficients of model.

Q1.i.Write down the equation for the prediction of response in a plot with a silver plastic cover at day 85. And similarly for a weedy plot at day 85.

Question 2

Scientific paper:

Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5, 1989,

Question 2 will Refer only to Experiment1 in the above paper.

Please answer the following

Description

Q2.1.Objective

Q2.2.Response Variable

Q2.3.Indicate What Are The Different Experimental Units,

  1. Main-Unit
  2. Sub-Unit
  3. Sub-Sub Unit:

Q2.4.How Is A Block Defined?

Q2.5.What Are The Factors And Their Type: Random Fixed,

Counting

Q2.6.Number Of Blocks

Q2.7.Total Number Of Main-Units

Q2.8.Total Number Of Sub-Units

Q2.9.Total Number Of Sub-Sub-Units

Q2.10.How Many Main-Unit Within Each Block

Q2.11.How Many Sub-Units Within Each Main-Unit

Q2.12.How Many Sub-Sub-Units Within Each Sub-Unit

Statistical Analysis

Q2.13.Linear Model, based on above information.

Q2.14.Present The ANOVA Table, Sources Of Variation, Df, Ms If Possible,

Q2.15.Sub-Sub-Plot Factor And Interactions Was Analyzed As A Randomized Complete Block Design. IndicateThe Number Of Blocks That Should Be Considered.

Q2.16.Compare Your ANOVA Table With Table 2. Experiment 1. If any discrepancies are observed, please explain Them.

Q2.17.Describe An Alternative Plan Of Statistical Analysis For The Described Model.

Q3. This question ask you to write down a description of your research project, indicating

Q3.1.Objective

Q3.2.Response Variable

Q3.3.Experimental design. Detailed description

  1. Indicate What Are The Different Experimental Units,
  2. Main-Unit
  3. Sub-Unit (if any)
  4. Sub-Sub Unit (if any)
  5. How Is A Block Defined?
  6. What Are The Factors And Their Type: Random Fixed,

Q3.4.Present the Analysis of Variance table

  1. Sources of Variation (SOV)
  2. Degrees of Freedom
  3. Expected Mean Squares
  4. F test for each SOV

Q3.5.What type of statistical tests do you plan to carried on results to answer your research questions: pairwise mean comparisons, contrasts, orthogonal polynomial contrasts, curve fitting, etc

Q3.6.Do you have repeated measures, how do you plan to analyze them?

References

From North Dakota Agricultural Exp Station-Research Project Guidelines

Procedures:
This section is to provide a general design of the project. To begin, re-state each of the objective statements followed by a description of the procedures/methods for that objective. The procedure statements should show that the research needs and plans have been considered carefully and the proposed work has the potential to provide data and information which will permit accomplishing the objectives.
While the details of the experimental design do not need to be specified, provide sufficient information to indicate that an appropriate design is planned.

1

Effect of Plant Species and Environmental Conditions on Epiphytic Population Sizes of Pseudomonas syringae and other Bacteria. R. D. O’Brien and S. E. Lindow. 1989. the American Phytopathological Society. V. 79, No. 5, 1989,