Polynomial fitting

When one deals with a data of uknown functional behavior a typical approach is to fit a polynomial. This strategy is motivated by the fact that any function may be represented by a polynomial of certain degree (Taylor’s expansion). However the question remains what is a degree of the uknown polynomial. In the exercise you will fit to a given data successive polynomials of nth degree and verify the goodness of fit.

MATLAB functions:

Polyfit(x,y,n)

Polyval(p,x)

Subplot(n,m,k)

Problem 1

Having a set of data

X / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8 / 9
y / 5 / 6 / 10 / 20 / 28 / 33 / 34 / 36 / 42
  1. Write a script that fits polynomials of from 1st to 4th degree.
  2. Calculates for each fit SSE and R2.
  3. Chooses the best fit according to maximum R2 rule.

Problem 2

Having a set of data of vehicles (in millions) crossing the bridge each for 10 years:

Year / 2000 / 2001 / 2002 / 2003 / 2004 / 2005 / 2006 / 2007 / 2008 / 2009
Vehicle no / 2.1 / 3.4 / 4.5 / 5.3 / 6.2 / 6.6 / 6.8 / 7 / 7.4 / 7.8

This is an example of data with large values of the independent variable or dependent variable. In such a case due to large coefficients in the transformation matrix its inverse may carry large computational error. A common practice is to center and/or scale the data to lessen computational error. The example of scaling is presented below:

Write a script that performs following tasks:

  1. Performs a fit of a third degree polynomial to given set of data using polyfit function. Determine reliability of your fit by methods you already know (you can calculate value of your polynomial using polyval function).
  2. Scale the data using three methods from the set above and perform fit again. Is your accuracy improved? Is each model correctly reproduces your data?
  3. Plot on one plot original data and your three chosen models.

Problem 3

The following table gives data on the growth of a certain bacteria population with time.Fit an equation to these data. Your task is to find an appropriate functional form. You will try several polynomials and exponential function and decide which is the best model.

Time(min) / Bacteria(ppm) / Time (min) / Bacteria(ppm)
0 / 6 / 10 / 350
1 / 13 / 11 / 440
2 / 23 / 12 / 557
3 / 33 / 13 / 685
4 / 54 / 14 / 815
5 / 83 / 15 / 990
6 / 118 / 16 / 1170
7 / 156 / 17 / 1350
8 / 210 / 18 / 1575
9 / 282 / 19 / 1830

Write a script that:

  1. Performs a fit of a 1st, 2nd, 3rd, 4th and exp function ().
  2. Write a function that for a given set of data and a given model calculates r2.
  3. Use your function to calculate quality of each fit.
  4. Write a function that calculates for a given set of data and a given model residues of the model.
  5. Plot on one plot (using subplot function) distribution of the residuals and histogram of the residuals for each model (10 plots).
  6. Which of the plots exhibits a pattern? Which of the distributions is closest to normal distribution and why?

Using the information gathered by your analysis decide which functional form describes the data the best.