Due Date: 25 April 2011
This homework involves performing some basic analysis of time-series data. All data consists of arrival and inter-arrival times (in seconds) from a fixed point in time.
The data is found in the Excel spreadsheet file time project.xls. It consists of two spreadsheets, which are labeled. The first sheet is 20 days of arrival times (in seconds). Note that the total number of arrivals per day is not constant. The arrival times are pre-sorted from beginning to end of the time period. The second sheet is the inter-arrival times associated with the first spreadsheet.
There are several tasks you must do with this data, though none of them are particularly difficult (except possibly for the requested explanations). Here are the tasks/questions you must answer.
1. Poisson Distribution. Find the arrival rate λ for each day, using all the data, and using all the data referenced to the first point (subtract the value of the first point from all data and find λ for all points except the first).
Question: Why might these values for λ be different?
2. Sampling Distribution of Poisson Rate Parameter. Calculate the overall average λ for both sets of λ data from Task 1 above. Generate a histogram for the two sets of λ data from Task 1 above. Bin size is up to you; just make it reasonable. Plot the average as a vertical line on the relevant histogram.
Question: Is an assumption of a Gaussian distribution for the daily λ as a sampling distribution of an overall (average) λ reasonable? Why or why not?
3. Exponential Distribution. Plot histograms of the inter-arrival times for any 2 days (your choice). Plot (on the same graph) the exponential distribution corresponding to that set of inter-arrival times. You will need to normalize your histogram counts by dividing by the number of arrivals to have these plot on the same scale.
Question: Do the histograms and distributions match reasonably well? Why or why not?
4. Poisson Process. Take all 20 days of data and combine it into a single set of arrival times. Sort them, and calculate the inter-arrival times. Find the arrival rate λ for this concatenated set. Compare this value with (a) the sum of the individual λ values using all data, and (b) the sum of the individual λ values using all data minus the 1st point.
Question: Which sum better approximates the concatenated set? Explain why this might be so.
5. Poisson Process. Using the concatenated set (and thus the concatenated number of arrivals), find the expected number of arrivals (using the calculated λ) for T = 300, 600, 900, 1200, 1500, 1800 seconds. Compare with the “actual” number of arrivals through those time periods.
Question: Does the Poisson distribution reasonably describe the number of arrivals for the concatenated set?
6. Poisson Process. Generate a histogram for the inter-arrival times for the concatenated set. Bin sizes are up to you; just make them reasonable. Plot on the same graph the exponential distribution using the calculated value of λ. You will need to normalize your histogram counts by dividing by the number of arrivals to have these plot on the same scale.
Question: Does the exponential distribution reasonably describe the inter-arrival times for the concatenated set?
Because of all the plots and tables, not to mention the requested explanations, I strongly recommend you type this up. That will allow me to grade it much quicker, in a better mood (and therefore more likely to accept “interesting” explanations), and get it back to you.
The explanations do not necessarily have a set right or wrong answer. This is “real” data, with fairly small numbers of arrivals per time stream. It may not look like a “textbook” output. There may be many possible reasons for any deviations from “expected”.
Use your brains! Engineers are called upon to analyze data and answer questions. If you have a conclusion, defend it. If you do a good enough job, even if your answer is “wrong”, I’ll give you either full credit or very-high partial credit. This is a thought exercise.
Good luck.