IGR Report on GR/M10229/01

Novel Wavelet Methods in Statistics

Investigators: Guy Nason and Bernard Silverman, FRS

Background/Context

This report describes the theoretical, methodological and applied contributions to and advances in wavelet methods in statistics. The grant supported the research of Guy Nason and Bernard Silverman and employed four post-doctoral research workers: Dr Hee-Seok Oh, Dr Stuart Barber, Dr Maarten Jansen and Dr Kostas Triantafyllopolous.

Wavelets are a fairly recent mathematical development and the purpose of this grant was to explore their role in statistics. Informally, wavelets are short oscillating functions that can form efficient expansions of functions from a variety of interesting function spaces. The most popular wavelets are the Daubechies series that are a set of orthonormal basis functions with compact support. As well as providing sparse representations of possibly inhomogeneous functions wavelets are associated with rapid and efficient computational algorithms. Another wavelet advantage is their multiscale nature. In the early 90s, wavelets were first proposed for use in statistical curve estimation: the idea was to wavelet transform signal plus noise data, keep the large wavelet coefficients and kill the small ones (thresholding) and then invert to obtain a curve estimate. Wavelets are also playing a role in areas such as time series analysis, survival function estimation, and statistical image analysis.

Achievements summary

In all the grant directly supported the production of 20 peer-reviewed publications with about 10 subsidiary works (such as contribution to books, conference proceedings, technical reports). One or other of us has given invited presentations, many of them plenaries or keynotes, at over 20 major international meetings (including the International Conference on Current Advances and Trends in Nonparametric Statistics, Crete, 2002, Joint Statistical Meetings, 2000; IMS annual meeting each year; joint London/Belgian Mathematical Society Meeting 1999, keynote speech for the IEE Colloquium on “Applied Statistical Pattern Recognition”, keynote for the British Society of Soil Scientists annual meeting 2002, the National Astronomy Meeting 2002, etc) as well as organising or co-organising several others including a prestigious Royal Society Meeting detailed below. In addition, a considerable number of contributed papers have been presented at various conferences (mainly by our research students and RAs), numerous university seminars have been presented worldwide, and many invitations declined. We have presented several wavelet short courses both in the US and the UK to both academic and industrial communities. Silverman’s research in many areas, including wavelet methods, was recognized by his election as a member of Academia Europaea in 2001, and also by an invitation to present an IMS Medallion Lecture in 1999. Nason was awarded an EPSRC Advanced Research Fellowship in “Wavelets in statistics and probability” in 2000 and was awarded the 2001 Guy Medal in Bronze by the Royal Statistical Society for research in wavelets and statistics.

Notes

In the following document the reference numbers correspond to the IGR eForms ordering (e.g. [1] is “Dr S Barber – Posterior probability intervals for wavelet thresholding”). Nearly all the papers are joint papers with one or other of us as co-authors (eForms only permits entering of lead authors).

Key Advances and Supporting Methodology

The key advances of the report follow. The objectives of the grant were:

1. to develop specific new wavelet-based methodologies for a variety of statistical problems. We shall describe this in depth next.

2. by both theoretical investigation and practical application, to gain a much fuller appreciation of the statistical contexts will be genuinely useful; It can be seen that our work has comprised of a mixture of theoretical (e.g. [5,7,9,12,14,22,30, part of 31]), methodological (e.g. [1,2,3,13,15,17,18,21,23,26,27,part of 31] and applied work [4,6,10,27,28]. In some areas (e.g. curve estimation) [1] wavelets work just as well if not better in regular situations but significantly outperform the competition for irregular situations; in time series analysis wavelets provide a very useful extra tool for time-scale analysis and modelling of non-stationary series).

3. to make the wavelet-based statistical methodology we develop widely available to users by producing relevant software; WaveThresh3.0 [20] has been downloaded and is used by several hundred verifiable users worldwide. Most key methodological developments are available as software implementations written by us and/or our students. For example, ewspec for evolutionary spectral estimation, the LS2W suite for two-dimensional wavelet processes, WaveBand for wavelet confidence intervals and more recently eBayesThresh.[32] All of this is available free from the web.

4. produce methodology that is efficient in terms of speed and memory. This objective is constantly in our minds. Some of our work is highly focussed on efficient algorithms, e.g. [2,3,9,13].

The following sections describe our key topic advances.

Curve estimation

A very wide range of topics in curve estimation was addressed both in the original objectives (Bayesian, evolutionary, confidence intervals) and extra ones (choice of wavelet, techniques for density estimation, block thresholding, Poisson data).

Virtually no work in wavelet confidence intervals has been performed. We addressed this problem and developed posterior probability intervals for wavelet shrinkage estimates (published in [1]). The basic idea was that in Bayesian wavelet shrinkage the posterior distribution wavelet coefficient model is known (with estimated parameters). However, it is intractable to obtain analytically the distributional form of the posterior distribution of the function estimates themselves. Simulation methods are one possibility for ascertaining this information but this is highly computationally intensive. Our method obtains the cumulants of the estimates from the cumulants of the posterior wavelet coefficients using an extremely efficient computational algorithm (still order n). The algorithm uses the fact that integer powers of wavelets can be represented as relatively simple linear combinations of wavelets at finer scales. The resulting posterior probability intervals are shown to have excellent coverage properties and the methodology works well on smooth signals as well as more inhomogeneous ones. The idea of utilizing powers of wavelets originated in methods for computing the variance of wavelet coefficients and published in [5] as part of a collection of new ideas for density estimation.

An important problem in statistics arises since most data do not arrive on some equally spaced grid. Work in [13] address this problem by interpolating irregularly spaced data to a regular grid and then processing the regularly spaced data using a wavelet shrinkage method that takes account of the modified variance of the wavelet coefficients. The variance of the wavelet coefficients was computed by a fast wavelet transform akin to the two-dimensional wavelet transform. Publication [2] built upon this work and developed fast algorithms to perform cross-validational choice of wavelet smoothness, primary resolution and threshold. More recently, we have begun to seriously explore “evolutionary” curve estimation techniques based on the very recent multiscale methods of lifting. A preliminary paper based on a Bayesian wavelet shrinkage method applied to a lifting algorithm in two-dimensions has been published in [18]. Related work, by Dr Jansen (and co-authors) investigating the numerical stability of 1D lifting transforms and the implication for wavelet shrinkage appears in [27]. These revolutionary methods have great potential for curve estimation in several dimensions and modelling of stochastic processes and are the subject of a submitted EPSRC research grant proposal.

Considerable attention has been given, in joint work with I.M. Johnstone, to an empirical Bayes approach for choosing the threshold, both in wavelet thresholding and more widely. The method has been shown to have excellent theoretical and practical properties, and will provide a basis for a great deal of futher development in the future. The theoretical work establishes the adaptivity of the method for a wider range of function behaviour than any previous method, and this is supported by detailed practical investigation. So far, three very substantial reports [30, 31, 32] have been written on this method. One of these includes a publicly available software implementation.

Dr Oh helped develop an interesting method for combining wavelet shrinkage with polynomials to improve the performance of wavelet shrinkage near the boundaries. Essentially, a low-order polynomial, still orthogonal to the wavelet basis, is fitted and subtracted from the data. Periodic wavelet shrinkage can then better handle the function near the boundaries which is “nearer” periodic. This work, in [21], also demonstrates that the combined estimator retains its optimal “near-adaptivity” and also works well in practice. Another piece of work carried out jointly with Dr Oh has been the use of wavelet packet methods to fit parametric models of time-frequency dependence, working particularly on data on signals given out by bats. This work is still in progress.

We have applied some of the above techniques to problems arising from real data. For example, in [10] we applied Haar wavelet shrinkage to data collected in an itch response experiment devised by colleagues at Unilever Research Ltd. Here the human sensation of itch was recorded in the presence of noise: Haar wavelets were used to quantify the degree of itch.

Time series analysis

A great deal of progress was made on the analysis and modelling of time series using wavelets. At the start of the grant, some developments in the understanding and theory of representation of locally stationary (LS) processes using non-decimated wavelets were proposed. This work culminated during the period of the current grant in [7] which introduced LS wavelet (LSW) processes, the evolutionary wavelet spectrum, localized wavelet autocovariances and statistical techniques to obtain reasonable estimators. Our work was set in context in a more general overview of wavelets in time series analysis in [8]. Nason’s PhD student Piotr Fryzlewicz, with von Sachs (co-author of [7]) and one of his students, further developed the time series strand of this work to produce a fascinating technique for forecasting LS series (submitted for publication). In separate, but related work, Li and Oh [22] show that the wavelet spectrum can characterize the second-order statistical properties of stationary and long-memory non-stationary series.

Overall this body of work has created a new model for LS time series that enables estimation of a well-defined time-scale spectrum and also a new method for forecasting. Currently, the Gaussian, but non-stationary LSW processes are being investigated as an alternative model for financial time series in contrast to certain stationary, but non-Gaussian, techniques already popular.

The other area that we have investigated is the modelling of transfer function relationships between mainly bivariate time series. The key idea is to relate a time series of predictive interest, Y, to the non-decimated wavelet packet transform of an explanatory series, X. The transform needs to be non-decimated so that the number of observations in Y and the wavelet packets of X are the same. Then a wide variety of classical statistical techniques can be used to model Y in terms of X and also provide predictions of Y when future values of X, but not Y, are known. The methodological and computational aspects of this work are detailed in [3]. Collaboration with Andrew Sawczenko, Institute of Child Health demonstrated the utility of our methodology for relating sleep state data to heart rate in neonates [6]. In her PhD thesis Katherine Hunt further developed the methodology to mitigate the effects of high correlations between packets by using principal components methods: these ideas were applied to wind speed modelling and prediction problems [4]. The ideas from this body of work were applied to electromyographical data supplied by Unilever who were interested in the effect of mass on toothbrush manipulation [28].

Spatial data

In joint work with graduate student Idris Eckley we have extended LSW processes into 2D for the modelling and analysis of lattice processes (LS2W). Here analogues of the 1D quantities: directional multiscale wavelet spectrum and localized spatial autocovariance were created and associated algorithms and software developed. Some theoretical results were also non-trivially extended. The LS2W work fed into interactions with Unilever Research for the analysis of texture data. Indeed, Eckley’s software was adopted by Unilever’s hair products’ research division for the analysis of hair images. Our methodology enables the quantification and classification of different kinds of texture using time-scale wavelet quantities. This work forms the basis of Eckley’s thesis and a paper is in preparation. The image left shows a montage of 4 LS spatial fields constructed using diagonal Haar wavelets at 4 different scales ranging from finest to coarsest anti-clockwise from top-left. One can see the different kinds of textures that could be suitable for modelling fabrics (for example, another area that Unilever is interested in). Other wavelets & directions give rise to other effects.

PhD Theses.

The following students worked on their theses during the period of the grant. Although not explicitly mentioned as being supported by the grant they are, of course, advised by the investigators who are. As such they are indicative of the research quality and teamwork of the wavelets group as a whole.

Name / Thesis Title / Year / Destination
Herrick (GPN) / Wavelet methods for curve and surface estimation / 2000 / Institute of Child Health, Bristol
Subbarao (BWS) / Wavelets and adaptive filters / 2001 / Postdoc, Rainer Dahlhaus, Heidelberg.
Eckley (GPN) / Wavelet methods for time series and spatial data / 2001 / Shell Global Solutions
Botchkina (BWS) / Wavelets for non-parametric regression and a test of significance / 2002 / Oxford Glycosciences, UK
Ambler (BWS) / Dominated coupling from the past and some extensions of the area-interaction process / 2002 / Postdoc, Peter Green, University of Bristol
Hunt (GPN) / Time-scale transfer function models for non-stationary time series (submitted) / 2002 / Analyst for online insurer eSure.com

Project Plan Review

There were no major changes to the original plan. Naturally, the priorities between areas evolved over the course of the project. Two areas (survival/ratio and information engineering) maybe received less emphasis than originally proposed. However, much more was achieved with all of the other areas resulting in significant publications and collaborations, which are spurring further research initiatives.

Research Impact and Benefits to Society

At this point in time it is not easy to assess the academic impact of the research. Some of the key results of the research have only been published during 2002 and it takes time for the impact to be felt.

Having mentioned this we do have other evidence of the significant impact of this research grant. The investigators have organised or co-organised sessions at major international conferences (e.g. ISI Helsinki 1999) including a prestigious Royal Society Discussion Meeting on Wavelets. The meeting resulted in a special issue of Philosophical Transactions of the Royal Society, Series A (also published as a Wiley book) in 1999. The special issue has already attracted more than 50 citations already in a variety of fields.

Our proposal named Unilever and Shell Research as beneficiaries. Our close collaboration with Unilever has been particularly fruitful. As mentioned above our wavelet texture analysis software has been incorporated into scientific software used by the hair products division; projects were also undertaken in multivariate time series (analysis of toothbrush motion data) and itch response. Shell Global Solutions now employ Idris Eckley and we are currently planning a future CASE studentship with Shell.

As for benefits for society some of our work is directly related to medical problems. For example, we discovered new ways to analyse neonate heart rate data and also developed new modelling strategies that enable (expensive) sleep state quantification to be obtained from (cheap) heart rate monitoring in collaboration with Andrew Sawczenko of the Institute of Child Health in Bristol. Similar methodology was also involved in the prediction of wind speeds at new wind farm sites using data from existing Met Office stations. This work was inspired by consultancies with M&N Wind Power a successful SME in renewable energy.

Explanation of expenditure

There was no significant variance in the original spending plans.

Further Research or Dissemination Activities

A great deal of further research has spawned from this project. This includes indirect research, for example, associated with the Royal Society Discussion Meeting and other meetings. More directly, Nason was awarded an Advanced Research Fellowship by the EPSRC (£231k) and a research grant on “Vector-valued locally stationary wavelet processes” for £117k by QinetiQ.

In the future Nason is part of General Dynamics led consortium which was recently announced as the preferred bidder for the MOD Defence Technology Centre on “Data and Information Fusion” which includes work on wavelets and statistics for defence technologies in collaboration with the Centre of Communications Research at Bristol. Further into the future Nason is proposing to collaborate with Prof. Leo Brady in using multiscale methods to understand macromolecular structure determination and ligand docking: this is the subject of an MRC Discipline Bridging Initiative proposal. Both current investigators have submitted a joint proposal to EPSRC for investigating the many new and exciting areas that have arisen from the current project.

Dissemination has been achieved largely in the manner described in the original proposal: through journal publication, presentations at seminars and conferences, development and distribution of free user-friendly software and genuine collaboration with those outside of mathematics and statistics.

Submitted publications & other reports

24. GPN, (2000) Application of the characteristic function of Student’s t distribution to Haar wavelet shrinkage with heavy-tailed noise and projection index design. Technical Report 00:07

25. Eckley, I.A. and GPN (2001) The inner product matrix of discrete autocorrelation wavelets: efficient computation and application. Technical Report 01:09.