EEMD: EEMD-based decomposition for 72-hour continuous glucose monitoring (CGM) data
(1) Add white noise to each of the raw CGM data, to obtain a new data series.
(2) Decompose the new data series into IMFs. The envelopes of time series are derived via cubic spline by connecting local maxima and minima separately. Then, a sifting process decomposes the IMFs from the time series by subtracting the mean of envelopes. This process should repeat until the component satisfies two conditions:
For the entire time series, the difference between numbers of local maxima and minima and zero-crossing must be either equal or differ at most by one;
At any data point, the mean value of the envelope defined by the local maxima and the envelope defined by the local minima is zero.
The component that satisfies those two conditions is called an IMF. At the end of the procedure, the original data can be reconstructed by the summation of n IMFs and the nth residue.
(3) Repeat steps 1 and 2 using different white noise series each time.
(4) Obtain the (ensemble) means of corresponding IMFs of the decompositions as the final result. The effect of the added white noise will always be able to be reduced to a negligibly small level by increasing the ensemble number. The ensemble number was 100 in this study.
The EEMD solved the EMD problem of “mode mixing”, which occurs when oscillations with dramatically disparate scales appear within one IMF.
The steps for calculation of the average period for each IMF:
1. First, we applied the Hilbert transform to the extracted IMFs (each IMF is denoted as S(t)) to calculate their instantaneous frequencies. S(t) can be expressed as
(1)
The Hilbert transform of S(t) is defined as
(2)
where P denotes the Cauchy principal value.
2. and can be calculated by applying the Hilbert transform,
(3)
(4)
3. Thus, the instantaneous frequency can be defined as
(5)
4. Therefore, the average cycle time is
(6)
Here, a is the total time period, and is the time interval of each input data. In this study, a= 72 hours and is 5min.
Glycemic variability of each IMF is calculating through the standard deviation of the IMF series.
Sample entropy
Let represent a time series of length n. The sample entropy algorithm can be summarized as follows.
(1) Construct template vectors with dimension m by using
(2) A match occurs when the distance between two template vectors is smaller than a predefined tolerance . The distance between the two vectors is calculated by using the infinity norm:
(3) The is defined as an m-dimensional matched vector pair if is less than or equal to a tolerance (). Let represent the total number of m-dimensional matched vector pairs.
(4) Steps 1-3 are repeated for , is obtained to represent the total number of -dimensional matched vector pairs.
(5) The SampEn is defined as the logarithm of the ratio of to ; that is,
RCMSE algorithm: quantifying the 72-hour glucose time series which is the original time series
To obtain the coarse-grained time series at a scale factors of , the original time series is divided into non-overlapping windows of length and the data points inside each are averaged.
(1) coarse-grained time series are divided from the original time series for a scale factor of . The -th coarse-grained time series of is defined as follows:
(2) At each scale factor τ , the number of matched vector pairs, and , is calculated for all coarse-grained series.
(3) Let and represent the mean value of and for . The RCMSE value at a scale factor of is defined as the logarithm of the ratio of to , which is provided as,
, where
The RCMSE equation can be simplified as
(4) Because the blood glucose homeostasis system is regulated by multiple nonlinearly mutually interacting systems over multiple time scales in a complex way. It would be more appropriate to use the method of complexity analysis in analyzing the CGMS data. The complexity index defined here as the sum of entropy values over a pre-defined range of scales. From a mathematical point of view, the complexity indexes indicate that the degree of irregularity of the CGM time series.
The all results were computed by the computing platform which is the matlab R2012b.