A test for the absence of aliasing or local white noise in locally stationary wavelet time series

.


Introduction
Typically a data analyst is presented with a time series sampled at a fixed rate.However, the series might have been sampled at a higher rate, or the analyst could request future samples at a higher rate.In practice, it is important to question whether time series are sampled often enough to successfully capture their second-order structure.Improper sampling can lead to aliasing, which this article proposes to detect and locate via a new test.
For a time series sampled at intervals of length , the range of angular frequencies in the spectrum that can be observed undistorted is [0, π/ ), where π/ is the Nyquist frequency (Chatfield, 2003, p. 109).If the highest frequencies in a series exceed the Nyquist frequency, then aliasing occurs and distorts the spectrum estimable by any method.As the spectrum and autocovariance are Fourier duals, distortion of the spectrum implies distortion of the autocovariance, and techniques that rely on either can be affected.Consequently, aliasing can have a critical impact.
Without knowledge of the data-generation process, the analyst will, a priori, be unaware of whether aliasing has occurred, and could misguidedly analyse the series assuming it is free from distortion.For example, if {X t } is a real-valued stationary process with spectrum f X (ω) for 2 I. A. Eckley AND G. P. Nason ω ∈ [0,π), and Y t = X 2t is observed, then the spectrum of Y t is f Y (ω) = f X (ω) + f X (π − ω) for ω ∈ [0, π/2).Generally, f X cannot be identified from f Y , so estimation of f X is impossible from an estimate of f Y .Thus, there is a need for principled approaches to testing for the presence of aliasing.Hannan (1960), Priestley (1983), Hamilton (1994), Bloomfield (2000), Brillinger (2001) and Chatfield (2003) all describe aliasing, but provide no advice on how to detect or locate it.Instead, they suggest steps that can be taken to guard against it.One possibility is to apply a low-pass anti-aliasing filter prior to sampling, to ensure that the highest frequency in the filtered series is below the Nyquist rate.Although anti-alias filtering can be useful for analogue or very high sample-rate signal acquisitions, it is not useful when data are acquired at slow rates, such as in economics.Moreover, even if anti-alias filtering is used, valuable high-frequency information could be lost.Another possibility is that one might know, a priori, the highest frequency contained within a time series, and can choose the sample rate high enough to prevent aliasing.Hinich & Wolinsky (1988) and Hinich & Messer (1995) introduced a hypothesis test for aliasing in stationary time series based on the bispectrum, a third-order quantity.Our wavelet test is based on simpler second-order quantities, and is designed for nonstationary time series.Computing the bispectrum typically takes O(T 2 ) operations.Our test is faster, requiring only O(T log T ) operations, which can be important for long time series.
Previous work has considered alias detection at a fixed location for a nonstationary time series, where the true signal or spectrum is known beforehand.Wunsch & Gunn (2003) used ice core time series and induced aliasing to demonstrate how it can lead to misleading scientific conclusions.Our test does not require knowledge of the underlying true spectrum or the higher-rate time series.
In a single realization, a nonstationary series can sometimes be aliased and sometimes not, depending on its spectral content relative to the Nyquist frequency at a given point.Hence, with a nonstationary series one may ask not only whether the series is aliased, but also where.Our test is designed to help answer both questions for locally stationary wavelet time series.That wavelets bring something genuinely new to the aliasing problem can be seen by replicating our method using Fourier-based quantities.Unlike the wavelet equations, which have a solution as given below, the equivalent Fourier-based equations are underdetermined, which is precisely the usual aliasing problem.
If one has prior knowledge that the time series has been properly sampled and there is no possibility of aliasing, then our test becomes a test for local white noise.Although global white noise tests are popular and useful, at the time of writing we are unaware of any local method tailored to locally stationary series.Local tests can be used for similar purposes to global ones, such as assisting with model selection, understanding time-varying forecasting performance or, as suggested by a referee, detecting measurement error in some circumstances.

Review of locally stationary wavelet processes
Locally stationary wavelet processes are time series models, constructed from wavelets, that change their statistical properties slowly over time.They are particularly useful for their ability to model time series operating at dominant scales in areas such as finance (Fryzlewicz, 2005), economics (Winkelmann, 2016), ocean engineering (Killick et al., 2013), structural engineering (Spanos & Kougioumtzoglou, 2012), energy (Nowotarski et al., 2013;de Menezes et al., 2016) and business (Michis, 2009).Dahlhaus (2012) provides a comprehensive review of locally stationary time series.We now briefly review essential definitions from Nason et al. (2000).Definition 1 (Discrete wavelets).Let {h k } and {g k } be the low-and high-pass quadrature mirror filters underlying the Daubechies (1992) compactly supported orthogonal continuoustime wavelets.The discrete wavelets ψ j = (ψ j,0 , ψ j,1 , . . ., ψ j, N j −1 ) are vectors of length N j for scales j ∈ N obtained using the formulae (1) where N j = (2 j − 1)(N h − 1) + 1, with N h being the number of non-zero elements of {h k }, and δ 0, k is the Kronecker delta.The number of vanishing moments of the associated continuous-time Daubechies compactly supported wavelet is N = N h /2, where x m ψ(x) dx = 0 for m ∈ N such that 0 m < N .Such wavelets are commonly referred to as Daubechies DN wavelets.
Definition 2. A locally stationary wavelet process is a sequence of doubly indexed stochastic processes {X t, T } t=0,...,T −1 (T = 2 J , J ∈ N) having the following representation in the meansquare sense: where {ξ j, k } j∈N, k∈Z is a collection of uncorrelated random variables with zero mean and unit variance, {ψ j, k } j∈N, k∈Z is a set of discrete wavelets, and {w j, k;T } j∈N, k∈Z is a set of amplitudes satisfying the following conditions.For each j ∈ N, there exists a Lipschitz-continuous function W j : (0, 1) → R such that: (i) ∞ j=1 |W j (z)| 2 < ∞ uniformly in z ∈ (0, 1); (ii) the Lipschitz constants, L j , are uniformly bounded in j and ∞ j=1 2 j L j < ∞; (iii) there exists {C j } j∈N such that for each T , sup k |w j, k;T − W j (k/T )| C j /T , where for each j the supremum is over k = 0, . . ., T − 1 and Spectral power for a locally stationary wavelet time series is quantified by the evolutionary wavelet spectrum, the time-scale analogue of the usual stationary spectrum, f (ω).
Hence, the {w j, k } are a collection of amplitudes such that w 2 j, k ≈ S j (k/T ).Evolution of the second-order properties of X t,T is controlled by smoothness constraints on S j (z) as a function of z via those imposed on W j (z) in Definition 2(i)-(iii).For brevity, we henceforth drop the second subscript, T , in X t,T .
The spectrum, S j (z), governs the contribution to variance in X t at different scales at time z.Informally, S j (z) corresponds to the process variance integrated over the approximate frequency band [2 −j π, 2 1−j π].For example, the approximate band for S 1 (z) is [π/2, π], that for S 2 (z) is [π/4, π/2], and so on.The precise frequency bands and their degree of overlap depend on the particular wavelet ψ j, k (t) used in (3).We next recall several key quantities associated with locally stationary wavelet theory.
Definition 4. The raw wavelet periodogram of Y t is defined to be I , m = d 2 , m for ∈ N and m ∈ Z, where {d , m } are the nondecimated wavelet coefficients of Y t given by d , m = t Y t ψ , m (t) for ∈ N and m ∈ Z.The autocorrelation wavelet is given by j (τ ) = k ψ j, k ψ j, k−τ for j ∈ N and τ ∈ Z, and the inner product operator of the autocorrelation wavelets is A j, = j , = τ j (τ ) (τ ) for j, ∈ N.

Locally stationary wavelet models under dyadic sampling
3•1.Dyadic subsampling of locally stationary wavelet processes In this article, aliasing is induced by starting with a locally stationary wavelet process, X t , and then forming Y t = X 2 r t (t ∈ Z; r = 1, . . ., J −1) by dyadic subsampling of X t .Since Y t is sampled at a slower rate than X t , high-frequency power in X t could reappear as aliased power in Y t .Our first result shows that Y t can be represented by the sum of a locally stationary wavelet process, constructed using the same wavelets as the original, and another process, which is asymptotically white noise.
Theorem 1.Let {X t } t∈Z be a locally stationary wavelet process with evolutionary wavelet spectrum , where L t is a locally stationary wavelet process with the same underlying wavelets as X t and having raw wavelet periodogram expectation given by the right-hand side of (4), and F t is a process with mean zero and autocovariance cov(F t , F t+τ ) = S 1 (2t/T )δ 0,τ + O(T −1 ).Hence, F t is asymptotically white noise with variance S 1 (z).Further, if S 1 (z) is constant for z ∈ (0, 1), then F t is stationary white noise with variance S 1 .
The proof is given in the Appendix.Theorem 1 can be extended to repeated dyadic sampling as follows.
Corollary 1.Let {X t } t∈Z be as in Theorem 1, and let Y t = X 2 r t (r ∈ N).Then, asymptotically, {Y t } admits the decomposition Y t = L t + F t , where L t is a locally stationary wavelet process with the same underlying wavelets as X t and having raw wavelet periodogram expectation as in the right-hand side of (4), and F t is a process with mean zero and autocovariance cov(F t , F t+τ ) = δ 0,τ r j=1 S j (2 r t/T )+O(T −1 ).Further, if S 1 (z), . . ., S r (z) are all constant functions of z ∈ (0, 1), then F t is stationary white noise with variance r j=1 S j .Nason et al. (2000) developed an estimator for the wavelet spectrum by exploiting the raw  where ∈ N, m ∈ Z and A is the inner-product operator from Definition 4. The result also holds for Shannon wavelets if the evolutionary wavelet spectrum, S j (z), has continuous first derivative for each j > 0.
Theorem 2 generalizes Proposition 4 of Nason et al. (2000), in which r = 0, and additionally establishes the result for Shannon wavelets.The Shannon wavelet can be thought of as the limiting case of Daubechies wavelets with an infinite number of vanishing moments.A consequence is that A j, = 2 j δ j, for Shannon wavelets; see Nason et al. (2000) for further details.
Aliasing results in power redistribution across scales.For example, with r = 1 and subsampling dyadically once, formula (4) for Y t becomes (5) Compare ( 5) with the usual formula for the asymptotic expectation of the raw wavelet periodogram of X t without subsampling, due to Nason et al. (2000): There are three key differences between ( 5) and ( 6).First, E(d 2 , m ) in ( 5) is contaminated by S 1 (2m/T ) at every analysis scale, i.e., for all 1.This contamination is the manifestation of aliasing in the locally stationary wavelet domain and could be used to detect aliasing, as described in §3•2.The second difference is that the mixing matrix on the right-hand side of ( 5) is A j−1, , not A j, .The third difference is that the dyadically subsampled periodogram exists on the grid 2m/T rather than on m/T .

3•2. White noise confounding and hypothesis specification
If a locally stationary wavelet process X t is white noise with variance σ 2 , then the unsampled process is such that E(d 2 , m ) = σ 2 for ∈ N and m ∈ Z; see the proof of Lemma B.3 in Fryzlewicz et al. (2003).The same quantity appears at every scale as in the first term of (4) or (5) when subsampling.Hence, the effects of white noise and aliasing are confounded in our set-up.
Such confounding is not a surprise, as white noise can be seen as the ultimate aliased signal.For example, suppose that X t is a stationary process with variance σ 2 < ∞ and autocovariance In other words, repeated subsampling of X t leads to white noise.Taking a broader view, in a practical situation, what appears to be white noise may be the result of repeatedly subsampling a time series.Similar confounding can be found in the test for aliasing for stationary series in Hinich & Messer (1995); rejection of the null hypothesis there can mean that the series is not random, not stationary, aliased, not mixing, or any combination of these.
However, even with confounding, we can still test the null hypothesis H 0 of there being no white noise component and no aliased component at z 0 ∈ (0, 1) against the hypothesis H A that a white noise or aliased component exists at z 0 .So although we cannot separate the two components, our hypothesis test can test locally whether they are both absent.
4. Aliasing/white noise test 4•1.The test procedure Let t 0 ∈ {1, . . ., T } and let z 0 = t 0 /T be the rescaled-time equivalent of t 0 .If we could observe the evolutionary wavelet spectrum {S (Y ) j (z 0 )} j∈N directly, it would be possible to test the ideal null hypothesis H (I) 0 that there exists j * ∈ N such that S j * (z 0 ) = 0, versus the alternative H (I) A : S j (z 0 ) > 0 for all j ∈ N. In view of §3•2, the null hypothesis means that {Y t } cannot have any additive white noise component; and, in view of Theorem 2, it would also mean that no aliasing has occurred from any underlying subsampled {X t } process.
However, the spectrum {S (Y ) j (z)} j∈N cannot be observed directly but can only be estimated from a realization {Y t } T t=1 on a finite set of scales j J † < J , where J † is chosen to avoid boundary effects.Hence, we can only gain information on S j (z 0 ) for j ∈ {1, . . ., J † }, and therefore only test the null hypothesis H 0 that there exists j * ∈ {1, . . ., J † } such that S * j (z 0 ) = 0, versus the alternative A could be true, as S j (z 0 ) might be zero for scales j > J † that cannot be discerned from a finite-length series.
We use Student's t-test or, in the case of significant violations of normality, a nonparametric test such as the Wilcoxon signed rank or the signmedian test.Student's t-test does not work well in the presence of autocorrelation, which is known to exist in the Ŝj, t 0 +r sequence as a function of r; see Jones (1975), for example.To improve the performance of the t-test, we employ the equivalent sample size method from Zwiers & von Storch (1995), which uses estimated autocorrelation to alter the effective sample size.The t-test is robust against violations of the normality assumption; see the Supplementary Material.
We use Holm's method (Holm, 1979) to control the overall size of our test over the multiple J † hypotheses.This is a reasonable choice, as no assumption is made about correlations between the Ŝj (z 0 ) at different scales, but as a result our test is somewhat conservative.Downloaded from https://academic.oup.com/biomet/advance-article-abstract/doi/10.1093/biomet/asy040/5106591 by University of Bristol Library user on 26 September 2018 4•2.Practical considerations Our test is robust with respect to mismatch of the synthesis wavelet in (3) and the analysis wavelet given in Definition 4, because a white noise component and aliasing caused by subsampling both cause a constant to be added to each level of the wavelet periodogram, regardless of the type of wavelet used to synthesize the process.Our test uses a wavelet method to detect whether a constant has been added to each level, and it works irrespective of the analysis wavelet.
All Daubechies compactly supported wavelet spectral estimates suffer from spectral leakage, where power from one scale can leak to adjacent scales.Such leakage makes it harder to determine whether power in a given scale is zero, and this influences the statistical size and power of our test.In practice, there is a trade-off between mitigating spectral leakage by using longer, smoother wavelets (Nason et al., 2000, § 4.1) and achieving good time localization by using shorter, rougher wavelets such as Haar wavelets.Therefore, we recommend using Daubechies D5, D6 or D7 mid-range wavelets, which achieve a good compromise.
Similarly, the size and power of our test will depend on the window width, b, which should be large enough to enable detection of modest departures from H 0 , but small enough to ensure that the estimates Ŝj, t 0 +r are representative of the behaviour of the spectrum at z 0 .An appropriate upper bound for b could be obtained by computing an estimator of the spectrum (Nason et al., 2000;Fryzlewicz & Nason, 2006) and then choosing b such that the window of spectral estimates on R b is approximately constant for each scale.For fixed T our test operates over a small interval (z 0 − b/T , z 0 + b/T ), although, conceptually, as T increases this interval will shrink to z 0 .
Torrence & Compo (1998) introduced the wavelet cone of influence as the region of the wavelet spectrum degraded by edge effects.The extent of the cone depends on the length T of the time series and the length N h of the wavelet filter used to compute the wavelet spectrum from Definition 1.The coarsest non-cone scale is the largest J NC ∈ N such that J NC log 2 {(T /2 + N h − 2)/(N h − 1)}.Our test uses scales 1, . . ., J † and, to avoid edge effects, we choose J † J NC .For a functioning practical test, with reasonable power, we recommend T 512, and for the range of sample sizes considered we choose J † = 4.This provides us with enough scales to furnish a nontrivial test, but not too many scales, which would reduce power and expose the test to large autocorrelations at the coarser scales, as mentioned earlier.For practical use, we recommend that J † grow linearly with log T .
We already employ multiple hypothesis testing methods to control the overall size of our test by combining the results over J † scales using Holm's method.Similar methods could be used if we repeated our test at multiple time locations, not just at a single time-point z 0 .

4•3. Post hoc investigation if the test is rejected
If the null hypothesis is rejected, then evidence exists that aliasing or a white noise component is present, and we describe three post hoc strategies to distinguish them.
One strategy might be to perform stationarity tests on a range of times around z 0 and, if stationary, use the Hinich & Wolinsky (1988) or Hinich & Messer (1995) aliasing tests.Another strategy could be to reanalyse the series at a faster rate, if possible.If significant power exists at finer scales, then aliasing probably has occurred, and the new rate should be considered in future.
A third approach, illustrated in §5•2, investigates the local spectral properties of the series in clear patches, where there is no aliasing or white noise, to discover whether power moves from low to higher frequencies just before an aliasing/white noise event or vice versa.This exploratory approach has the advantage of not requiring faster-sampled data required by the second approach or the stationarity tests of the first approach, which could have poor power for moderate sample sizes.
Table 1.Empirical rejection rate (%), per scale and overall, at z 0 = 1/2 with no subsampling For the specific case of Shannon locally stationary wavelet processes, Eckley & Nason (2014) showed that aliased power can be detected and the aliasing effects removed.

Simulations and a wind energy example
5•1.Simulation study Our simulation study uses the test-bed evolutionary wavelet spectrum {S TB1 j (z)} shown in Fig. 1 to produce realizations without subsampling, r = 0.The test-bed spectrum satisfies the null hypothesis for J † = 4 at z 0 = 1/2, and the alternative at z 0 = 1/4.
We drew 1000 realizations of length T = 4096 from the locally stationary process model of (3) using the test-bed spectrum and Daubechies D5 wavelets with normally distributed innovations.Tables 1-4 show the empirical rejection rates; in each table, the rightmost column gives the rejection rate obtained by combining the per-scale p-values via Holm's method, whereas the rates for individual scales j = 1, . . ., 4 are reported in the preceding columns.Using the method described in §4•1, we test H 0 : S j (z 0 ) = 0 versus H A : S j (z 0 ) > 0, for each scale j = 1, . . ., J † separately, at a nominal size of 5%, using the equivalent sample size version of Student's t-test.
Table 1 illustrates our test working as a local white noise test, with no subsampling.Under the null hypothesis, at z 0 = 1/2, the test shows increasing empirical power at scales j = 2, 3, 4 as the window width b increases, and has empirical size ranging from 1•8% to 3•2% for scale j = 1.These empirical sizes are somewhat conservative, and we attribute this to negative correlations present in the { Ŝj, t 0 +r } r∈R b sample.The equivalent sample size method mentioned in §4•1 attempts to improve the size calibration, but the outcome is not perfect.
The overall empirical size of our test is 1-2%, lower than the nominal rate.Under the alternative hypothesis, at z 0 = 1/4, Table 2 shows the increasing empirical power for all scales.The next simulation set illustrates application of our absence-of-aliasing test using a second test-bed spectrum, S TB2 j (z), which is identical to S TB1 j (z) except that the scale j = 3 has zero power.This set subsamples using r = 1, which reduces the realization length from 4096 to 2048 according to §3•1, and induces aliasing where power existed at scale j = 1.The subsampling causes scale j = 1 to disappear, and so we test the individual scale hypotheses at j = 2, 3, 4. Table 3 shows the results for z 0 = 1/2, where previously there was no power at the finest scale and therefore no occurrence of aliasing.The empirical power for the j = 2, 4 columns is high, due to the spectral power present at those scales.However, for scale j = 3, for reasonable sample sizes, the rejection rate becomes consistent with the nominal size.The overall rejection rate at z 0 = 1/2 is well controlled, but conservative, with respect to the nominal rate.
Table 4 shows the results for z 0 = 1/4, where the power at the finest scale, j = 1, is redistributed into coarser scales according to Theorem 2. As before, the null hypothesis is routinely rejected at scales j = 2, 4. In contrast to Table 3, Table 4 shows increasing, and eventually high, power at scale j = 3.Hence, we would often reject our null hypothesis in favour of the alternative H A : S j (z 0 ) > 0 for all j = 1, . . ., J † , and therefore the overall empirical power of the test increases and reaches 65•8% for b = 128.In practice we would, for the time being, accept H 0 at z 0 = 1/2 and conclude that no aliasing occurred and nor were white noise components present.However, at z 0 = 1/4, although we can reject H 0 , we cannot conclude that there was definitely aliasing or that white noise components were present, because we have no information on scales for j > 4.
A wind speed example Wind power forecasting has received much attention in recent years, motivated by the need to develop reliable forecasting tools to enable effective integration of wind farm output into power grids (Landberg et al., 2003;Genton & Hering, 2007).Several authors have used autoregressive integrated moving average models that implicitly assume the absence of aliasing; see Huang & Chalabi (1995) or Sfetsos (2002), for example.We seek to identify whether any such corruption might occur in data provided by an industrial collaborator.High-resolution wind speed data, sampled at 1 Hz, were acquired from a proposed wind farm in the midwestern U.S.A. during March 2011 and differenced to remove trend.Figure 2 displays the differenced series, which exhibits nonstationarity.For example, the sample variances of the first and last 100 observations are 0•063 and 0•093, respectively, and statistically significant variance differences have been confirmed using methods from Nason (2013).Q-Q plots and goodness-of-fit tests strongly suggest that the series' marginal distribution has heavy tails.
We test H 0 at the two times t = 245, 330, and the Holm-adjusted p-values are displayed in Table 5. Results are presented for the three wavelets recommended in §4•2 for the J † = 4 noncone scales.At time t = 330, for each wavelet H 0 is not rejected, so we have no evidence of a white noise component or aliasing at this location.At time t = 245, H 0 is rejected at the 5% level for wavelets D6 and D8 but not for D7.However, H 0 would have been rejected at the 6•5% level for the D7 wavelet.Hence there is evidence to reject H 0 in favour of H A .
We undertake a post hoc investigation as described in §4•3.We did not reject H 0 at t = 330, are suspicious that aliasing or white noise components could exist at t = 245, and suspect that there might be a transition between the two time-points, possibly around t = 290.If prior to t = 290 the series was truly subject to aliasing, and after that time it was not, then power will have moved from higher to lower frequencies over time or, at the very least, have disappeared from frequencies above the Nyquist frequency.Since we suspect aliasing prior to t = 300, we do not apply classical spectral analysis here.
However, as we have no evidence of aliasing after the suspected transition, we subject the series there to post hoc classical stationary periodogram analyses on three rolling windows, using methods described by Fryzlewicz et al. (2008) and displayed in Fig. 3.The results indicate that the peak frequency associated with each window decreases.Hence, there is supporting evidence for the conjecture that, after t = 300, the high-frequency content in the series reduces over time.
Using the centre time, t, for each rolling window and its associated peak frequency, f , we obtain (t, f ) ∈ {(292, 0•438), (302, 0•406), (312, 0•395)}, which can be modelled approximately by the linear relationship f = 1•06 − 0•0022t.Assuming that the peak frequency reduces linearly according to this model, and extrapolating backwards, we hypothesize that the series was aliased before t = 261, when f = 0•5 in the model.This is merely a hypothesis, as the series could have been contaminated by white noise before the transition rather than by aliased highfrequency components.In addition, the transition time itself, as well as this analysis, is subject to uncertainty.

TimeFig. 2 .
Fig. 2. First differences of wind speed (m s −1 ) for a proposed wind farm site in the midwestern U.S.A., with T = 512.The dashed vertical lines indicate the time-points t = 245, 330.

Table 2 .
Empirical power (%) of test, per scale and overall, at z 0 = 1/4 with no subsampling

Table 5 .
The p-values and H 0 rejection status for the wind series with our test at two reference times, using the signmedian test, three different Daubechies Dv wavelets with v vanishing moments, and b = 25