The plot below shows the frequency stability of an pretty good 10 MHz oscillator. For this Allan deviation demonstration it doesn't really matter what kind of oscillator it is (but if you're curious, it's a recently warmed-up miniature X.72 rubidium). The goal is to manipulate the raw data to see what effect it has on the plot.

A total of 60,000 time interval measurements were taken at a 1 second rate. That is, the time (aka phase) of the oscillator was compared against the time (aka phase) of a reference standard once a second for about 16 hours.

The reference standard (active H-maser) is much more stable than the rubidium oscillator being tested. And the precision of the time interval analyzer (TSC 5110A) is under 1 ps, which is also well below the stability of the rubidium oscillator.

The stability (or instability, depending on how you look at it) shown by any Allan deviation plot is always the sum of all noise in the experiment; that is, the reference standard, the time interval analyzer measurement system itself, and, of course, the oscillator being tested.

In this textbook case, because both the reference and the analyzer are significantly better than the oscillator, what the above plot shows is purely the performance of the oscillator.

It's interesting to see what happens to an Allan deviation plot if the resolution of the measurement data is limited.

In the plot below, the raw data was edited and digits in the far right decimal places were successively deleted to simulate a less-precise measurement system. The black line shows ADEV based on data truncated at 12 decimal places (1 ps). The blue line shows ADEV based on data truncated at 11 decimal places (10 ps). The green line shows ADEV based on data truncated at 10 decimal places (100 ps). And so on.

For this particular oscillator there is no visible difference between the full resolution plot (above) and the 1 ps resolution data (black line, below). In fact, there is almost no visible difference between the 1 ps resolution data (black line) and 10 ps resolution data (blue line). However, as the raw data gets less and less precise the effect on the ADEV plot is pronounced.

The implication of this plot is that if you were comparing two such oscillators with a measurement system, for example a time interval counter, having only 1 ns resolution, then the ADEV plot you would see is the red line, That would be your only true view of the oscillator.

Deleting decimal places in a data file is easy to do deliberately, and may even happen accidentally if one records raw data with too small a fixed or floating point precision. But a more appropriate way to simulate measurement noise would be to add different levels of noise to the original high-resolution phase data.

In the plot below, the full precision of raw data was maintained yet increasing levels of random white phase noise was added to each of 60,000 phase measurements.

This plot is a cleaner example of measurement noise.
It nicely shows the fidelity of an ADEV plot to reveal
the level of measurement noise.
For example we know the red line contains 1 ns of noise
and, sure enough, it starts exactly at 1×10^{-9}
at tau 1 s and drops down with a slope of exactly -1
until it finally meets the actual performance of the oscillator.

There are occasions when one wants to average the raw data before making ADEV plots. This can be done in several ways and each has a different effect on the appearance and validity of an ADEV plot.

The first method is simply making averages of frequency measurements. Since frequency is computed as the difference of time measurements, averaging N frequency measurements is identical to removing N-1 of every N phase measurements. For example, if F1=t1-t0, F2=t2-t1, F3=t3-t2, F4=t4-t3, F5=t5-t4, etc. and you wanted to compute the average of 5 frequency measurements, then Favg = (F1+F2+F3+F4+F5)/5 = (t1-t0 + t2-t1 + t3-t2 + t4-t3 + t5-t4)/5 = (t5-t0)/5.

Let's do this with our rubidium measurements, averaging by a factor of 10. Note the 60,000 phase measurements at 1 Hz become 6,000 phase measurements at 10 Hz. The plot below shows the ADEV of this averaged data set. It can be seen that this plot is identical to the first one above, with the exception that the points for tau less than 10 s are missing.

Computing frequency with longer gate time
typically results in greater accuracy.
Our 10 second frequency averages have greater accuracy and precision
than 1 second frequency averages:
the ADEV of the averaged data (5×10^{-12})
is lower than the ADEV of the original data (1×10^{-11}),
as expected.
But by averaging we have also changed the tau of the lower ADEV value.
What used to be 60,000 1-second averages are now
6,000 10-second averages.
And that's why the two plots are identical.

Allan deviation plots are already a measure of frequency stability across a wide range of averaging times so it should come as no surprise that pre-averaging the data do not change the points on the plot at all.

Another way to average might be to compute boxcar averages of the raw phase data prior to computing ADEV. For example, each phase point could be averaged with 10 of its neighbors. This would convert 60,000 1-second data points into to 59,990 slightly smoothed 1-second data points. Unlike the example above, the tau would remain at 1 second.

The ADEV plots below illustrate this. With n=1 a boxcar average is just the raw data, shown as the black line. This is the actual stability of the oscillator. With a boxcar average only 2 samples wide some smoothing is already evident from a slight improvement in ADEV. As the amount of averaging increases there is a corresponding large decrease in stability as reported by the ADEV calculation. But is this greatly improved stability real? Does it represent a more precise measurement of oscillator performance?

In general this style of data averaging is not at all appropriate for the stability analysis of oscillators. The reason is that legitimate oscillator noise is being smoothed by neighboring samples. The whole intent of an ADEV plot is to show the level of instability of a time/frequency source. Clearly if any variations in measured phase are reduced through slight or heavy running averaging, the data will have the appearance of an oscillator that is inherently more stable than it really is.

This effect will also be visible in a phase comparator that performs some sort of running average internally. Either way, this method of averaging creates artificial stability for short tau. The plots above still exhibits valid mid-term stability and long-term drift, but only because the boxcar averaging didn't go beyond 100 s.

Of course, the oscillator is still the same oscillator. It's just that the very raw data that is intended to measure the instability of the oscillator is being smoothed. The Allan deviation calculation (which is essentially just a first difference in frequency error, the second difference in time error) mistakes this externally averaged data for a virtual oscillator of much greater stability.

Another style of averaging is the running average. This is typically easier to compute than the boxcar method since it requires no storage and can be done on-the-fly, as in A = (A(n-1) + x) / n.

The plot below shows the effect of applying a running average to the raw phase data prior to computing ADEV. A running average with n=1 is just the raw data (the same black plot as seen above). Again, as in the above case but even more effectively, it creates false stability by smoothing phase variations.

It could be that one can artificially create an ADEV as low as you want by averaging the data enough like this. I suppose in the extreme one could apply a running average or averages of averages so completely that all the points in the data set are smooth resulting in a perfectly stable clock, an ADEV of zero.