2 sample KS test for time series data - portion vs. whole

Oct 8, 2018

    I have time series data and I want to know whether preserving only a limited portion (say first 300 seconds) and discarding the rest would still leave me with a good estimate of the shape and spread of the data's distribution or do I end up losing too much information.

    To do this I am using a 2 sample ks test. I use all the data points for the first 300 seconds as sample 1 and the whole time series (varying lengths from 300 to 6000 seconds) for the second sample. So sample 2 contains sample 1 + additional data. Is this approach correct?

    Note that I have millisecond level data, so the sample size is quite large. Also, I understand that the ks test does not account for the inter-dependency of data within the time series, but my data is simulated so I am working under the assumption that if the shape and spread is preserved, it should be enough.

