Preparing streaming data for ML with the Window Module
Many machine learning models are not designed to work directly with time-series (streaming) data. Instead they operate on short sequences of data to predict future values, or to derive other insights, based on each input sequence (sample).
Turning a sequence of data points into a number of individual sequences is usually called ‘windowing’. The sequences can be non-overlapping, i.e. each data point only exists in one sequence. This is sometimes called ‘Tumbling windows’. Or, the sequences overlap, i.e. each data point exists in more than one sequence. This is often called ‘Sliding windows’.
The Window module is designed to do exactly this. It can create ‘windows’, or sequences of data points, from any number of sources. The size of the windows/sequences can be specified either as a unit of time, or a number of samples. Both tumbling and sliding windows with different amounts of overlap can be created by using the Shift setting.
Preparing data for machine learning models is only one use case where windowing is useful. There are other use cases where you may want to operate on subsets of time-series data, for example to check if the statistical properties of a data source change over time. The Window module can then be used as the first step, to produce data that can be analyzed by other modules.