Skip to main content Skip to footer

Search Crosser Knowledge Base

Preparing streaming data for ML with the Window Module

Many machine learning models are not designed to work directly with time-series (streaming) data. Instead they operate on short sequences of data to predict future values, or to derive other insights, based on each input sequence (sample).

Turning a sequence of data points into a number of individual sequences is usually called ‘windowing’. The sequences can be non-overlapping, i.e. each data point only exists in one sequence. This is sometimes called ‘Tumbling windows’. Or, the sequences overlap, i.e. each data point exists in more than one sequence. This is often called ‘Sliding windows’.

 

Crosser Modules | Window Module ExplainedCrosser Modules | Window Module Explained

The Window module is designed to do exactly this. It can create ‘windows’, or sequences of data points, from any number of sources. The size of the windows/sequences can be specified either as a unit of time, or a number of samples. Both tumbling and sliding windows with different amounts of overlap can be created by using the Shift setting.

Preparing data for machine learning models is only one use case where windowing is useful. There are other use cases where you may want to operate on subsets of time-series data, for example to check if the statistical properties of a data source change over time. The Window module can then be used as the first step, to produce data that can be analyzed by other modules.

About the author

Goran Appelquist

Göran has 20 years experience in leading technology teams. He’s the lead architect of our end-to-end solution and is extremely focused in securing the lowest possible Total Cost of Ownership for our customers.

"Hidden Lifecycle (employee) cost can account for 5-10 times the purchase price of software. Our goal is to offer a solution that automates and removes most of the tasks that is costly over the lifecycle.

My career started in the academic world where I got a PhD in physics by researching large scale data acquisition systems for physics experiments, such as the LHC at CERN. After leaving academia I have been working in several tech startups in different management positions over the last 20 years.

In most of these positions I have stood with one foot in the R&D team and another in the product/business teams. My passion is learning new technologies, use it to develop innovative products and explain the solutions to end users, technical or non-technical."