Article

When does it make sense to use Crosser as an alternative to Kafka?

Kafka has become almost synonymous with real-time streaming applications today, often being the starting point when building a new streaming application.

However, is it always the best approach? In this article, we will show that there are many applications where Kafka may not be the best choice, and that these applications can be implemented easier, faster, and cheaper using Crosser's Streaming Analytics solution.

Crosser and Kafka use cases in common illustration

Although the underlying technology is different between Crosser and Kafka there is quite some overlap between the use cases the platforms can address. Comparing the Technology:

At its core, Kafka is a message broker with persistence that can be scaled to handle huge data volumes. A message broker is a useful tool when building large applications where producers and consumers of data need to be decoupled. Persistence is also helpful when data needs to be correlated over long time spans or across multiple out-of-sync streams. However, all of these benefits come at a significant cost. Kafka requires multiple servers and is complex to configure and operate and each cluster needs to be locally managed one-by-one

Crosser is low code and an in-memory streaming analytics solution optimized to be lightweight, modular, flexible and fast. It has a built-in message broker and persistent storage targeted mainly for buffering purposes and to guarantee “at-least-once” delivery of messages. The Crosser Node (processing runtime) only requires a single server (two for redundancy) and is centrally managed through the Crosser Control Center.

As you can see in the table there are significant differences in the minimum requirements of running a redundant installation. The infrastructure complexity adds additional operational hurdles and costs on top of the cost associated with the hardware needed.

Comparing Crosser and Kafka Technology Illustration

Illustration: Kafka and Crosser Technical Requirements Comparison. Reference article here →

Complexity and barriers vs Simplicity and Agility

Kafka uses a concept of Events, Topics, Partitions, ZooKeeper, Kafka Connect, Stream APIs etc that is engineered to be highly scalable, elastic, fault-tolerant, and secure. It is optimized for advanced requirements of processing ultra-high volumes of social media events and for securing processing of bank transactions and other requirements of the largest of companies.

This comes with a price of complexity and high requirements of development skills. Kafka clusters need to be deployed and configured by highly trained IT teams and developing and implementing use cases requires access to developers. These are teams that typically are in high demand and often become bottlenecks which slows down the implementation and increases cost.

Crosser is built around the concept of building event streaming pipelines we call Flows with a low-code, drag-and-drop visual design studio where non-developers easily connect existing building blocks we call modules into processing Flows. Simplicity and Agility are guiding principles for the Crosser Platform.

Illustration: Kafka and Crosser Organisational Comparison. Reference article here →

Use cases ideal for non-Kafka solutions

There are many applications where the high-end Kafka features are unnecessary, and you can get up and running much faster and at a lower cost using alternative tools like Crosser. Let's take a look at some examples of such applications:

Event-driven synchronization of changes between applications/data integration

Synchronizing changes between two systems, either in one direction or both ways, is a common use case. For example, updating order information from a CRM system to an ERP system, or keeping contact information synchronized between a CRM system and a marketing tool. In such cases, we deal with streams of changes that need to be transformed to apply them to the destination system.

These change events can either be pushed from the source system whenever there is a change or pulled from the source system at regular intervals. If events are pulled, persistence is typically unnecessary; we only need to keep track of the timestamp of the last successfully delivered event, and the events can be pulled again from the source system.

If events are pushed, however, we will need persistence, since we will receive the events only once and must ensure that they are delivered to the destination. Still, if the volume of events is reasonable, we can implement persistence without having to set up a full-blown Kafka system.

Crosser - Event-driven synchronisation of changes between applications data integration

Analyze and take actions on real-time data/automation flows

Analyzing streams of data to trigger actions is another common use case for streaming, with anomaly detection being the most frequent scenario. This may involve simple range checks or the use of machine learning models to analyze the data.

The message broker functionality could be useful if multiple analysis applications are used to analyze the same streams. Persistence might be necessary to ensure that the analysis results are delivered correctly, but it is generally not required to store the input streams, as analysis is typically performed over short time periods. These requirements can be met without needing to set up a full-blown Kafka system.

Crosser - Analyze and take actions on real-time data/automation flows

Send Machine data to a Datalake/Warehouse

Machines, whether on the factory floor or in the field, can generate massive amounts of data. To facilitate large-scale analysis and machine learning model training, it is often desirable to collect this data in a central data lake or data warehouse, typically hosted in the cloud.

The data sources, such as sensors and counters, produce streams of data that need to be sent to the central storage as soon as it becomes available. Apart from basic cleaning, filtering, and transformations, and possibly aggregations, no further processing is needed. Since the data is sent from each source to a single destination, a message broker adds no value. The basic processing applied requires no correlation over long time spans, hence persistent queues are unnecessary.

The data may need to be temporarily buffered to ensure successful delivery, but that's all. As the data sources are naturally distributed (machines, process areas, sites, etc.), the same ingest application can be replicated in each location to achieve the necessary scaling.

Crosser - Send Machine data to a Datalake/Warehouse

Other use cases

Examples of other use cases where a lightweight and low code solution can be considered instead of Kafka:

Event & Stream processing with lower volume requirements
Non-mission-critical applications
Use-cases where exactly-once delivery guarantee is not needed
Event-driven application integration
Streaming ETL and ELT pipelines for data warehouses and data lakes

Non-Kafka use cases

Industrial Edge: Requires connectivity to new and legacy industrial protocols together with data transformation and harmonization before data can be used.
Closed-loop edge optimization with ML: Analyzing machine data with ML models to derive optimized settings or detect anomalies typically requires low latency.
Distributed Edge: Mass roll-out of the same application to many distributed edge locations.

Use cases where Crosser is not a good fit

Fraud detection: Requires access to long sequences of data to detect fraudulent patterns.
Web site analytics: Several independent and out-of-sync streams of data must be joined in order to derive insights.

Complementing Kafka

Besides being an alternative for Kafka, Crosser can also complement Kafka by either ingesting data from Kafka or sending data to Kafka. Crosser has pre-built Kafka connectors that allow our customers to combine Crosser and Kafka.

Example scenarios:

Running Crosser distributed in remote edge locations for data pre-processing and sending data to a central Kafka cluster, for instance for industrial IoT data
Run Kafka as a central cluster where IT resources are available and data volumes are significant. Run Crosser in branch-offices where skills levels are lower.
Use Crosser as a low code streaming analytics solution on top of Kafka message broker for easier and faster development of more advanced processing
Run trained machine learning and AI models on data coming from Kafka

Crosser and Kafka Complement Diagram

Read the full article: Why using Crosser and Kafka together makes sense →

Crosser Streaming Analytics

Smart. Lightweight. Easy-to-use.

Crosser’s solution can be used to implement many streaming use cases, like the ones above, without requiring an external platform like Kafka. There are many advantages with this:

All-in-one platform - Connectivity and Processing. Extensive library of pre-built connectors and processing modules, complement with your own custom modules.

Low-code design - Build ‘visual’ pipelines by combining modules. Interactive testing from within the design tool.

Centralized management, distributed processing - Manage all use cases centrally from our Control Center (hosted by Crosser or by you). Deploy a single Docker container for on-premise processing or host your pipelines on Crosser.

Low complexity - Crosser’s simple architecture translates into reduced cost for both infrastructure and operations.

As outlined above, Crosser is not a one-to-one replacement for Kafka, even though there are some overlaps, these systems address different problems. Kafka is a message broker with persistence, while Crosser’s focus is on stream processing and connectivity. You can do some stream processing with Kafka using KSQL, but for many use cases separate streaming applications must be used. In the same way Crosser has a message broker built in and some persistence, but is not designed to address the type of use cases that Kafka is designed for.

What we have tried to show with this article is that there are use cases which can be implemented in a much easier and cheaper way by choosing other technology. For some use cases it might also make sense to use Crosser in combination with Kafka, see here.

Next Step

Learn more about the Crosser Platform for Intelligent Data Pipelines & Automations here →

Look into the rich library or connectors here →

Appendix

Table 1 - Resource Type Kafka and Crosser

Resource Type	Kafka Min. Requirement	Crosser Min. Requirement
CPU Cores	130	2
RAM Memory (GB)	245	0.6
Storage	43.536	optional
Processing	read/write to storage	in-memory
Hardware cost	High	Very Low
Infrastructure complecity	High	Very Low

Table: Comparing the system requirements for a redundant installation. Reference article here →

Table 2 - Capabilities Kafka and Crosser

Capabilities	Kafka	Crosser
Management	Local	Crosser Control Center
Development	Commands & Code	Low Code Flow Studio w. Drag-n-drop
Skills barriers	High (coding/complexity)	Low (low-code configuration)
IT team burden	High	Very Low

Table: Comparing capabilities for management & development. Reference article here →

20 Jun 2023

About the author

Goran Appelquist (Ph.D) | CTO

Göran has 20 years experience in leading technology teams. He’s the lead architect of our end-to-end solution and is extremely focused in securing the lowest possible Total Cost of Ownership for our customers.

"Hidden Lifecycle (employee) cost can account for 5-10 times the purchase price of software. Our goal is to offer a solution that automates and removes most of the tasks that is costly over the lifecycle.

My career started in the academic world where I got a PhD in physics by researching large scale data acquisition systems for physics experiments, such as the LHC at CERN. After leaving academia I have been working in several tech startups in different management positions over the last 20 years.

In most of these positions I have stood with one foot in the R&D team and another in the product/business teams. My passion is learning new technologies, use it to develop innovative products and explain the solutions to end users, technical or non-technical."