NovaTec took part in this year’s W-JAX conference in Munich and I was glad to follow some of the interesting talks about Apache Kafka. By the evolution of the ecosystem and the active participation on the conference, one could see that it has made its way to become the mainstream asynchronous messaging platform in many companies. While it can be a replacement for RabbitMQ or Apache ActiveMQ, it is by far more than that.
Several talks emphasized the important role, which Kafka Streams (and the new KSQL feature) can play in the Microservice world for implementing event sourcing plus the CQRS pattern and how it can serve in Big Data environments to get machine learning models into production.
Camunda’s new Microservice orchestration engine zeebe.io, has a promising approach to allow to use BPM techniques in high throughput scenarios. Please read further to see how it relates to the rest of this post. 😉
The presentation from Mike Wiesner (MHP) about data and event driven Microservices showed how a typical example of an event-driven Microservice can look like with Kafka. On the one hand, its basic event log serves as single source of truth in an event-driven architecture. On the other hand, you can build materialized views upon this event stream to allow each service to have its own view on the shared kernel (in Domain-Driven Design terms).
The talk from Kai Wähner (Confluent) KSQL – An Open Source Streaming SQL Engine for Apache Kafka gave an outlook how these materialized views can be built more easily in future with KSQL. It permits to create the first class citizens in Kafka Streams – tables and streams – in an SQL-like manner. This allows it also for non-developers in the project to define them. See the example to find out occurrences of 3 illegal login attempts in a tumbling (sliding) 5 second time range.
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
The resulting KSQL query operates on a shifting time window on the authorization_attempts events and returns continuous stream. With this “Continuous Query” concept, a whole bunch of different views on the event stream can be created. Having both writing services and reading services built on top of Kafka and Kafka Streams, eliminates the need for additional (RDBMS/NoSQL) storage. Write access happens with high performance into the into the distributed commit log, reading can be done from the “consumer-friendly” streams and tables defined using KSQL.
Kai Wähner also gave a presentation about Deep Learning in Mission-critical and scalable Real Time Applications with Open Source Frameworks. In machine learning, data scientists develop and train their models in languages that do not scale well. For example, they often use R or Python. Consequently, it is a common challenge to bring that model into a (Java) production environment. Also here, Kafka streams can help, as demonstrated in an example available on GitHub. It shows how a deep learning model, developed with H2O, is exported as Java source file and integrated into a Kafka streams application to solve that problem: https://github.com/kaiwaehner/kafka-streams-machine-learning-examples.
Last but not least, there was an interesting talk by Bernd Rücker (Camunda) about Workflow and State Machines at scale. It was about about Camunda’s new Microservice orchestration engine zeebe.io, which is also a bit “Kafkaesque”: In order to build the core of the high-throughput BPM engine, they didn’t use Kafka itself, but they basically used the same design approach of its distributed commit log.
For those interested, the Camunda guys are still on tour through meetups and user groups. For instance, they will present zeebe.io again in Munich at the Microservices Meetup. As a preview, the slides from a similar presentation in Vienna are available on Slideshare (similar deck as on W-JAX): Introducing Zeebe.io at Camunda Meetup Vienna 10/2017 from Daniel Meyer.
I recommend to dig deeper into the different use cases because we will see Kafka for a while in the Microservice or Big Data streaming area.