Skip to main content

Towards A Unified Streaming & Lakehouse Architecture

Luo Yuxia
Fluss Committer

The unification of Lakehouse and streaming storage represents a major trend in the future development of modern data lakes and streaming storage systems. Designed specifically for real-time analytics, Fluss has embraced a unified Streaming and Lakehouse architecture from its inception, enabling seamless integration into existing Lakehouse architectures.

Fluss is designed to address the demands of real-time analytics with the following key capabilities:

  • Real-Time Stream Reading and Writing: Supports millisecond-level end-to-end latency.
  • Columnar Stream: Optimizes storage and query efficiency.
  • Streaming Updates: Enables low-latency updates to data streams.
  • Changelog Generation: Supports changelog generation and consumption.
  • Real-Time Lookup Queries: Facilitates instant lookup queries on primary keys.
  • Streaming & Lakehouse Unification: Seamlessly integrates streaming and lakehouse storage for unified data processing.

Introducing Fluss: Streaming Storage for Real-Time Analytics

Jark Wu
Creator of Fluss project

We have discussed the challenges of using Kafka for real-time analytics in our previous blog post. Today, we are excited to introduce Fluss, a cutting-edge streaming storage system designed to power real-time analytics. We are going to explore Fluss's architecture, design principles, key features, and how it addresses the challenges of using Kafka for real-time analytics.

Why Fluss? Top 4 Challenges of Using Kafka for Real-Time Analytics

Jark Wu
Creator of Fluss project

The industry is undergoing a clear and significant shift as big data computing transitions from offline to real-time processing. This transition is revolutionizing various sectors, including the E-commerce, automotive networking, finance, and beyond, where real-time data applications are becoming integral to operations. This evolution enables organizations to unlock greater value by leveraging real-time insights to drive business impact and enhance decision-making.

Fluss is Now Open Source

Jark Wu
Creator of Fluss project
Giannis Polyzos
Fluss Contributor

Earlier this year at Flink Forward 2024 Berlin we announced Fluss and today we are thrilled to announce open-sourcing the project. Fluss is a streaming storage system designed to power real-time analytics. It aspires to change how organizations approach real-time data by acting as the real-time data layer for the Lakehouse. Its cutting-edge design enables businesses to achieve sub-second latency, high throughput, and cost efficiency for data analytics, making it the ideal solution for modern data-driven applications.

We have historically invested a lot of effort into advancing the data streaming ecosystem, being major contributors to Apache Flink®, Apache Flink CDC, and Apache Paimon. As part of our commitment, Fluss is now open source under the Apache 2.0 license and is available on GitHub, inviting users to create the next generation of real-time architectures.

FF Announcement