Pravega: Storage for streams (ApacheCon @Home 2020)

This is the talk I have given at ApacheCon @Home – 2020 in the Streaming track. I start by motivating stream data and the need of storage for streams. Traditional storage systems on one side of the abstraction spectrum and messaging systems on the other have not really solved the problem of storage for streaming data. Pravega is a system that we have built from ground up to solve this problem, and I introduce Pravega in this presentation, covering important aspects of its architecture:

– The scalability of metadata using table segments
– The separation of IOs for the read and write paths

One of the core features of Pravega is stream scaling, which enables streams to change dynamically, adapting to workload variations. But, what does it imply for reading? I discuss with an example the implications and show two ways to read from such a stream: ordered (Event Stream API) and unordered (Batch API).

The stream segments that form streams in Pravega can be used for a number of features, like scaling and transactions, and even to offer different abstractions. Two that we have implemented in Pravega are the state synchronizer and key-value tables. Key-value tables have been introduced in the latest 0.8.0 release.

Performance is a very relevant topic for streaming as it is often important for applications to be able to tail streams (low latency) and sustain processing in the presence of large volumes of data (high throughput). The presentation includes a couple of graphs showing latency and throughput both for writes only and end-to-end (read what is being written). Pravega is able to sustain low latency with high throughput in both cases. I also show the result of an experiment in which we make a reader group catch up with a backlog of data while more data is being written, which Pravega was successfully able to handle. Stay tuned for a blog post with a lot more detail.

The presentation ends with a few words on Pravega and summary of the key points of the presentation.

2 thoughts on “Pravega: Storage for streams (ApacheCon @Home 2020)

Leave a comment