Tyler Akidau is a senior staff software engineer at Google, where he is the technical lead for the Data Processing Languages & Systems group, responsible for Google's Apache Beam efforts, Google Cloud Dataflow, and internal data processing tools like Google Flume, MapReduce, and MillWheel. His also a founding member of the Apache Beam PMC. Though deeply passionate and vocal about the capabilities and importance of stream processing, he is also a firm believer in batch and streaming as two sides of the same coin, with the real endgame for data processing systems the seamless merging between the two. He is the author of the 2015 Dataflow Model paper and the Streaming 101 and Streaming 102 articles on the O’Reilly website. His preferred mode of transportation is by cargo bike, with his two young daughters in tow.
Slava Chernyak is a senior software engineer at Google Seattle. Slava spent over five years working on Google’s internal massive-scale streaming data processing systems and has since become involved with designing and building Windmill, Google Cloud Dataflow's next-generation streaming backend, from the ground up. Slava is passionate about making massive-scale stream processing available and useful to a broader audience. When he is not working on streaming systems, Slava is out enjoying the natural beauty of the Pacific Northwest.
Reuven Lax is a senior staff software engineer at Google Seattle, and has spent the past nine years helping to shape Google's data processing and analysis strategy. For much of that time he has focused on Google's low-latency, streaming data processing efforts, first as a long-time member and lead of the MillWheel team, and more recently founding and leading the team responsible for Windmill, the next-generation stream processing engine powering Google Cloud Dataflow. He's very excited to bring Google's data-processing experience to the world at large, and proud to have been a part of publishing both the MillWheel paper in 2013 and the Dataflow Model paper in 2015. When not at work, Reuven enjoys swing dancing, rock climbing, and exploring new parts of the world.
发表于2024-11-24
Streaming Systems 2024 pdf epub mobi 电子书
Streaming SQL没有仔细读,回头再来研究; 关于流式计算,这本书讲得非常透彻,从数据(bounded data VS unbounded data,stream vs table)到计算(batch vs streaming, window/trigger/accumulation)娓娓道来(有时候甚至觉得啰嗦,哈哈),看完之后会对学习流式计算框架很...
评分Streaming SQL没有仔细读,回头再来研究; 关于流式计算,这本书讲得非常透彻,从数据(bounded data VS unbounded data,stream vs table)到计算(batch vs streaming, window/trigger/accumulation)娓娓道来(有时候甚至觉得啰嗦,哈哈),看完之后会对学习流式计算框架很...
评分Streaming SQL没有仔细读,回头再来研究; 关于流式计算,这本书讲得非常透彻,从数据(bounded data VS unbounded data,stream vs table)到计算(batch vs streaming, window/trigger/accumulation)娓娓道来(有时候甚至觉得啰嗦,哈哈),看完之后会对学习流式计算框架很...
评分Streaming SQL没有仔细读,回头再来研究; 关于流式计算,这本书讲得非常透彻,从数据(bounded data VS unbounded data,stream vs table)到计算(batch vs streaming, window/trigger/accumulation)娓娓道来(有时候甚至觉得啰嗦,哈哈),看完之后会对学习流式计算框架很...
评分Streaming SQL没有仔细读,回头再来研究; 关于流式计算,这本书讲得非常透彻,从数据(bounded data VS unbounded data,stream vs table)到计算(batch vs streaming, window/trigger/accumulation)娓娓道来(有时候甚至觉得啰嗦,哈哈),看完之后会对学习流式计算框架很...
图书标签: 流式计算 大数据 分布式 流计算 计算机 数据库 软件工程 数据挖掘
Streaming data is a big deal in big data these days. As more and more businesses seek to tame the massive unbounded data sets that pervade our world, streaming systems have finally reached a level of maturity sufficient for mainstream adoption. With this practical guide, data engineers, data scientists, and developers will learn how to work with streaming data in a conceptual and platform-agnostic way.
Expanded from Tyler Akidau’s popular blog posts "Streaming 101" and "Streaming 102", this book takes you from an introductory level to a nuanced understanding of the what, where, when, and how of processing real-time data streams. You’ll also dive deep into watermarks and exactly-once processing with co-authors Slava Chernyak and Reuven Lax.
You’ll explore:
How streaming and batch data processing patterns compare
The core principles and concepts behind robust out-of-order data processing
How watermarks track progress and completeness in infinite datasets
How exactly-once data processing techniques ensure correctness
How the concepts of streams and tables form the foundations of both batch and streaming data processing
The practical motivations behind a powerful persistent state mechanism, driven by a real-world example
How time-varying relations provide a link between stream processing and the world of SQL and relational algebra
前两章就已经定义出一个流式计算的几个要素了,后面的一些概念都是用这几个要素组合出来的。窗口那章感觉挺好懂的,有点啰嗦。effectly once 那章讲了一下google dataflow的内部实现方式。sql语句和后面的join那章将最开始定义的几个要素划到sql的世界当中,整个流式sql的理论讲述的非常清晰了。 如果读过flink的文档,这本书只要看上面提到的章节即可。
评分看不太懂,但总算看完了。对streaming有更多了解后会读第二遍
评分前两章就已经定义出一个流式计算的几个要素了,后面的一些概念都是用这几个要素组合出来的。窗口那章感觉挺好懂的,有点啰嗦。effectly once 那章讲了一下google dataflow的内部实现方式。sql语句和后面的join那章将最开始定义的几个要素划到sql的世界当中,整个流式sql的理论讲述的非常清晰了。 如果读过flink的文档,这本书只要看上面提到的章节即可。
评分看不太懂,但总算看完了。对streaming有更多了解后会读第二遍
评分一般,比较无聊
Streaming Systems 2024 pdf epub mobi 电子书