Holden Karau是Databricks的軟件開發工程師,活躍於開源社區。她還著有《Spark快速數據處理》。
Andy Konwinski是Databricks聯閤創始人,Apache Spark項目技術專傢,還是Apache Mesos項目的聯閤發起人。
Patrick Wendell是Databricks聯閤創始人,也是Apache Spark項目技術專傢。他還負責維護Spark核心引擎的幾個子係統。
Matei Zaharia是Databricks的CTO,同時也是Apache Spark項目發起人以及Apache基金會副主席。
Data is getting bigger, arriving faster, and coming in varied formats—and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads efficiently? Enter Apache Spark.
Updated to emphasize new features in Spark 2.x., this second edition shows data engineers and scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine-learning algorithms. Through discourse, code snippets, and notebooks, you’ll be able to:
Learn Python, SQL, Scala, or Java high-level APIs: DataFrames and Datasets
Peek under the hood of the Spark SQL engine to understand Spark transformations and performance
Inspect, tune, and debug your Spark operations with Spark configurations and Spark UI
Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka
Perform analytics on batch and streaming data using Structured Streaming
Build reliable data pipelines with open source Delta Lake and Spark
Develop machine learning pipelines with MLlib and productionize models using MLflow
Use open source Pandas framework Koalas and Spark for data transformation and feature engineering
發表於2025-03-05
Learning Spark, 2nd Edition 2025 pdf epub mobi 電子書 下載
花瞭一天看完這本書,感覺這本書適閤入門級人看,內容比較基礎,沒有閱讀難度。給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好...
評分花瞭一天看完這本書,感覺這本書適閤入門級人看,內容比較基礎,沒有閱讀難度。給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好評給個好...
評分基於Python Spark的大數據分析(第一期) 課程介紹地址:http://www.xuetuwuyou.com/course/173 課程齣自學途無憂網:http://www.xuetuwuyou.com 講師:軒宇老師 1、開課時間:小班化教學授課,第一期開課時間為5月20號(滿30人開班,先報先學!); 2、學習方式:在綫直播,...
評分一本入門的好書,講解瞭spark的基本情況,講解瞭spark core已經內部常用組件,稍顯不足的是書中的spark版本較低,有些內容已經在新版本中不適用瞭 書中對RDD做瞭非常詳盡的講解,對spark streaming spark sql , MLlib等內容講解不多 總之,對於入門來說足夠瞭,而且本...
評分我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看過瞭 我看...
圖書標籤: Spark 計算機科學 分布式 軟件工程 數據分析 大數據 BigData
Learning Spark, 2nd Edition 2025 pdf epub mobi 電子書 下載