Ready to unleash the power of your massive dataset? With the latest edition of this comprehensive resource, you'll learn how to use Apache Hadoop to build and maintain reliable, scalable, distributed systems. It's ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. This third edition covers recent changes to Hadoop, including new material on the new MapReduce API, as well as version 2 of the MapReduce runtime (YARN) and its more flexible execution model. You'll also find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. * Store large datasets with the Hadoop Distributed File System (HDFS), then run distributed computations with MapReduce * Use Hadoop's data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence * Discover common pitfalls and advanced features for writing real-world MapReduce programs * Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud * Use Pig, a high-level query language for large-scale data processing * Analyze datasets with Hive, Hadoop's data warehousing system * Load data from relational databases into HDFS, using Sqoop * Take advantage of HBase, the database for structured and semi-structured data * Use ZooKeeper, the toolkit for building distributed systems
發表於2024-12-22
Hadoop 2024 pdf epub mobi 電子書 下載
很好的Hadoop教程,比Apache和Yahoo !網頁版guide詳細很多,很多想不明白的Hadoop實現細節都可以在這本書裏找到。
評分看瞭幾章中文版的,各種錯誤,太低級,實在是看不下去瞭。 建議還是看原版吧。 譯者們的臉皮可真厚,英文譯不明白也就罷瞭,中文都組織的不通順,好意思嗎!! 什麼叫 “但是,......,但是”啊,“但是體”啊。
評分其實也不算全部讀完瞭,讀它主要是為瞭技術選型,考慮升級持久層架構、提高係統可擴展性,仔細研讀瞭前幾章,對Hadoop、MapReduce、HDFS的模型、機製、使用場景有瞭一定瞭解。後麵幾章及其生態圈內的其他項目抱著瞭解的心態簡單瀏覽瞭一下。整體感覺還行,至少從我看過的章節來...
評分 評分圖書標籤: Hadoop 分布式 並行計算 數據挖掘 大數據 計算機 O'Reilly 編程
看瞭前兩部分,講的比較清楚
評分看瞭前兩部分,講的比較清楚
評分可以當做概覽
評分The system of Big Data, all focuse on the Scality, Fault torlerance, Scheduler, Shuffle.
評分這書到後麵已經神遊瞭,沒這環境先不玩
Hadoop 2024 pdf epub mobi 電子書 下載