Ready to unleash the power of your massive dataset? With the latest edition of this comprehensive resource, you'll learn how to use Apache Hadoop to build and maintain reliable, scalable, distributed systems. It's ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. This third edition covers recent changes to Hadoop, including new material on the new MapReduce API, as well as version 2 of the MapReduce runtime (YARN) and its more flexible execution model. You'll also find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. * Store large datasets with the Hadoop Distributed File System (HDFS), then run distributed computations with MapReduce * Use Hadoop's data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence * Discover common pitfalls and advanced features for writing real-world MapReduce programs * Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud * Use Pig, a high-level query language for large-scale data processing * Analyze datasets with Hive, Hadoop's data warehousing system * Load data from relational databases into HDFS, using Sqoop * Take advantage of HBase, the database for structured and semi-structured data * Use ZooKeeper, the toolkit for building distributed systems
發表於2025-01-22
Hadoop 2025 pdf epub mobi 電子書 下載
中文版412頁: 所以理論上,任何東西都可以錶示成二進製形式,然後轉化成為長整型的字符串或直接對數據結構進行序列化,來作為鍵值。 原文460頁: ..., so theoretically anything can serve as row key, from strings to binary representations of long or even serialized ...
評分Cobub Razor APP數據統計分析工具官網上有篇文章是講Hadoop Yarn調度器的選擇和使用的,我覺得寫的挺好的,推薦http://www.cobub.com/the-selection-and-use-of-hadoop-yarn-scheduler/
評分很多地方翻譯的不行,需要對照英文看纔能明白。。。不過對於快速學習,仍然是不錯的選擇。建議譯者看看每部分內容的重要性,不重要的瞎翻翻就算瞭,重要的部分還是好好花點功夫,不要本末倒置瞭。比如第三章的數據流部分,這麼經典的地方居然被翻譯爛的一塌糊塗。不知道譯者會...
評分 評分參加豆瓣China-pub抽奬,比較幸運的得到這本Hadoop權威指南中文第二版,拿來與第一版相比,發現新加入瞭Hive和Sqoop章節,譯文質量也提高瞭不少,並且保留瞭英文索引。 這本書對Hadoop的介紹還算全麵,有實踐衝動的朋友基本可以拿著書、配閤Google百度馬上實現夢想。個人感覺“...
圖書標籤: Hadoop 分布式 並行計算 數據挖掘 大數據 計算機 O'Reilly 編程
這書到後麵已經神遊瞭,沒這環境先不玩
評分過瞭一遍,隻知道個大概結構。細節還不是很懂
評分感覺還行,講的比較細
評分感覺還行,講的比較細
評分看完瞭Hadoop部分,看瞭部分可選模塊章節。 真心寫的挺仔細的。 17.3月 171216-17 看完瞭PART III Hadoop Operations
Hadoop 2025 pdf epub mobi 電子書 下載