Alex Gorelik is CTO and founder of Waterline Data and the founder of three startups. He also served as GM of Informatica’s Data Quality Business Unit and managed the company’s platform and data integration technology. Also for Informatica, Alex managed a team of 400 engineers and product managers as SVP of R&D for Core Technology, developing Informatica’s platform and Data Integration technology. Alex was an IBM Distinguished Engineer and co-founder, CTO and VP of engineering at Exeros and Acta Technology. Previously, Alex was co-founder, CTO and VP of Engineering at Acta Technology (acquired by Business Objects and now marketed as SAP Business Objects Data Services). Prior to founding Acta, Alex managed development of Replication Server at Sybase and worked on Sybase’s strategy for enterprise application integration (EAI). Earlier, he developed the database kernel for Amdahl’s Design Automation group. Alex holds a B.S. in Computer Science from Columbia University School of Engineering and a M.S. in Computer Science from Stanford University.
The data lake is a daring new approach for harnessing the power of big data technology and providing convenient self-service capabilities. But is it right for your company? This book is based on discussions with practitioners and executives from more than a hundred organizations, ranging from data-driven companies such as Google, LinkedIn, and Facebook, to governments and traditional corporate enterprises. You’ll learn what a data lake is, why enterprises need one, and how to build one successfully with the best practices in this book.
Alex Gorelik, CTO and founder of Waterline Data, explains why old systems and processes can no longer support data needs in the enterprise. Then, in a collection of essays about data lake implementation, you’ll examine data lake initiatives, analytic projects, experiences, and best practices from data experts working in various industries.
Get a succinct introduction to data warehousing, big data, and data science
Learn various paths enterprises take to build a data lake
Explore how to build a self-service model and best practices for providing analysts access to the data
Use different methods for architecting your data lake
Discover ways to implement a data lake from experts in different industries
發表於2024-11-23
The Enterprise Big Data Lake 2024 pdf epub mobi 電子書 下載
這本書很一般,講的實踐、案例太少瞭,不推薦閱讀 但因為數據湖國內講得很少(但實踐非常多),因此簡單寫一下我的認識 一、什麼是數據湖? 用架構圖能很快說明白,用阿裏的數據架構圖來說 - ODS(operational data store, staging area)存儲來自各業務係統(生産係統)的原始...
評分這本書很一般,講的實踐、案例太少瞭,不推薦閱讀 但因為數據湖國內講得很少(但實踐非常多),因此簡單寫一下我的認識 一、什麼是數據湖? 用架構圖能很快說明白,用阿裏的數據架構圖來說 - ODS(operational data store, staging area)存儲來自各業務係統(生産係統)的原始...
評分這本書很一般,講的實踐、案例太少瞭,不推薦閱讀 但因為數據湖國內講得很少(但實踐非常多),因此簡單寫一下我的認識 一、什麼是數據湖? 用架構圖能很快說明白,用阿裏的數據架構圖來說 - ODS(operational data store, staging area)存儲來自各業務係統(生産係統)的原始...
評分這本書很一般,講的實踐、案例太少瞭,不推薦閱讀 但因為數據湖國內講得很少(但實踐非常多),因此簡單寫一下我的認識 一、什麼是數據湖? 用架構圖能很快說明白,用阿裏的數據架構圖來說 - ODS(operational data store, staging area)存儲來自各業務係統(生産係統)的原始...
評分這本書很一般,講的實踐、案例太少瞭,不推薦閱讀 但因為數據湖國內講得很少(但實踐非常多),因此簡單寫一下我的認識 一、什麼是數據湖? 用架構圖能很快說明白,用阿裏的數據架構圖來說 - ODS(operational data store, staging area)存儲來自各業務係統(生産係統)的原始...
圖書標籤: 計算機 Data 大數據 bigdata Hadoop
講的實踐、案例太少瞭,也很少說data warehouse怎麼做,後麵部分也跑偏瞭. 但核心還是不錯的 —— data science和互聯網公司的齣現,産生瞭data lake的管理方式. 因為大傢能夠、也更傾嚮自己分析,而不是去找技術團隊齣數; 而且machine learning用到的數據是傳統data warehouse維度建模無法給到的。self-service, 是data lake 真正的核心,而不再局限於的加工好數據齣BI報錶。算是解答瞭我為什麼對data warehouse完全看不懂的原因,因為我一直用的都是data lake。很好奇國外大公司的實踐到底是怎樣的...
評分講的實踐、案例太少瞭,也很少說data warehouse怎麼做,後麵部分也跑偏瞭. 但核心還是不錯的 —— data science和互聯網公司的齣現,産生瞭data lake的管理方式. 因為大傢能夠、也更傾嚮自己分析,而不是去找技術團隊齣數; 而且machine learning用到的數據是傳統data warehouse維度建模無法給到的。self-service, 是data lake 真正的核心,而不再局限於的加工好數據齣BI報錶。算是解答瞭我為什麼對data warehouse完全看不懂的原因,因為我一直用的都是data lake。很好奇國外大公司的實踐到底是怎樣的...
評分講的實踐、案例太少瞭,也很少說data warehouse怎麼做,後麵部分也跑偏瞭. 但核心還是不錯的 —— data science和互聯網公司的齣現,産生瞭data lake的管理方式. 因為大傢能夠、也更傾嚮自己分析,而不是去找技術團隊齣數; 而且machine learning用到的數據是傳統data warehouse維度建模無法給到的。self-service, 是data lake 真正的核心,而不再局限於的加工好數據齣BI報錶。算是解答瞭我為什麼對data warehouse完全看不懂的原因,因為我一直用的都是data lake。很好奇國外大公司的實踐到底是怎樣的...
評分講的實踐、案例太少瞭,也很少說data warehouse怎麼做,後麵部分也跑偏瞭. 但核心還是不錯的 —— data science和互聯網公司的齣現,産生瞭data lake的管理方式. 因為大傢能夠、也更傾嚮自己分析,而不是去找技術團隊齣數; 而且machine learning用到的數據是傳統data warehouse維度建模無法給到的。self-service, 是data lake 真正的核心,而不再局限於的加工好數據齣BI報錶。算是解答瞭我為什麼對data warehouse完全看不懂的原因,因為我一直用的都是data lake。很好奇國外大公司的實踐到底是怎樣的...
評分講的實踐、案例太少瞭,也很少說data warehouse怎麼做,後麵部分也跑偏瞭. 但核心還是不錯的 —— data science和互聯網公司的齣現,産生瞭data lake的管理方式. 因為大傢能夠、也更傾嚮自己分析,而不是去找技術團隊齣數; 而且machine learning用到的數據是傳統data warehouse維度建模無法給到的。self-service, 是data lake 真正的核心,而不再局限於的加工好數據齣BI報錶。算是解答瞭我為什麼對data warehouse完全看不懂的原因,因為我一直用的都是data lake。很好奇國外大公司的實踐到底是怎樣的...
The Enterprise Big Data Lake 2024 pdf epub mobi 電子書 下載