-
2019 年数据仓库 BI 及 Data Science 最全书单
前两天在网络上搜集一些数据仓库的书,发现有个哥们写了一个非常详细的书单。这份书单可能是2019年最齐全的数据仓库,BI以及数据科学学习书单了,不敢独享,转载到这里方便大家一起学习。
由于本篇文章,在 http://wordpress.com 站点上,原文可能并不是每个人都可以访问,具体原因大家都懂的。所以我就一字不差都转载过来,包括作者自己写的一本入门级数据仓库的书。
作者:Vincent
原文:https://dwbi1.wordpress.com/data-warehousing-books/
Disappointed with the Google search result of “data warehousing books”, I try to put all data warehousing books that I know into this page. It is totally understandable why Google’s search result don’t include ETL or Dimensional Modeling, for example. Same thing with Amazon, see Note 1 below. Even data warehouse books as important as Inmon’s DW 2.0 was missed because the title doesn’t contain the word “Warehouse”.
For data modelling my all time favorite is the Kimball’s toolkit (#1 in the list). Devlin’s, Inmon’s and Imhoff’s classics (#3, #4 and #5 in the list) have broaden my horizon on the basic principles of DW design. For ODS design it’s #17 and the newest model is in #6. If you are building a DW on SQL Server platform, Mundy’s Toolkit (#2) is a treasure. On Oracle, it’s Hobbs (#54) and on Teradata it’s Coffing’s series (#58 to #63). #7 to #11 explain Kimball’s theory in more detail. Some of them are dimensional modelling (Adamson’s #8 is excellent), some are about ETL (Kimball’s #7 is a jewel). For methodology/project management #11 is the classic, #27 is a proven treasure and #83 for the iterative approach.
- The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling by Ralph Kimball and Margy Ross
- Microsoft Data Warehouse Toolkit: With SQL Server 2005 and the Microsoft Business Intelligence Toolset by Joy Mundy, Warren Thornthwaite, and Ralph Kimball
- Building the Data Warehouse by W. H. Inmon
- Mastering Data Warehouse Design: Relational and Dimensional Techniques by Claudia Imhoff, Nicholas Galemmo, and Jonathan G. Geiger
- Data Warehouse: From Architecture to Implementation by Barry Devlin
- DW 2.0: The Architecture for the Next Generation of Data Warehousing by William H. Inmon
- The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data by Ralph Kimball and Joe Caserta
- The Star Schema Handbook: The Complete Reference to Dimensional Data Warehouse Designby Christopher Adamson
- The Data Webhouse Toolkit: Building the Web-enabled Data Warehouse by Ralph Kimball and Richard Merz
- Data Warehouse Design Solutions by Christopher Adamson and Michael Venerable
- The Data Warehouse Lifecycle Toolkit by Ralph Kimball, Margy Ross, Warren Thornthwaite, and Joy Mundy
- Building a Data Warehouse: with Examples on SQL Server by Vincent Rainardi
- Oracle Data Warehousing and Business Intelligence Solutions: With Business Intelligence Solutions by Robert Stackowiak, Joseph Rayman, and Rick Greenwald
- Impossible Data Warehouse Situations: Solutions from the Experts (Information Technology)by Sid Adelman, Joyce Bischoff, Jill Dyché, and Douglas Hackney
- Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance by Christopher Adamson
- Data Warehouse Performance by W. H. Inmon, Ken Rudin, Christopher K. Buss, and Ryan Sousa
- Building the Operational Data Store by W. H. Inmon, Claudia Imhoff, and Greg Battas
- Rapid Data Warehouse Design: User-Focused Techniques for Designing Dimensional Data Warehouses by Lawrence Corr
- Data Warehouse Design: Modern Principles and Methodologies by Matteo Golfarelli and Stefano Rizzi
- Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications (Data-centric Systems and Applications) by Elzbieta Malinowski and Esteban Zimányi
- Designing a Data Warehouse – Supporting Customer Relationship Management by Chris Todman
- Data Warehouses and OLAP: Concepts, Architectures and Solutions by Robert Wrembel and Christian Koncilia
- Implementing a Data Warehouse: A Methodology That Worked by Bruce Russel Ullrey
- Data Warehousing for Dummies by Thomas C. Hammergren
- Improving Data Warehouse and Business Information Quality : Methods for Reducing Costs and Increasing Profits by Larry P English
- Data Warehouse 100 Success Secrets – 100 Most Asked Questions on Data Warehouse Design, Projects, Business Intelligence, Architecture, Software and Models by Richard Martin
- Data Warehouse Project Management by Sid Adelman and Larissa T. Moss
- Data Warehouse Management Handbook by Kachur
- Data Warehouse: Extract, Transform, Load, Metadata, Data Integration, Data Mining, Data Warehouse Appliance, Database Management System, Decision Support System by Frederic P. Miller, Agnes F. Vandome, and John McBrewster
- Oracle Data Warehouse Tuning for 10g by Gavin JT Powell
- Using the Data Warehouse by W. H. Inmon and Richard D. Hackathorn
- Entity-attribute-value model: Data model, Data warehouse, Denormalization, Attribute- value system, Linked Data, Resource Description Framework, Semantic Web, Inner- platform effectby Frederic P. Miller, Agnes F. Vandome, and John McBrewster
- Index Structures for Data Warehouses: v. 1859 (Lecture Notes in Computer Science) by Marcus Jürgens
- Tivoli Data Warehouse Version 1.3: Planning And Implementation by IBM Redbooks and Vasfi Gucer
- Data Warehouse Implementations: Critical Implementation Factors Study by Joe Ganczarski
- The Enterprise Data Warehouse: Planning, Building and Implementation v. 1 by Eric Sperley and Hewlett-Packard
- Data Warehousing in the Real World: A Step-by-step Guide for Building Decision Support Data Warehouses by S. Anahory and D. Murray
- Filtering the Web to Feed Data Warehouses by Witold Abramowicz, Pawel J. Kalczynski, and Krzysztof Wecel
- Data Warehouse: Practical Advice from the Experts by Joyce Bischoff and Ted Alexander
- Leveraging DB2 Data Warehouse Edition for Business Intelligence by IBM Redbooks
- Fundamentals of Data Warehouses by Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, and P. Vassiliadis
- Web-enabled Data Warehouse by William A. Giovinazzo
- Decision Support and Data Warehouse Systems by Efrem G Mallach
- Planning and Designing the Data Warehouse (The Data Warehousing Institute series) by Ramon Barquin and Herb Edelstein
- Data Warehouse Design by William A. Giovinazzo
- Building, Using and Managing the Data Warehouse (Data Warehousing Institute) by Ramon Barquin and Herb Edelstein
- Building a Data Warehouse for Decision Support by Vidette Poe and Laura L. Reeves
- Parallel Systems in the Data Warehouse (Data Warehousing Institute) by Steve Morse and David Isaac
- Decision Support in the Data Warehouse (The Data Warehousing Institute series) by Hugh J. Watson and Paul Gray
- Building a Better Data Warehouse by Don Meyer and Casey E. Cannon
- The Data Model Resource Book: A Library of Logical Data and Data Warehouse Models by Len Silverston, W. H. Inmon, and Kent Graziano
- Managing the Data Warehouse: Practical Techniques for Monitoring Operations and Performances Administering Data and Tools by W. H. Inmon, J. D. Welch, and Katherine L. Glassey
- The Intranet Data Warehouse: Tools and Techniques for Building Intranet-enabled Data Warehouse by Richard Tanler
- Oracle 10g Data Warehousing by Lilian Hobbs PhD, Susan Hillson MS in CIS Boston University, Shilpa Lawande, and Pete Smith
- Oracle9iR2 Data Warehousing by Lilian Hobbs, Susan Hillson MS in CIS Boston University, and Shilpa Lawande
- Oracle8i Data Warehousing by Lilian Hobbs PhD and Susan Hillson MS in CIS Boston University
- Oracle8i Data Warehousing by Michael J. Corey, Michael Abbey, Ben Taub, and Ian Abramson
- Tera-Tom on Teradata Basics by Tom Coffing and Gareth Walter
- Tera-Tom on Teradata Physical Implementation by W. Coffing and Mark Ferguson
- Tera-Tom on Teradata SQL by Tom Cofffing and Robert Hines
- Tera-Tom on Teradata Database Administrator by Tom Coffing and Steve Wilmes
- Tera-Tom on Teradata Designer by Tom Coffing and Todd Wilson
- Tera-Tom on Teradata Application Development by Tom Coffing and Scott Smith
- Tera-Tom on Teradata E-Business by Randy Volters and Tom Coffing
- Teradata SQL Unleash the Power V2R6 by Thomas L. Coffing and Michael Larkins
- Teradata Utilities – Breaking the Barriers by Tom Coffing, Morgan Jones, Mike Larkins, Steve Wilmes, Randy Volters
- Netezza SQL – Harness the Power by Mike Larkins and Tom Coffing
- Netezza Underground: The unauthorized tales of derring-do and adventures in resilient data warehousing solutions byDavid Birmingham
- Teradata Users Guide: The Ultimate Companion by Tom Coffing, Leona Coffing, Chris Coffing, and Robert Hines
- Teradata SQL Quick Reference Guide – Simplicity By Design by Tom Coffing, Todd Carroll, Robert Hines, and Mike Larkins
- Secrets of Best Data Warehouses in the World by Rob Armstrong, Tom Coffing, and Rolf Hanusa
- Common Warehouse Metamodel: An Introduction to the Standard for Data Warehouse Integration (Omg) by John Poole, Dan Chang, Douglas Tolbert, and David Mellor
- 50 Tb Data Warehouse Benchmark on IBM System Z by IBM Redbooks
- E-Business Intelligence Front-End Tool Access to Os/390 Data Warehouse by IBM Redbooks
- Rdb/vms: Developing a Data Warehouse by William H. Inmon and Chuck Kelley
- Data Warehouses: More Than Just Mining by Barbara J. Bashein and M. Lynne Markus
- Corporate Information with Sap(R)-Eis: Building a Data Warehouse and Mis-Application (Efficient business-computing) by Bernd-Ulrich Kaiser
- Dimensional Data Warehousing with MySQL: A Tutorial by Djoni Darmawikarta
- Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals by Paulraj Ponniah
- Data Warehousing, Data Mining, and OLAP (Data Warehousing/Data Management) by Alex Berson and Stephen J. Smith
- Data Warehousing: Architecture and Implementation by Mark W. Humphries, Michael W. Hawkins, and Michelle C. Dy
- Data Warehousing 101: Concepts and Implementation by Arshad Khan
- Agile Data Warehousing: Delivering World-Class Business Intelligence Systems Using Scrum and XP by Ralph Hughes
- e-Data: Turning Data Into Information With Data Warehousing by Jill Dyché
- Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL by Roland Bouman and Jos van Dongen
- A Manager’s Guide to Data Warehousing by Laura Reeves
- Data Warehousing with SAP Bw7 Bi in SAP Netweaver 2004s: Architecture, Concepts, and Implementation by Christian Mehrwald and Sabine Morlock
- Data Warehousing: Using the Wal-Mart Model (The Morgan Kaufmann Series in Data Management Systems) by Paul Westerman
- Oracle DBA Guide to Data Warehousing and Star Schemas by Bert Scalzo
- Building and Maintaining a Data Warehouse by Fon Silvers
- Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions by Pedro Nuno San-Banto Furtado
- Data Warehousing And Business Intelligence For e-Commerce (The Morgan Kaufmann Series in Data Management Systems) by Alan R. Simon and Steven L. Shaffer
- Data Warehousing with Informix: Best Practices by Angela Sanchez
- Data Warehousing: Concepts, Technologies, Implementations, and Management by Harry Singh
- Data Warehousing in Action by Sean Kelly
- High Performance Oracle Data Warehousing: All You Need to Master Professional Database Development Using Oracle by Donald K. Burleson
- Implementing Enterprise Data Warehousing: A Guide for Executives by Alan Schlukbier
- Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development: Innovative Methods and Applications (Advances in Data Warehousing and Mining (Adwm) Book Series) by Tho Manh Nguyen
- New Trends in Data Warehousing and Data Analysis (Annals of Information Systems) by Stanislaw Kozielski and Robert Wrembel
- Data Warehousing with Service-oriented Architecture: Designing and Implementing Prototype Models For an Integration of Near-Real-Time Data Warehousing Architecture with Service-oriented Architecture by Ronnie Abrahiem
- Encyclopedia of Data Warehousing and Mining, Second Edition by John Wang
- IBM Data Warehousing: With IBM Business Intelligence Tools by Michael L. Gonzales
- Clickstream Data Warehousing by Mark Sweiger, Mark R. Madsen, Jimmy Langston, and Howard Lombard
- Intelligent Data Warehousing: From Data Preparation to Data Mining by Zhengxin Chen
- Data Stores, Data Warehousing, and the Zachman Framework: Managing Enterprise Knowledge (Mcgraw-Hill Series on Data Warehousing and Data Management) by William H. Inmon, John A. Zachman, and Jonathan G. Geiger
- Progressive Methods in Data Warehousing and Business Intelligence: Concepts and Competitive Analytics (Advances in Data Warehousing and Mining) by David Taniar
- Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas
- AS/400 Data Warehousing: The Complete Guide to Implementation by Brian W. Kelly
- Data Warehousing and Data Mining for Telecommunications (Artech House Computer Science Library) by Rob Mattison
- Data Warehousing : Design, Development and Best Practices by Soumendra Mohanty
- Exploration Warehousing: Turning Business Information into Business Opportunity by William H. Inmon, R. H. Terdeman, and Claudia Imhoff
- The Data Model Resource Book: A Library of Logical Data and Data Warehouse Designs by Len Silverston, William H. Inmon, and Kent Graziano
- Data Warehousing in the Real World (A Practical Guide for Building Decision Support Systems)by Dennis Murray Sam Anahory
- Parallel Processing Techniques for Data Warehousing and Mining: Application and Challengesby Satchidananda Dehuri
- Essential Oracle8i Data Warehousing: Designing, Building, and Managing Oracle Data Warehouses by Gary Dodge and Tim Gorman
- The Essential Guide to Data Warehousing by Lou Agosta
- Data Warehousing OLAP and Data Mining by S. Nagabhushana
- Building the Customer-Centric Enterprise: Data Warehousing Techniques for Supporting Customer Relationship Management by Claudia Imhoff, Lisa Loftis, and Jonathan G. Geiger
- Data Warehousing: The Ultimate Guide to Building Corporate Business Intelligence (HOTT Guide) by SCN Education B.V.
- Data Warehousing and Knowledge Discovery: 9th International Conference, DaWaK 2007, Regensburg, Germany, September 3-7, 2007, Proceedings (Lecture Notes … Applications, incl. Internet/Web, and HCI) by Il Yeol Song, Johann Eder, and Tho Manh Nguyen
- Clinical Data Mining and Warehousing, An Issue of Clinics in Laboratory Medicine (The Clinics: Internal Medicine) by James Harrison Jr. MD PhD
- Using data warehousing to deliver integrated management information: Case studies of customer data integration using sales and marketing data marts by Shana Ponelis
- Data Warehousing and Knowledge Discovery: 6th International Conference, DaWaK 2004, Zaragoza, Spain, September 1-3, 2004, Proceedings (Lecture Notes in Computer Science) by Yahiko Kambayashi, Mukesh Mohania, and Wolfram Wöß
- Strategic Data Warehousing: Achieving Alignment with Business by Neera Bhansali
- Strategic Data Warehousing Principles Using SAS Software by Peter R. Welbrock
- Data Warehousing: The Route to Mass Communication by Sean Kelly
- Data Warehousing for E-Business by R. H. Terdeman, Joyce Norris-Montanari, Dan Meers, and William H. Inmon
- Data Warehousing and Knowledge Discovery: 10th International Conference, DaWak 2008 Turin, Italy, September 1-5, 2008, Proceedings (Lecture Notes in Computer … Applications, incl. Internet/Web, and HCI) by Il-Yeol Song, Johann Eder, and Tho Manh Nguyen
- Data Warehousing and Data Mining Techniques for Cyber Security (Advances in Information Security) by Anoop Singhal
- Data Warehousing and Decision Support : The State of the Art, Volume 1 by Pam Roth. Volume 2 is here.
- Advances in Database Technologies: ER ’98 Workshops on Data Warehousing and Data Mining, Mobile Data Access, and Collaborative Work Support and Spatio-Temporal … (Lecture Notes in Computer Science) by Yahiko Kambayashi, Dik Lun Lee, Ee-Peng Lim, and Mukesh Kumar Mohania
- Data Warehousing and Web Engineering by Shirley A. Becker
- ERP and Data Warehousing in Organizations: Issues and Challenges by Gerald G. Grant
- Data Warehousing and Knowledge Discovery: 8th International Conference, DaWaK 2006, Krakow, Poland, September 4-8, 2006, Proceedings (Lecture Notes in … Applications, incl. Internet/Web, and HCI) by A Min Tjoa and Juan Trujillo
- Data Warehousing Advice for Managers by Patricia L. Ferdinandi
- Data Warehousing and the Management Accountant (CIMA Research) by Ian Cobb
- Data Warehousing and Knowledge Discovery: 4th International Conference, DaWaK 2002, Aix-en-Provence, France, September 4-6, 2002. Proceedings (Lecture Notes in Computer Science) by Yahiko Kambayashi, Werner Winiwarter, and Masatoshi Arikawa
- Oracle Data Warehousing Unleashed by Michael Schrader, John Dakin, Kieron Hardy, and Matthew Townsend
- Journal of Healthcare Information Management, E-Healthcare Data Warehousing Journal of Healthcare Information Management, No. 2: Journal of Healthcare … Health Care Information Mgmt) by Julie Foreman
- Worldwide Data Warehousing Tools 2004 Vendor Shares by Dan Vesset
- Constructing Data Warehouses with Metadata-driven Generic Operators by Dr Bin Jiang.
- Testing the Data Warehouse Practicum by Doug Vucevic and Wayne Yaddow
Notes:
- You may think that “data warehouse” search in Amazon would also include “data warehousing”. That was what I was thinking. But sadly no. I don’t hope Amazon search is smart enough to interpret that the term “ETL” or “Dimensional Model” has a lot to do with data warehousing either, hence my motive to create this list. Same for the term “ODS” and “data mart”.
- Data warehouse book as important as Inmon’s DW 2.0 was missed because the title doesn’t contain “Warehous*”. Sad. And Data Warehousing 101: Concepts and Implementation by Arshad Khan was missed when we search “Data Warehouse” in Amazon.
- I don’t limit myself on SQL Server. As you can see I also include Oracle ones. We can learn a lot about data warehousing from other platform, particularly the ETL. In fact I learnt a lot from a book called “Oracle 8i Data Warehousing” (Corey et al, not Hobbs & Hilson). Informix, DB2, MySQL, AS/400, SAS, are all in there now.
- I don’t include data modelling book in the list if it’s a general one. I only include it if it’s dimensional model.
- I don’t include “bundle”, e.g. several books packaged and sold as one. An example of a bundle is Kimball’s Toolkit bundle. The reason is because I have included the components individually.
- I don’t include data mining book if it’s only data mining. But if contains data warehousing as well then I include it. See Alex Berson’s for example. Ditto for MDM, BI, OLAP, DQ and Text Analytics. I do include Decision Support though (well of course)
- Can you believe it’s 123 books in data warehousing! That’s a lot of books for 1 area of study/work. And that exclude the things I mentioned above.
- If there are many editions of the book (like Inmon classic) I only include the latest one. First edition is an absolute treasure sometimes, like Kimball’s 1996 but there you go. When it’s a rewrite using different version of the software, I include them. For example: Oracle 8i, 9i and 10g Data Warehousing.
- I do include conference proceedings and lecture notes, despite that some people say they are not ‘real books’. I don’t care the physical form of it (thin, thick, non paper, etc), as long as the content is warehousing.
- Apologies there are many DW books in German which I don’t include here. Primarily because this is an English blog and I can’t write in German. Perhaps somebody else could make a list of these German DW books (there are really a lot of them, check in Amazon).
- I know there is a Data Warehousing book in MySQL. I know it exists because I know the author, who is also from Indonesia like me but he lives in Canada now. Djoni Darmawikarta. So I’ll find it and put it here too.
- I own Barry Devlin’s warehousing book. Very old, the binder is almost off, but the content is illuminating. Primarily because it was written free from Inmon & Kimball influence, hence it defined its owned principles of design. I’ll add it here.
- Intelligent Solution composed a comprehensive list of data warehousing articles, from 1993 to 2006.
My Book
I was sometimes asked by people who wanted to learn data warehousing to recommend a book for them. Some of them are database administrators/data architects (on various platforms) and some are developers (application developers and database developers). They know how to write SQL. They know how to create tables. They know how to query data. They are looking for a basic data warehousing book, which is practical and aimed for beginners. A book that can be used by new starters to build their first data warehouse, and the BI on top of it. A book that contains all the essential topics such as methodology, architecture, data modelling, ETL, data quality, reports, cubes and BI. A book that contains examples and illustrations from real projects which are easy to understand. For this reason I wrote a data warehousing book: Building a Data Warehouse: with Examples on SQL Server (#12).
It has 17 chapters:
- Chapter 1 is about what a data warehouse is
- Chapter 2 is about data warehouse architecture
- Chapter 3 is about methodology / project management
- Chapter 4 is about gathering requirements
- Chapter 5 is about designing the data model, both dimensional and normalised
- Chapter 6 is about the system architecture/servers and configuring the databases
- Chapter 7 is about ETL (extracting data from source systems)
- Chapter 8 is also about ETL (loading data into the warehouse)
- Chapter 9 is about data quality
- Chapter 10 is about metadata
- Chapter 11 is about reports
- Chapter 12 is about OLAP cubes
- Chapter 13 is about BI (Business Intelligence)
- Chapter 14 is about using a data warehouse for CRM
- Chapter 15 is about unstructured data and data warehousing search
- Chapter 16 is about testing
- Chapter 17 is about operation and administration
It contains all the essential topics in data warehousing. In order for this book to be able to be used to build the reader’s first data warehouse, and the BI on top of it, I need to give a case study. A case study that contain examples which span across all those chapters. From designing the architecture, to building the cubes and reports. For this purpose I had to choose a platform. I chose SQL Server as the platform. Not only it has an excellent database engine, it also comes with the ETL, reports, OLAP cubes and data mining tool built-in. SQL Server 2005/2008 is a complete end-to-end data warehousing solution. So in chapter 6 I use SQL Server database server to create the databases. In chapter 7 & 8 I use SSIS for data extraction and data loading (ETL). In chapter 10 I used SQL Server database for metadata. In chapter 11 I used SSRS for reports. In chapter 12 I used SSAS for OLAP cubes. And in chapter 13 I used SSAS for data mining. I hope this book will serve its purpose in providing a basic data warehousing book, which is practical and aimed for beginners