Data Mining and Predictive Analytics

Larose, Daniel T. / Larose, Chantal D.

Wiley Series on Methods and Applications

2. Auflage April 2015
824 Seiten, Hardcover
Praktikerbuch

ISBN: 978-1-118-11619-7

John Wiley & Sons

Probekapitel

Weitere Versionen

Learn methods of data analysis and their application to real-world data sets

This updated second edition serves as an introduction to data mining methods and models, including association rules, clustering, neural networks, logistic regression, and multivariate analysis. The authors apply a unified "white box" approach to data mining methods and models. This approach is designed to walk readers through the operations and nuances of the various methods, using small data sets, so readers can gain an insight into the inner workings of the method under review. Chapters provide readers with hands-on analysis problems, representing an opportunity for readers to apply their newly-acquired data mining expertise to solving real problems using large, real-world data sets.

Data Mining and Predictive Analytics, Second Edition:

* Offers comprehensive coverage of association rules, clustering, neural networks, logistic regression, multivariate analysis, and R statistical programming language
* Features over 750 chapter exercises, allowing readers to assess their understanding of the new material
* Provides a detailed case study that brings together the lessons learned in the book
* Includes access to the companion website, www.dataminingconsultant, with exclusive password-protected instructor content

Data Mining and Predictive Analytics, Second Edition will appeal to computer science and statistic students, as well as students in MBA programs, and chief executives.

PREFACE xxi

ACKNOWLEDGMENTS xxix

PART I DATA PREPARATION 1

CHAPTER 1 AN INTRODUCTION TO DATA MINING AND PREDICTIVE ANALYTICS 3

CHAPTER 2 DATA PREPROCESSING 20

CHAPTER 3 EXPLORATORY DATA ANALYSIS 54

CHAPTER 4 DIMENSION-REDUCTION METHODS 92

PART II STATISTICAL ANALYSIS 129

CHAPTER 5 UNIVARIATE STATISTICAL ANALYSIS 131

CHAPTER 6 MULTIVARIATE STATISTICS 148

CHAPTER 7 PREPARING TO MODEL THE DATA 160

CHAPTER 8 SIMPLE LINEAR REGRESSION 171

CHAPTER 9 MULTIPLE REGRESSION AND MODEL BUILDING 236

PART III CLASSIFICATION 299

CHAPTER 10 k-NEAREST NEIGHBOR ALGORITHM 301

CHAPTER 11 DECISION TREES 317

CHAPTER 12 NEURAL NETWORKS 339

CHAPTER 13 LOGISTIC REGRESSION 359

CHAPTER 14 NAÏVE BAYES AND BAYESIAN NETWORKS 414

CHAPTER 15 MODEL EVALUATION TECHNIQUES 451

CHAPTER 16 COST-BENEFIT ANALYSIS USING DATA-DRIVEN COSTS 471

CHAPTER 17 COST-BENEFIT ANALYSIS FOR TRINARY AND k-NARY CLASSIFICATION MODELS 491

CHAPTER 18 GRAPHICAL EVALUATION OF CLASSIFICATION MODELS 510

PART IV CLUSTERING 521

CHAPTER 19 HIERARCHICAL AND k-MEANS CLUSTERING 523

CHAPTER 20 KOHONEN NETWORKS 542

CHAPTER 21 BIRCH CLUSTERING 560

CHAPTER 22 MEASURING CLUSTER GOODNESS 582

PART V ASSOCIATION RULES 601

CHAPTER 23 ASSOCIATION RULES 603

PART VI ENHANCING MODEL PERFORMANCE 623

CHAPTER 24 SEGMENTATION MODELS 625

CHAPTER 25 ENSEMBLE METHODS: BAGGING AND BOOSTING 637

CHAPTER 26 MODEL VOTING AND PROPENSITY AVERAGING 653

PART VII FURTHER TOPICS 669

CHAPTER 27 GENETIC ALGORITHMS 671

CHAPTER 28 IMPUTATION OF MISSING DATA 695

PART VIII CASE STUDY: PREDICTING RESPONSE TO DIRECT-MAIL MARKETING 705

CHAPTER 29 CASE STUDY, PART 1: BUSINESS UNDERSTANDING, DATA PREPARATION, AND EDA 707

CHAPTER 30 CASE STUDY, PART 2: CLUSTERING AND PRINCIPAL COMPONENTS ANALYSIS 732

CHAPTER 31 CASE STUDY, PART 3: MODELING AND EVALUATION FOR PERFORMANCE AND INTERPRETABILITY 749

CHAPTER 32 CASE STUDY, PART 4: MODELING AND EVALUATION FOR HIGH PERFORMANCE ONLY 762

APPENDIX A DATA SUMMARIZATION AND VISUALIZATION 768

Part 1: Summarization 1: Building Blocks of Data Analysis 768

Part 2: Visualization: Graphs and Tables for Summarizing and Organizing Data 770

Part 3: Summarization 2: Measures of Center, Variability, and Position 774

Part 4: Summarization and Visualization of Bivariate Relationships 777

INDEX 781

Daniel T. Larose is Professor of Mathematical Sciences and Director of the Data Mining programs at Central Connecticut State University. He has published several books, including Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage (Wiley, 2007) and Discovering Knowledge in Data: An Introduction to Data Mining (Wiley, 2005). In addition to his scholarly work, Dr. Larose is a consultant in data mining and statistical analysis working with many high profile clients, including Microsoft, Forbes Magazine, the CIT Group, KPMG International, Computer Associates, and Deloitte, Inc.

Chantal D. Larose is a Ph.D. candidate in Statistics at the University of Connecticut. Her research focuses on the imputation of missing data and model-based clustering. She has taught undergraduate statistics since 2011, and is a statistical consultant for DataMiningConsultant.com, LLC.

D. T. Larose, Department of Mathematical Sciences, and Director of Data Mining@CCSU, at Central Connecticut State University