Data Mining and Predictive Analytics
Wiley Series on Methods and Applications

2. Auflage April 2015
824 Seiten, Hardcover
Praktikerbuch
Learn methods of data analysis and their application to real-world data sets
This updated second edition serves as an introduction to data mining methods and models, including association rules, clustering, neural networks, logistic regression, and multivariate analysis. The authors apply a unified "white box" approach to data mining methods and models. This approach is designed to walk readers through the operations and nuances of the various methods, using small data sets, so readers can gain an insight into the inner workings of the method under review. Chapters provide readers with hands-on analysis problems, representing an opportunity for readers to apply their newly-acquired data mining expertise to solving real problems using large, real-world data sets.
Data Mining and Predictive Analytics, Second Edition:
* Offers comprehensive coverage of association rules, clustering, neural networks, logistic regression, multivariate analysis, and R statistical programming language
* Features over 750 chapter exercises, allowing readers to assess their understanding of the new material
* Provides a detailed case study that brings together the lessons learned in the book
* Includes access to the companion website, www.dataminingconsultant, with exclusive password-protected instructor content
Data Mining and Predictive Analytics, Second Edition will appeal to computer science and statistic students, as well as students in MBA programs, and chief executives.
ACKNOWLEDGMENTS xxix
PART I DATA PREPARATION 1
CHAPTER 1 AN INTRODUCTION TO DATA MINING AND PREDICTIVE ANALYTICS 3
CHAPTER 2 DATA PREPROCESSING 20
CHAPTER 3 EXPLORATORY DATA ANALYSIS 54
CHAPTER 4 DIMENSION-REDUCTION METHODS 92
PART II STATISTICAL ANALYSIS 129
CHAPTER 5 UNIVARIATE STATISTICAL ANALYSIS 131
CHAPTER 6 MULTIVARIATE STATISTICS 148
CHAPTER 7 PREPARING TO MODEL THE DATA 160
CHAPTER 8 SIMPLE LINEAR REGRESSION 171
CHAPTER 9 MULTIPLE REGRESSION AND MODEL BUILDING 236
PART III CLASSIFICATION 299
CHAPTER 10 k-NEAREST NEIGHBOR ALGORITHM 301
CHAPTER 11 DECISION TREES 317
CHAPTER 12 NEURAL NETWORKS 339
CHAPTER 13 LOGISTIC REGRESSION 359
CHAPTER 14 NAÏVE BAYES AND BAYESIAN NETWORKS 414
CHAPTER 15 MODEL EVALUATION TECHNIQUES 451
CHAPTER 16 COST-BENEFIT ANALYSIS USING DATA-DRIVEN COSTS 471
CHAPTER 17 COST-BENEFIT ANALYSIS FOR TRINARY AND k-NARY CLASSIFICATION MODELS 491
CHAPTER 18 GRAPHICAL EVALUATION OF CLASSIFICATION MODELS 510
PART IV CLUSTERING 521
CHAPTER 19 HIERARCHICAL AND k-MEANS CLUSTERING 523
CHAPTER 20 KOHONEN NETWORKS 542
CHAPTER 21 BIRCH CLUSTERING 560
CHAPTER 22 MEASURING CLUSTER GOODNESS 582
PART V ASSOCIATION RULES 601
CHAPTER 23 ASSOCIATION RULES 603
PART VI ENHANCING MODEL PERFORMANCE 623
CHAPTER 24 SEGMENTATION MODELS 625
CHAPTER 25 ENSEMBLE METHODS: BAGGING AND BOOSTING 637
CHAPTER 26 MODEL VOTING AND PROPENSITY AVERAGING 653
PART VII FURTHER TOPICS 669
CHAPTER 27 GENETIC ALGORITHMS 671
CHAPTER 28 IMPUTATION OF MISSING DATA 695
PART VIII CASE STUDY: PREDICTING RESPONSE TO DIRECT-MAIL MARKETING 705
CHAPTER 29 CASE STUDY, PART 1: BUSINESS UNDERSTANDING, DATA PREPARATION, AND EDA 707
CHAPTER 30 CASE STUDY, PART 2: CLUSTERING AND PRINCIPAL COMPONENTS ANALYSIS 732
CHAPTER 31 CASE STUDY, PART 3: MODELING AND EVALUATION FOR PERFORMANCE AND INTERPRETABILITY 749
CHAPTER 32 CASE STUDY, PART 4: MODELING AND EVALUATION FOR HIGH PERFORMANCE ONLY 762
APPENDIX A DATA SUMMARIZATION AND VISUALIZATION 768
Part 1: Summarization 1: Building Blocks of Data Analysis 768
Part 2: Visualization: Graphs and Tables for Summarizing and Organizing Data 770
Part 3: Summarization 2: Measures of Center, Variability, and Position 774
Part 4: Summarization and Visualization of Bivariate Relationships 777
INDEX 781
Chantal D. Larose is a Ph.D. candidate in Statistics at the University of Connecticut. Her research focuses on the imputation of missing data and model-based clustering. She has taught undergraduate statistics since 2011, and is a statistical consultant for DataMiningConsultant.com, LLC.