Multivariate Analysis

Mardia, Kanti V. / Kent, John T. / Taylor, Charles C.

Wiley Series in Probability and Statistics

2. Edition July 2024
592 Pages, Hardcover
Textbook

ISBN: 978-1-118-73802-3

John Wiley & Sons

Sample Chapter

Further versions

Multivariate Analysis

Comprehensive Reference Work on Multivariate Analysis and its Applications

The first edition of this book, by Mardia, Kent and Bibby, has been used globally for over 40 years. This second edition brings many topics up to date, with a special emphasis on recent developments.

A wide range of material in multivariate analysis is covered, including the classical themes of multivariate normal theory, multivariate regression, inference, multidimensional scaling, factor analysis, cluster analysis and principal component analysis. The book also now covers modern developments such as graphical models, robust estimation, statistical learning, and high-dimensional methods. The book expertly blends theory and application, providing numerous worked examples and exercises at the end of each chapter. The reader is assumed to have a basic knowledge of mathematical statistics at an undergraduate level together with an elementary understanding of linear algebra. There are appendices which provide a background in matrix algebra, a summary of univariate statistics, a collection of statistical tables and a discussion of computational aspects. The work includes coverage of:
* Basic properties of random vectors, copulas, normal distribution theory, and estimation
* Hypothesis testing, multivariate regression, and analysis of variance
* Principal component analysis, factor analysis, and canonical correlation analysis
* Discriminant analysis, cluster analysis, and multidimensional scaling
* New advances and techniques, including supervised and unsupervised statistical learning, graphical models and regularization methods for high-dimensional data

Although primarily designed as a textbook for final year undergraduates and postgraduate students in mathematics and statistics, the book will also be of interest to research workers and applied scientists.

Epigraph

Preface xiii

Preface to the Second Edition xiii

Preface to the First Edition xv

Acknowledgements from First Edition xviii

Notation, abbreviations and key ideas xix

1 Introduction 1

1.1 Objects and Variables 1

1.2 Some Multivariate Problems and Techniques 2

1.2.1 Generalizations of univariate techniques 2

1.2.2 Dependence and regression 2

1.2.3 Linear combinations 2

1.2.4 Assignment and dissection 5

1.2.5 Building configurations 6

1.3 The Data Matrix 6

1.4 Summary Statistics 7

1.4.1 The mean vector and covariance matrix 8

1.4.2 Measures of multivariate scatter 11

1.5 Linear Combinations 11

1.5.1 The scaling transformation 12

1.5.2 Mahalanobis transformation 12

1.5.3 Principal component transformation 12

1.6 Geometrical Ideas 13

1.7 Graphical Representation 14

1.7.1 Univariate scatters 14

1.7.2 Bivariate scatters 16

1.7.3 Harmonic curves 16

1.7.4 Parallel coordinates plot 19

1.8 Measures of Multivariate Skewness and Kurtosis 19

2 Basic Properties of Random Vectors 27

2.1 Cumulative Distribution Functions and Probability Density Functions 27

2.2 Population Moments 29

2.2.1 Expectation and correlation 29

2.2.2 Population mean vector and covariance matrix 29

2.2.3 Mahalanobis space 31

2.2.4 Higher moments 31

2.2.5 Conditional moments 33

2.3 Characteristic Functions 33

2.4 Transformations 35

2.5 The Multivariate Normal Distribution 36

2.5.1 Definition 36

2.5.2 Geometry 38

2.5.3 Properties 38

2.5.4 Singular multivariate normal distribution 43

2.5.5 The matrix normal distribution 44

2.6 Random Samples 45

2.7 Limit Theorems 47

3 Non-normal Distributions 53

3.1 Introduction 53

3.2 Some Multivariate Generalizations of Univariate Distributions 53

3.2.1 Direct generalizations 53

3.2.2 Common components 54

3.2.3 Stochastic generalizations 55

3.3 Families of Distributions 56

3.3.1 The exponential family 56

3.3.2 The spherical family 57

3.3.3 Elliptical distributions 60

3.3.4 Stable distributions 62

3.4 Insights into skewness and kurtosis 62

3.5 Copulas 64

3.5.1 The Gaussian Copula 66

3.5.2 The Clayton-Mardia copula 67

3.5.3 Archimedean Copulas 68

3.5.4 Fr´echet-H¨offding Bounds 69

4 Normal Distribution Theory 77

4.1 Characterization and Properties 77

4.1.1 The central role of multivariate normal theory 77

4.1.2 A definition by characterization 78

4.2 Linear Forms 79

4.3 Transformations of Normal Data Matrices 81

4.4 The Wishart Distribution 83

4.4.1 Introduction 83

4.4.2 Properties of Wishart matrices 83

4.4.3 PartitionedWishart matrices 86

4.5 The Hotelling T 2 Distribution 89

4.6 Mahalanobis Distance 92

4.6.1 The two-sample Hotelling T 2 statistic 92

4.6.2 A decomposition of Mahalanobis distance 93

4.7 Statistics Based on the Wishart Distribution 95

4.8 Other Distributions Related to the Multivariate Normal 99

5 Estimation 111

5.1 Likelihood and Sufficiency 111

5.1.1 The likelihood function 111

5.1.2 Efficient scores and Fisher's information 112

5.1.3 The Cram´er-Rao lower bound 114

5.1.4 Sufficiency 115

5.2 Maximum Likelihood Estimation 116

5.2.1 General case 116

5.2.2 Multivariate normal case 117

5.2.3 Matrix normal distribution 122

5.3 Robust Estimation of Location and Dispersion for Multivariate Distributions 123

5.3.1 M-Estimates of location 123

5.3.2 Minimum covariance determinant 124

5.3.3 Multivariate trimming 124

5.3.4 Stahel-Donoho estimator 125

5.3.5 Minimum volume estimator 125

5.3.6 Tyler's estimate of scatter 127

5.4 Bayesian inference 127

6 Hypothesis Testing 137

6.1 Introduction 137

6.2 The Techniques Introduced 139

6.2.1 The likelihood ratio test (LRT) 139

6.2.2 The union intersection test (UIT) 143

6.3 The Techniques Further Illustrated 146

6.3.1 One-sample hypotheses on mu 146

6.3.2 One-sample hypotheses on _ 148

6.3.3 Multi-sample hypotheses 152

6.4 Simultaneous Confidence Intervals 156

6.4.1 The one-sample Hotelling T 2 case 156

6.4.2 The two-sample Hotelling T 2 case 157

6.4.3 Other examples 157

6.5 The Behrens-Fisher Problem 157

6.6 Multivariate Hypothesis Testing: Some General Points 158

6.7 Non-normal Data 159

6.8 Mardia's Non-parametric Test for the Bivariate Two-sample Problem 162

7 Multivariate Regression Analysis 169

7.1 Introduction 169

7.2 Maximum Likelihood Estimation 170

7.2.1 Maximum likelihood estimators for B and _ 170

7.2.2 The distribution of ^B and ^_ 172

7.3 The General Linear Hypothesis 173

7.3.1 The likelihood ratio test (LRT) 173

7.3.2 The union intersection test (UIT) 175

7.3.3 Simultaneous confidence intervals 175

7.4 Design Matrices of Degenerate Rank 176

7.5 Multiple Correlation 178

7.5.1 The effect of the mean 178

7.5.2 Multiple correlation coefficient 178

7.5.3 Partial correlation coefficient 180

7.5.4 Measures of correlation between vectors 181

7.6 Least Squares Estimation 182

7.6.1 Ordinary least squares (OLS) estimation 182

7.6.2 Generalized least squares 183

7.6.3 Application to multivariate regression 183

7.6.4 Asymptotic consistency of least squares estimators 184

7.7 Discarding of Variables 184

7.7.1 Dependence analysis 184

7.7.2 Interdependence analysis 186

8 GraphicalModels 195

8.1 Introduction 195

8.2 Graphs and Conditional independence 196

8.3 Gaussian Graphical Models 201

8.3.1 Estimation 202

8.3.2 Model selection 207

8.4 Log-linear Graphical Models 208

8.4.1 Notation 209

8.4.2 Log-linear models 210

8.4.3 Log-linear models with a graphical interpretation 213

8.5 Directed and Mixed Graphs 215

9 Principal Component Analysis 221

9.1 Introduction 221

9.2 Definition and Properties of Principal Components 221

9.2.1 Population principal components 221

9.2.2 Sample principal components 224

9.2.3 Further properties of principal components 225

9.2.4 Correlation structure 229

9.2.5 The effect of ignoring some components 229

9.2.6 Graphical representation of principal components 232

9.2.7 Biplots 232

9.3 Sampling Properties of Principal Components 236

9.3.1 Maximum likelihood estimation for normal data 236

9.3.2 Asymptotic distributions for normal data 239

9.4 Testing Hypotheses about Principal Components 242

9.4.1 Introduction 242

9.4.2 The hypothesis that (_1 + * * * + _k)/(_1 + * * * + _p) = 244

9.4.3 The hypothesis that (p . k) eigenvalues of _ are equal 245

9.4.4 Hypotheses concerning correlation matrices 246

9.5 Correspondence Analysis 247

9.5.1 Contingency tables 247

9.5.2 Gradient analysis 253

9.6 Allometry-- the Measurement of Size and Shape 255

9.7 Discarding of variables 258

9.8 Principal Component Regression 259

9.9 Projection Pursuit and Independent Component Analysis 261

9.9.1 Projection pursuit 261

9.9.2 Independent component analysis 263

9.10 PCA in high dimensions 266

10 Factor Analysis 277

10.1 Introduction 277

10.2 The Factor Model 278

10.2.1 Definition 278

10.2.2 Scale invariance 279

10.2.3 Non-uniqueness of factor loadings 279

10.2.4 Estimation of the parameters in factor analysis 280

10.2.5 Use of the correlation matrix R in estimation 281

10.3 Principal Factor Analysis 282

10.4 Maximum Likelihood Factor Analysis 284

10.5 Goodness of Fit Test 287

10.6 Rotation of Factors 288

10.6.1 Interpretation of factors 288

10.6.2 Varimax rotation 289

10.7 Factor Scores 293

10.8 Relationships Between Factor Analysis and Principal Component Analysis 294

10.9 Analysis of Covariance Structures 295

11 Canonical Correlation Analysis 299

11.1 Introduction 299

11.2 Mathematical Development 300

11.2.1 Population canonical correlation analysis 300

11.2.2 Sample canonical correlation analysis 304

11.2.3 Sampling properties and tests 305

11.2.4 Scoring and prediction 306

11.3 Qualitative Data and Dummy Variables 307

11.4 Qualitative and Quantitative Data 309

12 Discriminant Analysis and Statistical Learning 317

12.1 Introduction 317

12.2 Bayes' Discriminant Rule 319

12.3 The error rate 320

12.3.1 Probabilities of misclassification 320

12.3.2 Estimation of error rate 323

12.3.3 Confusion matrix 324

12.4 Discrimination Using the Normal Distribution 324

12.4.1 Population discriminant rules 324

12.4.2 The sample discriminant rules 326

12.4.3 Is discrimination worthwhile? 334

12.5 Discarding of Variables 334

12.6 Fisher's Linear Discriminant Function 336

12.7 Nonparametric Distance-based Methods 339

12.7.1 Nearest neighbor classifier 339

12.7.2 Large sample behavior of the Nearest Neighbor Classifier 341

12.7.3 Kernel classifiers 344

12.8 Classification Trees 346

12.8.1 Splitting criteria 348

12.8.2 Pruning 351

12.9 Logistic Discrimination 354

12.9.1 Logistic regression model 354

12.9.2 Estimation and inference 356

12.9.3 Interpretation of the parameter estimates 356

12.9.4 Extensions 360

12.10Neural Networks 360

12.10.1Motivation 360

12.10.2Multi-layer perceptron 361

12.10.3 Radial basis functions 363

12.10.4 Support Vector Machines 366

13 Multivariate Analysis of Variance 379

13.1 Introduction 379

13.2 Formulation of Multivariate One-way Classification 379

13.3 The Likelihood Ratio Principle 380

13.4 Testing Fixed Contrasts 382

13.5 Canonical Variables and a Test of Dimensionality 383

13.5.1 The problem 383

13.5.2 The LR test (_ known) 383

13.5.3 Asymptotic distribution of the likelihood ratio criterion 385

13.5.4 The estimated plane 386

13.5.5 The LR test (unknown _) 387

13.5.6 The estimated plane (unknown _) 387

13.5.7 Profile analysis 393

13.6 The union intersection approach 394

13.7 Two-way Classification 395

14 Cluster Analysis and Unsupervised Learning 405

14.1 Introduction 405

14.2 Probabilistic membership models 406

14.3 Parametric mixture models 410

14.4 Partitioning Methods 412

14.5 Hierarchical Methods 418

14.5.1 Agglomerative algorithms 418

14.5.2 Minimum spanning tree and single linkage 421

14.5.3 Properties of different agglomerative algorithms 423

14.6 Distances and Similarities 425

14.6.1 Distances 425

14.6.2 Similarity coefficients 430

14.7 Grouped Data 432

14.8 Mode Seeking 434

14.9 Measures of agreement 436

15 Multidimensional Scaling 449

15.1 Introduction 449

15.2 Classical solution 451

15.2.1 Some theoretical results 451

15.2.2 An algorithm for the classic MDS solution 454

15.2.3 Similarities 456

15.3 Duality Between Principal Coordinate Analysis and Principal Component

Analysis 459

15.4 Optimal Properties of the Classical Solution and Goodness of Fit 460

15.5 Seriation 467

15.5.1 Description 467

15.5.2 Horseshoe effect 468

15.6 Non-metric methods 469

15.7 Goodness of Fit Measure: Procrustes Rotation 472

15.8 Multi-sample Problem and Canonical Variates 475

16 High-dimensional Data 481

16.1 Introduction 481

16.2 ShrinkageMethods in Regression 483

16.2.1 The multiple linear regression model 483

16.2.2 Ridge regression 484

16.2.3 Least absolute selection and shrinkage operator (LASSO) 486

16.3 Principal Component Regression 488

16.4 Partial Least Squares Regression 490

16.4.1 Overview 490

16.4.2 The PLS1 algorithm to construct the PLS loading matrix for p = 1

response variable 490

16.4.3 The PLS2 algorithm to construct the PLS loading matrix for p > 1

response variables 493

16.4.4 The predictor envelope model 494

16.4.5 PLS regression 494

16.4.6 Joint envelope models 496

16.5 Functional Data 498

16.5.1 Functional principal component analysis 499

16.5.2 Functional linear regression models 503

A Matrix Algebra 509

A.1 Introduction 509

A.2 Matrix Operations 512

A.2.1 Transpose 512

A.2.2 Trace 513

A.2.3 Determinants and cofactors 513

A.2.4 Inverse 515

A.2.5 Kronecker products 516

A.3 Further Particular Matrices and Types of Matrix 517

A.3.1 Orthogonal matrices 517

A.3.2 Equicorrelation matrix 518

A.3.3 Centering matrix 519

A.4 Vector Spaces, Rank, and Linear Equations 519

A.4.1 Vector spaces 519

A.4.2 Rank 521

A.4.3 Linear equations 522

A.5 Linear Transformations 523

A.6 Eigenvalues and Eigenvectors 523

A.6.1 General results 523

A.6.2 Symmetric matrices 525

A.7 Quadratic Forms and Definiteness 531

A.8 Generalized Inverse 533

A.9 Matrix Differentiation and Maximization Problems 535

A.10 Geometrical Ideas 538

A.10.1 n-dimensional geometry 538

A.10.2 Orthogonal transformations 538

A.10.3 Projections 539

A.10.4 Ellipsoids 539

B Univariate Statistics 543

B.1 Introduction 543

B.2 Normal Distribution 543

B.3 Chi-squared Distribution 544

B.4 F and Beta Variables 544

B.5 t distribution 545

B.6 Poisson distribution 546

C R commands and data 547

C.1 Basic R Commands Related to Matrices 547

C.2 R Libraries and Commands Used in Exercises and Figures 548

C.3 Data Availability 549

D Tables 551

References 560

Index

Kanti V. Mardia OBE is a Senior Research Professor in the Department of Statistics at the University of Leeds, Leverhulme Emeritus Fellow, and Visiting Professor in the Department of Statistics, University of Oxford.

John T. Kent and Charles C. Taylor are both Professors in the Department of Statistics, University of Leeds.

K. V. Mardia, University of Leeds, UK; J. T. Kent, University of Leeds, UK; C. C. Taylor, University of Leeds, UK