Preface
1 Introduction Overview of Supervised Learning
2.1 Introduction
2.2 Variable Types and Terminology
2.3 Two Simple Approaches to Prediction: Least Squares and Nearest Neighbors
2.3.1 Linear Models and Least Squares
2.3.2 Nearest-Neighbor Methods
2.3.3 From Least Squares to Nearest Neighbors
2.4 Statistical Decision Theory
2.5 Local Methods in High Dimensions
2.6 Statistical Models, Supervised Learning and Function Approximation
2.6.1 A Statistical Model for the Joint Distribution Pr(X,Y)
2.6.2 Supervised Learning
2.6.3 Function Approximation
2.7 Structured Regression Models
2.7.1 Difficulty of the Problem
2.8 Classes of Restricted Estimators
2.8.1 Roughness Penalty and Bayesian Methods
2.8.2 Kernel Methods and Local Regression
2.8.3 Basis Functions and Dictionary Methods
2.9 Model Selection and the Bias-Variance Tradeoff
Bibliographic Notes
Exercises
3 Linear Methods for Regression
3.1 Introduction
3.2 Linear Regression Models and Least Squares
3.2.1 Example:Prostate Cancer
3.2.2 The Ganss-Markov Theorem
3.3 Multiple Regression from Simple Univariate Regression
3.3.1 Multiple Outputs
3.4 Subset Selection and Coefficient Shrinkage
3.4.1 Subset Selection
3.4.2 Prostate Cancer Data Example fContinued)
3.4.3 Shrinkage Methods
3.4.4 Methods Using Derived Input Directions
3.4.5 Discussion:A Comparison of the Selection and Shrinkage Methods
3.4.6 Multiple Outcome Shrinkage and Selection
3.5 Compntational Considerations
Bibliographic Notes
Exercises
4 Linear Methods for Classification
4.1 Introduction
4.2 Linear Regression of an Indicator Matrix
4.3 Linear Discriminant Analysis
4.3.1 Regularized Discriminant Analysis
4.3.2 Computations for LDA
4.3.3 Reduced-Rank Linear Discriminant Analysis
4.4 Logistic Regression
4.4.1 Fitting Logistic Regression Models
4.4.2 Example:South African Heart Disease
4.4.3 Quadratic Approximations and Inference
4.4.4 Logistic Regression or LDA7
4.5 Separating Hyper planes
4.5.1 Rosenblatts Perceptron Learning Algorithm
4.5.2 Optimal Separating Hyper planes
Bibliographic Notes
Exercises
5 Basis Expansions and Regularizatlon
5.1 Introduction
5.2 Piecewise Polynomials and Splines
5.2.1 Natural Cubic Splines
5.2.2 Example: South African Heart Disease (Continued)
5.2.3 Example: Phoneme Recognition
5.3 Filtering and Feature Extraction
5.4 Smoothing Splines
5.4.1 Degrees of Freedom and Smoother Matrices
5.5 Automatic Selection of the Smoothing Parameters
5.5.1 Fixing the Degrees of Freedom
5.5.2 The Bias-Variance Tradeoff
5.6 Nonparametric Logistic Regression
5.7 Multidimensional Splines
5.8 Regularization and Reproducing Kernel Hilbert Spaces . .
5.8.1 Spaces of Phnctions Generated by Kernels
5.8.2 Examples of RKHS
5.9 Wavelet Smoothing
5.9.1 Wavelet Bases and the Wavelet Transform
5.9.2 Adaptive Wavelet Filtering
Bibliographic Notes
Exercises
Appendix: Computational Considerations for Splines
Appendix: B-splines
Appendix: Computations for Smoothing Splines
6 Kernel Methods
6.1 One-Dimensional Kernel Smoothers
6.1.1 Local Linear Regression
6.1.2 Local Polynomial Regression
6.2 Selecting the Width of the Kernel
6.3 Local Regression in Jap
6.4 Structured Local Regression Models in ]ap
6.4.1 Structured Kernels
6.4.2 Structured Regression Functions
6.5 Local Likelihood and Other Models
6.6 Kernel Density Estimation and Classification
6.6.1 Kernel Density Estimation
6.6.2 Kernel Density Classification
6.6.3 The Naive Bayes Classifier
6.7 Radial Basis Functions and Kernels
6.8 Mixture Models for Density Estimation and Classification
6.9 Computational Considerations
Bibliographic Notes
Exercises
7 Model Assessment and Selection
7.1 Introduction
7.2 Bias, Variance and Model Complexity
7.3 The Bias-Variance Decomposition
7.3.1 Example: Bias-Variance Tradeoff
7.4 Optimism of the Training Error Rate
7.5 Estimates of In-Sample Prediction Error
7.6 The Effective Number of Parameters
7.7 The Bayesian Approach and BIC
7.8 Minimum Description Length
7.9 Vapnik Chernovenkis Dimension
7.9.1 Example (Continued)
7.10 Cross-Validation
7.11 Bootstrap Methods
7.11.1 Example (Continued)
Bibliographic Notes
Exercises
8 Model Inference and Averaging
8.1 Introduction
8.2 The Bootstrap and Maximum Likelihood Methods
8.2.1 A Smoothing Example
8.2.2 Maximum Likelihood Inference
8.2.3 Bootstrap versus Maximum Likelihood
8.3 Bayesian Methods
8.4 Relationship Between the Bootstrap and Bayesian Inference
8.5 The EM Algorithm
8.5.1 Two-Component Mixture Model
8.5.2 The EM Algorithm in General
8.5.3 EM as a Maximization-Maximization Procedure
8.6 MCMC for Sampling from the Posterior
8.7 Bagging
8.7.1 Example: Trees with Simulated Data
8.8 Model Averaging and Stacking
8.9 Stochastic Search: Bumping
Bibliographic Notes
Exercises
9 Additive Models, Trees, and Related Methods
9.1 Generalized Additive Models
9.1.1 Fitting Additive Models
9.1.2 Example: Additive Logistic Regression
9.1.3 Summary
9.2 Tree Based Methods
10 Boosting and Additive Trees
11 Neural Networks
12 Support Vector Machines and Flexible Discriminants
13 Prototype Methods and Nearest-Neighbors
14 Unsupervised Learning
References
Author Index
Index
· · · · · · (
收起)