作者:《The Elements of Statistical Learning》书籍
出版社:Springer
出版年:December 2008
评分:9.4
ISBN:9780387848570
所属分类:网络科技
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for "wide" data (p bigger than n), including multiple testing and false discovery rates.
1 Introduction
2 Overview of Supervised Learning
2.1 Introduction
2.2 Variable Types and Terminology
2.3 Two Simple Approaches to Prediction:
Least Squares and Nearest Neighbors
2.3.1 Linear Models and Least Squares
2.3.2 Nearest-Neighbor Methods
2.3.3 From Least Squares to Nearest Neighbors
2.4 Statistical Decision Theory
2.5 Local Methods in High Dimensions
2.6 Statistical Models, Supervised Learning
and Function Approximation
2.6.1 A Statistical Model
for the Joint Distribution Pr(X, Y )
2.6.2 Supervised Learning
2.6.3 Function Approximation
2.7 Structured Regression Models
2.7.1 Difficulty of the Problem
2.8 Classes of Restricted Estimators
2.8.1 Roughness Penalty and Bayesian Methods
2.8.2 Kernel Methods and Local Regression
2.8.3 Basis Functions and Dictionary Methods
2.9 Model Selection and the Bias–Variance Tradeoff
Bibliographic Notes
Exercises
3 Linear Methods for Regression
3.1 Introduction
3.2 Linear Regression Models and Least Squares
3.2.1 Example: Prostate Cancer
3.2.2 The Gauss–Markov Theorem
3.2.3 Multiple Regression
from Simple Univariate Regression
3.2.4 Multiple Outputs
3.3 Subset Selection
3.3.1 Best-Subset Selection
3.3.2 Forward- and Backward-Stepwise Selection
3.3.3 Forward-Stagewise Regression
3.3.4 Prostate Cancer Data Example (Continued)
3.4 Shrinkage Methods
3.4.1 Ridge Regression
3.4.2 The Lasso
3.4.3 Discussion: Subset Selection, Ridge Regression
and the Lasso
3.4.4 Least Angle Regression
3.5 Methods Using Derived Input Directions
3.5.1 Principal Components Regression
3.5.2 Partial Least Squares
3.6 Discussion: A Comparison of the Selection
and Shrinkage Methods
3.7 Multiple Outcome Shrinkage and Selection
3.8 More on the Lasso and Related Path Algorithms
3.8.1 Incremental Forward Stagewise Regression
3.8.2 Piecewise-Linear Path Algorithms
3.8.3 The Dantzig Selector
3.8.4 The Grouped Lasso
3.8.5 Further Properties of the Lasso
3.8.6 Pathwise Coordinate Optimization
3.9 Computational Considerations
Bibliographic Notes
Exercises
4 Linear Methods for Classification
4.1 Introduction
4.2 Linear Regression of an Indicator Matrix
4.3 Linear Discriminant Analysis
4.3.1 Regularized Discriminant Analysis
4.3.2 Computations for LDA
4.3.3 Reduced-Rank Linear Discriminant Analysis
4.4 Logistic Regression
4.4.1 Fitting Logistic Regression Models
4.4.2 Example: South African Heart Disease
4.4.3 Quadratic Approximations and Inference
4.4.4 L1 Regularized Logistic Regression
4.4.5 Logistic Regression or LDA?
4.5 Separating Hyperplanes
4.5.1 Rosenblatt’s Perceptron Learning Algorithm .
4.5.2 Optimal Separating Hyperplanes
Bibliographic Notes
Exercises
5 Basis Expansions and Regularization
5.1 Introduction
5.2 Piecewise Polynomials and Splines
5.2.1 Natural Cubic Splines
5.2.2 Example: South African Heart Disease (Continued)
5.2.3 Example: Phoneme Recognition
5.3 Filtering and Feature Extraction
5.4 Smoothing Splines
5.4.1 Degrees of Freedom and Smoother Matrices
5.5 Automatic Selection of the Smoothing Parameters
5.5.1 Fixing the Degrees of Freedom
5.5.2 The Bias–Variance Tradeoff
5.6 Nonparametric Logistic Regression
5.7 Multidimensional Splines
5.8 Regularization and Reproducing Kernel Hilbert Spaces
5.8.1 Spaces of Functions Generated by Kernels
5.8.2 Examples of RKHS
5.9 Wavelet Smoothing
5.9.1 Wavelet Bases and the Wavelet Transform
5.9.2 Adaptive Wavelet Filtering
Bibliographic Notes
Exercises
Appendix: Computational Considerations for Splines
Appendix: B-splines
Appendix: Computations for Smoothing Splines
6 Kernel Smoothing Methods
6.1 One-Dimensional Kernel Smoothers
6.1.1 Local Linear Regression
6.1.2 Local Polynomial Regression
6.2 Selecting the Width of the Kernel
6.3 Local Regression in IRp
6.4 Structured Local Regression Models in IRp
6.4.1 Structured Kernels
6.4.2 Structured Regression Functions
6.5 Local Likelihood and Other Models
6.6 Kernel Density Estimation and Classification
6.6.1 Kernel Density Estimation
6.6.2 Kernel Density Classification
6.6.3 The Naive Bayes Classifier
6.7 Radial Basis Functions and Kernels
6.8 Mixture Models for Density Estimation and Classification
6.9 Computational Considerations
Bibliographic Notes
Exercises
7 Model Assessment and Selection
7.1 Introduction
7.2 Bias, Variance and Model Complexity
7.3 The Bias–Variance Decomposition 223
7.3.1 Example: Bias–Variance Tradeoff
7.4 Optimism of the Training Error Rate
7.5 Estimates of In-Sample Prediction Error
7.6 The Effective Number of Parameters
7.7 The Bayesian Approach and BIC
7.8 Minimum Description Length
7.9 Vapnik–Chervonenkis Dimension
7.9.1 Example (Continued)
7.10 Cross-Validation
7.10.1 K-Fold Cross-Validation
7.10.2 The Wrong and Right Way
to Do Cross-validation
7.10.3 Does Cross-Validation Really Work?
7.11 Bootstrap Methods
7.11.1 Example (Continued)
7.12 Conditional or Expected Test Error?
Bibliographic Notes
Exercises
8 Model Inference and Averaging
8.1 Introduction
8.2 The Bootstrap and Maximum Likelihood Methods
8.2.1 A Smoothing Example
8.2.2 Maximum Likelihood Inference
8.2.3 Bootstrap versus Maximum Likelihood
8.3 Bayesian Methods
8.4 Relationship Between the Bootstrap
and Bayesian Inference
8.5 The EM Algorithm
8.5.1 Two-Component Mixture Model
8.5.2 The EM Algorithm in General
8.5.3 EM as a Maximization–Maximization Procedure
8.6 MCMC for Sampling from the Posterior
8.7 Bagging
8.7.1 Example: Trees with Simulated Data
8.8 Model Averaging and Stacking
8.9 Stochastic Search: Bumping
Bibliographic Notes
Exercises
9 Additive Models, Trees, and Related Methods
9.1 Generalized Additive Models
9.1.1 Fitting Additive Models
9.1.2 Example: Additive Logistic Regression
9.1.3 Summary
9.2 Tree-Based Methods
9.2.1 Background
9.2.2 Regression Trees
9.2.3 Classification Trees
9.2.4 Other Issues
9.2.5 Spam Example (Continued)
9.3 PRIM: Bump Hunting
9.3.1 Spam Example (Continued)
9.4 MARS: Multivariate Adaptive Regression Splines
9.4.1 Spam Example (Continued)
9.4.2 Example (Simulated Data)
9.4.3 Other Issues
9.5 Hierarchical Mixtures of Experts
9.6 Missing Data
9.7 Computational Considerations
Bibliographic Notes
Exercises
10 Boosting and Additive Trees
10.1 Boosting Methods
10.1.1 Outline of This Chapter
10.2 Boosting Fits an Additive Model
10.3 Forward Stagewise Additive Modeling
10.4 Exponential Loss and AdaBoost
10.5 Why Exponential Loss?
10.6 Loss Functions and Robustness
10.7 “Off-the-Shelf” Procedures for Data Mining
10.8 Example: Spam Data
10.9 Boosting Trees
10.10 Numerical Optimization via Gradient Boosting
10.10.1 Steepest Descent
10.10.2 Gradient Boosting
10.10.3 Implementations of Gradient Boosting
10.11 Right-Sized Trees for Boosting
10.12 Regularization
10.12.1 Shrinkage
10.12.2 Subsampling
10.13 Interpretation
10.13.1 Relative Importance of Predictor Variables
10.13.2 Partial Dependence Plots
10.14 Illustrations
10.14.1 California Housing
10.14.2 New Zealand Fish
10.14.3 Demographics Data
Bibliographic Notes
Exercises
11 Neural Networks
11.1 Introduction
11.2 Projection Pursuit Regression
11.3 Neural Networks
11.4 Fitting Neural Networks
11.5 Some Issues in Training Neural Networks
11.5.1 Starting Values
11.5.2 Overfitting
11.5.3 Scaling of the Inputs
11.5.4 Number of Hidden Units and Layers
11.5.5 Multiple Minima
11.6 Example: Simulated Data
11.7 Example: ZIP Code Data
11.8 Discussion
11.9 Bayesian Neural Nets and the NIPS 2003 Challenge
11.9.1 Bayes, Boosting and Bagging
11.9.2 Performance Comparisons
11.10 Computational Considerations
Bibliographic Notes
Exercises
12 Support Vector Machines and
Flexible Discriminants
12.1 Introduction
12.2 The Support Vector Classifier
12.2.1 Computing the Support Vector Classifier
12.2.2 Mixture Example (Continued)
12.3 Support Vector Machines and Kernels
12.3.1 Computing the SVM for Classification
12.3.2 The SVM as a Penalization Method
12.3.3 Function Estimation and Reproducing Kernels
12.3.4 SVMs and the Curse of Dimensionality
12.3.5 A Path Algorithm for the SVM Classifier
12.3.6 Support Vector Machines for Regression
12.3.7 Regression and Kernels
12.3.8 Discussion
12.4 Generalizing Linear Discriminant Analysis
12.5 Flexible Discriminant Analysis
12.5.1 Computing the FDA Estimates
12.6 Penalized Discriminant Analysis
12.7 Mixture Discriminant Analysis
12.7.1 Example: Waveform Data
Bibliographic Notes
Exercises
13 Prototype Methods and Nearest-Neighbors
13.1 Introduction
13.2 Prototype Methods
13.2.1 K-means Clustering
13.2.2 Learning Vector Quantization
13.2.3 Gaussian Mixtures
13.3 k-Nearest-Neighbor Classifiers
13.3.1 Example: A Comparative Study
13.3.2 Example: k-Nearest-Neighbors
and Image Scene Classification
13.3.3 Invariant Metrics and Tangent Distance
13.4 Adaptive Nearest-Neighbor Methods
13.4.1 Example
13.4.2 Global Dimension Reduction
for Nearest-Neighbors
13.5 Computational Considerations
Bibliographic Notes
Exercises
14 Unsupervised Learning
14.1 Introduction
14.2 Association Rules
14.2.1 Market Basket Analysis
14.2.2 The Apriori Algorithm
14.2.3 Example: Market Basket Analysis
14.2.4 Unsupervised as Supervised Learning
14.2.5 Generalized Association Rules
14.2.6 Choice of Supervised Learning Method
14.2.7 Example: Market Basket Analysis (Continued)
14.3 Cluster Analysis
14.3.1 Proximity Matrices
14.3.2 Dissimilarities Based on Attributes
14.3.3 Object Dissimilarity
14.3.4 Clustering Algorithms
14.3.5 Combinatorial Algorithms
14.3.6 K-means
14.3.7 Gaussian Mixtures as Soft K-means Clustering
14.3.8 Example: Human Tumor Microarray Data
14.3.9 Vector Quantization
14.3.10 K-medoids
14.3.11 Practical Issues
14.3.12 Hierarchical Clustering
14.4 Self-Organizing Maps
14.5 Principal Components, Curves and Surfaces
14.5.1 Principal Components
14.5.2 Principal Curves and Surfaces
14.5.3 Spectral Clustering
14.5.4 Kernel Principal Components
14.5.5 Sparse Principal Components
14.6 Non-negative Matrix Factorization
14.6.1 Archetypal Analysis
14.7 Independent Component Analysis
and Exploratory Projection Pursuit
14.7.1 Latent Variables and Factor Analysis
14.7.2 Independent Component Analysis
14.7.3 Exploratory Projection Pursuit
14.7.4 A Direct Approach to ICA
14.8 Multidimensional Scaling
14.9 Nonlinear Dimension Reduction
and Local Multidimensional Scaling
14.10 The Google PageRank Algorithm
Bibliographic Notes
Exercises
15 Random Forests
15.1 Introduction
15.2 Definition of Random Forests
15.3 Details of Random Forests
15.3.1 Out of Bag Samples
15.3.2 Variable Importance
15.3.3 Proximity Plots
15.3.4 Random Forests and Overfitting
15.4 Analysis of Random Forests
15.4.1 Variance and the De-Correlation Effect
15.4.2 Bias
15.4.3 Adaptive Nearest Neighbors
Bibliographic Notes
Exercises
16 Ensemble Learning
16.1 Introduction
16.2 Boosting and Regularization Paths
16.2.1 Penalized Regression
16.2.2 The “Bet on Sparsity” Principle
16.2.3 Regularization Paths, Over-fitting and Margins
16.3 Learning Ensembles
16.3.1 Learning a Good Ensemble
16.3.2 Rule Ensembles
Bibliographic Notes
Exercises
17 Undirected Graphical Models
17.1 Introduction
17.2 Markov Graphs and Their Properties
17.3 Undirected Graphical Models for Continuous Variables
17.3.1 Estimation of the Parameters
when the Graph Structure is Known
17.3.2 Estimation of the Graph Structure
17.4 Undirected Graphical Models for Discrete Variables
17.4.1 Estimation of the Parameters
when the Graph Structure is Known
17.4.2 Hidden Nodes
17.4.3 Estimation of the Graph Structure
17.4.4 Restricted Boltzmann Machines
Exercises
18 High-Dimensional Problems: p ≫ N
18.1 When p is Much Bigger than N
18.2 Diagonal Linear Discriminant Analysis
and Nearest Shrunken Centroids
18.3 Linear Classifiers with Quadratic Regularization
18.3.1 Regularized Discriminant Analysis
18.3.2 Logistic Regression
with Quadratic Regularization
18.3.3 The Support Vector Classifier
18.3.4 Feature Selection
18.3.5 Computational Shortcuts When p ≫ N
18.4 Linear Classifiers with L1 Regularization
18.4.1 Application of Lasso
to Protein Mass Spectroscopy
18.4.2 The Fused Lasso for Functional Data
18.5 Classification When Features are Unavailable
18.5.1 Example: String Kernels
and Protein Classification
18.5.2 Classification and Other Models Using
Inner-Product Kernels and Pairwise Distances .
18.5.3 Example: Abstracts Classification
18.6 High-Dimensional Regression: Supervised Principal Components
18.6.1 Connection to Latent-Variable Modeling
18.6.2 Relationship with Partial Least Squares
18.6.3 Pre-Conditioning for Feature Selection
18.7 Feature Assessment and the Multiple-Testing Problem
18.7.1 The False Discovery Rate
18.7.2 Asymmetric Cutpoints and the SAM Procedure
18.7.3 A Bayesian Interpretation of the FDR
18.8 Bibliographic Notes
Exercises
《斯坦福社会创新评论09》内容简介:区块链、人工智能、3D打印等在给社会创新领域带来新的发展,是否也触发了科技的“黑暗面”?技
《顺利升入一年级》内容简介:幼小衔接是教育的基础工程,但很多家长对幼小衔接认识不清,在错误的方向上努力,以至问题丛生。本书
《妞妞:一个父亲的札记》内容简介:父爱如山,周国平感动万千读者的经典之作。二十多年前,45岁的周国平喜得一女妞妞,如同拥有了
VEX机器人设计 本书特色 vex是国际机器人比赛内容之一,每年全世界都 有很多学生参与这一活动。这一比赛所涉及的机器人 不仅可用于比赛,同时也是学生学习机器人...
书中内容基于C++全书共分10章。第0章讲解了算法的概念及体例说明。第1~7章分别就计数问题、信息查找问题、组合优化问题、图中搜
《看得懂的金融投资课》内容简介:什么是货币?什么是金融?你要怎么做才能获得更高的、更稳健的投资收益,更好地分析投资的风险,
《京胡伴奏京剧经典唱段选》内容简介:这本《陈平一京胡伴奏京剧经典唱段》包含青衣、花脸、老旦、老生的唱腔名段,包括《西施》、
《辣味江湖》内容简介:这不仅是口腹之欲的记录,更是一次美食文化之旅。在这本《辣味江湖——一个食客的寻味笔记》中,要云先生以
《从零开始学K线(实战操练图解版)》内容简介:在股市如此繁荣的今天,你是否想进入股市大展身手?作为股市新手,你是否因为无法读
《网络至死:如何在喧嚣的互联网时代重获我们的创造力和思维力》内容简介:我们也许已经晓悟,但也许并未察觉,我们正陷入空前的“
《成人教育服务乡村振兴的实践研究》内容简介:本书旨在对国家乡村振兴战略文件深度解读的基础上,调研并提炼近年来宁波市成人教育
《元代辽阳行省女真人研究》内容简介:本书对元代辽阳行省女真人进行全景式系列研究。书中将元代辽阳行省女真人分成北部、东部、南
Fiddler是一种流行的Web调试代理。它功能强大,界面友好,简单易用,无论对开发人员或者测试人员来说,都是非常有用的工具。《Fi
AhoandUllmanhavecreatedaCversionoftheirgroundbreakingtext.Asinthattext,thisbookc...
《我将前往的远方》内容简介:联合报文学大奖得主郭强生,《断代》后又一力作 “人生私散文”获奖作,献给单身初老族的一首情歌 难
《极简父母法则:教出快乐、自信、独立的孩子》内容简介:爱默生曾经说过,我们为孩子的美丽和幸福感到极大的欢乐,这欢乐使我们的
《大学的改革(第五卷·学子篇·研究生)》内容简介:1. 经管学院院长倾力打造,与国际接轨的本土教育圣典;所有家长之枕边书,所以
《江苏历代文化名人传·丁文江》内容简介:本书主要从“科学”与“近代化”两方面展现了地质学家、社会活动家丁文江波澜壮阔的一生
中文版Windows7 从入门到精通 本书特色 《从入门到精通系列:中文版windows 7从入门到精通》特点一本图书 三本价值1本书=入门十提高十精通=3本书...
《许崇德论基本法文集》内容简介:本书为许崇德教授在制定香港、澳门基本法过程中,发表的论文合集。许崇德教授2018年获评改革开放