作者:《The Elements of Statistical Learning》书籍
出版社:Springer
出版年:December 2008
评分:9.4
ISBN:9780387848570
所属分类:网络科技
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for "wide" data (p bigger than n), including multiple testing and false discovery rates.
1 Introduction
2 Overview of Supervised Learning
2.1 Introduction
2.2 Variable Types and Terminology
2.3 Two Simple Approaches to Prediction:
Least Squares and Nearest Neighbors
2.3.1 Linear Models and Least Squares
2.3.2 Nearest-Neighbor Methods
2.3.3 From Least Squares to Nearest Neighbors
2.4 Statistical Decision Theory
2.5 Local Methods in High Dimensions
2.6 Statistical Models, Supervised Learning
and Function Approximation
2.6.1 A Statistical Model
for the Joint Distribution Pr(X, Y )
2.6.2 Supervised Learning
2.6.3 Function Approximation
2.7 Structured Regression Models
2.7.1 Difficulty of the Problem
2.8 Classes of Restricted Estimators
2.8.1 Roughness Penalty and Bayesian Methods
2.8.2 Kernel Methods and Local Regression
2.8.3 Basis Functions and Dictionary Methods
2.9 Model Selection and the Bias–Variance Tradeoff
Bibliographic Notes
Exercises
3 Linear Methods for Regression
3.1 Introduction
3.2 Linear Regression Models and Least Squares
3.2.1 Example: Prostate Cancer
3.2.2 The Gauss–Markov Theorem
3.2.3 Multiple Regression
from Simple Univariate Regression
3.2.4 Multiple Outputs
3.3 Subset Selection
3.3.1 Best-Subset Selection
3.3.2 Forward- and Backward-Stepwise Selection
3.3.3 Forward-Stagewise Regression
3.3.4 Prostate Cancer Data Example (Continued)
3.4 Shrinkage Methods
3.4.1 Ridge Regression
3.4.2 The Lasso
3.4.3 Discussion: Subset Selection, Ridge Regression
and the Lasso
3.4.4 Least Angle Regression
3.5 Methods Using Derived Input Directions
3.5.1 Principal Components Regression
3.5.2 Partial Least Squares
3.6 Discussion: A Comparison of the Selection
and Shrinkage Methods
3.7 Multiple Outcome Shrinkage and Selection
3.8 More on the Lasso and Related Path Algorithms
3.8.1 Incremental Forward Stagewise Regression
3.8.2 Piecewise-Linear Path Algorithms
3.8.3 The Dantzig Selector
3.8.4 The Grouped Lasso
3.8.5 Further Properties of the Lasso
3.8.6 Pathwise Coordinate Optimization
3.9 Computational Considerations
Bibliographic Notes
Exercises
4 Linear Methods for Classification
4.1 Introduction
4.2 Linear Regression of an Indicator Matrix
4.3 Linear Discriminant Analysis
4.3.1 Regularized Discriminant Analysis
4.3.2 Computations for LDA
4.3.3 Reduced-Rank Linear Discriminant Analysis
4.4 Logistic Regression
4.4.1 Fitting Logistic Regression Models
4.4.2 Example: South African Heart Disease
4.4.3 Quadratic Approximations and Inference
4.4.4 L1 Regularized Logistic Regression
4.4.5 Logistic Regression or LDA?
4.5 Separating Hyperplanes
4.5.1 Rosenblatt’s Perceptron Learning Algorithm .
4.5.2 Optimal Separating Hyperplanes
Bibliographic Notes
Exercises
5 Basis Expansions and Regularization
5.1 Introduction
5.2 Piecewise Polynomials and Splines
5.2.1 Natural Cubic Splines
5.2.2 Example: South African Heart Disease (Continued)
5.2.3 Example: Phoneme Recognition
5.3 Filtering and Feature Extraction
5.4 Smoothing Splines
5.4.1 Degrees of Freedom and Smoother Matrices
5.5 Automatic Selection of the Smoothing Parameters
5.5.1 Fixing the Degrees of Freedom
5.5.2 The Bias–Variance Tradeoff
5.6 Nonparametric Logistic Regression
5.7 Multidimensional Splines
5.8 Regularization and Reproducing Kernel Hilbert Spaces
5.8.1 Spaces of Functions Generated by Kernels
5.8.2 Examples of RKHS
5.9 Wavelet Smoothing
5.9.1 Wavelet Bases and the Wavelet Transform
5.9.2 Adaptive Wavelet Filtering
Bibliographic Notes
Exercises
Appendix: Computational Considerations for Splines
Appendix: B-splines
Appendix: Computations for Smoothing Splines
6 Kernel Smoothing Methods
6.1 One-Dimensional Kernel Smoothers
6.1.1 Local Linear Regression
6.1.2 Local Polynomial Regression
6.2 Selecting the Width of the Kernel
6.3 Local Regression in IRp
6.4 Structured Local Regression Models in IRp
6.4.1 Structured Kernels
6.4.2 Structured Regression Functions
6.5 Local Likelihood and Other Models
6.6 Kernel Density Estimation and Classification
6.6.1 Kernel Density Estimation
6.6.2 Kernel Density Classification
6.6.3 The Naive Bayes Classifier
6.7 Radial Basis Functions and Kernels
6.8 Mixture Models for Density Estimation and Classification
6.9 Computational Considerations
Bibliographic Notes
Exercises
7 Model Assessment and Selection
7.1 Introduction
7.2 Bias, Variance and Model Complexity
7.3 The Bias–Variance Decomposition 223
7.3.1 Example: Bias–Variance Tradeoff
7.4 Optimism of the Training Error Rate
7.5 Estimates of In-Sample Prediction Error
7.6 The Effective Number of Parameters
7.7 The Bayesian Approach and BIC
7.8 Minimum Description Length
7.9 Vapnik–Chervonenkis Dimension
7.9.1 Example (Continued)
7.10 Cross-Validation
7.10.1 K-Fold Cross-Validation
7.10.2 The Wrong and Right Way
to Do Cross-validation
7.10.3 Does Cross-Validation Really Work?
7.11 Bootstrap Methods
7.11.1 Example (Continued)
7.12 Conditional or Expected Test Error?
Bibliographic Notes
Exercises
8 Model Inference and Averaging
8.1 Introduction
8.2 The Bootstrap and Maximum Likelihood Methods
8.2.1 A Smoothing Example
8.2.2 Maximum Likelihood Inference
8.2.3 Bootstrap versus Maximum Likelihood
8.3 Bayesian Methods
8.4 Relationship Between the Bootstrap
and Bayesian Inference
8.5 The EM Algorithm
8.5.1 Two-Component Mixture Model
8.5.2 The EM Algorithm in General
8.5.3 EM as a Maximization–Maximization Procedure
8.6 MCMC for Sampling from the Posterior
8.7 Bagging
8.7.1 Example: Trees with Simulated Data
8.8 Model Averaging and Stacking
8.9 Stochastic Search: Bumping
Bibliographic Notes
Exercises
9 Additive Models, Trees, and Related Methods
9.1 Generalized Additive Models
9.1.1 Fitting Additive Models
9.1.2 Example: Additive Logistic Regression
9.1.3 Summary
9.2 Tree-Based Methods
9.2.1 Background
9.2.2 Regression Trees
9.2.3 Classification Trees
9.2.4 Other Issues
9.2.5 Spam Example (Continued)
9.3 PRIM: Bump Hunting
9.3.1 Spam Example (Continued)
9.4 MARS: Multivariate Adaptive Regression Splines
9.4.1 Spam Example (Continued)
9.4.2 Example (Simulated Data)
9.4.3 Other Issues
9.5 Hierarchical Mixtures of Experts
9.6 Missing Data
9.7 Computational Considerations
Bibliographic Notes
Exercises
10 Boosting and Additive Trees
10.1 Boosting Methods
10.1.1 Outline of This Chapter
10.2 Boosting Fits an Additive Model
10.3 Forward Stagewise Additive Modeling
10.4 Exponential Loss and AdaBoost
10.5 Why Exponential Loss?
10.6 Loss Functions and Robustness
10.7 “Off-the-Shelf” Procedures for Data Mining
10.8 Example: Spam Data
10.9 Boosting Trees
10.10 Numerical Optimization via Gradient Boosting
10.10.1 Steepest Descent
10.10.2 Gradient Boosting
10.10.3 Implementations of Gradient Boosting
10.11 Right-Sized Trees for Boosting
10.12 Regularization
10.12.1 Shrinkage
10.12.2 Subsampling
10.13 Interpretation
10.13.1 Relative Importance of Predictor Variables
10.13.2 Partial Dependence Plots
10.14 Illustrations
10.14.1 California Housing
10.14.2 New Zealand Fish
10.14.3 Demographics Data
Bibliographic Notes
Exercises
11 Neural Networks
11.1 Introduction
11.2 Projection Pursuit Regression
11.3 Neural Networks
11.4 Fitting Neural Networks
11.5 Some Issues in Training Neural Networks
11.5.1 Starting Values
11.5.2 Overfitting
11.5.3 Scaling of the Inputs
11.5.4 Number of Hidden Units and Layers
11.5.5 Multiple Minima
11.6 Example: Simulated Data
11.7 Example: ZIP Code Data
11.8 Discussion
11.9 Bayesian Neural Nets and the NIPS 2003 Challenge
11.9.1 Bayes, Boosting and Bagging
11.9.2 Performance Comparisons
11.10 Computational Considerations
Bibliographic Notes
Exercises
12 Support Vector Machines and
Flexible Discriminants
12.1 Introduction
12.2 The Support Vector Classifier
12.2.1 Computing the Support Vector Classifier
12.2.2 Mixture Example (Continued)
12.3 Support Vector Machines and Kernels
12.3.1 Computing the SVM for Classification
12.3.2 The SVM as a Penalization Method
12.3.3 Function Estimation and Reproducing Kernels
12.3.4 SVMs and the Curse of Dimensionality
12.3.5 A Path Algorithm for the SVM Classifier
12.3.6 Support Vector Machines for Regression
12.3.7 Regression and Kernels
12.3.8 Discussion
12.4 Generalizing Linear Discriminant Analysis
12.5 Flexible Discriminant Analysis
12.5.1 Computing the FDA Estimates
12.6 Penalized Discriminant Analysis
12.7 Mixture Discriminant Analysis
12.7.1 Example: Waveform Data
Bibliographic Notes
Exercises
13 Prototype Methods and Nearest-Neighbors
13.1 Introduction
13.2 Prototype Methods
13.2.1 K-means Clustering
13.2.2 Learning Vector Quantization
13.2.3 Gaussian Mixtures
13.3 k-Nearest-Neighbor Classifiers
13.3.1 Example: A Comparative Study
13.3.2 Example: k-Nearest-Neighbors
and Image Scene Classification
13.3.3 Invariant Metrics and Tangent Distance
13.4 Adaptive Nearest-Neighbor Methods
13.4.1 Example
13.4.2 Global Dimension Reduction
for Nearest-Neighbors
13.5 Computational Considerations
Bibliographic Notes
Exercises
14 Unsupervised Learning
14.1 Introduction
14.2 Association Rules
14.2.1 Market Basket Analysis
14.2.2 The Apriori Algorithm
14.2.3 Example: Market Basket Analysis
14.2.4 Unsupervised as Supervised Learning
14.2.5 Generalized Association Rules
14.2.6 Choice of Supervised Learning Method
14.2.7 Example: Market Basket Analysis (Continued)
14.3 Cluster Analysis
14.3.1 Proximity Matrices
14.3.2 Dissimilarities Based on Attributes
14.3.3 Object Dissimilarity
14.3.4 Clustering Algorithms
14.3.5 Combinatorial Algorithms
14.3.6 K-means
14.3.7 Gaussian Mixtures as Soft K-means Clustering
14.3.8 Example: Human Tumor Microarray Data
14.3.9 Vector Quantization
14.3.10 K-medoids
14.3.11 Practical Issues
14.3.12 Hierarchical Clustering
14.4 Self-Organizing Maps
14.5 Principal Components, Curves and Surfaces
14.5.1 Principal Components
14.5.2 Principal Curves and Surfaces
14.5.3 Spectral Clustering
14.5.4 Kernel Principal Components
14.5.5 Sparse Principal Components
14.6 Non-negative Matrix Factorization
14.6.1 Archetypal Analysis
14.7 Independent Component Analysis
and Exploratory Projection Pursuit
14.7.1 Latent Variables and Factor Analysis
14.7.2 Independent Component Analysis
14.7.3 Exploratory Projection Pursuit
14.7.4 A Direct Approach to ICA
14.8 Multidimensional Scaling
14.9 Nonlinear Dimension Reduction
and Local Multidimensional Scaling
14.10 The Google PageRank Algorithm
Bibliographic Notes
Exercises
15 Random Forests
15.1 Introduction
15.2 Definition of Random Forests
15.3 Details of Random Forests
15.3.1 Out of Bag Samples
15.3.2 Variable Importance
15.3.3 Proximity Plots
15.3.4 Random Forests and Overfitting
15.4 Analysis of Random Forests
15.4.1 Variance and the De-Correlation Effect
15.4.2 Bias
15.4.3 Adaptive Nearest Neighbors
Bibliographic Notes
Exercises
16 Ensemble Learning
16.1 Introduction
16.2 Boosting and Regularization Paths
16.2.1 Penalized Regression
16.2.2 The “Bet on Sparsity” Principle
16.2.3 Regularization Paths, Over-fitting and Margins
16.3 Learning Ensembles
16.3.1 Learning a Good Ensemble
16.3.2 Rule Ensembles
Bibliographic Notes
Exercises
17 Undirected Graphical Models
17.1 Introduction
17.2 Markov Graphs and Their Properties
17.3 Undirected Graphical Models for Continuous Variables
17.3.1 Estimation of the Parameters
when the Graph Structure is Known
17.3.2 Estimation of the Graph Structure
17.4 Undirected Graphical Models for Discrete Variables
17.4.1 Estimation of the Parameters
when the Graph Structure is Known
17.4.2 Hidden Nodes
17.4.3 Estimation of the Graph Structure
17.4.4 Restricted Boltzmann Machines
Exercises
18 High-Dimensional Problems: p ≫ N
18.1 When p is Much Bigger than N
18.2 Diagonal Linear Discriminant Analysis
and Nearest Shrunken Centroids
18.3 Linear Classifiers with Quadratic Regularization
18.3.1 Regularized Discriminant Analysis
18.3.2 Logistic Regression
with Quadratic Regularization
18.3.3 The Support Vector Classifier
18.3.4 Feature Selection
18.3.5 Computational Shortcuts When p ≫ N
18.4 Linear Classifiers with L1 Regularization
18.4.1 Application of Lasso
to Protein Mass Spectroscopy
18.4.2 The Fused Lasso for Functional Data
18.5 Classification When Features are Unavailable
18.5.1 Example: String Kernels
and Protein Classification
18.5.2 Classification and Other Models Using
Inner-Product Kernels and Pairwise Distances .
18.5.3 Example: Abstracts Classification
18.6 High-Dimensional Regression: Supervised Principal Components
18.6.1 Connection to Latent-Variable Modeling
18.6.2 Relationship with Partial Least Squares
18.6.3 Pre-Conditioning for Feature Selection
18.7 Feature Assessment and the Multiple-Testing Problem
18.7.1 The False Discovery Rate
18.7.2 Asymmetric Cutpoints and the SAM Procedure
18.7.3 A Bayesian Interpretation of the FDR
18.8 Bibliographic Notes
Exercises
《老年大学摄影进阶教程》内容简介:本书是摄影家张恣宽总结四十多年创作经验和十多年老年大学教学成果,为有一定基础的老年大学学
《Spark GraphX 实战》内容简介:本书是一本Spark GraphX入门书籍。前5章为基础内容,即使读者对Spark、GraphX、Scala不熟悉,...
本书将与知识点、重点、难点和考点有关的典型题做全析全解,是具有解题题典性质的助学读物。但本书又优于解题题典,不仅展示解题
杭间,1961年生于浙江义乌,文学博士,清华大学美术学院副院长,教授,博士生导师。毕业于中央工艺美术学院艺术史论系,1996年至2
《C++沉思录》集中反映了C++的关键思想和编程技术,不仅告诉你如何编程,还告诉你为什么要这样编程。本书曾出现在众多的C++专家推
《每晚睡前原谅所有的人和事》内容简介:每晚入睡前,想着一天的劳累和辛苦,想着过往几十年的辛酸与不易,想着那些我们曾经错过和
《高盛帝国(上)》内容简介:《高盛帝国》(下)请移步:http://read.douban.com/ebook/262238/ 《高盛帝国》一书充分展现了为高盛
《浙东唐诗之路学术文化编年史》内容简介:浙东,自晋代起渐成为人文荟萃之地,源远流长的山水诗在此滋生,与之有连带关系的书法、
《轻松看懂体检报告(第二版)》内容简介:本书分为三篇,首篇向广大读者介绍健康体检的常规和基础知识。第二篇针对成人体检,按照
《别让心态毁了你》内容简介:保持什么样的心态就会有什么样的行为方式,而行为方式决定着一个人的生人走向。心态能够成就一个人,
《只愿你曾被这世界温柔相待》内容简介:“在冷酷和粗暴的世界里,用自己温柔的方式生存下去的人,是了不起的人。”这是一本关乎内
《全彩中国史(下册)》内容简介:《中国史(全彩)(图文全解)(套装上下册)》脉络清晰,以历史批判精神,描绘和评述了从盘古开天辟地
《物联网的机遇与利用》内容简介:物联网,也被称为继计算机、互联网之后世界信息产业发展的第三次浪潮。物联网是云、管、端一体化
《清末白话报刊与文学革命》内容简介:五四文学革命的发生,是清末以来数十年时间中,社会环境和文学自身不断发展演化的结果。清末
《刑事程序中的透明性与正当性》内容简介:来自英美法系的对抗制理念在我国风靡一时。关于何谓“正当程序”,一些对抗制教条逐渐演
《60问读懂《道德经》》内容简介:老子的《道德经》被誉为“万经”,是道家、道教的优选经典。从古到今,注解它的书层出不穷,不同的
《瑞士银行秘密》内容简介:银行业变革一直是近些年的热点话题。世界很大的离岸金融中心瑞士银行以拥有数百年的保密传统而享有盛誉
LightweightDirectoryAccessProtocol(LDAP)isthestandardfordirectoryinformationacce...
在今天不可持续的商品社会,人们购买到渴望已久的商品,只是短暂的使用,就迅速地丢弃然后又去寻求新的,消耗和浪费疯长,进而失
InthetraditionofWhoOwnstheFuture?andTheSecondMachineAge,anMITMediaLabscientistim...