作者:《The Elements of Statistical Learning》书籍
出版社:Springer
出版年:December 2008
评分:9.4
ISBN:9780387848570
所属分类:网络科技
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for "wide" data (p bigger than n), including multiple testing and false discovery rates.
1 Introduction
2 Overview of Supervised Learning
2.1 Introduction
2.2 Variable Types and Terminology
2.3 Two Simple Approaches to Prediction:
Least Squares and Nearest Neighbors
2.3.1 Linear Models and Least Squares
2.3.2 Nearest-Neighbor Methods
2.3.3 From Least Squares to Nearest Neighbors
2.4 Statistical Decision Theory
2.5 Local Methods in High Dimensions
2.6 Statistical Models, Supervised Learning
and Function Approximation
2.6.1 A Statistical Model
for the Joint Distribution Pr(X, Y )
2.6.2 Supervised Learning
2.6.3 Function Approximation
2.7 Structured Regression Models
2.7.1 Difficulty of the Problem
2.8 Classes of Restricted Estimators
2.8.1 Roughness Penalty and Bayesian Methods
2.8.2 Kernel Methods and Local Regression
2.8.3 Basis Functions and Dictionary Methods
2.9 Model Selection and the Bias–Variance Tradeoff
Bibliographic Notes
Exercises
3 Linear Methods for Regression
3.1 Introduction
3.2 Linear Regression Models and Least Squares
3.2.1 Example: Prostate Cancer
3.2.2 The Gauss–Markov Theorem
3.2.3 Multiple Regression
from Simple Univariate Regression
3.2.4 Multiple Outputs
3.3 Subset Selection
3.3.1 Best-Subset Selection
3.3.2 Forward- and Backward-Stepwise Selection
3.3.3 Forward-Stagewise Regression
3.3.4 Prostate Cancer Data Example (Continued)
3.4 Shrinkage Methods
3.4.1 Ridge Regression
3.4.2 The Lasso
3.4.3 Discussion: Subset Selection, Ridge Regression
and the Lasso
3.4.4 Least Angle Regression
3.5 Methods Using Derived Input Directions
3.5.1 Principal Components Regression
3.5.2 Partial Least Squares
3.6 Discussion: A Comparison of the Selection
and Shrinkage Methods
3.7 Multiple Outcome Shrinkage and Selection
3.8 More on the Lasso and Related Path Algorithms
3.8.1 Incremental Forward Stagewise Regression
3.8.2 Piecewise-Linear Path Algorithms
3.8.3 The Dantzig Selector
3.8.4 The Grouped Lasso
3.8.5 Further Properties of the Lasso
3.8.6 Pathwise Coordinate Optimization
3.9 Computational Considerations
Bibliographic Notes
Exercises
4 Linear Methods for Classification
4.1 Introduction
4.2 Linear Regression of an Indicator Matrix
4.3 Linear Discriminant Analysis
4.3.1 Regularized Discriminant Analysis
4.3.2 Computations for LDA
4.3.3 Reduced-Rank Linear Discriminant Analysis
4.4 Logistic Regression
4.4.1 Fitting Logistic Regression Models
4.4.2 Example: South African Heart Disease
4.4.3 Quadratic Approximations and Inference
4.4.4 L1 Regularized Logistic Regression
4.4.5 Logistic Regression or LDA?
4.5 Separating Hyperplanes
4.5.1 Rosenblatt’s Perceptron Learning Algorithm .
4.5.2 Optimal Separating Hyperplanes
Bibliographic Notes
Exercises
5 Basis Expansions and Regularization
5.1 Introduction
5.2 Piecewise Polynomials and Splines
5.2.1 Natural Cubic Splines
5.2.2 Example: South African Heart Disease (Continued)
5.2.3 Example: Phoneme Recognition
5.3 Filtering and Feature Extraction
5.4 Smoothing Splines
5.4.1 Degrees of Freedom and Smoother Matrices
5.5 Automatic Selection of the Smoothing Parameters
5.5.1 Fixing the Degrees of Freedom
5.5.2 The Bias–Variance Tradeoff
5.6 Nonparametric Logistic Regression
5.7 Multidimensional Splines
5.8 Regularization and Reproducing Kernel Hilbert Spaces
5.8.1 Spaces of Functions Generated by Kernels
5.8.2 Examples of RKHS
5.9 Wavelet Smoothing
5.9.1 Wavelet Bases and the Wavelet Transform
5.9.2 Adaptive Wavelet Filtering
Bibliographic Notes
Exercises
Appendix: Computational Considerations for Splines
Appendix: B-splines
Appendix: Computations for Smoothing Splines
6 Kernel Smoothing Methods
6.1 One-Dimensional Kernel Smoothers
6.1.1 Local Linear Regression
6.1.2 Local Polynomial Regression
6.2 Selecting the Width of the Kernel
6.3 Local Regression in IRp
6.4 Structured Local Regression Models in IRp
6.4.1 Structured Kernels
6.4.2 Structured Regression Functions
6.5 Local Likelihood and Other Models
6.6 Kernel Density Estimation and Classification
6.6.1 Kernel Density Estimation
6.6.2 Kernel Density Classification
6.6.3 The Naive Bayes Classifier
6.7 Radial Basis Functions and Kernels
6.8 Mixture Models for Density Estimation and Classification
6.9 Computational Considerations
Bibliographic Notes
Exercises
7 Model Assessment and Selection
7.1 Introduction
7.2 Bias, Variance and Model Complexity
7.3 The Bias–Variance Decomposition 223
7.3.1 Example: Bias–Variance Tradeoff
7.4 Optimism of the Training Error Rate
7.5 Estimates of In-Sample Prediction Error
7.6 The Effective Number of Parameters
7.7 The Bayesian Approach and BIC
7.8 Minimum Description Length
7.9 Vapnik–Chervonenkis Dimension
7.9.1 Example (Continued)
7.10 Cross-Validation
7.10.1 K-Fold Cross-Validation
7.10.2 The Wrong and Right Way
to Do Cross-validation
7.10.3 Does Cross-Validation Really Work?
7.11 Bootstrap Methods
7.11.1 Example (Continued)
7.12 Conditional or Expected Test Error?
Bibliographic Notes
Exercises
8 Model Inference and Averaging
8.1 Introduction
8.2 The Bootstrap and Maximum Likelihood Methods
8.2.1 A Smoothing Example
8.2.2 Maximum Likelihood Inference
8.2.3 Bootstrap versus Maximum Likelihood
8.3 Bayesian Methods
8.4 Relationship Between the Bootstrap
and Bayesian Inference
8.5 The EM Algorithm
8.5.1 Two-Component Mixture Model
8.5.2 The EM Algorithm in General
8.5.3 EM as a Maximization–Maximization Procedure
8.6 MCMC for Sampling from the Posterior
8.7 Bagging
8.7.1 Example: Trees with Simulated Data
8.8 Model Averaging and Stacking
8.9 Stochastic Search: Bumping
Bibliographic Notes
Exercises
9 Additive Models, Trees, and Related Methods
9.1 Generalized Additive Models
9.1.1 Fitting Additive Models
9.1.2 Example: Additive Logistic Regression
9.1.3 Summary
9.2 Tree-Based Methods
9.2.1 Background
9.2.2 Regression Trees
9.2.3 Classification Trees
9.2.4 Other Issues
9.2.5 Spam Example (Continued)
9.3 PRIM: Bump Hunting
9.3.1 Spam Example (Continued)
9.4 MARS: Multivariate Adaptive Regression Splines
9.4.1 Spam Example (Continued)
9.4.2 Example (Simulated Data)
9.4.3 Other Issues
9.5 Hierarchical Mixtures of Experts
9.6 Missing Data
9.7 Computational Considerations
Bibliographic Notes
Exercises
10 Boosting and Additive Trees
10.1 Boosting Methods
10.1.1 Outline of This Chapter
10.2 Boosting Fits an Additive Model
10.3 Forward Stagewise Additive Modeling
10.4 Exponential Loss and AdaBoost
10.5 Why Exponential Loss?
10.6 Loss Functions and Robustness
10.7 “Off-the-Shelf” Procedures for Data Mining
10.8 Example: Spam Data
10.9 Boosting Trees
10.10 Numerical Optimization via Gradient Boosting
10.10.1 Steepest Descent
10.10.2 Gradient Boosting
10.10.3 Implementations of Gradient Boosting
10.11 Right-Sized Trees for Boosting
10.12 Regularization
10.12.1 Shrinkage
10.12.2 Subsampling
10.13 Interpretation
10.13.1 Relative Importance of Predictor Variables
10.13.2 Partial Dependence Plots
10.14 Illustrations
10.14.1 California Housing
10.14.2 New Zealand Fish
10.14.3 Demographics Data
Bibliographic Notes
Exercises
11 Neural Networks
11.1 Introduction
11.2 Projection Pursuit Regression
11.3 Neural Networks
11.4 Fitting Neural Networks
11.5 Some Issues in Training Neural Networks
11.5.1 Starting Values
11.5.2 Overfitting
11.5.3 Scaling of the Inputs
11.5.4 Number of Hidden Units and Layers
11.5.5 Multiple Minima
11.6 Example: Simulated Data
11.7 Example: ZIP Code Data
11.8 Discussion
11.9 Bayesian Neural Nets and the NIPS 2003 Challenge
11.9.1 Bayes, Boosting and Bagging
11.9.2 Performance Comparisons
11.10 Computational Considerations
Bibliographic Notes
Exercises
12 Support Vector Machines and
Flexible Discriminants
12.1 Introduction
12.2 The Support Vector Classifier
12.2.1 Computing the Support Vector Classifier
12.2.2 Mixture Example (Continued)
12.3 Support Vector Machines and Kernels
12.3.1 Computing the SVM for Classification
12.3.2 The SVM as a Penalization Method
12.3.3 Function Estimation and Reproducing Kernels
12.3.4 SVMs and the Curse of Dimensionality
12.3.5 A Path Algorithm for the SVM Classifier
12.3.6 Support Vector Machines for Regression
12.3.7 Regression and Kernels
12.3.8 Discussion
12.4 Generalizing Linear Discriminant Analysis
12.5 Flexible Discriminant Analysis
12.5.1 Computing the FDA Estimates
12.6 Penalized Discriminant Analysis
12.7 Mixture Discriminant Analysis
12.7.1 Example: Waveform Data
Bibliographic Notes
Exercises
13 Prototype Methods and Nearest-Neighbors
13.1 Introduction
13.2 Prototype Methods
13.2.1 K-means Clustering
13.2.2 Learning Vector Quantization
13.2.3 Gaussian Mixtures
13.3 k-Nearest-Neighbor Classifiers
13.3.1 Example: A Comparative Study
13.3.2 Example: k-Nearest-Neighbors
and Image Scene Classification
13.3.3 Invariant Metrics and Tangent Distance
13.4 Adaptive Nearest-Neighbor Methods
13.4.1 Example
13.4.2 Global Dimension Reduction
for Nearest-Neighbors
13.5 Computational Considerations
Bibliographic Notes
Exercises
14 Unsupervised Learning
14.1 Introduction
14.2 Association Rules
14.2.1 Market Basket Analysis
14.2.2 The Apriori Algorithm
14.2.3 Example: Market Basket Analysis
14.2.4 Unsupervised as Supervised Learning
14.2.5 Generalized Association Rules
14.2.6 Choice of Supervised Learning Method
14.2.7 Example: Market Basket Analysis (Continued)
14.3 Cluster Analysis
14.3.1 Proximity Matrices
14.3.2 Dissimilarities Based on Attributes
14.3.3 Object Dissimilarity
14.3.4 Clustering Algorithms
14.3.5 Combinatorial Algorithms
14.3.6 K-means
14.3.7 Gaussian Mixtures as Soft K-means Clustering
14.3.8 Example: Human Tumor Microarray Data
14.3.9 Vector Quantization
14.3.10 K-medoids
14.3.11 Practical Issues
14.3.12 Hierarchical Clustering
14.4 Self-Organizing Maps
14.5 Principal Components, Curves and Surfaces
14.5.1 Principal Components
14.5.2 Principal Curves and Surfaces
14.5.3 Spectral Clustering
14.5.4 Kernel Principal Components
14.5.5 Sparse Principal Components
14.6 Non-negative Matrix Factorization
14.6.1 Archetypal Analysis
14.7 Independent Component Analysis
and Exploratory Projection Pursuit
14.7.1 Latent Variables and Factor Analysis
14.7.2 Independent Component Analysis
14.7.3 Exploratory Projection Pursuit
14.7.4 A Direct Approach to ICA
14.8 Multidimensional Scaling
14.9 Nonlinear Dimension Reduction
and Local Multidimensional Scaling
14.10 The Google PageRank Algorithm
Bibliographic Notes
Exercises
15 Random Forests
15.1 Introduction
15.2 Definition of Random Forests
15.3 Details of Random Forests
15.3.1 Out of Bag Samples
15.3.2 Variable Importance
15.3.3 Proximity Plots
15.3.4 Random Forests and Overfitting
15.4 Analysis of Random Forests
15.4.1 Variance and the De-Correlation Effect
15.4.2 Bias
15.4.3 Adaptive Nearest Neighbors
Bibliographic Notes
Exercises
16 Ensemble Learning
16.1 Introduction
16.2 Boosting and Regularization Paths
16.2.1 Penalized Regression
16.2.2 The “Bet on Sparsity” Principle
16.2.3 Regularization Paths, Over-fitting and Margins
16.3 Learning Ensembles
16.3.1 Learning a Good Ensemble
16.3.2 Rule Ensembles
Bibliographic Notes
Exercises
17 Undirected Graphical Models
17.1 Introduction
17.2 Markov Graphs and Their Properties
17.3 Undirected Graphical Models for Continuous Variables
17.3.1 Estimation of the Parameters
when the Graph Structure is Known
17.3.2 Estimation of the Graph Structure
17.4 Undirected Graphical Models for Discrete Variables
17.4.1 Estimation of the Parameters
when the Graph Structure is Known
17.4.2 Hidden Nodes
17.4.3 Estimation of the Graph Structure
17.4.4 Restricted Boltzmann Machines
Exercises
18 High-Dimensional Problems: p ≫ N
18.1 When p is Much Bigger than N
18.2 Diagonal Linear Discriminant Analysis
and Nearest Shrunken Centroids
18.3 Linear Classifiers with Quadratic Regularization
18.3.1 Regularized Discriminant Analysis
18.3.2 Logistic Regression
with Quadratic Regularization
18.3.3 The Support Vector Classifier
18.3.4 Feature Selection
18.3.5 Computational Shortcuts When p ≫ N
18.4 Linear Classifiers with L1 Regularization
18.4.1 Application of Lasso
to Protein Mass Spectroscopy
18.4.2 The Fused Lasso for Functional Data
18.5 Classification When Features are Unavailable
18.5.1 Example: String Kernels
and Protein Classification
18.5.2 Classification and Other Models Using
Inner-Product Kernels and Pairwise Distances .
18.5.3 Example: Abstracts Classification
18.6 High-Dimensional Regression: Supervised Principal Components
18.6.1 Connection to Latent-Variable Modeling
18.6.2 Relationship with Partial Least Squares
18.6.3 Pre-Conditioning for Feature Selection
18.7 Feature Assessment and the Multiple-Testing Problem
18.7.1 The False Discovery Rate
18.7.2 Asymmetric Cutpoints and the SAM Procedure
18.7.3 A Bayesian Interpretation of the FDR
18.8 Bibliographic Notes
Exercises
《Python编程零基础入门》内容简介:本书是一本真正地从零开始讲解Python编程的图书,它旨在让零基础读者较快地掌握编程知识,并能
《数码摄影后期高手之路》内容简介:本书是国内知名的图形图像专家——李涛多年在摄影后期教学实践的总结。他通过拍摄及后期操作前
帕科•昂德希尔是美国著名的消费行为学研究专家,被《旧金山纪事》盛赞为“零售业的福尔摩斯”。他带领自己的调查小组,在购物中心
网站运维工作,一向以内容繁杂、覆盖面广著称。《网站运维技术与实践》选取日常工作涉及的监测调优、日志分析、集群规划、自动化
传染病动力学优化算法及其应用 本书特色 《传染病动力学优化算法及其应用》系统介绍了依据传染病动力学原理构造出来的一系列新型复杂场景群智能优化算法,即传染病动力学...
编程珠玑-第2版-修订版 本书特色 本书是计算机科学方面的经典名著。书的内容围绕程序设计人员面对的一系列实际问题展开。作者jon bentley 以其独有的洞察...
《马克思主义中国化进程中经典著作编译与传播研究(1949—1978)》内容简介:新中国成立70年以来,马克思主义经典著作在中国的编译
《生命的火花》内容简介:☆《《西线无战事》作者雷马克创作巅峰期的泣血之作,照亮被残酷时代吞没的所有无名者。☆希望的星火在令
《苹果的味道:iPad商务应用每一天》是一本介绍iPad商务应用的书。编者从AppStore丰富的应用程序库中选取了MobileRSS、PocketInfo
《系统工程引论(第4版)》内容简介:本书是普通高等教育“十五”国家级规划教材,是教育部招标确定的系统工程教材。第1版于2004年
《Android开发精要》内容简介:本书如何才能写出贴近Android设计理念、能够更加高效和可靠运行的Android应用?通过Android的源代码
佐藤卓,日本知名的平面設計大師、策展人,他設計的作品總是在簡約中散發出一股獨特的黏力,能在市場上屹立數十年不倒,不僅成為
Geneticalgorithmsareplayinganincreasinglyimportantroleinstudiesofcomplexadaptive...
《TRIZ:产品创新设计》内容简介:开发出有竞争力的产品,是制造业企业提高自身竞争优势的重要保障因素。在模糊前端和新产品开发、
《所思远道:两周卷》内容简介:《所思远道:两周卷》全书通过对我国两周时期大量文明遗产(包括各类工艺品、美术作品、音乐舞蹈作
本书从命题逻辑、谓词逻辑、模态逻辑和计算机逻辑等不同角度,对隐喻现象在在思维和语言表达中的应用做了多角度、多层次的深入分
魏玛时期的包豪斯,是20世纪最重要的建筑、设计及艺术学校之一,莫霍利-纳吉是其个性鲜明的教员团队中的一员。他不仅是欧洲前卫运
在这本书里,建筑与写作是两条永远平行但并不互相解释的线索,全部含义都指向:通过某种建造房子的活动,一个人如何实现自觉,从
本书共由三章组成,第一章“网络安全篇”介绍了网络安全的概念、网络安全的问题及对策、漏洞资料。第二章“黑客揭密篇”介绍了黑
MuchhaschangedinthetenyearssincethepublicationofthefirsteditionoftheHandbookofHu...