作者:《The Elements of Statistical Learning》书籍
出版社:Springer
出版年:December 2008
评分:9.4
ISBN:9780387848570
所属分类:网络科技
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for "wide" data (p bigger than n), including multiple testing and false discovery rates.
1 Introduction
2 Overview of Supervised Learning
2.1 Introduction
2.2 Variable Types and Terminology
2.3 Two Simple Approaches to Prediction:
Least Squares and Nearest Neighbors
2.3.1 Linear Models and Least Squares
2.3.2 Nearest-Neighbor Methods
2.3.3 From Least Squares to Nearest Neighbors
2.4 Statistical Decision Theory
2.5 Local Methods in High Dimensions
2.6 Statistical Models, Supervised Learning
and Function Approximation
2.6.1 A Statistical Model
for the Joint Distribution Pr(X, Y )
2.6.2 Supervised Learning
2.6.3 Function Approximation
2.7 Structured Regression Models
2.7.1 Difficulty of the Problem
2.8 Classes of Restricted Estimators
2.8.1 Roughness Penalty and Bayesian Methods
2.8.2 Kernel Methods and Local Regression
2.8.3 Basis Functions and Dictionary Methods
2.9 Model Selection and the Bias–Variance Tradeoff
Bibliographic Notes
Exercises
3 Linear Methods for Regression
3.1 Introduction
3.2 Linear Regression Models and Least Squares
3.2.1 Example: Prostate Cancer
3.2.2 The Gauss–Markov Theorem
3.2.3 Multiple Regression
from Simple Univariate Regression
3.2.4 Multiple Outputs
3.3 Subset Selection
3.3.1 Best-Subset Selection
3.3.2 Forward- and Backward-Stepwise Selection
3.3.3 Forward-Stagewise Regression
3.3.4 Prostate Cancer Data Example (Continued)
3.4 Shrinkage Methods
3.4.1 Ridge Regression
3.4.2 The Lasso
3.4.3 Discussion: Subset Selection, Ridge Regression
and the Lasso
3.4.4 Least Angle Regression
3.5 Methods Using Derived Input Directions
3.5.1 Principal Components Regression
3.5.2 Partial Least Squares
3.6 Discussion: A Comparison of the Selection
and Shrinkage Methods
3.7 Multiple Outcome Shrinkage and Selection
3.8 More on the Lasso and Related Path Algorithms
3.8.1 Incremental Forward Stagewise Regression
3.8.2 Piecewise-Linear Path Algorithms
3.8.3 The Dantzig Selector
3.8.4 The Grouped Lasso
3.8.5 Further Properties of the Lasso
3.8.6 Pathwise Coordinate Optimization
3.9 Computational Considerations
Bibliographic Notes
Exercises
4 Linear Methods for Classification
4.1 Introduction
4.2 Linear Regression of an Indicator Matrix
4.3 Linear Discriminant Analysis
4.3.1 Regularized Discriminant Analysis
4.3.2 Computations for LDA
4.3.3 Reduced-Rank Linear Discriminant Analysis
4.4 Logistic Regression
4.4.1 Fitting Logistic Regression Models
4.4.2 Example: South African Heart Disease
4.4.3 Quadratic Approximations and Inference
4.4.4 L1 Regularized Logistic Regression
4.4.5 Logistic Regression or LDA?
4.5 Separating Hyperplanes
4.5.1 Rosenblatt’s Perceptron Learning Algorithm .
4.5.2 Optimal Separating Hyperplanes
Bibliographic Notes
Exercises
5 Basis Expansions and Regularization
5.1 Introduction
5.2 Piecewise Polynomials and Splines
5.2.1 Natural Cubic Splines
5.2.2 Example: South African Heart Disease (Continued)
5.2.3 Example: Phoneme Recognition
5.3 Filtering and Feature Extraction
5.4 Smoothing Splines
5.4.1 Degrees of Freedom and Smoother Matrices
5.5 Automatic Selection of the Smoothing Parameters
5.5.1 Fixing the Degrees of Freedom
5.5.2 The Bias–Variance Tradeoff
5.6 Nonparametric Logistic Regression
5.7 Multidimensional Splines
5.8 Regularization and Reproducing Kernel Hilbert Spaces
5.8.1 Spaces of Functions Generated by Kernels
5.8.2 Examples of RKHS
5.9 Wavelet Smoothing
5.9.1 Wavelet Bases and the Wavelet Transform
5.9.2 Adaptive Wavelet Filtering
Bibliographic Notes
Exercises
Appendix: Computational Considerations for Splines
Appendix: B-splines
Appendix: Computations for Smoothing Splines
6 Kernel Smoothing Methods
6.1 One-Dimensional Kernel Smoothers
6.1.1 Local Linear Regression
6.1.2 Local Polynomial Regression
6.2 Selecting the Width of the Kernel
6.3 Local Regression in IRp
6.4 Structured Local Regression Models in IRp
6.4.1 Structured Kernels
6.4.2 Structured Regression Functions
6.5 Local Likelihood and Other Models
6.6 Kernel Density Estimation and Classification
6.6.1 Kernel Density Estimation
6.6.2 Kernel Density Classification
6.6.3 The Naive Bayes Classifier
6.7 Radial Basis Functions and Kernels
6.8 Mixture Models for Density Estimation and Classification
6.9 Computational Considerations
Bibliographic Notes
Exercises
7 Model Assessment and Selection
7.1 Introduction
7.2 Bias, Variance and Model Complexity
7.3 The Bias–Variance Decomposition 223
7.3.1 Example: Bias–Variance Tradeoff
7.4 Optimism of the Training Error Rate
7.5 Estimates of In-Sample Prediction Error
7.6 The Effective Number of Parameters
7.7 The Bayesian Approach and BIC
7.8 Minimum Description Length
7.9 Vapnik–Chervonenkis Dimension
7.9.1 Example (Continued)
7.10 Cross-Validation
7.10.1 K-Fold Cross-Validation
7.10.2 The Wrong and Right Way
to Do Cross-validation
7.10.3 Does Cross-Validation Really Work?
7.11 Bootstrap Methods
7.11.1 Example (Continued)
7.12 Conditional or Expected Test Error?
Bibliographic Notes
Exercises
8 Model Inference and Averaging
8.1 Introduction
8.2 The Bootstrap and Maximum Likelihood Methods
8.2.1 A Smoothing Example
8.2.2 Maximum Likelihood Inference
8.2.3 Bootstrap versus Maximum Likelihood
8.3 Bayesian Methods
8.4 Relationship Between the Bootstrap
and Bayesian Inference
8.5 The EM Algorithm
8.5.1 Two-Component Mixture Model
8.5.2 The EM Algorithm in General
8.5.3 EM as a Maximization–Maximization Procedure
8.6 MCMC for Sampling from the Posterior
8.7 Bagging
8.7.1 Example: Trees with Simulated Data
8.8 Model Averaging and Stacking
8.9 Stochastic Search: Bumping
Bibliographic Notes
Exercises
9 Additive Models, Trees, and Related Methods
9.1 Generalized Additive Models
9.1.1 Fitting Additive Models
9.1.2 Example: Additive Logistic Regression
9.1.3 Summary
9.2 Tree-Based Methods
9.2.1 Background
9.2.2 Regression Trees
9.2.3 Classification Trees
9.2.4 Other Issues
9.2.5 Spam Example (Continued)
9.3 PRIM: Bump Hunting
9.3.1 Spam Example (Continued)
9.4 MARS: Multivariate Adaptive Regression Splines
9.4.1 Spam Example (Continued)
9.4.2 Example (Simulated Data)
9.4.3 Other Issues
9.5 Hierarchical Mixtures of Experts
9.6 Missing Data
9.7 Computational Considerations
Bibliographic Notes
Exercises
10 Boosting and Additive Trees
10.1 Boosting Methods
10.1.1 Outline of This Chapter
10.2 Boosting Fits an Additive Model
10.3 Forward Stagewise Additive Modeling
10.4 Exponential Loss and AdaBoost
10.5 Why Exponential Loss?
10.6 Loss Functions and Robustness
10.7 “Off-the-Shelf” Procedures for Data Mining
10.8 Example: Spam Data
10.9 Boosting Trees
10.10 Numerical Optimization via Gradient Boosting
10.10.1 Steepest Descent
10.10.2 Gradient Boosting
10.10.3 Implementations of Gradient Boosting
10.11 Right-Sized Trees for Boosting
10.12 Regularization
10.12.1 Shrinkage
10.12.2 Subsampling
10.13 Interpretation
10.13.1 Relative Importance of Predictor Variables
10.13.2 Partial Dependence Plots
10.14 Illustrations
10.14.1 California Housing
10.14.2 New Zealand Fish
10.14.3 Demographics Data
Bibliographic Notes
Exercises
11 Neural Networks
11.1 Introduction
11.2 Projection Pursuit Regression
11.3 Neural Networks
11.4 Fitting Neural Networks
11.5 Some Issues in Training Neural Networks
11.5.1 Starting Values
11.5.2 Overfitting
11.5.3 Scaling of the Inputs
11.5.4 Number of Hidden Units and Layers
11.5.5 Multiple Minima
11.6 Example: Simulated Data
11.7 Example: ZIP Code Data
11.8 Discussion
11.9 Bayesian Neural Nets and the NIPS 2003 Challenge
11.9.1 Bayes, Boosting and Bagging
11.9.2 Performance Comparisons
11.10 Computational Considerations
Bibliographic Notes
Exercises
12 Support Vector Machines and
Flexible Discriminants
12.1 Introduction
12.2 The Support Vector Classifier
12.2.1 Computing the Support Vector Classifier
12.2.2 Mixture Example (Continued)
12.3 Support Vector Machines and Kernels
12.3.1 Computing the SVM for Classification
12.3.2 The SVM as a Penalization Method
12.3.3 Function Estimation and Reproducing Kernels
12.3.4 SVMs and the Curse of Dimensionality
12.3.5 A Path Algorithm for the SVM Classifier
12.3.6 Support Vector Machines for Regression
12.3.7 Regression and Kernels
12.3.8 Discussion
12.4 Generalizing Linear Discriminant Analysis
12.5 Flexible Discriminant Analysis
12.5.1 Computing the FDA Estimates
12.6 Penalized Discriminant Analysis
12.7 Mixture Discriminant Analysis
12.7.1 Example: Waveform Data
Bibliographic Notes
Exercises
13 Prototype Methods and Nearest-Neighbors
13.1 Introduction
13.2 Prototype Methods
13.2.1 K-means Clustering
13.2.2 Learning Vector Quantization
13.2.3 Gaussian Mixtures
13.3 k-Nearest-Neighbor Classifiers
13.3.1 Example: A Comparative Study
13.3.2 Example: k-Nearest-Neighbors
and Image Scene Classification
13.3.3 Invariant Metrics and Tangent Distance
13.4 Adaptive Nearest-Neighbor Methods
13.4.1 Example
13.4.2 Global Dimension Reduction
for Nearest-Neighbors
13.5 Computational Considerations
Bibliographic Notes
Exercises
14 Unsupervised Learning
14.1 Introduction
14.2 Association Rules
14.2.1 Market Basket Analysis
14.2.2 The Apriori Algorithm
14.2.3 Example: Market Basket Analysis
14.2.4 Unsupervised as Supervised Learning
14.2.5 Generalized Association Rules
14.2.6 Choice of Supervised Learning Method
14.2.7 Example: Market Basket Analysis (Continued)
14.3 Cluster Analysis
14.3.1 Proximity Matrices
14.3.2 Dissimilarities Based on Attributes
14.3.3 Object Dissimilarity
14.3.4 Clustering Algorithms
14.3.5 Combinatorial Algorithms
14.3.6 K-means
14.3.7 Gaussian Mixtures as Soft K-means Clustering
14.3.8 Example: Human Tumor Microarray Data
14.3.9 Vector Quantization
14.3.10 K-medoids
14.3.11 Practical Issues
14.3.12 Hierarchical Clustering
14.4 Self-Organizing Maps
14.5 Principal Components, Curves and Surfaces
14.5.1 Principal Components
14.5.2 Principal Curves and Surfaces
14.5.3 Spectral Clustering
14.5.4 Kernel Principal Components
14.5.5 Sparse Principal Components
14.6 Non-negative Matrix Factorization
14.6.1 Archetypal Analysis
14.7 Independent Component Analysis
and Exploratory Projection Pursuit
14.7.1 Latent Variables and Factor Analysis
14.7.2 Independent Component Analysis
14.7.3 Exploratory Projection Pursuit
14.7.4 A Direct Approach to ICA
14.8 Multidimensional Scaling
14.9 Nonlinear Dimension Reduction
and Local Multidimensional Scaling
14.10 The Google PageRank Algorithm
Bibliographic Notes
Exercises
15 Random Forests
15.1 Introduction
15.2 Definition of Random Forests
15.3 Details of Random Forests
15.3.1 Out of Bag Samples
15.3.2 Variable Importance
15.3.3 Proximity Plots
15.3.4 Random Forests and Overfitting
15.4 Analysis of Random Forests
15.4.1 Variance and the De-Correlation Effect
15.4.2 Bias
15.4.3 Adaptive Nearest Neighbors
Bibliographic Notes
Exercises
16 Ensemble Learning
16.1 Introduction
16.2 Boosting and Regularization Paths
16.2.1 Penalized Regression
16.2.2 The “Bet on Sparsity” Principle
16.2.3 Regularization Paths, Over-fitting and Margins
16.3 Learning Ensembles
16.3.1 Learning a Good Ensemble
16.3.2 Rule Ensembles
Bibliographic Notes
Exercises
17 Undirected Graphical Models
17.1 Introduction
17.2 Markov Graphs and Their Properties
17.3 Undirected Graphical Models for Continuous Variables
17.3.1 Estimation of the Parameters
when the Graph Structure is Known
17.3.2 Estimation of the Graph Structure
17.4 Undirected Graphical Models for Discrete Variables
17.4.1 Estimation of the Parameters
when the Graph Structure is Known
17.4.2 Hidden Nodes
17.4.3 Estimation of the Graph Structure
17.4.4 Restricted Boltzmann Machines
Exercises
18 High-Dimensional Problems: p ≫ N
18.1 When p is Much Bigger than N
18.2 Diagonal Linear Discriminant Analysis
and Nearest Shrunken Centroids
18.3 Linear Classifiers with Quadratic Regularization
18.3.1 Regularized Discriminant Analysis
18.3.2 Logistic Regression
with Quadratic Regularization
18.3.3 The Support Vector Classifier
18.3.4 Feature Selection
18.3.5 Computational Shortcuts When p ≫ N
18.4 Linear Classifiers with L1 Regularization
18.4.1 Application of Lasso
to Protein Mass Spectroscopy
18.4.2 The Fused Lasso for Functional Data
18.5 Classification When Features are Unavailable
18.5.1 Example: String Kernels
and Protein Classification
18.5.2 Classification and Other Models Using
Inner-Product Kernels and Pairwise Distances .
18.5.3 Example: Abstracts Classification
18.6 High-Dimensional Regression: Supervised Principal Components
18.6.1 Connection to Latent-Variable Modeling
18.6.2 Relationship with Partial Least Squares
18.6.3 Pre-Conditioning for Feature Selection
18.7 Feature Assessment and the Multiple-Testing Problem
18.7.1 The False Discovery Rate
18.7.2 Asymmetric Cutpoints and the SAM Procedure
18.7.3 A Bayesian Interpretation of the FDR
18.8 Bibliographic Notes
Exercises
《移动通信(第2版影印版)》是移动通信领域的导论,主要讨论数字数据传输。适用于选修计算机网络或通信课程的电子工程或计算机专业
数位达尔文主义 本书特色 在这本书中,作者对高度竞争的网络经济进行了前所未有的深入考察,概括出立足网络的企业在形形色色的网上市场中生存与发展的7个关键战略。通过...
《从零开始做抖音短视频》内容简介:本书作者运营的自媒体账号“台球帝”在抖音和快手两个平台共拥有300多万粉丝量,在亲自运营的基
8086微机原理及接口技术-习题解答与实验指导 本书特色 本书是教材《80x86/Pentium微机原理及接口技术》的配套教材。全书共分两部分:**部分为学习指...
汇编语言程序设计-第2版 本书特色 《汇编语言程序设计(第2版)》以Intel 80x86 CPU的指令系统为介绍对象,以并行推进的方式来介绍其16位和32位C...
《Web2.0地图学》在引入Web2.0地图概念基础上,探讨了Web2.0地图的用户参与特性,系统阐明了Web2.0环境下地图的生产和传播体系,
地理空间元数据关联网络构建与应用 本书特色 在介绍相关研究背景与意义、国内外研究现状、相关理论方法的基础上,本书阐述面向关联数据的地理空间数据语义关联网络构建方...
《Oracle性能诊断艺术》以优化Oracle应用程序为目的,先介绍Oracle性能优化的基本原理、关键概念,从业务角度和系统角度分析性能
从貌似天书的汇编代码中,一探Windows底层的核心实现。.在开发中出现的问题,能从Windows自身找到答案!...本书从基本的Windows程
创业需要好的设计,精益创业的用户体验设计是一种更快更智能的用户体验设计方法。本书讲述了众多精益用户体验设计的特点,通过多
《糖王周毅翻糖蛋糕之古风集》内容简介:世界权威性翻糖蛋糕大赛中 获全场特等奖的作品——武则天,和醉卧忘忧境 首次公开出版,揭
《破解牛股密码(第2版)》内容简介:本书通过对市场中个股的描述,总结出牛股上涨的内在因素和外在图形特征,通过对牛股内外在特征
《DBA攻坚指南》内容简介:本书主要分为Oracle和MySQL两大部分。第一部分介绍Oracle的日常运维,该部分由第1~4章构成,内容由浅入
本书系统地讨论了数字信号处理的基本原理、主要分析方法及相应的实践途径。书中前三章是数字信号处理的基础,其中包括了时间离散
《避风港:金融风暴中的安全投资》内容简介:《避风港》探讨了一个重要的问题:在变化波动的市场中,什么是安全的投资? 现代金融理
《悦吃悦瘦》内容简介:本书首先揭开众多的减肥产品的真面目,帮助读者走出减肥的误区,告诉读者如何通过合理饮食来达到减肥的目的
《云原生安全与DevOps保障》内容简介:本书主要介绍了DevOps实践中最容易被忽视的一环——安全,并且对云原生服务的安全保障也做了
ThisbookisasuperreferencetoallthingsAPI-related.Itexplainstheunderlyingtheoryand...
网络操作系统Linux管理与配置 本书特色 陈志涛主编的《网络操作系统Linux管理与配置》是21世纪高职高专IT类专业系列教材之一。教材内容贯彻“工学结合”指...
《盛开:树洞》内容简介:每个人在青春里,都有一段不可名状的幽微岁月:貌似平静如水的生活之下,暗暗涌动着的无奈与挣扎;情窦初