This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.
To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.
Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line.
●Obtain data from websites, APIs, databases, and spreadsheets
●Perform scrub operations on plain text, CSV, HTML/XML, and JSON
●Explore data, compute descriptive statistics, and create visualizations
●Manage your data science workflow using Drake
●Create reusable tools from one-liners and existing Python or R code
●Parallelize and distribute data-intensive pipelines using GNU Parallel
●Model data with dimensionality reduction, clustering, regression, and classification algorithms
Chapter 1 Introduction
Overview
Data Science Is OSEMN
Intermezzo Chapters
What Is the Command Line?
Why Data Science at the Command Line?
A Real-World Use Case
Further Reading
Chapter 2 Getting Started
Overview
Setting Up Your Data Science Toolbox
Essential Concepts and Tools
Further Reading
Chapter 3 Obtaining Data
Overview
Copying Local Files to the Data Science Toolbox
Decompressing Files
Converting Microsoft Excel Spreadsheets
Querying Relational Databases
Downloading from the Internet
Calling Web APIs
Further Reading
Chapter 4 Creating Reusable Command-Line Tools
Overview
Converting One-Liners into Shell Scripts
Creating Command-Line Tools with Python and R
Further Reading
Chapter 5 Scrubbing Data
Overview
Common Scrub Operations for Plain Text
Working with CSV
Working with HTML/XML and JSON
Common Scrub Operations for CSV
Further Reading
Chapter 6 Managing Your Data Workflow
Overview
Introducing Drake
Installing Drake
Obtain Top Ebooks from Project Gutenberg
Every Workflow Starts with a Single Step
Well, That Depends
Rebuilding Specific Targets
Discussion
Further Reading
Chapter 7 Exploring Data
Overview
Inspecting Data and Its Properties
Computing Descriptive Statistics
Creating Visualizations
Further Reading
Chapter 8 Parallel Pipelines
Overview
Serial Processing
Parallel Processing
Distributed Processing
Discussion
Further Reading
Chapter 9 Modeling Data
Overview
More Wine, Please!
Dimensionality Reduction with Tapkee
Clustering with Weka
Regression with SciKit-Learn Laboratory
Classification with BigML
Further Reading
Chapter 10 Conclusion
Let’s Recap
Three Pieces of Advice
Where to Go from Here?
Getting in Touch
影響各式設計完成度最重要的「標準字設計」(logotypedesign)中、歐、日標準字設計要領,一次掌握!--華文世界第一本字型MOOK
本书基于虚构的计算机Pep/8,清晰、详细,循序渐进地介绍了计算机组成、汇编语言和计算机体系结构中的核心思想,围绕7个抽象层次
《典型半导体团簇及组装材料的结构和电子特性》内容简介:典型半导体团簇及其团簇组装材料的结构及其电子性质的研究是当前团簇科学
《信息存储与IT管理》内容简介:本书由华为技术有限公司与上海交通大学计算机科学与工程系联合编写,融合了上海交通大学在计算机领
《逍遥游(绘本版)》内容简介:《逍遥游》是战国时期哲学家、文学家庄子的代表作,无论在艺术上还是思想上均可视为《庄子》一书的
《智能的本质》内容简介:机器人的智慧能超越人类吗?人工智能的奇点究竟何时会到来?人类会借助人工智能实现永生吗?对于这些问题
《新概念编程C语言篇习题解答》对《新概念编程C语言篇》教材中的习题进行了系统全面的分析和解答。习题精选了C语言编程中典型题型
NodegivesJavaScriptdevelopersincredibleserver-sidepower,buttransitioningfromfron...
《Head First Go语言程序设计》内容简介:Go是为高性能网络和多处理而设计的,但与python和javascript一样,该语言易于阅读和使用。
TCP/IP最佳入门-原书第6版 本书特色 《TCP/IP *佳入门:因特网文原理与应用(原书第6版)》:详细说明TCP/IP的基本运作原理;包含协议分析-Et...
《和秋叶一起学Word(第3版)》内容简介:Word、PPT、Excel,哪一个值得你花精力去学习? 我认为是Word,因为Word软件的使用频率高
《认知设计意味着商机》(英文原名为:Realize:DesignMeansBusiness,以下简称《认知》)是由美国工业设计师协会出版,共收集了
《沙漠之城》内容简介:埃及,一个充满了妖艳而疯狂气息的沙漠之域。旅行探险家本尼西本想在这里体验一番奇妙的异域风情,探寻传说
《搜索引擎优化》对于DIY搜索引擎营销初学者而言,《搜索引擎优化》是一本非常好的入门读物。该书不仅涵盖了SEO的基本要素,还深
Abasicproblemincomputervisionistounderstandthestructureofarealworldscenegivensev...
《编程卓越之道第二卷:运用底层语言思想编写高级语言代码》是《编程卓越之道》系列书的第二卷,将探讨怎样用高级语言(而非汇编语
Inthetraditionofinternationalbestsellers,"FutureShock"and"Megatrends,"MichaelJ.S...
《唤醒自己》内容简介:这是一部能够帮你做出正确决策的智慧之书。它帮你认清世界的本质,让你在择业、财富、人生选择、思维模式等
《李国文散文》内容简介:李国文散文,无论抒写对现实人生的思考,还是关注历史上文人的生存状态,字里行间都透出真性情,具有洞明
Pro/ENGINEER高级应用教程-(2001中文版)(含1CD) 本书特色 本书通过大量的实例对Pro/ENGINEER高级功能中的一些抽象概念进行了详细的...