Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
《博弈论:每个人都能成为决策高手》内容简介:这是一本关于博弈论的入门书。蒋文华认为一个高度联结的社会必然是一个充满博弈的社
计算机审计情景案例 内容简介 计算机审计在我国正处于探索和发展的过程中,审计实践中会遇到各种各样的问题。本书选编了50个计算机审计的典型案例。从数据审计到信息系...
《家国情怀 知行合一》内容简介:郑杭生先生作为改革开放后中国社会学重建的重要领导人和新时期中国社会学理论的重要开拓者,不仅为
《苏东坡的诗词与人生》内容简介:本书以苏东坡一生的行迹为线索,采用漫话的形式,介绍了苏东坡各时期诗词的风格与成就。作者将诗
在整個藝術史當中,塗鴉藝術仍算是一門初生的想法與概念。這種EphemeralArt(消失性藝術),是基於作品本身的毀滅性或是它的短暫
《DevOps实施手册》内容简介:本书展示了如何:理解DevOps的要素与能力;应对多级IT环境的挑战;识别出可以与DevOps相互匹配的大型
LearnhowtouseRxJavaanditsreactiveObservablestobuildfast,concurrent,andpowerfulap...
《MFCWindows程序设计》是对其极为经典的第1版的全面更新,书中不仅扩展了已被认为是权威的Microsoft用于WindowsAPI的功能强大的
Likeanyothersoftwaresystem,Websitesgraduallyaccumulate"cruft"overtime.Theyslowdo...
《社会认知主义视域下学术写作指导反馈研究》内容简介:本书是“语言学博士文库”之一,拟在社会认知主义视域下研究中国英语学习者
《教你轻松学电商之淘宝海报设计》内容简介:电商海报是目前进行电商宣传的一种常用的形式,是消费者了解电商产品的一种主要的方式
《古典吉他入门完全自学教程》内容简介:音乐的本质是通过声音传递人的内心情绪及各种感受,也就是我们常说的“情感”。人类有着丰
随着Ajax技术的不断风靡,其核心技术JavaScript越来越受到人们的关注,各种JavaScript的框架层出不穷。jQuery作为JavaScript框架
《现代化的政治》内容简介:该书对发展中国家的政治现代化道路进行了深入的分析,特别是在实现政治现代化的过程中影响政治变革的一
AutoCAD2013完全学习手册 本书特色 《autocad2013完全学习手册》基于autocad2013版本,详细讲解了autocad的各项功能。《aut...
嘉格伦(GlennR.Jones)先生是杰士知识公司和杰士国际大学的创始人和首席执行官。杰士知识公司和杰土国际大学的下属机构自1961年
《SQL必知必会(第5版)》内容简介:SQL是使用最广泛的数据库语言,绝大多数重要的DBMS支持SQL。本书由浅入深地讲解了SQL的基本概念
本书探讨了针对Ajax、JavaScript和基于表现状态传输(RepresentationalStateTransfer,REST)的Webservice,以...
《元明清散曲选》内容简介:散曲,是我国最后一种具有生命力的古典诗体。此前散曲选本多录元曲,而极少涉及明、清。本书则是一部全
《醉鲨》内容简介:★“荣获挪威最重要的文学奖——伯瑞格文学奖 (The Brage Prize),至今在挪威畅销不衰。★一部精彩绝伦的北欧探