Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
《个人理财理论与实务(第二版)》内容简介:本教材突破以往同名教材编写侧重于金融企业理财或理财师代客理财的视角,本教材从个人
《给青年的十二封信(经典译林)》内容简介:朱光潜赠予青年朋友的人生智慧书; 附录朱光潜谈修养、谈文学等多篇精彩文章 本书是美
PKPM结构设计程序应用 内容简介 本书是为高等院校土木工程专业建筑结构设计程序应用课程编写的教科书,重点介绍了中国建筑科学研究院pkpm系列程序(200...
《以市场为驱动——华为大客户营销实战演练》内容简介:大客户营销不同于普通营销,它要远远比普通营销复杂得多,需要企业有着对自
C++程序设计-(计算机及应用专业)(独立本科段)(2008年版)(附:C++程序设计自学考试大纲) 内容简介 作为我国高等教育组成部分的自学考试,其职责就是在...
MoreExceptionalC++是ExceptionalC++的续篇。根据多年程序开发的实践经验、HerbSutter向C++程序员提供了久经考验的程序设计...
本书专注于CSS技巧实例的讲解,由浅入深地分析了CSS样式在布局时所需要理解的原理。放弃到处可见的基础知识、网络中能随意搜索到
《网众传播 (数字媒介变革书系)》内容简介:“网众传播”指的是由“网众”发起和参与,由“社会性媒体”中介的传播模式、现象与行
《文化南京》内容简介:本书拟从时间、空间和文化三个层面对南京进行深度解读,集中展示南京历史文化形象,总结南京历史文明的发展
龚正/惠普公司高级顾问拥有十多年的IT从业经验,具备丰富的云计算、大数据分析和大型企业级应用的架构设计和实施经验,是电信、金
CurrentlyusedastheeducationalhandoutforthetransportationdesignstudentsatArtCente...
哈佛大学法学院教授,哈佛大学伯克曼“互联网与社会”中心主任。美国马萨诸塞州安赫斯特大学学士(美国研究)、哈佛大学J.D.和美
《微店+营销,你该这样玩》内容简介:微商,诞生于2009年,2014年开始被大家所熟知。微店的各种操作、功能都属于基础知识,是每个微
《晚安妈妈,晚安宝贝:小秘密跑掉了》内容简介:本书含27个小故事,让孩子的睡前时光温暖甜蜜,治愈孩子小小的心灵。作者充分利用
《中国简史》内容简介:本书是吕思勉写作的一本中国史入门读物,用通俗简洁的语言讲述了中国五千年历史脉络。全书内容深入浅出,条
本书是对发生在西欧的三次著名的艺术运动(工艺美术运动、新艺术运动和装饰运动)中的“工艺美术运动”的综合性、专业性介绍和具
Avisualfeastof400dazzlingimages,thisisacomprehensivesurveyofthegenreoverthelastc...
《工地社会》内容简介:在科技水平、现代设备和经济条件都十分有限的集体化时代,大型水利工程何以成为可能?本书以1958~1962年甘
《Word/Excel/PPT 2019完全自学教程(视频讲解版)》内容简介:全书共17章,可分为4个部分。第1部分介绍Office 2019的操作环境和通
《仰顾山房文稿》内容简介:本书是凤凰枝文丛之一种,作者俞国林是中华书局编辑,先后策划《顾颉刚全集》等具有影响力的图书多种,