Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
《小创客学光环板》内容简介:本书主要介绍利用小巧的光环板及功能强大的慧编程平台实现智能可穿戴设备作品的设计与创作。在内容上
《金融促进高质量发展之路》内容简介:随着我国经济不断发展,国家综合实力不断提升,加之受新冠肺炎疫情的影响,金融业发展在我国
佐藤可士和(KashiwaSato),艺术指导,创意指导。1965年生于东京。1989年毕业于多摩艺术大学,毕业后进入日本知名广告公司“博报
ThisbookisaboutlearningtoprograminLisp.Althoughwidelyknownastheprincipallanguage...
这是一本专门为网页设计师赏析和借鉴服务的实用图典。本书收集了近1000个优秀网页设计的优秀瞬间,并为每个网页标注出URL、关键词
《云梯:从新人到达人的职场进化论》内容简介:本书从分析当今人们的学习、工作、生活现状开始,逐步引导读者更好地进行自我管理和
《明德学校史话》内容简介:本书依据宏富的史料,记述了明德学校艰苦卓绝的办学历程及其辉煌的办学业绩,展示了胡元倓等一批明德人
Learnhowtocreateresponsive,data-drivenwebsiteswithPHP,MySQL,andJavaScript-whethe...
Visual FoxPro及其应用系统开发第二版 内容简介 本书是《Visual FoxPro 及其应用系统开发》一书的第二版。本次再版把重点放在加强Web应用...
《约翰·托兰自传》内容简介:★普利策奖得主、知名二战历史学者、《希特勒传》作者约翰·托兰封笔之作。★亲历20世纪影响世界格局
遥感数据质量改善之信息重建 本书特色 基于航空航天遥感技术,人们可以感知地球陆表信息,从而认识和理解各种现象和规律。其中,传感器观测影像是遥感信息的载体,其数据...
《佛罗伦萨乌菲齐美术馆》内容简介:“伟大的博物馆少年版”系列图书以博物馆为依托,在内容编写上融合了博物馆与相关国家、城市的
《手到病自除2:常见病反射区自愈疗法(下)(2022版)》内容简介:本书是《手到病自除(增订精华版)》系列第二册,书中完整介绍了
《就喜欢你看不惯我又干不掉我的样子4》内容简介:超人气漫画家白茶全新力作!吾皇巴扎黑一家爆笑来袭!《就喜欢你看不惯我又干不掉
Thistextidentifies,examines,andillustratesfundamentalconceptsincomputersystemdes...
《群体的思维:如何利用群体智慧解决工作、生活难题》无论我们是要做金融方面的决策、换个工作还是找到我们的另一半,众包都不失
Excel数据处理与分析-(附1DVD.含自学视频.技巧视频.应用案例.报表资源.设计素材.PPT资源等) 本书特色《excel数据处理与分析》从全新的角度全面...
《行业专网规划设计手册》内容简介:本书首先介绍了专网的概念与目前国内外专网的建设现状。在专线电路的建设中根据承载业务对网络
互联网接入服务现状及管理对策研究 本书特色这是一部全面系统探讨互联网接入服务现状及管理对策的研究专著。佟力强编写的《互联网接入服务现状及管理对策研究》从介绍互联...
《不头疼的故事作文课(典藏版):怪小孩》内容简介:《不头疼的故事作文课》系列是根据作者的一些亲身经历的故事为基础,改编成为