Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
《左手微博右手微信》内容简介:关于社交媒体营销,曾经很多人热衷于微博营销;近来又有很多人热衷于微信营销,然而微博与微信有巨
Atthestartofeverywebdesignproject,theongoingstrugglesreappear.Wewanttodesignhigh...
OpenCL领域公认的权威著作,由OpenCL核心设计人员亲自执笔,不仅全面而深刻地解读了OpenCL规范和编程模型,而且通过大量案例和代
《书法美育的经典图释》内容简介:本书为陈振濂书法美育思想的“图像篇”,是针对书法美育的一次力行实践,对书法美育的推广与普及
《时装设计元素:调研与设计》是与设计相关的重要元素。《时装设计元素:调研与设计》通过一系列的章节,讲解了一个设计师首先可能
Thisbookprovidesthetheory,practicaldetails,andtoolsnecessaryforbuildingvisualiza...
本书是清朝康熙皇帝在万几之暇所作的一本笔记,内容主要是对天文、地理、古生物、动物、植物、医药、哲学等科学文化现象的调查、
《农产品上行运营策略与案例》内容简介:2018年的中央一号文件开启了中国乡村振兴的“三农”新时代,文件对农村电商的要求进一步聚
Inthetraditionofinternationalbestsellers,"FutureShock"and"Megatrends,"MichaelJ.S...
《服务的细节:让顾客爱上店铺2•三宅一生》主要讲述了三宅一生不仅仅是在日本具备绝对实力的时尚集团,在全世界也是享有盛名的品牌
RAW格式数码照片处理完全解析-(附光盘) 本书特色 raw格式是目前所有数码摄影人士都非常喜爱的影像格式,但由于raw格式图片在转换过程中步骤繁多,让许多影友...
《深度学习之TensorFlow》内容简介:本书通过96个案例,全面讲解了深度学习神经网络原理和TensorFlow的使用方法。全书共分为3篇,第
《住宅精细化设计》为作者针对住宅设计要点的长期研究的总结。书中就住宅精细化设计过程中的方法、意义和建议等问题,围绕我国住
《光幻中的论语:十七年电影的导演逻辑》内容简介:本书是徐皓峰最新的电影评论集。主要将新中国国建国后十七年的红色电影展开解读
《存在主义视阈中的苏珊·桑塔格创作研究》内容简介:《存在主义视阈中的苏珊·桑塔格创作研究》以存在主义为主线来解读美国女作家
“本书是Summit以及CFAQ在线列表的许多参与者多年心血的结晶,是C语言界最为珍贵的财富之一。我向所有C语言程序员推荐本书。”—
《背包旅行》内容简介:本书是基于一名背包客和背包客研究者的自助旅行体验,对背包旅行进行全面再概念化的学术专著。它不仅展现了
《董其昌浚路马湖记》内容简介:此卷全称《淮安府浚路马湖记》,纸本行楷书,高二十九点三厘米,长六百零七点五厘米,为董其昌所书
《成长比成功更重要(增订本)》内容简介:微软亚洲研究院聚集着计算机领域许多世界一流的科学家,他们每一个人都被称为天才,但是
分布式算法20多年来一直是倍受关注的主流方向。本书第二版不仅给出了算法的最新进展,还深入探讨了与之相关的理论知识。这本教材