Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
《GNU/LINUX环境编程(第2版)》详细介绍如何开发可以运行于GNU/Linux操作系统的应用程序,经过全面修订的第2版涵括所有必要的工具
Asmuchashelongsforpeace,KingEdwinisrepeatedlyforcedintowaragainsttreacherousneig...
《智能传播:机遇与挑战》内容简介:本书为第五届上海交通大学-国际传播学会新媒体国际论坛的优秀论文选第一辑。本辑包含14篇关于新
《汉英委婉语跨文化比较研究》内容简介:本专著通过跨文化交际的视角对汉英语言在职业委婉语、新闻委婉语、广告委婉语、死亡委婉语
本书为机电工程师继续教育丛书之一。本书以工程应用为背景,系统地阐述常用人工神经网络的基本原理、学习算法及分析方法,全书共
《国家竞争优势(下)》内容简介:《国家竞争优势(下)》基于10个主要发达国家的研究,根据企业凭以竞争的生产率,迈克尔•波特第一
《追梦的笨笨》内容简介:本书写给大学生的书,是一本实用的留学指南,也是一本励志书。全书精髓在于Sonia老师7年美国名校留学申请
《MBA、MPA、MPAcc、MEM管理类联考综合能力逻辑最后冲刺18套卷(含快速提分技法)》内容简介:主要内容是作者收集的除了历年真题以
《RocketMQ技术内幕:RocketMQ架构设计与实现原理》内容简介:本书由RocketMQ社区早期的布道者和技术专家撰写,Apache RocketMQ创...
《倪徽奥传》内容简介:倪徽奥,他的一生堪称传奇,东京大审判,他以凛然正气,将侵华日军战犯绳之以法,作为新中国第一位国际法院
《巡天记》内容简介:1965年9月,在一条神秘的军用铁路线的列车上,来自同一班级的10大学毕业生,一起奔向大漠深处的神秘之地。几年
《清华时间简史:美术学院(110校庆)》内容简介:清华大学美术学院的前身是1956年成立的中央工艺美术学院,1999年并入清华大学,更
Webservices,usuallyincludingsomecombinationofprogramminganddata,aremadeavailable...
《管理故事与哲理》内容简介:这是一本用故事+剖析的形式来传授管理哲学的大众经管图书。将管理的原理、策略、方法、技巧融入短小精
《机电设备维护与管理》内容简介:本书列入“十三五”江苏省高等学校重点教材,分为上下两篇,上篇为设备维护部分,下篇为设备管理
《周有光传》内容简介:《汉语拼音之父:周有光传》主要内容简介::半个世纪前,他从经济学教授变身语言文字学家。花三年时间倾力打磨
《国际平面设计基础教程:GRIDS网格设计》的目的是向读者介绍平面设计中网格的基本运用原则,就像当代设计师们所实践的那样。虽然
BookDescriptionThemodernWebisawashwithdataandservicesjustwaitingtobeused,buthowd...
《竹简《文子》研究之回顾与反思》内容简介:本书共分五个章节,依次为:一、竹简《文子》研究;二、竹简《文字》成书年代反思;三
《新城市中国》内容简介:中国的城市经营同样需要战略咨询。本书汇集了智纲智库为各地方政府参谋策划的经典案例。从北京到成都、从