Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
JAVASCRIPT DOM编程艺术(第2版) 本书特色 amazon超级畅销书*新版,释放javascript和dom编程的惊人潜力,涵盖html5及jque...
Thistextidentifies,examines,andillustratesfundamentalconceptsincomputersystemdes...
这场始于政府工作报告,由互联网和传统行业跨界融合形成的巨大浪潮,正在以中国经济转型为新驱动力的角色,席卷生机勃勃的整个国
《姐妹》内容简介:《姐妹》是朱墨创作的长篇小说。陶姜、陶然是一对姐妹,两人相差八岁,性格迥异。父母离婚那年,她们的人生埋下
《围棋死活二选一从入门到精通(级位篇)》内容简介:本书是由少儿围棋教育专家、职业五段棋手赵守洵专为围棋初学者创作。本书按照
本书在苹果公司公开的源代码基础上,深入剖析了对应用于内存管理的ARC以及应用于多线程开发的Blocks和GCD。这些新技术看似简单,
《分布式系统架构》内容简介:资深分布式系统研发工程师、构架师多年工作经验总结,从原理、应用和实践3个维度展开从前端到后端,从
Visual Basic 6.0基础与实践教程 内容简介 本书由浅入深,针对Visual Basic,系统地讲解从基本的语言元素知识点到构建一个具体完整的实用系...
《给设计以灵魂:当现代设计遇见传统工艺》的作者亲身实践“思考全球化、行动在地化”的设计概念,在西方的现代设计中加入日本传统
HTML5权威指南 本书特色 《html5权威指南》是系统学习网页设计的权威参考图书。《html5权威指南》分为五部分:**部分介绍学习本书的预备知识和html...
视觉机器学习20讲 本书特色谢剑斌等编著的这本《视觉机器学习20讲》是计算机、自动化、信息、电子与通信学科方向的专著,详尽地介绍了k-means、knn学习、回...
本书首先通过“总体架构”梳理了各个模块的分类、功能和依赖关系,让大家对jQuery的工作原理有大致的印象;进而通过“构造jQuery
本书首先解释了AJAX为什么在大规模的开发中能有如此广阔的应用前景,接着系统地介绍了当前重要的AJAX技术和组件。你将看到把数据
CADCAMCAE工程应用丛书ANSYS Fluent流体计算从入门到精通(2020版) 本书特色 适读人群 :适合广大Fluent初、中级读者学习使用;也可作...
《工业软实力》内容简介:本书由工业和信息化部政策法规司组织国家工业信息安全发展研究中心、中国信息通信研究院、中国电子信息产
《Java EE 框架整合开发入门到实战:Spring+Spring MVC+MyBatis(微课版)》内容简介:Java EE框架整合开发入门到实战:Spri...
《曾国藩家书》内容简介:曾国藩在为官从政、用人鉴人方面的智慧,常使后世之人钦佩万分,从毛泽东到蒋介石无不对其推崇备至。他在
《生活,是第一位的》内容简介:汪曾祺:“我要运用普通朴实的语言把生活写得很美,很健康,富于诗意,这同时也就是我要想达到的效
被咬过一口的苹果残缺低调但却难掩光芒无论是偏执狂还是是残忍的完美主义者无论是创新迷还是神一样的传奇都不重要他就是乔布斯不
很多事物中都存在组成(成分、构成)问题。作者创立的组成论为此提供统一的认识模型、分析工具、计算方法和原理。它通过广义集合