Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
《可伸缩架构(第2版):云环境下的高可用与风险管理》内容简介:《可伸缩架构(第2版):云环境下的高可用与风险管理》是一本关于
《原力:再造企业价值战略》内容简介:在移动互联背景下,商业环境发生了巨大变化,新商业模式正在井喷,巨大的价值潜力不断释放。
白纸或屏幕上的字母的组织,是设计师们最基本的挑战之一。使用什么字体?要多大?这些字母、字词和段落如何排列、间隔、安排和造型
本书是英国剑桥大学卡文迪许实验室的著名学者DavidJ.C.MacKay博士总结多年教学经验和科研成果,于2003年推出的一部力作。本书作
高分辨率遥感影像变化检测 本书特色 这是国内以高分辨遥感影像变化检测为核心,系统阐述相关关键技术的书籍。书中在深入阐述变化检测及相关技术的各种理论的同时,将对*...
Chapter1.IntroductionSection1.1.TheHighPerformanceBuzz-wordChapter2.TheTheoryofC...
《用图表说话》内容简介:数据视觉化,一种新的商业语言,一项职场人士必备的技能。优秀的可视化图表在传达信息方面比任何其他形式
作者简介:Raoul-GabrielUrma剑桥大学计算机科学博士,软件工程师,演讲者,培训师,CambridgeCodingAcademy联合创始人、CEO。...
多维信号处理:快速变换.稀疏表示与低秩分析 本书特色 多数信号处理论著主要针对理论与方法臻备的一维信号,而对于仍在发展完善中的多维信号处理少有涉及或涉之不深。本...
《Unity虚拟现实开发实战(原书第2版)》内容简介:本书通过基于项目的实践方式,详细讲解如何使用Unity 3D游戏引擎进行虚拟现实开
《于水山古琴练习曲集》内容简介:这是一本关于古琴演奏方法的教材。其主体由两部分组成:第一部分,是题为“练习曲与古琴弦法、调
《德国哲学(2016年下半年卷)》内容简介:本书是由湖北大学哲学学院主办、湖北大学德国哲学研究所协办的专门研究德国哲学及相关问
《日志管理与分析(第2版)》内容简介:本书基于主流日志管理与分析系统的设计理念,完善、透彻地对日志分析各流程模块的原理与实现
《88个一学就会的旅途小魔术》内容简介:魔术师制造奇迹的艺术,是大众十分喜爱的艺术。根据魔术表演的场景,《88个一学就会的旅途
《APP营销实战:抢占移动互联网第一入口》内容简介:在移动互联网时代,谁先占领用户的手机桌面,谁就是“明日霸主”。可以说,APP
《辉瑞:为世界健康护航》内容简介:重现辉瑞160多年的发展历程与管理特点,揭秘全球最大研发型制药公司百年成功之道:高技术产品+
为什么畅销商品的包装设计成这样?原来处处皆是眉角!你也许吃过这些热门商品,但你是否能看懂包装中的玄机?让人伸手就想拿!买
《中国器官移植发展报告(2019)》内容简介:本书由中国器官移植发展基金会组织编写。为器官移植领域提供参考依据,从而全面梳理我
《蟋蟀歌手读《诗经》》内容简介:这一年的暑假,丫丫和表姐乐琦、小姨、姨夫来到美丽的乡下,居住在妈妈小时候住过的房子里。美丽
本书结合理论知识和实例程序,全面而系统地介绍了Objective-C编程的相关内容,包括类和继承、对象的类型和动态绑定、基于引用计数