Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
本书内容包括:HSPA标准的研究进展和未来发展蓝图;HSPA对无线网络的影响、协议结构以及网元的功能和接口,并给出了无线资源控制
《Kafka Streams实战》内容简介:Kafka Streams是Kafka提供的一个用于构建流式处理程序的Java库,它与Storm、Spark等流式处...
"AjaxinPractice"providesexample-richcoverageofAjaxpackedwithready-to-usecodeandp...
《软件项目开发全程实录丛书•PHP项目开发全程实录:DVD17小时语音视频讲解(附光盘1张)》主要特色:(1)12-32小时全程语音同步视频
在1955-1968年乌尔姆设计学院办学期间,没有人能预料到这所规模不大、位置偏僻且历史短暂的学校,会有如此大的影响力。乌尔姆设计
《十堂极简人工智能课》内容简介:从人脸识别到AlphaGo,从无人驾驶到全球经济管理,人工智能作为21世纪极有潜力的一门技术,已经全
全国专业技术人员计算机应用能力考试考前冲刺EXCEL 2003中文版电子表格 本书特色 《全国专业技术人员计算机应用能力考试考前冲刺:Excel 2003中文电...
《儿童时间管理亲子手册》内容简介:《儿童时间管理训练手册》的出版,为困扰中的父母提供抓手。“三表一录”帮助孩子“一立三高”
《ZBrush+3ds Max+TopoGun+Substance Painter次世代游戏建模教程》内容简介:随着游戏行业的不断发展,对三维游戏模型人才的需求...
现代操作系统(英文版.第3版) 本书特色 在本书中作者深入讨论了许多主题,包括:进程、线程、存储管理、文件系统、i/o、死锁、接口设计、多媒体、性能权衡,以及有...
Photoshop建筑表现图专业技法与范例-附光盘 本书特色 《Photoshop建筑表现图专业技法与范例》:超值多媒体光盘12段多媒体全程配音教学视频30个建...
《李白诗选》内容简介:李白,字太白,号青莲居士,祖籍陇西城纪。李白是盛唐文化孕育出来的天才诗人,他非凡的自信、傲岸的人格和
PRO/ENCINEERZ中文野火版4.0模具设计师-分模特训篇 目录 第1章Pro/E分模的基础知识1.1Pro/MOLDESIGN模块1.2Pro/E的模具...
《区块链金融》内容简介:毋庸置疑,“区块链金融”已经成为当下经济发展的重要势头,但是很多经济行业对区块链依然有些陌生,例如
《智慧养老:服务与运营》内容简介:本书智慧养老是未来养老的重要方向,它将会和社区居家养老、机构养老结合形成智慧社区居家养老
《平衡掌控者》内容简介:本书由真正从事游戏行业工作的一线人员所著,书中的全部案例来自真实的游戏设计案例。全书共7章,第1章介
在电商两极化趋势越来越明显的当下,中小店主们该如何从重重包围中突围而出?如何在店铺包装上一鸣惊人?爆款已死,唯有真正小而
大型强子对撞机(LHC)坐落于欧洲核子研究中心(CERN),在瑞士和法国交界处的乡村地下25.659千米长的环形隧道中。它的宏伟目标是
创意城市:如何打造都市创意生活圈,ISBN:9787302210047,作者:(英)兰德利著,杨幼兰译目录 中文版序《创意城市》的缘起与未
《让你的时间更有价值》内容简介:★一本书告诉你非常时期,钱从哪里来? ★副业刚需时代来临,你还没有自己的副业吗?当下这个时代