Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
本书通过大量的实际开发应用实例阐述Python语言的基础知识,介绍如何使用计算机进行问题求解、结构化编程以及面向对象编程。本书
《中国至2050年重大交叉前沿科技领域发展路线图》是中国科学院"创新2050:科学技术与中国的未来"战略研究成果之一,集中探讨自然
企业级网站开发项目教程(ASP.NET) 本书特色 《企业级网站开发项目教程(ASP.NET)》:教育部高职高专计算机教指委规划教材。企业级网站开发项目教程(A...
数据库系统简明教程 本书特色 《数据库系统简明教程》是由王珊所编著,高等教育出版社出版发行的。数据库系统简明教程 内容简介 本书系统地阐述了数据库系统的基础理论...
《宋词选》内容简介:宋词在我国文学史上,占有相当重要的地位。胡云翼先生选注的《宋词选》,堪称宋词选本中的经典之作。选本打破
"DearPHP,Itsoverbetweenus.Youcankeepthekitchensink,butIwantmyMVC.WithTurboGears,...
本书是第二次世界大战末期美国科学研究发展局主任V•布什提交给总统的科学报告,回答了罗斯福总统提出的有关美国战后科学发展的四
《用户体验及其在通信产品开发中的应用》首先深入浅出地向读者介绍了用户体验的概念、理论基础、研究思路等;然后深入解析用户体
《新手读财报》内容简介:本书作为财务分析的入门级读物,以“业”“财”融合为基调,充分强调财务源于业务,在理解公司业务的基础
《数据分析原理:6步解决业务分析难题》内容简介:本书系统地介绍了数据如何始于业务、取于业务、用于业务。既有扎实的理论铺设,又
马中红,江苏苏州人。苏州大学凤凰传媒学院教授,博士生导师,苏州大学新媒介与青年文化研究中心主任。主要从事新媒介青年文化研
Rhinoceros Grasshopper 参数化建模(曾旭东) 本书特色 参数化建模技术在辅助建筑设计上的应用越来越广泛,其发展时间短暂,发展速度却令人叹为...
《Python编程第三版》已经成为Python用户的行业标准,且更加完善。第三版进行的更新反映了当前的最佳实践以及在语言的最新版本Py
Railsisafantastictoolforwebapplicationdevelopment,butitsAjax-driveninterfacessto...
本书深入剖析Photoshop创作技巧与理念的同时,从作品需求的角度出发解读Photoshop影像处理功能及其应用。目录 O热爱图像的石头何
《读老庄之道悟生活智慧》内容简介:老子与庄子主张精神上的逍遥自在,主张宇宙万物都有平等的性质,主张人要融于自然万物,从而与
《典型半导体团簇及组装材料的结构和电子特性》内容简介:典型半导体团簇及其团簇组装材料的结构及其电子性质的研究是当前团簇科学
《通盘无妙手》内容简介:本书是陆宝投资CEO刘红女士多年来写的随笔文章合集,包括投资、读书、人生等多方面的感悟。作者以其丰富的
《孙犁散文》内容简介:本书邀请孙犁研究专家重新编选的全新版本,包括《童年漫忆》《父亲的记忆》《母亲的记忆》《亡人逸事》《报
《工业4.0背景下的两岸产业合作》内容简介:本书通过大量一手资料及数据,以工业4.0为背景探讨两岸产业合作,构建两岸产业合作研究