Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily.
From the Back Cover:
Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.
Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.
Here is some of what you will find in thie book:
* When do I use formal parsers to process structured and semi-structured data? Page 257
* How do I work with full text indexing? Page 199
* What patterns in text can be expressed using regular expressions? Page 204
* How do I find a URL or an email address in text? Page 228
* How do I process a report with a concrete state machine? Page 274
* How do I parse, create, and manipulate internet formats? Page 345
* How do I handle lossless and lossy compression? Page 454
* How do I find codepoints in Unicode? Page 465
正则表达式是一种威力无比强大的武器,几乎在所有的程序设计语言里和计算机平台上都可以用它来完成各种复杂的文本处理工作。本书
《移花接木》内容简介:在创建创新型国家的大潮中,艺术摄影也是文化创新不可忽视的重要部分,鉴于摄影胶片暗房和数字暗房的手段特
《别让性格害了你》内容简介:本书教会你认识并掌握自己的性格,从而扬长避短,充分发挥自己的潜能,有利于高效开展工作、事业、经
《深入Linux设备驱动程序内核机制》内容简介:这是一本系统阐述Linux设备驱动程序技术内幕的专业书籍,它的侧重点不是讨论如何在
《大话数据结构》内容简介:本书为超级畅销书《大话设计模式》作者程杰潜心三年推出的扛鼎之作!以一个计算机教师教学为场景,讲解
《唐宋八大家文钞》内容简介:本书以“醇正”二字为标准。共十九卷,收“唐宋八大家”韩愈、柳宗元、欧阳修、苏洵、苏轼、苏辙、曾
UI设计入门一本就够 本书特色 本书紧扣用户界面设计趋势,主要讲解了什么是UI设计,UI设计的原则与理念,UI的文字、图片和图标设计,网页UI设计,移动端UI设...
,清华大学建筑学院教授,博士生导师,国家一级注册建筑师1978年考入清华大学建筑系,曾在日本学习和工作七年。长期致力于住宅精
《乘用车车身结构设计与轻量化》内容简介:本书是“汽车轻量化技术与应用系列丛书”中的一册,主要围绕车身结构设计与轻量化展开。
《视界·无界2.0:写给UI设计师的设计书(全彩)》内容简介:作为一名从事设计职业的设计师,或者正在通往成为设计师道路的朋友,都
《二手房装修改造常犯的110个错误》内容简介:每个做过二手房装修改造的人都有这样的经历,完成装修后,发现由于当初不了解关于装修
《把生活修炼到你喜欢的模样》内容简介:这一路行走,你得跟随着自己,为自己奔跑,抵制别人的驾驭。让自己的灵魂做主,即使在风烛
大数据丛书模式识别与分类导论/(美)杰夫.多尔蒂 本书特色 模式识别与分类的使用是当今许多自动化电子系统的基础。然而,尽管该领域已出版了许多名著,但该主题仍然非...
本书探讨了针对Ajax、JavaScript和基于表现状态传输(RepresentationalStateTransfer,REST)的Webservice,以...
《小创客学光环板》内容简介:本书主要介绍利用小巧的光环板及功能强大的慧编程平台实现智能可穿戴设备作品的设计与创作。在内容上
《详解AutoCAD 2022机械设计(第6版)》内容简介:本书结合典型机械设计案例,详细讲解AutoCAD 2022机械设计的知识要点,让读者在学
关于作者JasonMcC.Smith,2005年毕业于北卡罗莱纳州立大学教堂山分校,获计算机科学博士学位。该校也是元素模式的诞生地,元素模
本书通过分析ReactOS的源代码介绍了Windows内核各个方面的结构、功能、算法与具体实现。全书从“内存管理”、“进程”、“进程间
《人生的底气2》内容简介:不管是古代的孟子告诫国君如何治理国家,还是今天的我们思考如何经营自己的人生,底层逻辑都是一致的——
《每天5分钟玩转OpenStack》内容简介:本书是一本OpenStack的教程和参考。读者在学习的过程中,可以跟着教程进行操作,在实践中掌握