Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:* Collaborative filtering techniques that enable online retailers to recommend products or media * Methods of clustering to detect groups of similar items in a large dataset * Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm * Optimization algorithms that search millions of possible solutions to a problem and choose the best one * Bayesian filtering, used in spam filters for classifying documents based on word types and other features * Using decision trees not only to make predictions, but to model the way decisions are made * Predicting numerical values rather than classifications to build price models * Support vector machines to match people in online dating sites * Non-negative matrix factorization to find the independent features in a dataset * Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a gameEach chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today.If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect
Toby Segaran works as a Data Magnate at Metaweb Technologies. Prior to working at Metaweb, he started a biotech software company called Incellico which was later acquired by Genstruct. His book, "Programming Collective Intelligence" has been the best-selling AI book on Amazon for several months. He is the recipient of a National Interest Waiver for "People of Exceptional Abilit...
(展开全部)
Next,getalistofrandompeopletomakeupthedataset.Fortunately,HotorNotprovidesanAPIcallthatreturnsalistofpeoplewithspecifiedcriteria.Inthisexam-ple,theonlycriteriawillbethatthepeoplehave“meetme”profiles,sinceonlyfromtheseprofilescanyougetotherinformationlikelocationandinterests.Addthisfunctiontohotornot.py:
——引自第162页
WhatDoesThisHavetoDowiththeArticlesMatrix?Sofar,whatyouhaveisamatrixofarticleswithwordcounts.Thegoalistofactorizethismatrix,whichmeansfindingtwosmallermatricesthatcanbemultipliedtogethertoreconstructthisone.Thetwosmallermatricesare:ThefeaturesmatrixThismatrixhasarowforeachfeatureandacolumnforeachword.Thevaluesindicatehowimportantawordistoafeature.Eachfeatureshouldrepresentathemethatemergedfromasetofarticles,soyoumightexpectanarticleaboutanewTVshowtohaveahighweightfortheword“television.”TheweightsmatrixThismatrixmapsthefeaturestothearticlesmatrix.Eachrowisanarticleandeachcolumnisafeature.Thevaluesstatehowmucheachfeatureappliestoeacharticl...
——引自第234页
【内容简介】我们通过身体和心灵,透过接触到的事物了解自己和这个世界。人慢慢长大,喜欢略过本质看现象,一日茶,一夜酒,一部毫不掩饰的小说,一次没有目的的见面,一群...
《DRAGON BALL》译名《龙珠》(又名:七龙珠)是日本著名漫画家鸟山明的得意作品,1984年登场,1992年又推出『龙珠』续集。这部长篇巨作在『少年跳跃』...
《你的误区》是一本咨询心理学的大众读物。戴埃在这本书里,把人们日常生活中所暴露的性格缺陷(如自暴自弃、崇拜、依赖)和不良情绪(如悔恨、忧虑、抱怨、愤怒)进行逐条...
杜修贤,别名杜山,1926年出生,陕西米脂人。1940年参加革命,1944年在延安八路军电影团学习摄影,从师吴印咸先生。此后历任八路军关中前线野战军政治部摄影员...
马丁・伦恩是一位历史学家,是公认的研究大卫王血统以及其他宗教史问题的专家。他还是1408年由匈牙利国王西格斯蒙德创立的秘密社团--龙社(Dragon Socie...
几米,绘本作家。1998年开始创作。作品风靡两岸三地,美、法、德、希腊、韩、日、泰等国皆有译本。学界和媒体多次以「几米现象」为主题分析讨论。Studio Voi...
JoshMcCainthinksheisabouttoinherithisfathersfortune,butinsteadofleavingbehindeas...
图解服装概论 本书特色 《图解服装概论》(英汉对照)全书包括服装基本特性、服装业的基本构成、服装发展史、服装材料、色彩与服装效果图、服装款式分类与细部、服装系列...
现代汉语规范词典-第2版-[买赠套装] 本书特色 ★ 《现代汉语规范词典》自2004年出版以来,受到广大读者、业内同行的专家学者的关注与好评★ 书前附有吕叔湘先...
30岁之前,我放浪的人生仿佛一直在原地空转。梦中祖父的一句 “你去开间面包店吧!”猛地惊醒了我,以此为契机我走进了发酵的世界。在田间,我们描绘着哪怕再微不足道也...
冰浆技术及应用 本书特色 第0章绪论0.1人与自然0.2节能减排刻不容缓O.3蓄冷技术0.3.1蓄冷技术发展0.3.2蓄冷技术0.4食品保鲜0.5冰浆的概念参考...
在线阅读本书 Foreword by Stephen R. Covey, Author ofThe 7 Habits of Highly Effective P...
招标投标应用文书写作 本书特色 《招标投标应用文写作》选题创新,内容务实,理论系统,实践具体,方法正确,应用广泛,能够有效地帮助读者了解招标投标操作实务的专业知...
香龙血树, 网络作家。2007年开始连载小说《终极往事》。 因其作品集洗练冷硬和隽永深情一身,尤喜在激烈情节中表述命运等主题,与时下耽美风格迥异 ,而被称为“树...
精彩摘录我接受这个职责,我向阿波罗,预言之神起誓。我睁开未来的眼睛,拥抱过去。我接受德尔菲——神之代言,谜之宣告者,命运
《谢谢你折磨我:改变心路,才能改变出路》内容简介:你是否注意到,同一种人际关系问题会出现在不同情景下的可能性有多大——当你
戴维•W. 比安奇(David W. Bianchi)律师、投资者。美国新一代测序技术公司GnuBio的原始投资者、董事会成员(该公司以1.1亿美元被伯乐公司收...
ThesuccessofCreatingPowerfulBrandshasledtothepreparationofthisfullyrevisedandupd...
諾貝爾文學獎得主赫曼.赫塞 自東方之行歸來歷時十餘年隨著創作《浪流者之歌》期間經歷的創作瓶頸、精神壓力不斷寫下他腦中從不停止的思辯,不斷向印度探問、冀求回歸生命...
范怨武已出版作品《做自己的中医》。毕业于广州中医药大学。因经常结合门诊经验和先贤的著作,创作大量通俗易懂的中医科普文章,被读者称为“总能把复杂的中医知识讲得通俗...