Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:* Collaborative filtering techniques that enable online retailers to recommend products or media * Methods of clustering to detect groups of similar items in a large dataset * Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm * Optimization algorithms that search millions of possible solutions to a problem and choose the best one * Bayesian filtering, used in spam filters for classifying documents based on word types and other features * Using decision trees not only to make predictions, but to model the way decisions are made * Predicting numerical values rather than classifications to build price models * Support vector machines to match people in online dating sites * Non-negative matrix factorization to find the independent features in a dataset * Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a gameEach chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today.If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect
Toby Segaran works as a Data Magnate at Metaweb Technologies. Prior to working at Metaweb, he started a biotech software company called Incellico which was later acquired by Genstruct. His book, "Programming Collective Intelligence" has been the best-selling AI book on Amazon for several months. He is the recipient of a National Interest Waiver for "People of Exceptional Abilit...
(展开全部)
Next,getalistofrandompeopletomakeupthedataset.Fortunately,HotorNotprovidesanAPIcallthatreturnsalistofpeoplewithspecifiedcriteria.Inthisexam-ple,theonlycriteriawillbethatthepeoplehave“meetme”profiles,sinceonlyfromtheseprofilescanyougetotherinformationlikelocationandinterests.Addthisfunctiontohotornot.py:
——引自第162页
WhatDoesThisHavetoDowiththeArticlesMatrix?Sofar,whatyouhaveisamatrixofarticleswithwordcounts.Thegoalistofactorizethismatrix,whichmeansfindingtwosmallermatricesthatcanbemultipliedtogethertoreconstructthisone.Thetwosmallermatricesare:ThefeaturesmatrixThismatrixhasarowforeachfeatureandacolumnforeachword.Thevaluesindicatehowimportantawordistoafeature.Eachfeatureshouldrepresentathemethatemergedfromasetofarticles,soyoumightexpectanarticleaboutanewTVshowtohaveahighweightfortheword“television.”TheweightsmatrixThismatrixmapsthefeaturestothearticlesmatrix.Eachrowisanarticleandeachcolumnisafeature.Thevaluesstatehowmucheachfeatureappliestoeacharticl...
——引自第234页
《灵性开悟三部曲》作者
克里斯•格拉本斯坦,美国著名作家,《纽约时报》畅销书作家,曾多次荣获国际大奖,包括安东尼奖、阿加莎最佳青少年推理小说奖等。他对故事有天然的兴趣,上初中时他的老师...
1952年毕业于北京大学,现任西安电子科技大学教授、博士生导师、中国通信学会理事、中国电子学会学术工作委员会委员。先后被评选为中国通信学会会士、中国电子学会会士...
米哈伊尔•布尔加科夫被公认为魔幻现实主义的鼻祖,也是20世纪俄罗斯文坛的经典作家之一。但与其代表作《大师与玛格丽特》相比,他的名字对读者来说却有一点陌生。在布尔...
作者:丸山かぐね原本筆名為「むちむちぷりりん」。在小說連載網站「小説家になろう」的作品「OVERLORD」大受讀者喜愛,點擊數突破千萬。2012以本作出道。插畫...
铁路机车概要 内容简介 《铁路机车概要》(交一直流传动内燃、电力机车及液力传动内燃机车)由电力机车和内燃机车两部分组成。电力机车部分包括国产及进口交一直流传动电...
【编辑推荐】·1. 知道为什么会生病,比治疗症状更重要。2. 一个不能解释疾病的医生,不可能从根本上治愈疾病。3. 构建医学哲学认知体系,重新定义疾病和生命!4...
支倉凍砂(Isuna Hasekura)1982年12月27日出生。就算在大學攻讀物理,也想不透為何無法以球面調和函數計算出正確稅金申報金額,天天感嘆世間無常。...
裁量正义 内容简介 简介只要公职人员权力的实际界限允许其在可能的作为或不作为方案中自由做出选择,那么他就拥有裁量。本书考察的核心问题是如何保证在法律终止的情形下...
精彩摘录Springflowers,hethoughtashereachedtheelevator.Littleones;theyprobablygrowclo...
作者简介凌波,北京著名私募基金投资主管,拥有15年操盘经验,投资品种包括股票、期货、期权、基金等。2005年至今,股票投资年均
作品目录第1章 交互设计简介 11.1 引言 11.2 好的与不好的设计 21.2.1 设计什么 31.3 交互设计是什么 41.3.1 交互设计的组成 51
这是日本著名建筑师阿部勤的家。三十年前,他在房前种了七棵榉树。小小树苗,枝叶婆娑。走进一层客厅,靠墙一架钢琴,墙上挂着油画,皆为亲人朋友赠送的珍爱之物。椭圆形大...
星球研究所成立于 2016 年,是一家专业的地理科普传播机构,专注于探索世界,解构世界万物,用地理的视角来认知世界和人类自己,现已产出多篇现象级爆文。2018 ...
薇薇安·迈尔 (1926-2009),20世纪最神秘也最有意思的摄影师之一。目前,人们对薇薇安·迈尔仍所知甚少。据了解到的,1926年出生于纽约的她,生前是一位...
纯白喜欢玫瑰和牵牛花。平时听听粤语歌,以后想去阿根廷看看。雨水这样纷纷而落,那段故事我已经写过。
张儆是中国创业公司人力资源管理领域最早的实践者和研究者之一。张儆现任北京集合号科技有限公司CEO,公司致力于创业公司的人力资源管理综合服务,和国内数百家主流的创...
世界一流大作家专门为孩子创作插画名家精心创作的珍藏插图本引领孩子走进经典文学的殿堂世界文学大师卡尔维诺潜心编著比肩“格林童话”的经典故事文本让孩子沉浸在意大利明...
精彩摘录这些只想拥有尽可能大量追随者,因而需要许多人的人,与一个拥有一只装有少量酒的大酒桶,为了拥有更多的酒而用水去装满
作品目录年紀人們對女作家是有某一程度的憧憬的吧,文筆固然要好,相貌也需過得去,氣質尤其重要,行為亦得端莊,否則,就會惹來