Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:* Collaborative filtering techniques that enable online retailers to recommend products or media * Methods of clustering to detect groups of similar items in a large dataset * Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm * Optimization algorithms that search millions of possible solutions to a problem and choose the best one * Bayesian filtering, used in spam filters for classifying documents based on word types and other features * Using decision trees not only to make predictions, but to model the way decisions are made * Predicting numerical values rather than classifications to build price models * Support vector machines to match people in online dating sites * Non-negative matrix factorization to find the independent features in a dataset * Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a gameEach chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today.If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect
Toby Segaran works as a Data Magnate at Metaweb Technologies. Prior to working at Metaweb, he started a biotech software company called Incellico which was later acquired by Genstruct. His book, "Programming Collective Intelligence" has been the best-selling AI book on Amazon for several months. He is the recipient of a National Interest Waiver for "People of Exceptional Abilit...
(展开全部)
Next,getalistofrandompeopletomakeupthedataset.Fortunately,HotorNotprovidesanAPIcallthatreturnsalistofpeoplewithspecifiedcriteria.Inthisexam-ple,theonlycriteriawillbethatthepeoplehave“meetme”profiles,sinceonlyfromtheseprofilescanyougetotherinformationlikelocationandinterests.Addthisfunctiontohotornot.py:
——引自第162页
WhatDoesThisHavetoDowiththeArticlesMatrix?Sofar,whatyouhaveisamatrixofarticleswithwordcounts.Thegoalistofactorizethismatrix,whichmeansfindingtwosmallermatricesthatcanbemultipliedtogethertoreconstructthisone.Thetwosmallermatricesare:ThefeaturesmatrixThismatrixhasarowforeachfeatureandacolumnforeachword.Thevaluesindicatehowimportantawordistoafeature.Eachfeatureshouldrepresentathemethatemergedfromasetofarticles,soyoumightexpectanarticleaboutanewTVshowtohaveahighweightfortheword“television.”TheweightsmatrixThismatrixmapsthefeaturestothearticlesmatrix.Eachrowisanarticleandeachcolumnisafeature.Thevaluesstatehowmucheachfeatureappliestoeacharticl...
——引自第234页
当代内科名方验方大全 本书特色 由刘建平主编的《当代内科名方验方大全》以现代医学病名为纲,记述中医治疗方法,以病统方,意在切合临床实际。中西医病名概念完全一致者...
VictoriaJamesisathomeawaitingacallfromDowningStreet-desperateforthetelephonetori...
STUDIOWORK(スタジオワーク)这是一个有趣的工作室,工作室的成员们常年对不断变化的环境和风景充满兴趣,始终怀抱着问题意识。他们非常重视田野调查,通过深入...
杨怡爽,笔名青泥,现居北京。不合格的脑力工作者一名,爱好是阅读和写作。喜爱所有天花乱坠活色生香的东西,最擅长身未动心已远式的发呆,经常一头浸浮到什么东西里就爬不...
《神经网络与机器学习(英文版第3版)》的可读性非常强,作者举重若轻地对神经网络的基本模型和主要学习理论进行了深入探讨和分析,通过大量的试验报告、例题和习题来帮助...
《佛学与儒学(修订版)》内容简介:本书深入研究了东方两大文化,即印度佛教与中国儒学在古代中国相互排斥、相互吸收的过程及其结
迈普里斯·安徒生毕业于丹麦奥尔堡大学,并留校任教。叶世邦·杜拉航从事绘画创作多年,尝试过多种艺术形式,如卡通、漫画、教育、海报及报纸杂志插画等,以幽默的画风及手...
朱彦夫,1933年7月出生于山东省淄博市沂源县,14岁参军,参加过济南、渡江、上海等上百场战斗。1950年12月参加抗美援朝战争,在一次战斗中身负重伤,成为特等...
《坂本龙马(第3部)》内容简介:德川幕府末期,独裁专制,闭关锁国,民不聊生。坂本龙马乃日本高知乡下草民,却独身促成“萨长联盟”,一致倒幕;创建日本第一家公司,奠...
史学问道三十载(代序)(i)第一章史学原理(1)第一节什么是史学理论(1)第二节史学理论的价值(4)第三节中国史学理论传统与考古学理论现状(9)第四节历史与历史...
中华医药,第3辑 本书特色 中华医药系列丛书。中华医药,第3辑 内容简介 《中华医药》是中央电视台惟一向海内外观众传播中国中医药文化的电视栏目,这个栏目的开办是...
这是一部大胆且具有挑战性的社会学著作。我们正生活在人类历史上最繁荣的时代:物质资源丰富、科技迅速发展、文化形态多种多样,无论是财富还是舒适度都是过去无法比拟的。...
临床随记-中西医结合集锦 本书特色 本书是笔者在乡村从医多年的临床实践随笔记摘,共分三篇。**篇是单方、验方,主要介绍对内、外、妇、幼、五官、皮肤、中毒等常见病...
电路设计与制板PROTEL99SE基础教程 本书特色 本书从初学者学习和认知电路板设计的特点出发,首先介绍电路板设计的基础知识,然后通过精心选择的实例介绍原理图...
钱文忠,1966年6月出生上海,籍贯江苏无锡。1984年,考入北京大学东方语言文学系梵文巴利文专业,师从季羡林先生和金克木先生(高中就读于上海华师大一附中。中学...
亦舒,言情文学作家。原名倪亦舒,兄长是香港作家倪匡。1946年生于上海,祖籍浙江镇海,五岁时定居香港。中学毕业后,曾在《明报》任职记者,及担任电影杂志采访和编辑...
玻璃工业大 气污染源控制手册 内容简介 近年来,东南亚地区的发展中国家;因其城市化、工业化的加速发展大气污染问题越来越严重,而且这些国家针对污染物发生源的削减技...
“《知识机器》是继库恩的《科学革命的结构》(1970)之后最重要的科哲成就。”——刘闯(复旦大学、中国科学院哲学研究所)-关注科学变革人类社会的强大力量当代哲学...
作品目录第一章 佛教的基础知识第二章 佛教的历史第三章 佛教的宗派第四章 佛教的重要典籍第五章 佛教的诸尊第六章 佛教的真言与
“故事才是销售员绝佳的销售武器”,美国营销大师保罗•史密斯通过20年销售培训经验精华、50个故事范本,用8步法教你讲出有销售力的故事,学会用好的故事让顾客卸下心...