Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:* Collaborative filtering techniques that enable online retailers to recommend products or media * Methods of clustering to detect groups of similar items in a large dataset * Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm * Optimization algorithms that search millions of possible solutions to a problem and choose the best one * Bayesian filtering, used in spam filters for classifying documents based on word types and other features * Using decision trees not only to make predictions, but to model the way decisions are made * Predicting numerical values rather than classifications to build price models * Support vector machines to match people in online dating sites * Non-negative matrix factorization to find the independent features in a dataset * Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a gameEach chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today.If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect
Toby Segaran works as a Data Magnate at Metaweb Technologies. Prior to working at Metaweb, he started a biotech software company called Incellico which was later acquired by Genstruct. His book, "Programming Collective Intelligence" has been the best-selling AI book on Amazon for several months. He is the recipient of a National Interest Waiver for "People of Exceptional Abilit...
(展开全部)
Next,getalistofrandompeopletomakeupthedataset.Fortunately,HotorNotprovidesanAPIcallthatreturnsalistofpeoplewithspecifiedcriteria.Inthisexam-ple,theonlycriteriawillbethatthepeoplehave“meetme”profiles,sinceonlyfromtheseprofilescanyougetotherinformationlikelocationandinterests.Addthisfunctiontohotornot.py:
——引自第162页
WhatDoesThisHavetoDowiththeArticlesMatrix?Sofar,whatyouhaveisamatrixofarticleswithwordcounts.Thegoalistofactorizethismatrix,whichmeansfindingtwosmallermatricesthatcanbemultipliedtogethertoreconstructthisone.Thetwosmallermatricesare:ThefeaturesmatrixThismatrixhasarowforeachfeatureandacolumnforeachword.Thevaluesindicatehowimportantawordistoafeature.Eachfeatureshouldrepresentathemethatemergedfromasetofarticles,soyoumightexpectanarticleaboutanewTVshowtohaveahighweightfortheword“television.”TheweightsmatrixThismatrixmapsthefeaturestothearticlesmatrix.Eachrowisanarticleandeachcolumnisafeature.Thevaluesstatehowmucheachfeatureappliestoeacharticl...
——引自第234页
新针灸学 本书特色 《新针灸学》总结、整理了祖国传统针灸医学知识,是首部融入神经学知识的中医学著作,也是为数不多的得到朱德、董必武等老一辈革命家题词、作序的中...
歐美黃金時期推理經典,82年首次中譯出版!偵探小說大師尼可拉斯.布雷克最受讚譽推理小說!影響《寄生上流》導演奉俊昊最深之作品我要殺死一個人,我不知道他的名字,不...
★入选BookAuthority(全球排名首位的书评网站)推荐的有史以来100部超受欢迎的惊悚小说★入选BookBub(美国图书网站)评选的2010年代15部惊...
济南古建筑轶事 本书特色 晋葆纯、李亚菲著的《济南古建筑轶事》立足于山东济南这座半殖民地城市,展望清朝末年以来的历史春秋,对《中德胶澳租借条约》的前后往事进行了...
《结,起点亦是终点:雅鲁藏布大峡谷穿越纪实》内容简介:从空中俯瞰,雅鲁藏布江就像一条巨大的绳结,束缚着雄伟的南迦巴瓦蜂。这是一条怎样的绳结?长达五百公里的绳结勒...
内容简介:从贝多芬与莫扎特,到勃拉姆斯,再到勋伯格与卡特查尔斯•罗森以热情、犀利乃至尖刻的洞见,评判与解析音乐的过去与现在“在讲述聆听音乐的困难与最终得到的享受...
[内容简介]著名趋势专家、全球最具影响力的50大思想家、畅销书《驱动力》《全新思维》作者丹尼尔•平克最新力作《全新销售》重磅来袭。平克在书中以其一贯的商业敏锐性...
世界著名的佛教大师、宗教界精神领袖、伟大的心灵导师、当今社会最具宗教影响力的僧人之一,以禅师、诗人、人道主义者闻名。 1926年出生于越南,16岁出家,后创立青...
外科方外奇方-中医珍本文库影印点校·珍藏版 内容简介 《外科方外奇方》四卷,清·凌奂撰,原名维正,宇晓五,一字晓邬,晚号折肱老人。归安人。 本书凡二十一门,...
蔡澜,1941年生于新加坡,祖籍广东潮州,现居中国香港。知名作家、生活家、美食家、电影人、主持人。与金庸、倪匡、黄霑并称“香港四大才子”。《新周刊》年度生活家,...
来自世界的40位现代文学巨匠——约翰·厄普代克、尼尔·盖曼、迈克尔·坎宁安、伊藤比吕美、安房直子、乔伊斯·卡罗尔·欧茨等,以那些经典的故事和童话为母题,创作出4...
性病临床手册 本书特色 本书详细介绍了我国重点防治的8种可通过性接触传播的疾病,以及与性病处理相关的临床病征的病因、流行病学、临床表现、诊断和鉴别诊断、治疗和预...
本书是计算广告领域经典之作,是该领域首本图书!本书第1版和第2版出版以后,获得的业界反响超乎了我们的想象。实际上,本书已经广泛被互联网公司采用,作为商业化相关部...
《服务设计》内容简介:服务设计是什么,发展状况如何,能做什么,涉及哪些学科知识,如何开展服务设计?这是我们要回答和研究的主要问题。《服务设计》从工业设计出发,系...
病理学正科讲义-南京国医传习所中医讲义 本书特色 《病理学正科讲义(民国中医药教材)》一书,系民国时期南京国医传习所教务主任、著名中医理论学家郭受天教...
皮埃尔·布迪厄(Pierre Bourdleu,1930-2002),著名社会学家,法兰西学院院士。主要著作有《区隔趣味判断的社会批判》、《学术人》、《实践理论...
个人简介: • 叶茂中营销策划机构是一家由中国大陆、中国香港、中国台湾及美国、韩国人才构成的创群 • 200名营销策划创作人员中国极具实力和影响力的营销策划团队...
作品目录上篇 走进交易系统/1第一章 认识交易系统/3第一节 交易系统与系统交易/3一、交易系统产生的背景/3二、什么是交易系统/4
[韩]徐贤出生于1982年,毕业于弘益大学绘画专业,并在韩国插画学校(HILLS)学习绘图。绘图作品有《我把月亮喝掉啦》《神奇的钥匙圈》等,图文作品有《眼泪的海...
《模式识别与智能计算―MATLAB技术实现(第3版)》广泛吸取统计学、神经网络、数据挖掘、机器学习、人工智能、群智能计算等学科的先进思想和理论,将其应用到模式识...