Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:* Collaborative filtering techniques that enable online retailers to recommend products or media * Methods of clustering to detect groups of similar items in a large dataset * Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm * Optimization algorithms that search millions of possible solutions to a problem and choose the best one * Bayesian filtering, used in spam filters for classifying documents based on word types and other features * Using decision trees not only to make predictions, but to model the way decisions are made * Predicting numerical values rather than classifications to build price models * Support vector machines to match people in online dating sites * Non-negative matrix factorization to find the independent features in a dataset * Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a gameEach chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today.If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect
Toby Segaran works as a Data Magnate at Metaweb Technologies. Prior to working at Metaweb, he started a biotech software company called Incellico which was later acquired by Genstruct. His book, "Programming Collective Intelligence" has been the best-selling AI book on Amazon for several months. He is the recipient of a National Interest Waiver for "People of Exceptional Abilit...
(展开全部)
Next,getalistofrandompeopletomakeupthedataset.Fortunately,HotorNotprovidesanAPIcallthatreturnsalistofpeoplewithspecifiedcriteria.Inthisexam-ple,theonlycriteriawillbethatthepeoplehave“meetme”profiles,sinceonlyfromtheseprofilescanyougetotherinformationlikelocationandinterests.Addthisfunctiontohotornot.py:
——引自第162页
WhatDoesThisHavetoDowiththeArticlesMatrix?Sofar,whatyouhaveisamatrixofarticleswithwordcounts.Thegoalistofactorizethismatrix,whichmeansfindingtwosmallermatricesthatcanbemultipliedtogethertoreconstructthisone.Thetwosmallermatricesare:ThefeaturesmatrixThismatrixhasarowforeachfeatureandacolumnforeachword.Thevaluesindicatehowimportantawordistoafeature.Eachfeatureshouldrepresentathemethatemergedfromasetofarticles,soyoumightexpectanarticleaboutanewTVshowtohaveahighweightfortheword“television.”TheweightsmatrixThismatrixmapsthefeaturestothearticlesmatrix.Eachrowisanarticleandeachcolumnisafeature.Thevaluesstatehowmucheachfeatureappliestoeacharticl...
——引自第234页
德宝法师(Bhante Henepola Gunaratana),当代内观大师,北美地位最高的斯里兰卡上座部长老。12岁在斯里兰卡出家,后获美国大学(Ameri...
金融期权 本书特色 《金融期权》是《金融衍生品投资者教育丛书》中的一本。全书共分十章,包括期权基础、金融期权市场及交易、期权合约和基本要素、期权价格及其影响因素...
In Making a Metaverse That Matters: From Snow Crash & Second Life to A Virtual W...
风湿病影像学 内容简介 《风湿病影像学》是在总结上海交通大学附属仁济医院放射科多年来对风湿病影像诊断研究的基础上整理形成的。本书简要介绍了各种风湿病常用影像检查...
幾米,本名廖福彬,是台湾著名绘本画家,笔名来自其英文名Jimmy。幾米毕业于中国文化大学美术系,曾在广告公司工作十二年,后来为报纸、杂志等各种出版品画插画。19...
佐佐,「有读故事」APP签约作者,宁波大学本科毕业,是一名旅行杂记与民间传说的爱好者。曾在巴黎某旅行社工作半年,萌生了写一部奇幻与冒险之书的念头。喜爱研究世界观...
三天两觉:起点中文网大神级作家,全息光脑游戏小说代表名家。文字风趣,涉猎广泛。主要作品有《鬼喊抓鬼》《贩罪》《惊悚乐园》。写作之余时常在网上直播打游戏,以超淡定...
★东野圭吾青春告别之作,用悬疑写出青春的迷茫与成长。★凶手落网了,但故事还远远没有落幕——《学生街的日子》具备东野圭吾小说特有的魅力,情节充满悬念,不断反转,出...
森下典子(1956— ),日本散文家。曾为《朝日周刊》专栏作者,擅长朴实无华,轻松明快的写作风格。2002年出版茶道随想《日日是好日——茶教给我的15种幸福》,...
【专业全面,打破偏见】拥有傲人发表量和超高引用率的“学霸”作者,为读者介绍、剖析几乎全部的饮食营养成分及常见的都市流行饮食方案,一书在手,熟悉饮食的方方面面,并...
本书提供了有关编程的一种与众不同的理解。其主旨是,实际的编程也应像其他科学和工程领域一样基于坚实的数学基础。本书展示了在实际编程语言(如C++)中实现的算法如何...
马伦特出生于1965年,孩提时代的他就对家乡周围山中的鸟类和蝴蝶表现出极大的兴趣。后来,他不断地磨砺他的摄影技巧,用他的镜头记录下自然的变化。他对雨林的兴趣始于...
★超人气写手、吃可爱多长大的@患者阿离暖萌作品集,76个脑洞大开的小甜饼,写给每个人的温柔童话。★一本随时随地翻开都能让人会心一笑的书,一场充满甜味的暖心治愈之...
作品目录第1章 游戏策划概述1.1 什么是游戏策划1.2 游戏策划的任务1.3 游戏策划需具备的特质1.3.1 喜欢玩游戏1.3.2 丰富的想象力
《汪曾祺别集》总序别集,本来是汪曾祺为老师沈从文的一套书踅摸出的名字,如今用到了他的作品集上。这大概是老头儿生前没想到的。沈先生的夫人张兆和在《沈从文别集》总序...
“临川四梦”,又称“玉茗堂四梦”。是明代剧作家汤显祖的《牡丹亭》、《紫钗记》、《邯郸记》、《南柯记》四剧的合称。因四剧皆有梦境,固有临川四梦之说。临川四梦作为戏...
颜正华临证验案精选?颜正华临证验案精选?前言 医案,既是医生诊治疾病过程的记录,又是医学理论和实 践相结合的重要资料。它融辨证识病和理法方药为一体,真实 地体现...
简明中西医结合风湿病学 内容简介 本书的编写内容以先进、实用、简明(文字少而信息量大)为原则,分为总论、各论与附篇3部分。总论包括风湿病概述、风湿病的发病机制、...
古生物学与地史学实验实习指导书 本书特色 《高等院校石油天然气类规划教材:古生物学与地史学实验实习指导书》共分为两大部分——古生物学实验与地史学实习。其中,古生...
室内设计哲学 B905 内容简介 本书探索、阐释了室内设计的基本原则体系。作者并非简单强调是追本溯源,突出了设计中的长久弥新,声势不坠。对专业人士而言,阅读《室...