Want to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it.Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains:* Collaborative filtering techniques that enable online retailers to recommend products or media * Methods of clustering to detect groups of similar items in a large dataset * Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm * Optimization algorithms that search millions of possible solutions to a problem and choose the best one * Bayesian filtering, used in spam filters for classifying documents based on word types and other features * Using decision trees not only to make predictions, but to model the way decisions are made * Predicting numerical values rather than classifications to build price models * Support vector machines to match people in online dating sites * Non-negative matrix factorization to find the independent features in a dataset * Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a gameEach chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details."-- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today.If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect
Toby Segaran works as a Data Magnate at Metaweb Technologies. Prior to working at Metaweb, he started a biotech software company called Incellico which was later acquired by Genstruct. His book, "Programming Collective Intelligence" has been the best-selling AI book on Amazon for several months. He is the recipient of a National Interest Waiver for "People of Exceptional Abilit...
(展开全部)
Next,getalistofrandompeopletomakeupthedataset.Fortunately,HotorNotprovidesanAPIcallthatreturnsalistofpeoplewithspecifiedcriteria.Inthisexam-ple,theonlycriteriawillbethatthepeoplehave“meetme”profiles,sinceonlyfromtheseprofilescanyougetotherinformationlikelocationandinterests.Addthisfunctiontohotornot.py:
——引自第162页
WhatDoesThisHavetoDowiththeArticlesMatrix?Sofar,whatyouhaveisamatrixofarticleswithwordcounts.Thegoalistofactorizethismatrix,whichmeansfindingtwosmallermatricesthatcanbemultipliedtogethertoreconstructthisone.Thetwosmallermatricesare:ThefeaturesmatrixThismatrixhasarowforeachfeatureandacolumnforeachword.Thevaluesindicatehowimportantawordistoafeature.Eachfeatureshouldrepresentathemethatemergedfromasetofarticles,soyoumightexpectanarticleaboutanewTVshowtohaveahighweightfortheword“television.”TheweightsmatrixThismatrixmapsthefeaturestothearticlesmatrix.Eachrowisanarticleandeachcolumnisafeature.Thevaluesstatehowmucheachfeatureappliestoeacharticl...
——引自第234页
弗兰克·巴约尔,生于1961年,德国历史学家,目前任职于汉堡大学历史系、慕尼黑—柏林当代史研究所,主要研究方向为德国近现代史,尤其是第三帝国与大屠杀的历史。其他...
奈奈,原名曾奈,1990年2月4号出生于云南西南部的普洱市,典型的水瓶座女生,现就读于南方某大学。从小喜欢天马行空的幻想,热爱三毛、亦舒、李碧华、村上春树的所有...
张佳玮,生于无锡,后居上海,现生活在巴黎。自由撰稿人。知乎、豆瓣人气作家。主要作品有《无非求碗热汤喝》《世界上有趣的事太多》《伦勃朗1642》《莫奈和他的眼睛》...
中国审判案例要览(1995年综合本) 本书特色 ★ 16开精装,中国人民大学出版社1996年出版★ 在本书的编写过程中,对案件事实、审判过程、裁判理由、处理结果...
作者简介金淑姬(Suki Kim) 韓裔美籍作家,生於首爾,現居紐約。古根漢(Guggenheim)、傅爾布萊特(Fulbright)和開放社會(
电工学-电工技术全程导学及习题全解(第六版) 内容简介 本书是根据高等教育出版社出版的,秦曾煌主编的《电工学(电工技术)》(第六版)一书配套的学习辅导和习题解答...
知名耽美写手,曾用笔名老庄墨韩
针灸学 本书特色 王华、杜元灏主编的《针灸学》主要内容:绪论概述了针灸学的概念、范畴和主要内容,阐述了针灸学的发展简史、对外传播和国际交流,归纳了针灸学的学科特...
★电影奥本海默导演诺兰的物理导师卡洛·罗韦利继《七堂极简物理课》后,再掀科普阅读新热潮!本书入选刀锋图书奖夏季榜单。陈嘉映、周濂推荐!卡洛被媒体评为“100位最...
郭初阳,男,杭州人,毕业于杭州师范大学,被业界称为中学语文界新生代领军教师,当今语文教育界的新锐。杭州越读馆语文教学负责人。曾就职于杭州外国语学校。认为语文课堂...
叶嘉莹教授出生于1924年,1945年毕业于北京辅仁大学国文系,自1954年开始,在台湾大学任教15年,其间先后被聘为台湾大学专任教授、台湾淡江大学及辅仁大学兼...
肿瘤影象诊断学 内容简介 全书共分11篇,按解剖系统分类,如中枢神经系统肿瘤、胸部肿瘤等;除分别讲述各类肿瘤的影象学检查、诊断以及临床表现外,对肿瘤的生物学行为...
作者原名:刘欣,网名刘大猫。是一名务实的90后的连环创业者。草根站长出身的他,抓住了数次互联网的流量红利以及变现红利。微信朋友圈中很多极其热门的流量应用均出自于...
注册环保工程师基础考试应试一本通 本书特色 为配合全国注册环保工程师资格考试,也为有效指导考生复习、应考所组织编写的本辅导教材,以中华人民共和国建设部公布的注册...
國藝會馬華長篇小說專案補助,聯合報文學獎、時報文學獎、花蹤文學獎得主黎紫書全新創作!以人性鋪展馬來小城的俗世河流,站在晦澀幽暗的路頭,每個人都在尋找未知的出口!...
作品目录《中国大乘佛学 上》目录: 弁言 一 中国大乘佛学——引论 二 中国大乘佛学前奏——六家七宗 三 僧肇三论 四 道生的“佛
竹内好:现代日本杰出的思想家。他在1943年与武田泰淳等人发起组织中国文学研究会,并出版《中国文学月报》直至1943年研究会解散为止。在六十年代安保运动前后成为...
"NewFormations"isajournalofculturaldebate,historyandtheory.Itbringsnewandchallen...
呂大樂,香港教育學院亞洲及政策研究學系香港研究講座教授,曾任教於香港大學及香港中文大學社會學系,長期從事香港社會研究。近著包括《中產心事》、《凝聚力量:香港非政...
著者:乔尔•S.米格代尔,美国政治学家,华盛顿大学亨利·杰克逊国际关系学院荣休教授。1972年获哈佛大学政治学博士学位,早年师从萨缪尔·亨廷顿。曾先后任教于特拉...