| Overview
Want to tap the power behind search rankings, product
recommendations, social bookmarking, and online matchmaking? This
fascinating book demonstrates how you can build Web 2.0
applications to mine the enormous amount of data created by people
on the Internet. With the sophisticated algorithms in this book,
you can write smart programs to access interesting datasets from
other web sites, collect data from users of your own applications,
and analyze and understand the data once you've found it.
Programming Collective Intelligence takes you into the
world of machine learning and statistics, and explains how to draw
conclusions about user experience, marketing, personal tastes, and
human behavior in general -- all from information that you and
others collect every day. Each algorithm is described clearly and
concisely with code that can immediately be used on your web site,
blog, Wiki, or specialized application. This book explains:
Collaborative filtering techniques that enable online retailers
to recommend products or media Methods of clustering to detect groups of similar items in a
large dataset Search engine features -- crawlers, indexers, query engines,
and the PageRank algorithm Optimization algorithms that search millions of possible
solutions to a problem and choose the best one Bayesian filtering, used in spam filters for classifying
documents based on word types and other features Using decision trees not only to make predictions, but to model
the way decisions are made Predicting numerical values rather than classifications to
build price models Support vector machines to match people in online dating
sites Non-negative matrix factorization to find the independent
features in a dataset Evolving intelligence for problem solving -- how a computer
develops its skill by improving its own code the more it plays a
game
Each chapter includes exercises for extending the algorithms to
make them more powerful. Go beyond simple database-backed
applications and put the wealth of Internet data to work for
you.
"Bravo! I cannot think of a better way for a developer to first
learn these algorithms and methods, nor can I think of a better way
for me (an old AI dog) to reinvigorate my knowledge of the
details."
-- Dan Russell, Google
"Toby's book does a great job of breaking down the complex subject
matter of machine-learning algorithms into practical,
easy-to-understand examples that can be directly applied to
analysis of social interaction across the Web today. If I had this
book two years ago, it would have saved precious time going down
some fruitless paths."
-- Tim Wolters, CTO, Collective Intellect
Editorial ReviewsProduct DescriptionWant to tap the power behind search rankings, product recommendations, social bookmarking, and online matchmaking? This fascinating book demonstrates how you can build Web 2.0 applications to mine the enormous amount of data created by people on the Internet. With the sophisticated algorithms in this book, you can write smart programs to access interesting datasets from other web sites, collect data from users of your own applications, and analyze and understand the data once you've found it. Programming Collective Intelligence takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general -- all from information that you and others collect every day. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. This book explains: Collaborative filtering techniques that enable online retailers to recommend products or media Methods of clustering to detect groups of similar items in a large dataset Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm Optimization algorithms that search millions of possible solutions to a problem and choose the best one Bayesian filtering, used in spam filters for classifying documents based on word types and other features Using decision trees not only to make predictions, but to model the way decisions are made Predicting numerical values rather than classifications to build price models Support vector machines to match people in online dating sites Non-negative matrix factorization to find the independent features in adataset Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game Each chapter includes exercises for extending the algorithms to make them more powerful. Go beyond simple database-backed applications and put the wealth of Internet data to work for you. "Bravo! I cannot think of a better way for a developer to first learn these algorithms and methods, nor can I think of a better way for me (an old AI dog) to reinvigorate my knowledge of the details." -- Dan Russell, Google "Toby's book does a great job of breaking down the complex subject matter of machine-learning algorithms into practical, easy-to-understand examples that can be directly applied to analysis of social interaction across the Web today. If I had this book two years ago, it would have saved precious time going down some fruitless paths." -- Tim Wolters, CTO, Collective Intellect |
Other Readers Also Read | Top Sellers in This Category | | | | |
Reader Reviews From Amazon (Ranked by 'Helpfulness') Average Customer Rating: based on 34 reviews. An Eye Openning Inspiring Book, 2008-08-25 Reviewer rating: I got more from this book than I have from any other book I read in the past couple of years!
It covers in a streamlined form a huge array of algorithms powering the contemporary web - from recomendation engines to a search engine that includes as one of its features the Google PageRank algorithm, to some quite recent AI innovations.
Just about the only area that was not covered was statistical machine translation. I wish he had done that, since that is my favourite subject.
It helps you see the world through the "Collective Intelligence opportunities" prism. | Wow, 2008-07-21 Reviewer rating: If you are interested in this topic, you should read this book. Disclaimer: I am new to the topic but appreciate when it is done well and need to understand how to implement it for my job. I was blown away by both the conceptual coverage and the implementation details. This book will allow you to cover the concepts on a first pass then come back and actually build the approaches you are most interested in. Even if you ultimately use a vendor product for recommendation, you will understand the algorithms being used and their proper application and where they are deficient. | good but no great, 2008-06-12 Reviewer rating: most people have shared their thoughts on the good of this book. I like to point out some of the bad as I read through:
- first, too many typos - both the author and oreilly should do a better job on proof read the materials. the typos are so much that it can easily wreck otherwise good materials.
- second, arcane solution and coding style. Many first step to the solution of machine learning is to represent the problem at hand well. The author's brain apparently wired different from mine so the opinion is personal. For example: chapter 5 on "optimization for preference", he chose to represent a solution as vector form like [0,0,0,0,0,0,0,0,0,0], there is no way I can relate this solution to the real meaning (you want to allocate 10 students into 5 rooms each with two slots) - if there is an easy explanation, the book didn't say so.
thus the 3 star. I believe a second edition is warranted and should be much better.
just my 2c. | Great, simple presentation of some powerful techniques, 2008-06-10 Reviewer rating: Programming Collective Intelligence is a book about applying data mining techniques to analyse collections of data. There is submerged information in Ebay prices, in Facebook profile networks, in collections of movie reviews, in news sites, in the stockmarket; this book by Toby Segaran shows ways to extract, visualise, understand, and predict that information.
Each chapter explains and explores a different data mining algorithm, and builds up a working example in Python, while presenting different methods and parameters of the implementation. I hadn't really worked with Python before, but found the code easy to follow, and picked up some interesting Python idioms that I haven't seen in other languages before. Chapters end with a set of exercises to follow that build your understanding.
As you follow the examples you build up a reasonably generic code base that allows you to swap in and out different implementations, and reuse previous code to add to new applications.
The examples use live examples from the web: sites like Ebay, Facebook, and Yahoo Finance, and this makes the book more interesting and the results more visceral than some other books on the subject which use more contrived or obscure examples. Even though there is a strong web (or web 2.0) focus on the examples, the methods and the understanding is useful for a whole range of applications.
Some of the topics covered:
* Bayesian classifiers to detect spam, or to file news articles into site sections
* Hierarchical and k-means clustering to discover groups of similar items in massive sets
* Euclidiean distance, Pearson Correlation Coefficient, Tanimoto Coefficient: ways to measure the distance (or difference) between items
* Neural networks to predict user behaviour and improve search result ordering
* Optimisation methods like hill climbing, simulated annealing, and genetic algorithms
* Non-negative matrix factorization
* Support vector machines and kernel methods to go where linear regression can't
I found it exciting to read -- it's one of those books that give you a whole bunch of new ideas for things to build as you read it. The presentation is very good: no background is assumed, and it doesn't talk down to those more experienced.
Recommended. | Good intro to machine learning, 2008-06-04 Reviewer rating: Once I got past the initial shock of finding several glaring grammar and spelling errors in the introduction, I have been pleased with this purchase ever since.
The author gives a good overview the many different approaches to machine language (with great examples in Python). However, it's just that - an overview. While the explanations are very clear and the concepts are presented in a very accessible manner, I found myself having to look elsewhere for more detail on the various algorithms. Yes, with the level of understanding presented in this book you should be able to create functional code for your particular data set. However, I felt that to really get the best results from the algorithms I needed to study them a bit further in order to best apply them to my data.
As a recent CS graduate, I would certainly recommend this book to anyone looking for a basic understanding of machine learning and data mining techniques. |
Some information above was provided using data from Amazon.com. View at Amazon > |
| |
|
|