Ebook812 pages6 hours

Collective Intelligence in Action

Name: Collective Intelligence in Action
Brand: Manning
Rating: 4.1 (7 reviews)

By Satnam Alag

Rating: 4 out of 5 stars

4/5

()

Read preview

About this ebook

There's a great deal of wisdom in a crowd, but how do you listen to a thousand people talking at once? Identifying the wants, needs, and knowledge of internet users can be like listening to a mob.

In the Web 2.0 era, leveraging the collective power of user contributions, interactions, and feedback is the key to market dominance. A new category of powerful programming techniques lets you discover the patterns, inter-relationships, and individual profiles-the collective intelligence--locked in the data people leave behind as they surf websites, post blogs, and interact with other users.

Collective Intelligence in Action is a hands-on guidebook for implementing collective intelligence concepts using Java. It is the first Java-based book to emphasize the underlying algorithms and technical implementation of vital data gathering and mining techniques like analyzing trends, discovering relationships, and making predictions. It provides a pragmatic approach to personalization by combining content-based analysis with collaborative approaches.

This book is for Java developers implementing Collective Intelligence in real, high-use applications. Following a running example in which you harvest and use information from blogs, you learn to develop software that you can embed in your own applications. The code examples are immediately reusable and give the Java developer a working collective intelligence toolkit.

Along the way, you work with, a number of APIs and open-source toolkits including text analysis and search using Lucene, web-crawling using Nutch, and applying machine learning algorithms using WEKA and the Java Data Mining (JDM) standard.

Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.

Skip carousel

LanguageEnglish

PublisherManning

Release dateSep 30, 2008

ISBN9781638355380

Author

Satnam Alag

Satnam Alag, PhD, is currently the Vice President of Engineering at NextBio, a vertical search engine and a Web 2.0 collaboration application for the life sciences community. He is a seasoned software professional with over fifteen years of experience in machine learning and over a decade of experience in commercial software development and management. Dr. Alag worked as a consultant with Johnson & Johnsons's BabyCenter where he helped develop their personalization engine. Prior to that, he was the Chief Software Architect at Rearden Commerce and began his career at GE R&D. He is a Sun Certified Enterprise Architect (SCEA) for the Java Platform. Dr. Alag earned his PhD in engineering from UC Berkeley and his dissertation was in the area of probabilistic reasoning and machine learning. He has published numerous peer-reviewed articles.

Related authors

Skip carousel

Related to Collective Intelligence in Action

Related ebooks

Skip carousel

Feature Engineering Bookcamp
Ebook
Feature Engineering Bookcamp
bySinan Ozdemir
Rating: 0 out of 5 stars
0 ratings
Real-World Functional Programming: With examples in F# and C#
Ebook
Real-World Functional Programming: With examples in F# and C#
byTomas Petricek
Rating: 0 out of 5 stars
0 ratings
Machine Learning Systems: Designs that scale
Ebook
Machine Learning Systems: Designs that scale
byJeffrey Smith
Rating: 0 out of 5 stars
0 ratings
Troubleshooting Java: Read, debug, and optimize JVM applications
Ebook
Troubleshooting Java: Read, debug, and optimize JVM applications
byLaurentiu Spilca
Rating: 0 out of 5 stars
0 ratings
Re-Engineering Legacy Software
Ebook
Re-Engineering Legacy Software
byChris Birchall
Rating: 0 out of 5 stars
0 ratings
API Design Patterns
Ebook
API Design Patterns
byJJ Geewax
Rating: 5 out of 5 stars
5/5
Go in Practice
Ebook
Go in Practice
byMatt Farina
Rating: 5 out of 5 stars
5/5
RxJava for Android Developers
Ebook
RxJava for Android Developers
byTimo Tuominen
Rating: 0 out of 5 stars
0 ratings
AI as a Service: Serverless machine learning with AWS
Ebook
AI as a Service: Serverless machine learning with AWS
byPeter Elger
Rating: 1 out of 5 stars
1/5
Backbone.js Patterns and Best Practices
Ebook
Backbone.js Patterns and Best Practices
bySwarnendu De
Rating: 0 out of 5 stars
0 ratings
Functional Programming in JavaScript: How to improve your JavaScript programs using functional techniques
Ebook
Functional Programming in JavaScript: How to improve your JavaScript programs using functional techniques
byLuis Atencio
Rating: 0 out of 5 stars
0 ratings
Streaming Data: Understanding the real-time pipeline
Ebook
Streaming Data: Understanding the real-time pipeline
byAndrew Psaltis
Rating: 0 out of 5 stars
0 ratings
GANs in Action: Deep learning with Generative Adversarial Networks
Ebook
GANs in Action: Deep learning with Generative Adversarial Networks
byVladimir Bok
Rating: 0 out of 5 stars
0 ratings
Node Web Development, Second Edition
Ebook
Node Web Development, Second Edition
byDavid Herron
Rating: 0 out of 5 stars
0 ratings
Algorithms of the Intelligent Web
Ebook
Algorithms of the Intelligent Web
byDoug McIlwraith
Rating: 0 out of 5 stars
0 ratings
iOS in Practice
Ebook
iOS in Practice
byBear P. Cahill
Rating: 0 out of 5 stars
0 ratings
Full Stack Python Security: Cryptography, TLS, and attack resistance
Ebook
Full Stack Python Security: Cryptography, TLS, and attack resistance
byDennis Byrne
Rating: 0 out of 5 stars
0 ratings
Deep Learning with Structured Data
Ebook
Deep Learning with Structured Data
byMark Ryan
Rating: 0 out of 5 stars
0 ratings
Parallel and High Performance Computing
Ebook
Parallel and High Performance Computing
byRobert Robey
Rating: 0 out of 5 stars
0 ratings
Classic Computer Science Problems in Java
Ebook
Classic Computer Science Problems in Java
byDavid Kopec
Rating: 0 out of 5 stars
0 ratings
Linked Data: Structured data on the Web
Ebook
Linked Data: Structured data on the Web
byLuke Ruth
Rating: 4 out of 5 stars
4/5
The Joy of Clojure
Ebook
The Joy of Clojure
byChris Houser
Rating: 4 out of 5 stars
4/5
Beginning Graphics Programming with Processing 4
Ebook
Beginning Graphics Programming with Processing 4
byAntony Lees
Rating: 0 out of 5 stars
0 ratings
Neo4j in Action
Ebook
Neo4j in Action
byTareq Abedrabbo
Rating: 0 out of 5 stars
0 ratings
Reactive Design Patterns
Ebook
Reactive Design Patterns
byJamie Allen
Rating: 0 out of 5 stars
0 ratings
Street Coder: The rules to break and how to break them
Ebook
Street Coder: The rules to break and how to break them
bySedat Kapanoglu
Rating: 0 out of 5 stars
0 ratings
Event Processing in Action
Ebook
Event Processing in Action
byPeter Niblett
Rating: 0 out of 5 stars
0 ratings
Knative in Action
Ebook
Knative in Action
byJacques Chester
Rating: 0 out of 5 stars
0 ratings
Real-World Cryptography
Ebook
Real-World Cryptography
byDavid Wong
Rating: 4 out of 5 stars
4/5
Web Performance in Action: Building Fast Web Pages
Ebook
Web Performance in Action: Building Fast Web Pages
byJeremy Wagner
Rating: 0 out of 5 stars
0 ratings

Intelligence (AI) & Semantics For You

Skip carousel

Python for Beginners: A Crash Course to Learn Python Programming in 1 Week
Ebook
Python for Beginners: A Crash Course to Learn Python Programming in 1 Week
byBrady Ellison
Rating: 0 out of 5 stars
0 ratings
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
The Alignment Problem: How Can Machines Learn Human Values?
Ebook
The Alignment Problem: How Can Machines Learn Human Values?
byBrian Christian
Rating: 4 out of 5 stars
4/5
Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World
Ebook
Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World
byMo Gawdat
Rating: 4 out of 5 stars
4/5
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 4 out of 5 stars
4/5
Algorithms to Live By: The Computer Science of Human Decisions
Ebook
Algorithms to Live By: The Computer Science of Human Decisions
byBrian Christian
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: A Guide for Thinking Humans
Ebook
Artificial Intelligence: A Guide for Thinking Humans
byMelanie Mitchell
Rating: 4 out of 5 stars
4/5
Deep Utopia: Life and Meaning in a Solved World
Ebook
Deep Utopia: Life and Meaning in a Solved World
byNick Bostrom
Rating: 0 out of 5 stars
0 ratings
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Ebook
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
The Algorithm: How AI Can Hijack Your Career and Steal Your Future
Ebook
The Algorithm: How AI Can Hijack Your Career and Steal Your Future
byHilke Schellmann
Rating: 0 out of 5 stars
0 ratings
Grokking Machine Learning
Ebook
Grokking Machine Learning
byLuis Serrano
Rating: 0 out of 5 stars
0 ratings
Deep Learning with PyTorch
Ebook
Deep Learning with PyTorch
byLuca Pietro Giovanni Antiga
Rating: 5 out of 5 stars
5/5
Advances in Financial Machine Learning
Ebook
Advances in Financial Machine Learning
byMarcos López de Prado
Rating: 5 out of 5 stars
5/5
ChatGPT
Ebook
ChatGPT
byRobert Conway
Rating: 1 out of 5 stars
1/5
Prompt Engineering ; The Future Of Language Generation
Ebook
Prompt Engineering ; The Future Of Language Generation
byMichael Ferguson
Rating: 3 out of 5 stars
3/5
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
Ebook
Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)
byUtpal Chakraborty
Rating: 0 out of 5 stars
0 ratings
TensorFlow in 1 Day: Make your own Neural Network
Ebook
TensorFlow in 1 Day: Make your own Neural Network
byKrishna Rungta
Rating: 4 out of 5 stars
4/5
The Creativity Code: How AI is learning to write, paint and think
Ebook
The Creativity Code: How AI is learning to write, paint and think
byMarcus du Sautoy
Rating: 4 out of 5 stars
4/5
Grokking Deep Reinforcement Learning
Ebook
Grokking Deep Reinforcement Learning
byMiguel Morales
Rating: 5 out of 5 stars
5/5
ChatGPT For Dummies
Ebook
ChatGPT For Dummies
byPam Baker
Rating: 4 out of 5 stars
4/5
Predictive Analytics and Machine Learning for Managers
Ebook
Predictive Analytics and Machine Learning for Managers
byJ. Alberto Espinosa
Rating: 0 out of 5 stars
0 ratings
Midjourney Mastery - The Ultimate Handbook of Prompts
Ebook
Midjourney Mastery - The Ultimate Handbook of Prompts
byAndreea Todinca
Rating: 5 out of 5 stars
5/5
ChatGPT
Ebook
ChatGPT
byGary Stevens
Rating: 3 out of 5 stars
3/5
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
Ebook
ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)
byMatthew Hayes
Rating: 0 out of 5 stars
0 ratings
Hands-On System Design: Learn System Design, Scaling Applications, Software Development Design Patterns with Real Use-Cases
Ebook
Hands-On System Design: Learn System Design, Scaling Applications, Software Development Design Patterns with Real Use-Cases
byHarsh Kumar Ramchandani
Rating: 0 out of 5 stars
0 ratings
Grokking Artificial Intelligence Algorithms
Ebook
Grokking Artificial Intelligence Algorithms
byRishal Hurbans
Rating: 0 out of 5 stars
0 ratings
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
Ebook
The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications
byKavita Ganesan
Rating: 0 out of 5 stars
0 ratings
Deep Learning with Python
Ebook
Deep Learning with Python
byFrancois Chollet
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

046 jsAir - React Native with Bonnie Eisenman, Ken Wheeler, and Tyler McGinnis: React Native with Bonnie Eisenman, Ken Wheeler, and Tyler McGinnis Description: JavaScript is taking the software world by storm, and we're going to talk about yet another enabling technology: React Native. Show sponsors:Egghead.io - Bite-size...
Podcast episode
046 jsAir - React Native with Bonnie Eisenman, Ken Wheeler, and Tyler McGinnis: React Native with Bonnie Eisenman, Ken Wheeler, and Tyler McGinnis Description: JavaScript is taking the software world by storm, and we're going to talk about yet another enabling technology: React Native. Show sponsors:Egghead.io - Bite-size...
byJavaScript Air
0 ratings
0% found this document useful
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
Podcast episode
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
byMaintainable
0 ratings
0% found this document useful
Rust: A language for the next 40 years with Carol Nichols: Learn what makes the programming language Rust a unique technology, such as the memory safety guarantees that enable more people to write performant systems-level code. Scott talks to Rust core contributor Carol Nichols about what she's so excited about Rust and the future.
Podcast episode
Rust: A language for the next 40 years with Carol Nichols: Learn what makes the programming language Rust a unique technology, such as the memory safety guarantees that enable more people to write performant systems-level code. Scott talks to Rust core contributor Carol Nichols about what she's so excited about Rust and the future.
byHanselminutes with Scott Hanselman
0 ratings
0% found this document useful
CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14: CRDTs, Conflict Resolution, and Distributed Consensus in Real World Systems (Interview)
Podcast episode
CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14: CRDTs, Conflict Resolution, and Distributed Consensus in Real World Systems (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Jobs of Tomorrow: Windows Insider Podcast Episode 17
Podcast episode
Jobs of Tomorrow: Windows Insider Podcast Episode 17
byWindows Insider Podcast
100%
100% found this document useful
Mark Downie: Balancing The Promises That Open Source Projects Make: Robby speaks with Mark Downie, Program Manager at Microsoft. They discuss the benefits of frameworks and approaches to making your open source project accessible and welcoming to new contributors and users. Mark also shares how Visual Studio's workflow for navigating customer requirements and getting early feedback, along with an introduction to what a Program Manager role is responsible for on the Visual Studio team.
Podcast episode
Mark Downie: Balancing The Promises That Open Source Projects Make: Robby speaks with Mark Downie, Program Manager at Microsoft. They discuss the benefits of frameworks and approaches to making your open source project accessible and welcoming to new contributors and users. Mark also shares how Visual Studio's workflow for navigating customer requirements and getting early feedback, along with an introduction to what a Program Manager role is responsible for on the Visual Studio team.
byMaintainable
0 ratings
0% found this document useful
[AI is Here] Unlocking NLP's Potential in Banking - with Christophe Makni of Migros Bank: Today’s guest is Christophe Makni, Head of Business Operations at Migros Bank. Christophe shares a few key insights in this episode, starting with where natural language processing is finding a fit in banking today and the real deployments in the...
Podcast episode
[AI is Here] Unlocking NLP's Potential in Banking - with Christophe Makni of Migros Bank: Today’s guest is Christophe Makni, Head of Business Operations at Migros Bank. Christophe shares a few key insights in this episode, starting with where natural language processing is finding a fit in banking today and the real deployments in the...
byThe AI in Business Podcast
0 ratings
0% found this document useful
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
Podcast episode
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
byPractical AI: Machine Learning, Data Science, LLM
0 ratings
0% found this document useful
SLP77 Kalle Rosenbaum - Grokking Bitcoin: it’s not just for developers
Podcast episode
SLP77 Kalle Rosenbaum - Grokking Bitcoin: it’s not just for developers
byStephan Livera Podcast
0 ratings
0% found this document useful
41. Bob Nystrom
Podcast episode
41. Bob Nystrom
byIt's All Widgets! Flutter Podcast
0 ratings
0% found this document useful
Distributed Systems with Leslie Lamport: This episode is a republication from my interview with Leslie Lamport on Software Engineering Radio. Leslie Lamport won a Turing Award in 2013 for his work in distributed and concurrent systems. He also designed the document preparation tool LaTex.
Podcast episode
Distributed Systems with Leslie Lamport: This episode is a republication from my interview with Leslie Lamport on Software Engineering Radio. Leslie Lamport won a Turing Award in 2013 for his work in distributed and concurrent systems. He also designed the document preparation tool LaTex.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
RAG Quality Starts with Data Quality // Adam Kamor // #262
Podcast episode
RAG Quality Starts with Data Quality // Adam Kamor // #262
byMLOps.community
0 ratings
0% found this document useful
How to Over Deliver and Over Perform Using Blog Automation With Ai
Podcast episode
How to Over Deliver and Over Perform Using Blog Automation With Ai
byThe Secret To Success with Antonio T Smith Jr
0 ratings
0% found this document useful
Building Cody, an Open Source AI Coding Assistant // Beyang Liu // MLOps Podcast #173
Podcast episode
Building Cody, an Open Source AI Coding Assistant // Beyang Liu // MLOps Podcast #173
byMLOps.community
0 ratings
0% found this document useful
Working with Kubernetes and KRM with Megan O'Keefe: This week on the podcast, we welcome guest Megan O’Keefe to talk about KRM and Kubernetes with your hosts Mark Mirchandani and Anthony Bushong.
Podcast episode
Working with Kubernetes and KRM with Megan O'Keefe: This week on the podcast, we welcome guest Megan O’Keefe to talk about KRM and Kubernetes with your hosts Mark Mirchandani and Anthony Bushong.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Platform Engineering at a FAANG Company
Podcast episode
Platform Engineering at a FAANG Company
byThe Cloudcast
0 ratings
0% found this document useful
Using Artificial Intelligence to Redefine Customer Appreciation and Customer Relationships
Podcast episode
Using Artificial Intelligence to Redefine Customer Appreciation and Customer Relationships
byThe Secret To Success with Antonio T Smith Jr
0 ratings
0% found this document useful
Understanding Time-Series Database Patterns
Podcast episode
Understanding Time-Series Database Patterns
byThe Cloudcast
0 ratings
0% found this document useful
A "AI & ML" Look Ahead for 2020
Podcast episode
A "AI & ML" Look Ahead for 2020
byThe Cloudcast
0 ratings
0% found this document useful
32: DHH - Building Basecamp 3 like a Porsche 911: DHH returns to the podcast to talk in-depth about how Basecamp 3 is designed and implemented! Topics include: Why Basecamp is a "majestic monolith", and the impact of organizational shape and size on technical decision making in product development How
Podcast episode
32: DHH - Building Basecamp 3 like a Porsche 911: DHH returns to the podcast to talk in-depth about how Basecamp 3 is designed and implemented! Topics include: Why Basecamp is a "majestic monolith", and the impact of organizational shape and size on technical decision making in product development How
byFull Stack Radio
0 ratings
0% found this document useful
The Cloudcast #346 - What is Observability?
Podcast episode
The Cloudcast #346 - What is Observability?
byThe Cloudcast
0 ratings
0% found this document useful
DEEP DIVE: The Lowdown on Decentalized Data Storage in Web3 | #153
Podcast episode
DEEP DIVE: The Lowdown on Decentalized Data Storage in Web3 | #153
byThe Milk Road Show
0 ratings
0% found this document useful
How Dynamic NFTs Provide a Canvas for Real-World Utility | Michael Robinson @ Chainlink
Podcast episode
How Dynamic NFTs Provide a Canvas for Real-World Utility | Michael Robinson @ Chainlink
byThe Milk Road Show
0 ratings
0% found this document useful
Building Vector Search Applications
Podcast episode
Building Vector Search Applications
byThe Cloudcast
0 ratings
0% found this document useful
Kara Cotter: Creating Self-Paced Training for Communication Partners (Part 2): This week, we present Part 2 of Chris’s interview with Kara Cotter, a school-based AAC/AT Specialist who contacted Chris to ask about improving buy in, moving to the coaching model, making AAC more inclusive, and more! Before the interview, Chris shar...
Podcast episode
Kara Cotter: Creating Self-Paced Training for Communication Partners (Part 2): This week, we present Part 2 of Chris’s interview with Kara Cotter, a school-based AAC/AT Specialist who contacted Chris to ask about improving buy in, moving to the coaching model, making AAC more inclusive, and more! Before the interview, Chris shar...
byTalking With Tech AAC Podcast
0 ratings
0% found this document useful
Crypto Points Systems Are a 100x Opportunity, But Founders, Be Wary - Ep. 585: Li Jin of Variant Fund dives into the trend of points in crypto: why projects favor points over tokens, the art of designing such systems, and the potential of on-chain points.
Podcast episode
Crypto Points Systems Are a 100x Opportunity, But Founders, Be Wary - Ep. 585: Li Jin of Variant Fund dives into the trend of points in crypto: why projects favor points over tokens, the art of designing such systems, and the potential of on-chain points.
byUnchained
0 ratings
0% found this document useful
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
Podcast episode
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
byData Engineering Podcast
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Selenium Insight from All-Star SeleniumConf Speakers!: In this episode, you'll hear from five SeleniumConf Chicago 2023 speakers and/or project core committers about their upcoming talks, the reasons for their participation, and the benefits attendees can expect to gain from the conference. Additionally,...
Podcast episode
Selenium Insight from All-Star SeleniumConf Speakers!: In this episode, you'll hear from five SeleniumConf Chicago 2023 speakers and/or project core committers about their upcoming talks, the reasons for their participation, and the benefits attendees can expect to gain from the conference. Additionally,...
byTestGuild Automation Podcast
0 ratings
0% found this document useful
Humans in the Loop - Lina Weichbrodt
Podcast episode
Humans in the Loop - Lina Weichbrodt
byDataTalks.Club
0 ratings
0% found this document useful

Skip carousel

Usability
Linux Format
Article
Usability
Oct 19, 2021
3 min read
The Verdict
Linux Format
Article
The Verdict
May 30, 2023
2 min read
Use EBPF To Keep Tabs On Your CPU
Linux Format
Article
Use EBPF To Keep Tabs On Your CPU
Oct 18, 2022
Did you miss part one? Get hold of it on page 60 Mihalis Tsoukalos is a systems engineer and a technical writer. You can reach him at @mactsouk. We’re continuing our dive into the notoriously complex Extended Berkeley Packet Filter (eBPF) feature of
9 min read
Open Source Processors
Linux Format
Article
Open Source Processors
Jun 2, 2020
8 min read
Add Military-level Security To Any Project
Linux Format
Article
Add Military-level Security To Any Project
Aug 27, 2019
7 min read
Chain Reaction
Edge
Article
Chain Reaction
Nov 4, 2021
For a good while, ‘NFT’ has been just another entry in a long list of threeletter initialisms that was easy to ignore. Like so many technological fads before it, the thinking went, it would surely simply fizzle out. But with millions of dollars of in
7 min read
GENEALOGY GADGETS & APPS FOR ALL OCCASIONS!
Family Tree UK
Article
GENEALOGY GADGETS & APPS FOR ALL OCCASIONS!
Aug 12, 2022
2 min read
Google Chrome vs Mozilla Firefox
Maximum PC
Article
Google Chrome vs Mozilla Firefox
Jan 5, 2021
4 min read
“If ‘Show Password’ Is Enabled, The Feature Sends Your Password To Their Third-party Servers”
PC Pro Magazine
Article
“If ‘Show Password’ Is Enabled, The Feature Sends Your Password To Their Third-party Servers”
Dec 8, 2022
Like most people who write for a living, I lean heavily on my spoil chicken to get me through the day. Sorry, I mean spell checker. It’s not just professional writers, either: spell checkers have become de rigueur for business users and consumers ali
7 min read
Browser Wars 2020
Maximum PC
Article
Browser Wars 2020
May 26, 2020
8 min read
How We Tested…
Linux Format
Article
How We Tested…
May 4, 2021
Web browsers are among the most intuitive applications so we won’t test them on available documentation, even though this is an area where Falkon is severely lacking. As always, we want a feature-rich browser that can suitably replace the default off
1 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
Browser Wars 2020
TechLife
Article
Browser Wars 2020
Aug 24, 2020
8 min read
Seven Ways To Future-proof Your SEO Strategy
Marketing
Article
Seven Ways To Future-proof Your SEO Strategy
Apr 8, 2018
Search engine optimisation (SEO) is always changing. To stay ahead of your competitors you need to be able to shift your SEO strategy. Expect to see mobile devices, artificial intelligence (AI) and voice search dominating the news. But what practical
3 min read
Browser Wars 2020
Linux Format
Article
Browser Wars 2020
Jun 30, 2020
8 min read
Browsers The War For The Dominance Of Internet Navigation
AppleMagazine
Article
Browsers The War For The Dominance Of Internet Navigation
Aug 19, 2022
5 min read
Browsers
TechLife News
Article
Browsers
Aug 20, 2022
5 min read
A.i. Coding
Linux Format
Article
A.i. Coding
Aug 22, 2023
16 min read
The Verdict
Linux Format
Article
The Verdict
Aug 24, 2021
2 min read
The Problem Solvers
APC
Article
The Problem Solvers
Sep 5, 2022
I do worry about govt data collection, in particular the US FBI, even though I’m Australian it scares the heck out of me. They aren’t just spying on people, they act on it, too. Thousands of arrests are made every year due to the FBI or other alphabe
5 min read
Harden Your Browsers
Linux Format
Article
Harden Your Browsers
Jul 30, 2019
4 min read
FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read
Scan And Scrape Websites Using Python
Linux Format
Article
Scan And Scrape Websites Using Python
Nov 14, 2023
David Bolton once accidentally boosted the traffic for his firm’s website by 25% in one day by running a web scraper on it. Luckily, they never found out! Ever since the web made an appearance back in the mid-’90s, programmers have been writing softw
6 min read
Google Answer Box Strategy
Techfastly
Article
Google Answer Box Strategy
Sep 21, 2020
Leveraging the Google PAA (People Also Ask) element on a Search Results Page for Targeted Content Creation with a Python Scraper All businesses that are online today are creating content at a furious pace. According to Technavio, a research firm, con
7 min read
Machine Learning How Effective Is It in Cryptocurrency Trading?
Techfastly
Article
Machine Learning How Effective Is It in Cryptocurrency Trading?
Nov 1, 2021
5 min read
10 Ways To Transform Your
PC Pro Magazine
Article
10 Ways To Transform Your
Jun 6, 2024
8 min read
The Best Privacy And Security Apps For Android
Android Advisor
Article
The Best Privacy And Security Apps For Android
Feb 2, 2022
7 min read
Code An Admin Back-end In Django
Linux Format
Article
Code An Admin Back-end In Django
Dec 13, 2022
Credit: www.djangoproject.com OUR EXPERT Matt Holder has been a fan of the open source methodology for over two decades and uses Linux and other tools where possible. More featurepacked source code for this project can be downloaded from https://
6 min read
Password Managers
Linux Format
Article
Password Managers
Feb 6, 2024
14 min read
HotPicks
Linux Format
Article
HotPicks
Sep 19, 2023
12 min read

Related categories

Skip carousel

Reviews for Collective Intelligence in Action

Rating: 4.0714286 out of 5 stars

4/5

7 ratings1 review

Rating: 5 out of 5 stars
5/5
I consider this book a must for those who also read or think about reading Segaran's 'Programming Collective Intelligence'. This book's language of choice is Java and it presents important aspects of industrial strength open source tools such Nutch, Lucene and Hadoop. The book is very practical oriented and do not expect to find much theoretical about data mining or machine algorithms (the author is very brief on explanations, gives good examples and provide enough literature to dive into relevant detailed texts, articles, books, etc.). However if you want to jump into analysing your application's unstructured data along with user analysis, build intelligence and focused crawling and searching functionalities, and finally build a recommendation system then this book is full of technical treasury. I really liked the final chapters on building recommendation systems and also I was glad to learn about JDM, the Java Data Mining API. Another strong point of the book is its concerns about scalability and discussions about class and database design for implementing the systems.

Book preview

Collective Intelligence in Action - Satnam Alag

Copyright

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact

Special Sales Department

Manning Publications Co.

Sound View Court 3B fax: (609) 877-8256

Greenwich, CT 06830 email:

orders@manning.com

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.

Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15% recycled and processed without the use of elemental chlorine.

Manning Publications Co.

Sound View Court 3B

Greenwich, CT 06830

Development Editor: Jeff Bleiel

Copyeditor: Benjamin Berg

Typesetter: Gordan Salinovic

Cover designer: Leslie Haimes

Printed in the United States of America

1 2 3 4 5 6 7 8 9 10 – MAL – 13 12 11 10 09 08

Dedication

To my dear sons, Ayush and Shray, and my beautiful, loving, and intelligent wife, Alpana

Brief Table of Contents

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this book

1. Gathering data for intelligence

Chapter 1. Understanding collective intelligence

Chapter 2. Learning from user interactions

Chapter 3. Extracting intelligence from tags

Chapter 4. Extracting intelligence from content

Chapter 5. Searching the blogosphere

Chapter 6. Intelligent web crawling

2. Deriving intelligence

Chapter 7. Data mining: process, toolkits, and standards

Chapter 8. Building a text analysis toolkit

Chapter 9. Discovering patterns with clustering

Chapter 10. Making predictions

3. Applying intelligence in your application

Chapter 11. Intelligent search

Chapter 12. Building a recommendation engine

Index

List of Figures

List of Tables

List of Listings

Copyright

Brief Table of Contents

Table of Contents

Foreword

Preface

Acknowledgments

About this book

1. Gathering data for intelligence

Chapter 1. Understanding collective intelligence

1.1. What is collective intelligence?

1.2. CI in web applications

1.2.1. Collective intelligence from the ground up: a sample application

1.2.2. Benefits of collective intelligence

1.2.3. CI is the core component of Web 2.0

1.2.4. Harnessing CI to transform from content-centric to user-centric applications

1.3. Classifying intelligence

1.3.1. Explicit intelligence

1.3.2. Implicit intelligence

1.3.3. Derived intelligence

1.4. Summary

1.5. Resources

Chapter 2. Learning from user interactions

2.1. Architecture for applying intelligence

2.1.1. Synchronous and asynchronous services

2.1.2. Real-time learning in an event-driven system

2.1.3. Polling services for non–event-driven systems

2.1.4. Advantages and disadvantages of event-based and non–event-based architectures

2.2. Basics of algorithms for applying CI

2.2.1. Users and items

2.2.2. Representing user information

2.2.3. Content-based analysis and collaborative filtering

2.2.4. Representing intelligence from unstructured text

2.2.5. Computing similarities

2.2.6. Types of datasets

2.3. Forms of user interaction

2.3.1. Rating and voting

2.3.2. Emailing or forwarding a link

2.3.3. Bookmarking and saving

2.3.4. Purchasing items

2.3.5. Click-stream

2.3.6. Reviews

2.4. Converting user interaction into collective intelligence

2.4.1. Intelligence from ratings via an example

2.4.2. Intelligence from bookmarking, saving, purchasing Items, forwarding, click-stream, and reviews

2.5. Summary

2.6. Resources

Chapter 3. Extracting intelligence from tags

3.1. Introduction to tagging

3.1.1. Tag-related metadata for users and items

3.1.2. Professionally generated tags

3.1.3. User-generated tags

3.1.4. Machine-generated tags

3.1.5. Tips on tagging

3.1.6. Why do users tag?

3.2. How to leverage tags

3.2.1. Building dynamic navigation

3.2.2. Innovative uses of tag clouds

3.2.3. Targeted search

3.2.4. Folksonomies and building a dictionary

3.3. Extracting intelligence from user tagging: an example

3.3.1. Items related to other items

3.3.2. Items of interest for a user

3.3.3. Relevant users for an item

3.4. Scalable persistence architecture for tagging

3.4.1. Reviewing other approaches

3.4.2. Recommended persistence architecture

3.5. Building tag clouds

3.5.1. Persistence design for tag clouds

3.5.2. Algorithm for building a tag cloud

3.5.3. Implementing a tag cloud

3.5.4. Visualizing a tag cloud

3.6. Finding similar tags

3.7. Summary

3.8. Resources

Chapter 4. Extracting intelligence from content

4.1. Content types and integration

4.1.1. Classifying content

4.1.2. Architecture for integrating content

4.2. The main CI-related content types

4.2.1. Blogs

4.2.2. Wikis

4.2.3. Groups and message boards

4.3. Extracting intelligence step by step

4.3.1. Setting up the example

4.3.2. Naïve analysis

4.3.3. Removing common words

4.3.4. Stemming

4.3.5. Detecting phrases

4.4. Simple and composite content types

4.5. Summary

4.6. Resources

Chapter 5. Searching the blogosphere

5.1. Introducing the blogosphere

5.1.1. Leveraging the blogosphere

5.1.2. RSS: the publishing format

5.1.3. Blog-tracking companies

5.2. Building a framework to search the blogosphere

5.2.1. The searcher

5.2.2. The search parameters

5.2.3. The query results

5.2.4. Handling the XML response

5.2.5. Exception handling

5.3. Implementing the base classes

5.3.1. Implementing the search parameters

5.3.2. Implementing the result objects

5.3.3. Implementing the searcher

5.3.4. Parsing XML response

5.3.5. Extending the framework

5.4. Integrating Technorati

5.4.1. Technorati search API overview

5.4.2. Implementing classes for integrating Technorati

5.5. Integrating Bloglines

5.5.1. Bloglines search API overview

5.5.2. Implementing classes for integrating Bloglines

5.6. Integrating providers using RSS

5.6.1. Generalizing the query parameters

5.6.2. Generalizing the blog searcher

5.6.3. Building the RSS 2.0 XML parser

5.7. Summary

5.8. Resources

Chapter 6. Intelligent web crawling

6.1. Introducing web crawling

6.1.1. Why crawl the Web?

6.1.2. The crawling process

6.1.3. Intelligent crawling and focused crawling

6.1.4. Deep crawling

6.1.5. Available crawlers

6.2. Building an intelligent crawler step by step

6.2.1. Implementing the core algorithm

6.2.2. Being polite: following the robots.txt file

6.2.3. Retrieving the content

6.2.4. Extracting URLs

6.2.5. Making the crawler intelligent

6.2.6. Running the crawler

6.2.7. Extending the crawler

6.3. Scalable crawling with Nutch

6.3.1. Setting up Nutch

6.3.2. Running the Nutch crawler

6.3.3. Searching with Nutch

6.3.4. Apache Hadoop, MapReduce, and Dryad

6.4. Summary

6.5. Resources

2. Deriving intelligence

Chapter 7. Data mining: process, toolkits, and standards

7.1. Core concepts of data mining

7.1.1. Attributes

7.1.2. Supervised and unsupervised learning

7.1.3. Key learning algorithms

7.1.4. The mining process

7.2. Using an open source data mining framework: WEKA

7.2.1. Using the WEKA application: a step-by-step tutorial

7.2.2. Understanding the WEKA APIs

7.2.3. Using the WEKA APIs via an example

7.3. Standard data mining API: Java Data Mining (JDM)

7.3.1. JDM architecture

7.3.2. Key JDM objects

7.3.3. Representing the dataset

7.3.4. Learning models

7.3.5. Algorithm settings

7.3.6. JDM tasks

7.3.7. JDM connection

7.3.8. Sample code for accessing DME

7.3.9. JDM models and PMML

7.4. Summary

7.5. Resources

Chapter 8. Building a text analysis toolkit

8.1. Building the text analyzers

8.1.1. Leveraging Lucene

8.1.2. Writing a stemmer analyzer

8.1.3. Writing a TokenFilter to inject synonyms and detect phrases

8.1.4. Writing an analyzer to inject synonyms and detect phrases

8.1.5. Putting our analyzers to work

8.2. Building the text analysis infrastructure

8.2.1. Building the tag infrastructure

8.2.2. Building the term vector infrastructure

8.2.3. Building the Text Analyzer class

8.2.4. Applying the text analysis infrastructure

8.3. Use cases for applying the framework

8.4. Summary

8.5. Resources

Chapter 9. Discovering patterns with clustering

9.1. Clustering blog entries

9.1.1. Defining the text clustering infrastructure

9.1.2. Retrieving blog entries from Technorati

9.1.3. Implementing the k-means algorithms for text processing

9.1.4. Implementing hierarchical clustering algorithms for text processing

9.1.5. Expectation maximization and other examples of clustering high-dimension sparse data

9.2. Leveraging WEKA for clustering

9.2.1. Creating the learning dataset

9.2.2. Creating the clusterer

9.2.3. Evaluating the clustering results

9.3. Clustering using the JDM APIs

9.3.1. Key JDM clustering-related classes

9.3.2. Clustering settings using the JDM APIs

9.3.3. Creating the clustering task using the JDM APIs

9.3.4. Executing the clustering task using the JDM APIs

9.3.5. Retrieving the clustering model using the JDM APIs

9.4. Summary

9.5. Resources

Chapter 10. Making predictions

10.1. Classification fundamentals

10.1.1. Learning decision trees by example

10.1.2. Naïve Bayes’ classifier

10.1.3. Belief networks

10.2. Classifying blog entries using WEKA APIs

10.2.1. Building the dataset for classifying blog entries

10.2.2. Building the classifier class

10.3. Regression fundamentals

10.3.1. Linear regression

10.3.2. Multi-layer perceptron (MLP)

10.3.3. Radial basis functions (RBF)

10.4. Regression using WEKA

10.5. Classification and regression using JDM

10.5.1. Key JDM supervised learning–related classes

10.5.2. Supervised learning settings using the JDM APIs

10.5.3. Creating the classification task using the JDM APIs

10.5.4. Executing the classification task using the JDM APIs

10.5.5. Retrieving the classification model using the JDM APIs

10.5.6. Retrieving the classification model using the JDM APIs

10.6. Summary

10.7. Resources

3. Applying intelligence in your application

Chapter 11. Intelligent search

11.1. Search fundamentals

11.1.1. Search architecture

11.1.2. Core Lucene classes

11.1.3. Basic indexing and searching via example

11.2. Indexing with Lucene

11.2.1. Understanding the index format

11.2.2. Modifying the index

11.2.3. Incremental indexing

11.2.4. Accessing the term frequency vector

11.2.5. Optimizing indexing performance

11.3. Searching with Lucene

11.3.1. Understanding Lucene scoring

11.3.2. Querying Lucene

11.3.3. Sorting search results

11.3.4. Querying on multiple fields

11.3.5. Filtering

11.3.6. Searching multiple indexes

11.3.7. Using a HitCollector

11.3.8. Optimizing search performance

11.4. Useful tools and frameworks

11.4.1. Luke

11.4.2. Solr

11.4.3. Compass

11.4.4. Hibernate search

11.5. Approaches to intelligent search

11.5.1. Augmenting search with classifiers and predictors

11.5.2. Clustering search results

11.5.3. Personalizing results for the user

11.5.4. Community-based search

11.5.5. Linguistic-based search

11.5.6. Data search

11.6. Summary

11.7. Resources

Chapter 12. Building a recommendation engine

12.1. Recommendation engine fundamentals

12.1.1. Introducing the recommendation engine

12.1.2. Item-based and user-based analysis

12.1.3. Computing similarity using content-based and collaborative techniques

12.1.4. Comparison of content-based and collaborative techniques

12.2. Content-based analysis

12.2.1. Finding similar items using a search engine (Lucene)

12.2.2. Building a content-based recommendation engine

12.2.3. Related items for document clusters

12.2.4. Personalizing content for a user

12.3. Collaborative filtering

12.3.1. k-nearest neighbor

12.3.2. Packages for implementing collaborative filtering

12.3.3. Dimensionality reduction with latent semantic indexing

12.3.4. Implementing dimensionality reduction

12.3.5. Probabilistic model–based approach

12.4. Real-world solutions

12.4.1. Amazon item-to-item recommendation

12.4.2. Google News personalization

12.4.3. Netflix and the BellKor Solution for the Netflix Prize

12.5. Summary

12.6. Resources

Index

List of Figures

List of Tables

List of Listings

Foreword

When I founded ReadWriteWeb[¹] back in April 2003, a tech news and analysis blog that is now one of the world’s top 10 blogs,[²] my goal was to explore the current era of the web. The year 2003 was a time when the effects of the dot-com meltdown were still being felt, yet there was something new stirring on the web, too. I christened my new blog Read/Write Web (the slash and space have since been dropped) because this new era of the web seemed to embody the notion that Tim Berners-Lee had when he invented the web—that it ought to be editable by anyone and that everyone contributes in some way to the web’s data.

¹http://www.readwriteweb.com/

² According to Technorati http://www.technorati.com/pop/blogs/

As Satnam Alag writes in this book, collective intelligence as a research field actually predates the web. But it was after the dot-com era had ended that we began to see evidence of collective intelligence applied to the web. In 2003 we regularly saw it in sites like Amazon, with its user reviews and recommendations, eBay with its user-driven auctions, Wikipedia with its editable encyclopedia, and Google with its mysterious PageRank algorithm for ranking the popularity of web pages.

Sometime in 2004, O’Reilly & Associates coined the term Web 2.0, which eventually gained mainstream acceptance as the term for this era of the web (just as dot-com described the previous one). A central part of the new definition was the notion of harnessing collective intelligence, in which user contributions could be valuable in aggregate if mined and utilized in some way in your web site or application.

For all the popularity of Web 2.0, it remains difficult to implement many of its principles. This is where this book comes in, because it applies mathematical formulas and examples to the notion of collective intelligence (from now on simply known as CI). After explaining how to gather data and extract intelligence on the web, in part 2 of the book Satnam instructs you on specific CI techniques such as data mining, text analysis, clustering, and predictive technology.

And, pssst, do you want to know how to build a recommendation engine? This is an area of web technology that we at ReadWriteWeb have been covering with great interest in 2008. Recommendation engines, as Satnam notes, aim to show items of interest to a user. But in our reviews of the current wave of recommendation engines, we have seen that it’s hard—very hard—to get recommendations right. Satnam shows how the leading practitioners, such as Amazon, Google News, and Netflix, build their recommendation engines. He also explains the different approaches you can take, with examples that developers can use and deploy in their own applications.

The Read/Write Web, or Web 2.0, or the Social Web, whatever you want to call it, relies on and builds value from user participation. If you’re a web developer, you’ll want to know how to use CI techniques to ensure that your web application can extract valuable data from its usage—and most importantly deliver that value right back to the users, where it belongs. This book goes a long way towards explaining how to do this.

RICHARD MACMANUS

FOUNDER/EDITOR, READWRITEWEB

Preface

"What is the virality coefficient for your application?"

This is an increasingly common question being asked of young companies as they try to raise money from venture capitalists. New products are being designed that inherently take advantage of virality within the product. Companies such as YouTube, Facebook, Ning, LinkedIn, Skype, and more have grown from zero to millions of users by leveraging the power of virality. With little or no marketing, these types of companies rely on the wisdom of crowds to spread exponentially from one user to two users, then four, then eight, and so on. A simple link in an email, which worked for Hotmail to grow its user base, may no longer be adequate for your application. Facebook and LinkedIn enable users to build their networks by sending an invitation to others to connect as friends or connections; other applications such as Skype and Jaxtr provide free services as long as you’re connecting to someone who’s already a member, thus encouraging users to register.

It wasn’t long ago when things were different. I still remember a few years back when I would ignore requests from others to connect on sites such as LinkedIn. Over a period of time, after repeatedly getting requests to connect from friends and acquaintances, I finally reached a tipping point and joined the network. The critical mass of users on the application, in addition to word-of-mouth recommendations, was good enough for me to see enough value to joining the network. Others had collectively convinced me to change my ways and join the application—this is one aspect of how collective intelligence is born and can manifest itself in your application.

Over the last few years, there’s been a quiet revolution in the way users interact. Time magazine even declared you, as in the collective set of users on the web, as the person of the year for 2006. Users are no longer shy about expressing themselves. This expression may be as simple as forwarding an interesting article to a friend, rating an item, or generating new content—commonly known as user-generated content (UGC). To harness this user revolution, a new breed of applications, commonly known as user-centric applications, are being developed. Putting the user at the center of the application, leveraging social networks, and UGC are the new paradigms, and a high degree of personalization is now becoming the norm.

It’s been almost two years since I first contacted Manning with the idea of writing a book on collective intelligence. Ever since my graduate school days, I’ve been fascinated by how you can discover interesting information by analyzing data. Over the years, I was able to ground a lot of theory in the practical world, especially in the context of large-scale web applications. One thing I knew was that there wasn’t a practical book that could guide a developer through the various aspects of applying intelligence in an application. I could see a typical developer’s eyes roll when delving into the inner workings of an algorithm or applying some of the collective intelligence features. There’s immense value that an application can create by leveraging user-interaction data. As more and more companies joined the Web 2.0 parade, I wanted to write a book that would guide developers to understanding and implementing collective intelligence–related features in their applications.

It took longer to write this book than I had hoped. Most of the book was written while I was working full-time in demanding jobs. But the experience obtained by implementing these concepts in the real world provided good insight into what would be useful to others.

Remember, applications that make use of every user interaction to improve the value of the application for the user and other potential future users, and harness the power of virality, will dominate their markets. This book provides a set of tools that you’ll need to leverage the information provided by the users on your site. Whatever forms of information may be available to you, this book will guide you in harnessing the potential of your information to personalize the site for your users. Focus on the user, and you shall succeed. For collective intelligence begins with a crowd of one.

Acknowledgments

In the late seventeenth century, Sir Isaac Newton said, If I have seen further, it is by standing on the shoulders of giants. Similarly, if I’ve been able to finish this book, it’s with the help of a great number of people.

First, this book wouldn’t have been possible without Associate Publisher Michael Stephens. Mike’s passion and belief in the topic kept the book going. He’s an excellent mentor and guides you through good times and bad. Just like Mike, my brain now converts all text into lists of lists. It was a real privilege to work with my development editor, Jeff Bleiel. Jeff spent countless hours providing feedback, digging deeper into why things were written in a certain way, and improving the flow of the text. Thanks to Marjan Bace, Manning’s publisher, for helping fine-tune the table of contents, and for his guiding principle of keeping the book focused on new content. Special thanks to Karen Tegtmeyer for setting up and coordinating the peer reviews. And to the production team of Benjamin Berg, Katie Tennant, and Gordan Salinovic for turning my manuscript into the book that you are now holding. They spent countless hours checking and rechecking the manuscript. If you’re thinking of writing a book, you won’t find a better team than the one at Manning!

I’d like to thank all of the reviewers of my manuscript, many of whom spent large amounts of their free time on this task, for sending their excellent comments, suggestions, and criticisms. Some of the reviewers wished to remain anonymous...but here are a few I would like to acknowledge by name: Jérôme Bernard, Ryan Cox, Dave Crane, Roozbeh Daneshvar, Steve Gutz, Clint Howarth, Frank Jania, Gordon Jones, Murali Krishnan, Darren Neimke, Sumit Pal, Muhammad Saleem, Robi Sen, Sopan Shewale, Srikanth Sundararajana, and John Tyler.

Special thanks to Shiva Paranandi, for his help in reviewing the text and the code, and for his technical proofread; Brendan Murray, for his technical proofread of the first half of the book; Sean Handel, for his detailed review of and suggestions on the first four chapters; Gautam Aggarwal, for his insightful comments; Krishna Mayuram, for his review of the third chapter; Mark Hornick, specification lead of JDM, for his suggestions on JDM-related chapters; Mayur Datar of Google, for reviewing the text for the Google News Personalization section in chapter 12; Mark Hall, Lead for Pentaho’s data mining solutions (WEKA), for his comments on WEKA-related content; Shi Hui Liu, Murtaza Sonaseth, Kevin Xiao, Hector Villarreal, and the rest of the NextBio team, for their suggestions; Shahram Seyedin-Noor of NextBio, for his comments on the early chapters, encouragement, and his passionate philosophy on virality; and Ken DeLong and Mike McEvoy of BabyCenter, for their review and suggestions to improve the manuscript.

Special thanks to the awesome team at NextBio, especially the management team: Saeid Akhtari, Shahram Seyedin-Noor, Ilya Kupershmidt, and Mostafa Ronaghi, who introduced me to the field of data search and life sciences. We have a fantastic opportunity in intelligent search and user-centric applications; let’s make it happen!

This book wouldn’t have been possible without the support of a number of people whom I have worked for, including Patrick Grady, the charismatic CEO of Rearden Commerce; Michael McEvoy, CEO of QuickTrac Software; K.J., CEO 123signup.com, whom I thank for his mentorship; and Gordon Jones, SVP at TechWorks.

And finally, thanks to Richard MacManus, founder and editor of ReadWriteWeb, for taking the time to read the manuscript and write the foreword to the book.

This book took longer to finish than I had hoped, while I was working full-time. Consequently, it amounted to working all the time, even when we were on vacation. This book wouldn’t have been possible without the active support of my wife, Alpana, and sons, and also the active encouragement and support provided by our extended families. On Alpana’s side, dad diligently proofread and cheered raw early drafts; mom tried to free up my time; Rohini and Amit Verma provided constant encouragement. On my side, my mom helped in every way she could and kept me going, while my two adoring sisters, Nina and Amrita, made me feel as if I were the best writer in the world. Special thanks to Rajeev, Ankit, and Anish Suri for their encouragement.

Needless to say, this book was a nonstarter without the inspiration and support provided by Alpana, Ayush, and Shray. Dad, how many chapters did you finish last night? kept me going, as I didn’t want to see the disappointment in my sons’ eyes. Thank you, Alpana, for supporting me through this venture—it wouldn’t have been possible without your sacrifices. I look forward to some quality time with the family, soon.

About this book

Collective Intelligence in Action is a practical book for applying collective intelligence to real-world web applications. I cover a broad spectrum of topics, from simple illustrative examples that explain the concepts and the math behind them, to the ideal architecture for developing a feature, to the database schema, to code implementation and use of open source toolkits. Regardless of your background and nature of development, I’m sure you’ll find the examples and code samples useful. You should be able to directly use the code developed in this book. This is a practical book and I present a holistic view on what’s required to apply these techniques in the real world. Consequently, the book discusses the architectures for implementing intelligence—you’ll find lots of diagrams, especially UML diagrams, and a number of screenshots from well-known sites, in addition to code listings and even database schema designs.

There are a plethora of examples. Typically, concepts and the underlying math for algorithms are explained via examples with detailed step-by-step analysis. Accompanying the examples is Java code that demonstrates the concepts by implementing them, or by using open source frameworks.

A lot of work has been done by the open source community in Java in the areas of text processing and search (Lucene), data mining (WEKA), web crawling (Nutch), and data mining standards (JDM). This book leverages these frameworks, presenting examples and developing code that you can directly use in your Java application.

The first few chapters don’t assume knowledge of Java. You should be able to follow the concepts and the underlying math using the illustrative examples. For the later chapters, a basic understanding of Java will be helpful. The book uses a number of diagrams and screenshots to illustrate the concepts. The Resources section of each chapter contains links to other useful content.

Roadmap

Chapter 1 provides a basic introduction to the field of collective intelligence (CI). CI is an active area of research, and I’ve kept the focus on applying CI to web applications. Section 1.2.1 is a personal favorite of mine; it provides a roadmap through a hypothetical example of how you can apply CI to your application. This is a must-read, since it helps to translate CI into features in your application and puts the flow of the book in perspective. Chapter 1 should also provide you with a good overview of the three forms of intelligence: direct, indirect, and derived.

The book is divided into three parts. Part 1 deals with collecting data, both within and outside the application, to be translated into intelligence later. Chapters 2 through 4 deal with gathering information from within one’s application, while chapters 5 and 6 focus on gathering information from outside of one’s application.

Chapter 2 provides an overview of the architecture required to embed CI in your application, along with a quick overview of some of the basic concepts that are needed to apply CI. Please take some time to go through section 2.2 in detail, as a firm understanding of the concepts presented in this section will be useful throughout the book. This chapter also shows how intelligence can be derived by analyzing the actions of the user. It’s worthwhile to go through the example in section 2.4 in detail, as understanding the concepts presented there will also be useful throughout the book.

Chapter 3 continues with the theme of collecting data, this time from the user action of tagging. It provides an overview of the three forms of tags and how tagging can be leveraged. In section 3.3, we work through an example to show how tagging data can be converted into intelligence. This chapter also provides an overview of the ideal persistence architecture required to leverage tagging, and illustrates how to develop tag clouds.

Chapter 4 is focused on the different kinds of content that may be available in your application and how they can be used to derive intelligence. The chapter begins with providing an overview of the different architectures to embed content in your application. I also briefly discuss content that’s typically associated with CI: blogs, wikis, and message boards. Next, we work through a step-by-step example of how intelligence can be extracted from unstructured text. This is a must-read section for those who want to understand text analytics.

The next two chapters are focused on collecting data from outside of one’s application—first by searching the blogosphere and then by crawling the web.

Chapter 5 deals with building a framework to harvest information from the blogosphere. It begins with developing a generalized framework to retrieve blog entries. Next, it extends the framework to query blog-tracking providers such as Technorati, Blogdigger, Bloglines, and MSN.

Chapter 6 is focused on retrieving information from the web using web crawling. It introduces intelligent web crawling or focused crawling, along with a short discussion on dealing with hidden content. In this chapter, we first develop a simple web crawler. This exercise is useful to understand all the pieces that need to come together to build a web crawler and to understand the issues related to crawling the complete web. Next, for scalable crawling, we look at Nutch, an open source scalable web crawler.

Part 2 of the book is focused on deriving intelligence from the information collected. It consists of four chapters—an introduction to the data mining process, standards, and toolkits, and chapters on developing a text-analysis toolkit, finding patterns through clustering, and making predictions.

Chapter 7 provides an introduction to the process of data mining—the process and the various kinds of algorithms. It introduces WEKA, the open source data mining toolkit that’s being extensively used, along with Java Data Mining (JDM) standard.

Chapter 8 develops a text analysis toolkit; this toolkit is used in the remainder of the book to convert unstructured text into a format that’s usable for the mining algorithms. Here we leverage Lucene for text processing. In this section, we develop a custom analyzer to inject synonyms and detect phrases.

In chapter 9, we develop clustering algorithms. In this chapter, we develop the implementation for the k-means and hierarchical clustering algorithms. We also look at how we can leverage WEKA and JDM for clustering. Building on the blog harvesting framework developed in chapter 5, we also illustrate how we can cluster blog entries.

In chapter 10, we deal with algorithms related to making predictions. We first begin with classification algorithms, such as decision trees, Naïve Bayes’ classifier, and belief networks. This chapter covers three algorithms for making predictions: linear regression, multi-layer perceptron, and radial basis function. It builds on the example of harvesting blog entries to illustrate how WEKA and JDM APIs can be leveraged for both classification and regression.

Part 3 consists of two chapters, which deal with applying intelligence within one’s application.

Chapter 11 deals with intelligent search. It shows how you can leverage Lucene, along with other useful toolkits and frameworks that leverage Lucene. It also covers six different approaches being taken in the area of intelligent search.

The last chapter, chapter 12, illustrates how to build a recommendation engine using both content-based and collaborative-based approaches. It also covers real-world case studies on how recommendation engines have been build at Amazon, Google News, and Netflix.

Code conventions and downloads

All source code in listings or in text is in a fixed-width font like this to separate it from ordinary text. Method and function names, object properties, XML elements, and attributes in text are presented using this same font. Code annotations accompany many of the listings, highlighting important concepts. In some cases, numbered bullets link to explanations that follow the listing.

Source code for all of the working examples in this book is available for download from www.manning.com/CollectiveIntelligenceinAction. Basic setup documentation is provided with the download.

Author Online

The purchase of Collective Intelligence in Action includes free access to a private web forum run by Manning Publications, where you can make comments about the book, ask technical questions, and receive help from the authors and from other users. To access the forum and subscribe to it, point your web browser to www.manning.com/CollectiveIntelligenceinAction. This page provides information about how to get on the forum once you’re registered, what kind of help is available, and the rules of conduct on the forum.

Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It isn’t a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the author some challenging questions lest his interest stray! The Author Online forum and the archives of previous discussions will be accessible from the publisher’s web site as long as the book is in print.

About the author

SATNAM ALAG, PH.D, is currently the vice president of engineering at NextBio (www.nextbio.com), a vertical search engine and a Web 2.0 user-centric application for the life sciences community. He’s a seasoned software professional with more than 15 years of experience in machine learning and over a decade of experience in commercial software development and management. Dr. Alag worked as a consultant with Johnson & Johnson’s BabyCenter, where he helped develop their personalization engine. Prior to that, he was the chief software architect at Rearden Commerce and began his career at GE R&D. He’s a Sun Certified Enterprise Architect (SCEA) for the Java Platform. Dr. Alag earned his Ph.D in engineering from UC Berkeley, and his dissertation was on the area of probabilistic reasoning and machine learning. He’s published a number of peer-reviewed articles.

About the title

By combining introductions, overviews, and how-to examples, the In Action books are designed to help learning and remembering. According to research in cognitive science, the things people remember are things they discover during self-motivated exploration.

Although no one at Manning is a cognitive scientist, we’re convinced that for learning to become permanent it must pass through stages of exploration, play, and, interestingly, retelling of what is being learned. People understand and remember new things, which is to say they master them, only after actively exploring them. Humans learn in action. An essential part of an In Action book is that it’s example-driven. It encourages the reader to try things out, to play with new code, and explore new ideas.

There is another, more mundane, reason for the title of this book: our readers are busy. They use books to do a job or solve a problem. They need books that allow them to jump in and jump out easily and learn just what they want just when they want it. They need books that aid them in action. The books in this series are designed for such readers.

About the cover illustration

The figure on the cover of Collective Intelligence in Action is captioned Le Champenois, a resident of the Champagne region in northeast France, best known for its sparkling white wine. The illustration is taken from a 19th century edition of Sylvain Maréchal’s four-volume compendium of regional dress customs published in France. Each illustration is finely drawn and colored by hand. The rich variety of Maréchal’s collection reminds us vividly of how culturally apart the world’s towns and regions were just 200 years ago. Isolated from each other, people spoke different dialects and languages. In the streets or in the countryside, it was easy to identify where they lived and what their station in life was just by their dress.

Dress codes have changed since then and the diversity by region, so rich at the time, has faded away. It is now hard to tell apart the inhabitants of different continents, let alone different towns or regions. Perhaps we have traded cultural diversity for a more varied personal life—certainly for a more varied and fast-paced technological life.

At a time when it is hard to tell one computer book from another, Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional life of two centuries ago, brought back to life by Maréchal’s pictures.

Part 1. Gathering data for intelligence

Chapter 1 begins the book with a brief overview of what collective intelligence is and how it manifests itself in your application. Then we move on to focus on how we can gather data from which we can derive intelligence. For this, we look at information both inside the application (chapters 2 through 4) and outside the application (chapters 5 and 6).

Chapter 2 deals with learning from the interactions of users. To get the ball rolling, we look at the architecture for embedding intelligence, and present some of the basic concepts related to collective intelligence (CI). We also cover how we can gather data from various forms of user interaction. We continue with this theme in chapter 3, which deals with tagging. This chapter contains all the information you need to build tagging-related features in your application. In chapter 4, we look at the various forms of content that are typically available in a web application and how to derive collective intelligence from it.

Next, we change our focus to collecting data from outside our application. We first deal with searching the blogosphere in chapter 5. This is followed by chapter 6, which deals with intelligently crawling the web in search of relevant content.

Chapter 1. Understanding collective intelligence

Enjoying the preview?

Page 1 of 1

Collective Intelligence in Action

About this ebook

Satnam Alag

Related authors

Related to Collective Intelligence in Action

Related ebooks

Feature Engineering Bookcamp

Real-World Functional Programming: With examples in F# and C#

Machine Learning Systems: Designs that scale

Troubleshooting Java: Read, debug, and optimize JVM applications

Re-Engineering Legacy Software

API Design Patterns

Go in Practice

RxJava for Android Developers

AI as a Service: Serverless machine learning with AWS

Backbone.js Patterns and Best Practices

Functional Programming in JavaScript: How to improve your JavaScript programs using functional techniques

Streaming Data: Understanding the real-time pipeline

GANs in Action: Deep learning with Generative Adversarial Networks

Node Web Development, Second Edition

Algorithms of the Intelligent Web

iOS in Practice

Full Stack Python Security: Cryptography, TLS, and attack resistance

Deep Learning with Structured Data

Parallel and High Performance Computing

Classic Computer Science Problems in Java

Linked Data: Structured data on the Web

The Joy of Clojure

Beginning Graphics Programming with Processing 4

Neo4j in Action

Reactive Design Patterns

Street Coder: The rules to break and how to break them

Event Processing in Action

Knative in Action

Real-World Cryptography

Web Performance in Action: Building Fast Web Pages

Intelligence (AI) & Semantics For You

Python for Beginners: A Crash Course to Learn Python Programming in 1 Week

Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates

The Alignment Problem: How Can Machines Learn Human Values?

Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing

Algorithms to Live By: The Computer Science of Human Decisions

Artificial Intelligence: A Guide for Thinking Humans

Deep Utopia: Life and Meaning in a Solved World

The Secrets of ChatGPT Prompt Engineering for Non-Developers

Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention

Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees

The Algorithm: How AI Can Hijack Your Career and Steal Your Future

Grokking Machine Learning

Deep Learning with PyTorch

Advances in Financial Machine Learning

ChatGPT

Prompt Engineering ; The Future Of Language Generation

Rise of Generative AI and ChatGPT: Understand how Generative AI and ChatGPT are transforming and reshaping the business world (English Edition)

TensorFlow in 1 Day: Make your own Neural Network

The Creativity Code: How AI is learning to write, paint and think

Grokking Deep Reinforcement Learning

ChatGPT For Dummies

Predictive Analytics and Machine Learning for Managers

Midjourney Mastery - The Ultimate Handbook of Prompts

ChatGPT

ChatGPT for Beginners: How to Make Money Online and 10x Your Productivity Using ChatGPT Even if You’re an Absolute Beginner (The Complete Up-to-Date ChatGPT Guide)

Hands-On System Design: Learn System Design, Scaling Applications, Software Development Design Patterns with Real Use-Cases

Grokking Artificial Intelligence Algorithms

The Business Case for AI: A Leader's Guide to AI Strategies, Best Practices & Real-World Applications

Deep Learning with Python

Related podcast episodes

Related articles

Related categories

Reviews for Collective Intelligence in Action

What did you think?

Book preview

Collective Intelligence in Action - Satnam Alag

Copyright

Dedication

Brief Table of Contents

Table of Contents

Foreword

Preface