Mining of Massive Datasets

Unknown date Unknown author
Mining of Massive Datasets

Mining of Massive Datasets
Jure Leskovec, Anand Rajaraman, Je Ullman
Big-data is transforming the world. Here you will learn data mining and
machine learning techniques to process large datasets and extract
valuable knowledge from them.
The book
The book is based on Stanford Computer Science course CS246: Mining
Massive Datasets (and CS345A: Data Mining).
The book, like the course, is designed at the undergraduate computer

science level with no formal prerequisites. To support deeper
explorations, most of the chapters are supplemented with further
reading references.
The Mining of Massive Datasets book has been published by Cambridge

University Press. You can get a 20% discount by applying the code
MMDS20 at checkout.
By agreement with the publisher, you can download the book for free
from this page. Cambridge University Press does, however, retain
copyright on the work, and we expect that you will obtain their
permission and acknowledge our authorship if you republish parts or all
of it.
We welcome your feedback on the manuscript.
/
The 3rd edition of the book
The following is the third edition of the book. It contains new material on
Spark, Tensor ow, minhashing, community- nding, simrank, graph
algorithms, and decision trees. There is a new chapter 13, covering deep
learning.
We also o er a set of lecture slides that we use for teaching Stanford

CS246: Mining Massive Datasets course. Note that the slides do not
necessarily cover all the material convered in the corresponding
chapters.
Chapter Title Book Slides Videos

Preface and Table
PDF
of Contents
Chapter
Data Mining PDF PDF PPT
1
Map-Reduce and
Chapter
the New Software PDF PDF PPT 1 2 3 4 5 6 7 8
2
Stack
Chapter Finding Similar
PDF PDF PPT 1 2 3 4 5 6 7 8 9 10 11 12 13
3 Items
Part
Chapter Mining Data 1: PDF PPT
PDF 1 2 3 4 5
4 Streams Part PDF PPT
2:
Part
Chapter 1: PDF PPT
Link Analysis PDF 1 2 3 4 5 6 7 8 9 10 11 12 13 14
5 Part PDF PPT
2:
Chapter Frequent
PDF PDF PPT 1 2 3 4
6 Itemsets
Chapter
Clustering PDF PDF PPT 1 2 3 4 5
7
Chapter Advertising on
PDF PDF PPT 1 2 3 4
8 the Web
Part
Chapter Recommendation 1: PDF PPT
PDF 1 2 3 4 5
9 Systems Part PDF PPT
2:
Part
Chapter Mining Social- 1: PDF PPT
PDF 1 2 3 4 5 6 7 8 9 10 11 12
10 Network Graphs Part PDF PPT
2:
/
Chapter Dimensionality PDF PDF PPT 1 2 3 4 5 6 7 8 9 10 11 12
11 Reduction
Part
Large-Scale
Chapter 1: PDF PPT
Machine PDF 1 2 3 4 5 6 7 8 9 10 11 12
12 Part PDF PPT
Learning
2:
Chapter Neural Nets and
PDF
13 Deep Learning
Index PDF
Errata HTML
Download the latest version of the book as a single big PDF le (603
pages, 3.6 MB).
The Errata for the third edition of the book: HTML.
Download slides (PPT) in French: Chapter 4, Chapter 5, Chapter 8,

Chapter 9, Chapter 10. Courtesy of Richard Khoury.
Note to the users of provided slides: We would be delighted if you found

this our material useful in giving your own lectures. Feel free to use these
slides verbatim, or to modify them to t your own needs. PowerPoint
originals are available. If you make use of a signi cant portion of these
slides in your own lecture, please include this message, or a link to our
web site: http://www.mmds.org/.
Comments and corrections are most welcome. Please let us know if you
are using these materials in your course and we will list and link to your
course.
Stanford big data courses
CS246
CS246: Mining Massive Datasets is graduate level course that discusses

data mining and machine learning algorithms for analyzing very large
amounts of data. The emphasis is on Map Reduce as a tool for creating
parallel algorithms that can process very large amounts of data.
/
CS341
CS341 Project in Mining Massive Data Sets is an advanced project based

course. Students work on data mining and machine learning algorithms
for analyzing very large amounts of data. Both interesting big datasets as
well as computational infrastructure (large MapReduce cluster) are
provided by course sta . Generally, students rst take CS246 followed by
CS341.
CS341 is generously supported by Amazon by giving us access to their

EC2 platform.
CS224W
CS224W: Social and Information Networks is graduate level course that

covers recent research on the structure and analysis of such large social
and information networks and on models and algorithms that abstract
their basic properties. Class explores how to practically analyze large
scale network data and how to reason about it through models for
network structure and evolution.
You can take Stanford courses!
If you are not a Stanford student, you can still take CS246 as well as
CS224W or earn a Stanford Mining Massive Datasets graduate certi cate
by completing a sequence of four Stanford Computer Science courses. A
graduate certi cate is a great way to keep the skills and knowledge in
your eld current. More information is available at the Stanford Center
for Professional Development (SCPD).
Supporting materials
If you are an instructor interested in using the Gradiance Automated
Homework System with this book, start by creating an account for
/
yourself here. Then, email your chosen login and the request to become
an instructor for the MMDS book to support@gradiance.com. You will
then be able to create a class using these materials. Manuals explaining
the use of the system are available here.
Students who want to use the Gradiance Automated Homework System

for self-study can register here. Then, use the class token 1EDD8A1D to
join the "omnibus class" for the MMDS book. See The Student Guide for
more information.
Previous versions of the book
The 2nd edition of the book (v2.1)

The following is the second edition of the book. There are three new
chapters, on mining large graphs, dimensionality reduction, and
machine learning. There is also a revised Chapter 2 that treats map-
reduce programming in a manner closer to how it is used in practice.
Together with each chapter there is aslo a set of lecture slides that we use
for teaching Stanford CS246: Mining Massive Datasets course. Note that
the slides do not necessarily cover all the material convered in the
corresponding chapters.
Chapter Title Book Slides Videos

Preface and Table
PDF
of Contents
Chapter
Data Mining PDF PDF PPT
1
Map-Reduce and
Chapter
the New Software PDF PDF PPT 1 2 3 4 5 6 7 8
2
Stack
Chapter Finding Similar
PDF PDF PPT 1 2 3 4 5 6 7 8 9 10 11 12 13
3 Items
Chapter Mining Data PDF Part PDF PPT 1 2 3 4 5
4 Streams 1: PDF PPT
/
Part
2:
Part
Chapter 1: PDF PPT
Link Analysis PDF 1 2 3 4 5 6 7 8 9 10 11 12 13 14
5 Part PDF PPT
2:
Chapter Frequent
PDF PDF PPT 1 2 3 4
6 Itemsets
Chapter
Clustering PDF PDF PPT 1 2 3 4 5
7
Chapter Advertising on
PDF PDF PPT 1 2 3 4
8 the Web
Part
Chapter Recommendation 1: PDF PPT
PDF 1 2 3 4 5
9 Systems Part PDF PPT
2:
Part
Chapter Mining Social- 1: PDF PPT
PDF 1 2 3 4 5 6 7 8 9 10 11 12
10 Network Graphs Part PDF PPT
2:
Chapter Dimensionality
PDF PDF PPT 1 2 3 4 5 6 7 8 9 10 11 12
11 Reduction
Part
Large-Scale
Chapter 1: PDF PPT
Machine PDF 1 2 3 4 5 6 7 8 9 10 11 12
12 Part PDF PPT
Learning
2:
Index PDF
Errata HTML
Download the latest version of the book as a single big PDF le (511
pages, 3 MB).
Download the full version of the book with a hyper-linked table of

contents that make it easy to jump around: PDF le (513 pages, 3.69 MB).
The Errata for the second edition of the book: HTML.
Download slides (PPT) in French: Chapter 4, Chapter 5, Chapter 8,

Chapter 9, Chapter 10. Courtesy of Richard Khoury.
Note to the users of provided slides: We would be delighted if you found

this our material useful in giving your own lectures. Feel free to use these
slides verbatim, or to modify them to t your own needs. PowerPoint
originals are available. If you make use of a signi cant portion of these
slides in your own lecture, please include this message, or a link to our
web site: http://www.mmds.org/.
/
Version 1.0
The following materials are equivalent to the published book, with errata
corrected to July 4, 2012.
Chapter Title Book

Preface and Table of Contents PDF
Chapter 1 Data Mining PDF
Chapter 2 Large-Scale File Systems and Map-Reduce PDF
Chapter 3 Finding Similar Items PDF
Chapter 4 Mining Data Streams PDF
Chapter 5 Link Analysis PDF
Chapter 6 Frequent Itemsets PDF
Chapter 7 Clustering PDF
Chapter 8 Advertising on the Web PDF
Chapter 9 Recommendation Systems PDF
Index PDF
Errata HTML
Download the book as published here (340 pages, 2 MB).
Viewed using Just Read

Mining of Massive Datasets - Stanford

Uploaded by

Copyright:

Available Formats

Mining of Massive Datasets - Stanford

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mining of Massive Datasets - Stanford

Uploaded by

Copyright:

Available Formats

Unknown date Unknown author

The book, like the course, is designed at the undergraduate computer

The Mining of Massive Datasets book has been published by Cambridge

We welcome your feedback on the manuscript.

We also o er a set of lecture slides that we use for teaching Stanford

Chapter Title Book Slides Videos

The Errata for the third edition of the book: HTML.

Download slides (PPT) in French: Chapter 4, Chapter 5, Chapter 8,

Note to the users of provided slides: We would be delighted if you found

Stanford big data courses

CS246: Mining Massive Datasets is graduate level course that discusses

CS341 Project in Mining Massive Data Sets is an advanced project based

CS341 is generously supported by Amazon by giving us access to their

CS224W: Social and Information Networks is graduate level course that

You can take Stanford courses!

Students who want to use the Gradiance Automated Homework System

Previous versions of the book

The 2nd edition of the book (v2.1)

Chapter Title Book Slides Videos

Download the full version of the book with a hyper-linked table of

The Errata for the second edition of the book: HTML.

Download slides (PPT) in French: Chapter 4, Chapter 5, Chapter 8,

Note to the users of provided slides: We would be delighted if you found

Chapter Title Book

Download the book as published here (340 pages, 2 MB).

Viewed using Just Read

You might also like