Mining of Massive Datasets - Stanford
Mining of Massive Datasets - Stanford
Mining of Massive Datasets - Stanford
The book
The book is based on Stanford Computer Science course CS246: Mining
Massive Datasets (and CS345A: Data Mining).
By agreement with the publisher, you can download the book for free
from this page. Cambridge University Press does, however, retain
copyright on the work, and we expect that you will obtain their
permission and acknowledge our authorship if you republish parts or all
of it.
/
The 3rd edition of the book
The following is the third edition of the book. It contains new material on
Spark, Tensor ow, minhashing, community- nding, simrank, graph
algorithms, and decision trees. There is a new chapter 13, covering deep
learning.
Download the latest version of the book as a single big PDF le (603
pages, 3.6 MB).
Comments and corrections are most welcome. Please let us know if you
are using these materials in your course and we will list and link to your
course.
CS246
CS224W
If you are not a Stanford student, you can still take CS246 as well as
CS224W or earn a Stanford Mining Massive Datasets graduate certi cate
by completing a sequence of four Stanford Computer Science courses. A
graduate certi cate is a great way to keep the skills and knowledge in
your eld current. More information is available at the Stanford Center
for Professional Development (SCPD).
Supporting materials
If you are an instructor interested in using the Gradiance Automated
Homework System with this book, start by creating an account for
/
yourself here. Then, email your chosen login and the request to become
an instructor for the MMDS book to support@gradiance.com. You will
then be able to create a class using these materials. Manuals explaining
the use of the system are available here.
Together with each chapter there is aslo a set of lecture slides that we use
for teaching Stanford CS246: Mining Massive Datasets course. Note that
the slides do not necessarily cover all the material convered in the
corresponding chapters.
/
Part
2:
Part
Chapter 1: PDF PPT
Link Analysis PDF 1 2 3 4 5 6 7 8 9 10 11 12 13 14
5 Part PDF PPT
2:
Chapter Frequent
PDF PDF PPT 1 2 3 4
6 Itemsets
Chapter
Clustering PDF PDF PPT 1 2 3 4 5
7
Chapter Advertising on
PDF PDF PPT 1 2 3 4
8 the Web
Part
Chapter Recommendation 1: PDF PPT
PDF 1 2 3 4 5
9 Systems Part PDF PPT
2:
Part
Chapter Mining Social- 1: PDF PPT
PDF 1 2 3 4 5 6 7 8 9 10 11 12
10 Network Graphs Part PDF PPT
2:
Chapter Dimensionality
PDF PDF PPT 1 2 3 4 5 6 7 8 9 10 11 12
11 Reduction
Part
Large-Scale
Chapter 1: PDF PPT
Machine PDF 1 2 3 4 5 6 7 8 9 10 11 12
12 Part PDF PPT
Learning
2:
Index PDF
Errata HTML
Download the latest version of the book as a single big PDF le (511
pages, 3 MB).