Python Machine Learning code repository.
What you can expect are 400 pages rich in useful material just about everything you need to know to get started with machine learning ... from theory to the actual code that you can directly put into action! This is not yet just another "this is how scikit-learn works" book. I aim to explain all the underlying concepts, tell you everything you need to know in terms of best practices and caveats, and we will put those concepts into action mainly using NumPy, scikit-learn, and Theano.
You are not sure if this book is for you? Please checkout the excerpts from the Foreword and Preface, or take a look at the FAQ section for further information.
Paperback: 454 pages
Publisher: Packt Publishing
Language: English
ISBN-10: 1783555130
ISBN-13: 978-1783555130
Kindle ASIN: B00YSILNL0
Sebastian Raschka’s new book, Python Machine Learning, has just been released. I got a chance to read a review copy and it’s just as I expected - really great! It’s well organized, super easy to follow, and it not only offers a good foundation for smart, non-experts, practitioners will get some ideas and learn new tricks here as well.
– Lon Riesberg at Data Elixir
Superb job! Thus far, for me it seems to have hit the right balance of theory and practice…math and code!
– Brian Thomas
I've read (virtually) every Machine Learning title based around Scikit-learn and this is hands-down the best one out there.
– Jason Wolosonovich
- ebook and paperback at Amazon.com, Amazon.co.uk, Amazon.de
- ebook and paperback from Packt (the publisher)
- at other book stores: O'Reilly, Safari, Barnes & Noble, Apple iBooks, ...
- free sample chapter
Simply click on the ipynb
/nbviewer
links next to the chapter headlines to view the code examples (currently, the internal document links are only supported by the NbViewer version).
Please note that these are just the code examples accompanying the book, which I uploaded for your convenience; be aware that these notebooks may not be useful without the formulae and descriptive text.
Excerpts from the Foreword and Preface
- Machine Learning - Giving Computers the Ability to Learn from Data [dir] [ipynb] [nbviewer]
- Training Machine Learning Algorithms for Classification [dir] [ipynb] [nbviewer]
- A Tour of Machine Learning Classifiers Using Scikit-Learn [dir] [ipynb] [nbviewer]
- Building Good Training Sets – Data Pre-Processing [dir] [ipynb] [nbviewer]
- Compressing Data via Dimensionality Reduction [dir] [ipynb] [nbviewer]
- Learning Best Practices for Model Evaluation and Hyperparameter Optimization [dir] [ipynb] [nbviewer]
- Combining Different Models for Ensemble Learning [dir] [ipynb] [nbviewer]
- Applying Machine Learning to Sentiment Analysis [dir] [ipynb] [nbviewer]
- Embedding a Machine Learning Model into a Web Application [dir] [ipynb] [nbviewer]
- Predicting Continuous Target Variables with Regression Analysis [dir] [ipynb] [nbviewer]
- Working with Unlabeled Data – Clustering Analysis [dir] [ipynb] [nbviewer]
- Training Artificial Neural Networks for Image Recognition [dir] [ipynb] [nbviewer]
- Parallelizing Neural Network Training via Theano [dir] [ipynb] [nbviewer]
- Why are you and other people sometimes implement machine learning algorithms from scratch?
- What learning path/discipline in data science I should focus on?
- At what point should one start contributing to open source?
- How important do you think having a mentor is to the learning process?
- Where are the best online communities centered around data science/machine learning or python?
- Why are there so many deep learning libraries?
- What is the probabilistic interpretation of regularized logistic regression?
- Can you give a visual explanation for the back propagation algorithm for neural networks?
- How do I evaluate a model?
- Why do we re-use parameters from the training set to standardize the test set and new data?
- What are some of the issues with clustering?
- What is the difference between deep learning and usual machine learning?
- What is the best validation metric for multi-class classification?
- What are differences in research nature between the two fields: machine learning & data mining?
- What is the difference between LDA and PCA for dimensionality reduction?
- How do I know if the problem is solvable through machine learning?
- What factors should I consider when choosing a predictive model technique?
- Does regularization in logistic regression always results in better fit and better generalization?
- [Why did it take so long for deep networks to be invented?](./faq/Why did it take so long for deep networks to be invented?)
- How was classification, as a learning machine, developed?
- What are some good books/papers for learning deep learning?
- What are the different dimensionality reduction methods in machine learning?
- What is Euclidean distance in terms of machine learning?
- What is the major difference between naive Bayes and logistic regression?
- Can I use paragraphs and images from the book in presentations or my blog?
- How is this different from other machine learning books?
- Which version of Python was used in the code examples?
- Which technologies and libraries are being used?
- Which book version/format would you recommend?
- Why did you choose Python for machine learning?
- Why do you use so many leading and trailing underscores in the code examples?
- Are there any prerequisites and recommended pre-readings?
I am happy to answer questions! Just write me an email or consider asking the question on the Google Groups Email List.
If you are interested in keeping in touch, I have quite a lively twitter stream (@rasbt) all about data science and machine learning. I also maintain a blog where I post all of the things I am particularly excited about.