Mastering SciPy
()
About this ebook
Related to Mastering SciPy
Related ebooks
Scientific Computing with Python 3 Rating: 0 out of 5 stars0 ratingsInteractive Applications Using Matplotlib Rating: 0 out of 5 stars0 ratingsPractical Data Science Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsMastering Probabilistic Graphical Models Using Python Rating: 3 out of 5 stars3/5Mastering Python Data Analysis Rating: 0 out of 5 stars0 ratingsLearning Bayesian Models with R Rating: 5 out of 5 stars5/5Advanced Machine Learning with Python Rating: 0 out of 5 stars0 ratingsRegression Analysis with Python Rating: 0 out of 5 stars0 ratingsBayesian Analysis with Python Rating: 5 out of 5 stars5/5Python Data Science Essentials Rating: 0 out of 5 stars0 ratingsLearning Data Mining with Python Rating: 0 out of 5 stars0 ratingsMastering Python for Data Science Rating: 3 out of 5 stars3/5Learning Probabilistic Graphical Models in R Rating: 0 out of 5 stars0 ratingsLearning Data Mining with Python - Second Edition Rating: 0 out of 5 stars0 ratingsR High Performance Programming Rating: 4 out of 5 stars4/5Building a Recommendation System with R Rating: 0 out of 5 stars0 ratingsF# for Machine Learning Essentials Rating: 0 out of 5 stars0 ratingsMastering Scientific Computing with R Rating: 3 out of 5 stars3/5Python Data Science Essentials - Second Edition Rating: 4 out of 5 stars4/5Learning Predictive Analytics with Python Rating: 0 out of 5 stars0 ratingsMastering Python Regular Expressions Rating: 5 out of 5 stars5/5R Object-oriented Programming Rating: 3 out of 5 stars3/5Learning Functional Data Structures and Algorithms Rating: 0 out of 5 stars0 ratingsR for Data Science Rating: 5 out of 5 stars5/5Python Data Analysis Rating: 4 out of 5 stars4/5Mastering Data Analysis with R Rating: 5 out of 5 stars5/5Mastering Python Scientific Computing Rating: 4 out of 5 stars4/5
Applications & Software For You
Blender 3D By Example Rating: 4 out of 5 stars4/5Blender 3D Basics Beginner's Guide Second Edition Rating: 5 out of 5 stars5/5Animation for Beginners: Getting Started with Animation Filmmaking Rating: 3 out of 5 stars3/5Trend Following: Learn to Make a Fortune in Both Bull and Bear Markets Rating: 5 out of 5 stars5/52024 – 2025 Newbies Guide to UI/UX Design Using Figma Rating: 0 out of 5 stars0 ratingsHow to Build and Design a Website using WordPress : A Step-by-Step Guide with Screenshots Rating: 0 out of 5 stars0 ratingsPython: Master the Art of Design Patterns Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Visualizing Financial Data Rating: 0 out of 5 stars0 ratingsSound Design for Filmmakers: Film School Sound Rating: 5 out of 5 stars5/5Blender 3D for Jobseekers: Learn professional 3D creation skills using Blender 3D (English Edition) Rating: 0 out of 5 stars0 ratingsPython Projects for Everyone Rating: 0 out of 5 stars0 ratingsAmazon Echo Show (2nd Gen) & Echo Show 5 - The Complete User Guide: Learn to Use Your Echo Show Like A Pro Rating: 5 out of 5 stars5/5Learning Robotics Using Python Rating: 0 out of 5 stars0 ratingsBeginning AutoCAD® 2022 Exercise Workbook: For Windows® Rating: 0 out of 5 stars0 ratingsAdobe Illustrator CC For Dummies Rating: 5 out of 5 stars5/5iPhone 11 User Guide: The Simple Manual to Understand Your iPhone 11 with Tips and Tricks Rating: 0 out of 5 stars0 ratingsCanva Tips and Tricks Beyond The Limits Rating: 3 out of 5 stars3/5Canva For Dummies Rating: 0 out of 5 stars0 ratingsVisio 2016: Up To Speed Rating: 0 out of 5 stars0 ratingsGLSL Essentials Rating: 0 out of 5 stars0 ratingsThe Designer’s Guide to Figma: Master Prototyping, Collaboration, Handoff, and Workflow Rating: 0 out of 5 stars0 ratingsMaster In YouTube - How I Run 12+ Different Profitable YouTube Channels and Make 7 Figures From Them ! Rating: 0 out of 5 stars0 ratingsExperts' Guide to Anki Flashcards Rating: 5 out of 5 stars5/5iPhone Photography: A Ridiculously Simple Guide To Taking Photos With Your iPhone Rating: 0 out of 5 stars0 ratingsiPhone Photography For Dummies Rating: 0 out of 5 stars0 ratingsMachine Learning in Action Rating: 0 out of 5 stars0 ratings
Reviews for Mastering SciPy
0 ratings0 reviews
Book preview
Mastering SciPy - Blanco-Silva Francisco J.
Table of Contents
Mastering SciPy
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Numerical Linear Algebra
Motivation
Creation of matrices and linear operators
Constructing matrices in the ndarray class
Constructing matrices in the matrix class
Constructing sparse matrices
Linear operators
Basic matrix manipulation
Scalar multiplication, matrix addition, and matrix multiplication
Traces and determinants
Transposes and inverses
Norms and condition numbers
Matrix functions
Matrix factorizations related to solving matrix equations
Relevant factorizations
Pivoted LU decomposition
Cholesky decomposition
QR decomposition
Singular value decomposition
Matrix equations
Back and forward substitution
Basic systems: banded matrices
Basic systems: generic square matrices
Least squares
Normal equations
QR factorization
Singular value decomposition
Regularized least squares
Other matrix equation solvers
Matrix factorizations based on eigenvalues
Spectral decomposition
Schur decomposition
Summary
2. Interpolation and Approximation
Motivation
Interpolation
Implementation details
Univariate interpolation
Nearest-neighbors interpolation
Lagrange interpolation
Hermite interpolation
Piecewise polynomial interpolation
Spline interpolation
Multivariate interpolation
Least squares approximation
Linear least squares approximation
Nonlinear least squares approximation
Summary
3. Differentiation and Integration
Motivation
Differentiation
Numerical differentiation
Symbolic differentiation
Automatic differentiation
Integration
Symbolic integration
Numerical integration
Functions without singularities on finite intervals
Functions with singularities on bounded domains
Weighted functions
General functions with singularities
Integration on unbounded domains
Numerical multivariate integration
Summary
4. Nonlinear Equations and Optimization
Motivation
Non-linear equations and systems
Iterative methods for univariate functions
Bracketing methods
Secant methods
Brent method
Systems of nonlinear equations
Simple iterative solvers
The Broyden method
Powell's hybrid solver
Large-scale solvers
Optimization
Unconstrained optimization for univariate functions
Constrained optimization for univariate functions
Unconstrained optimization for multivariate functions
The stochastic methods
Deterministic algorithms that exclusively employ function evaluations
The Broyden-Fletcher-Goldfarb-Shanno quasi-Newton method
The conjugate gradient method
Constrained optimization for multivariate functions
Summary
5. Initial Value Problems for Ordinary Differential Equations
Symbolic solution of differential equations
Analytic approximation methods
Discrete-variable methods
One-step methods
Two-step methods
Summary
6. Computational Geometry
Plane geometry
Combinatorial computational geometry
Static problems
Convex hulls
Voronoi diagrams
Triangulations
Shortest paths
Geometric query problems
Point location
Nearest neighbors
Range searching
Dynamic problems
Numerical computational geometry
Bézier curves
Summary
7. Descriptive Statistics
Motivation
Probability
Symbolic setting
Numerical setting
Data exploration
Picturing distributions with graphs
Bar plots and pie charts
Histograms
Time plots
Describing distributions with numbers and boxplots
Relationship between quantitative variables
Scatterplots and correlation
Regression
Ordinary linear regression for moderate-sized datasets
Ordinary least-squares regression for large datasets
Linear regression beyond ordinary least-squares
Support vector machines
Ensemble methods
Analysis of the time series
Summary
8. Inference and Data Analysis
Statistical inference
Estimation of parameters
Frequentist approach
Bayesian approach
Likelihood approach
Interval estimation
Frequentist approach
Bayesian approach
Likelihood approach
Data mining and machine learning
Classification
Support vector classification
Trees
Naive Bayes
Nearest neighbors
Dimensionality reduction
Principal component analysis
Isometric mappings
Spectral embedding
Locally linear embedding
Clustering
MeanShift
Gaussian mixture models
Kmeans
Spectral clustering
Summary
9. Mathematical Imaging
Digital images
Binary
Gray-scale
Color
Alpha channels
High-level operations on digital images
Object measurements
Mathematical morphology
Smoothing filters
Multivariate calculus
Statistical filters
Fourier analysis
Wavelet decompositions
Image compression
Lossless compression
Lossy compression
Image editing
Transformations of the domain
Rescale and resize
Swirl
Geometric transformations
Intensity adjustment
Histogram equalization
Intensity clipping/resizing
Contrast enhancement
Image restoration
Noise reduction
Sharpening and blurring
Inpainting
Image analysis
Image structure
Object recognition
Edge detection
Line, circle, and ellipse detection
Blob detection
Corner detection
Beyond geometric entities
Summary
Index
Mastering SciPy
Mastering SciPy
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2015
Production reference: 1301015
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-474-9
www.packtpub.com
Credits
Author
Francisco J. Blanco-Silva
Reviewers
Raiyan Kamal
Kristen Thyng
Patrick Varilly
Jonathan Whitmore
Commissioning Editor
Kartikey Pandey
Acquisition Editor
Shaon Basu
Content Development Editor
Nikhil Potdukhe
Technical Editor
Bharat Patil
Copy Editors
Tani Kothari
Merilyn Pereira
Project Coordinator
Judie Jose
Proofreader
Safis Editing
Indexer
Tejal Soni
Production Coordinator
Aparna Bhagat
Cover Work
Aparna Bhagat
About the Author
Francisco J. Blanco-Silva is the owner of a scientific consulting company called Tizona Scientific Solutions, a faculty member of the Department of Mathematics, and an associate member of the Interdisciplinary Mathematics Institute at the University of South Carolina. He obtained his formal training as an applied mathematician from Purdue University. He enjoys problem solving, learning, and teaching alike. Being an avid programmer and blogger, when it comes to writing, he relishes finding the common denominator among his passions and skills and making it available to everyone.
He wrote the prequel to this book, Learning SciPy for Numerical and Scientific Computing, Packt Publishing, and coauthored Chapter 5 of the book, Modeling Nanoscale Imaging in Electron Microscopy, Springer.
I will always be indebted to Bradley J. Lucier and Rodrigo Bañuelos for being constant sources of inspiration and for their guidance and teachings. Special thanks to my editors, Sriram Neelakantam, Bharat Patil, Nikhil Potdukhe, Mohammad Rizvi, and the many colleagues who have contributed by giving me encouragement and participating in helpful discussions. In particular, I would like to mention Parsa Bakhtary, Aaron Dutle, Edsel Peña, Pablo Sprechmann, Adam Taylor, and Holly Watson.
The most special thanks, without a doubt, goes to my wife and daughter. Grace's love and smiles alone provided all the motivation, enthusiasm, and skills to overcome the difficulties encountered during the writing of this book and everything that life threw at me ever since she was born.
About the Reviewers
Raiyan Kamal is a strong proponent of the open source movement and everything related to Python. He holds a bachelor's degree in computer science from BUET, Dhaka, Bangladesh, and a master's degree from the University of Windsor, Ontario, Canada. He has been working in the software industry for several years, developing software for mobile, web, and desktop platforms. Although he is in his early thirties, Raiyan feels that his boyhood has not ended yet. He often looks for hidden treasures in science, engineering, programming, art, and nature. He is currently working at IOU Concepts, exploring different ways of saying thank you. When he isn't on a computer, he plants trees and composts kitchen scraps.
Kristen Thyng has worked on scientific computing for most of her career. She has a bachelor's degree in physics from Whitman College, master's degree in applied mathematics from the University of Washington, and PhD in mechanical engineering from the University of Washington. She uses Python on a daily basis for analysis and visualization in physical oceanography at Texas A&M University, where she works as an assistant research scientist.
Jonathan Whitmore is a data scientist at Silicon Valley Data Science. He has a diverse range of interests and is excited by the challenges in data science and data engineering. Before moving into the tech industry, he worked as an astrophysicist in Melbourne, Australia, researching whether the fundamental physical constants have changed over the lifespan of the universe. He has a long-standing commitment to the public's understanding of science and technology, and has contributed to FOSS projects. He co-starred in the 3D IMAX film Hidden Universe, which was playing in theaters around the world at the time of writing this book. Jonathan is a sought-after conference speaker on science and technical topics. He received his PhD in physics from the University of California, San Diego, and graduated magna cum laude from Vanderbilt University with a bachelor's degree in science. He is also a triple major in physics (with honors), philosophy, and mathematics.
www.PacktPub.com
Support files, eBooks, discount offers, and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Preface
The idea of writing Mastering SciPy arose but 2 months after publishing Learning SciPy for Numerical and Scientific Computing. During a presentation of that book at the University of South Carolina, I had the privilege of speaking about its contents to a heterogeneous audience of engineers, scientists, and students, each of them with very different research problems and their own set of preferred computational resources. In the weeks following that presentation, I helped a few professionals transition to a SciPy-based environment. During those sessions, we discussed how SciPy is, under the hood, the same set of algorithms (and often the same code) that they were already using. We experimented with some of their examples and systematically obtained comparable performance. We immediately saw the obvious benefit of a common environment based upon a robust scripting language. Through the SciPy stack, we discovered an easier way to communicate and share our results with colleagues, students, or employers. In all cases, the switch to the SciPy stack provided a faster setup for our groups, where newcomers could get up to speed quickly.
Everybody involved in the process went from novice to advanced user, and finally mastered the SciPy stack in no time. In most cases, the scientific background of the individuals with whom I worked made the transition seamless. The process toward mastering materialized when they were able to contrast the theory behind their research with the solutions offered. The aha moment always happened while replicating some of their experiments with a careful guidance and explanation of the process.
That is precisely the philosophy behind this book. I invite you to participate in similar sessions. Each chapter has been envisioned as a conversation with an individual with certain scientific needs expressed as numerical computations. Together, we discover relevant examples—the different possible ways to solve those problems, the theory behind them, and the pros and cons of each route.
The process of writing followed a similar path to obtain an engaging collection of examples. I entered into conversations with colleagues in several different fields. Each section clearly reflects these exchanges. This was crucial while engaged in the production of the most challenging chapters—the last four. To ensure the same quality throughout the book, always trying to commit to a rigorous set of standards, these chapters took much longer to be completed to satisfaction. Special mentions go to Aaron Dutle at NASA Langley Research Center, who helped shape parts of the chapter on computational geometry, and Parsa Bakhtary, a data analyst at Facebook, who inspired many of the techniques in the chapter on applications of statistical computing to data analysis.
It was an amazing journey that helped deepen my understanding of numerical methods, broadened my perspective in problem solving, and strengthened my scientific maturity. It is my wish that it has the same impact on you.
What this book covers
Chapter 1, Numerical Linear Algebra, presents an overview of the role of matrices to solve problems in scientific computing. It is a crucial chapter for understanding most of the processes and ideas of subsequent chapters. You will learn how to construct and store large matrices effectively in Python. We then proceed to reviewing basic manipulation and operations on them, followed by factorizations, solutions of matrix equations, and the computation of eigenvalues/eigenvectors.
Chapter 2, Interpolation and Approximation, develops advanced techniques to approximate functions, and their applications to scientific computing. This acts as a segway for the next two chapters.
Chapter 3, Differentiation and Integration, explores the different techniques to produce derivatives of functions and, more importantly, how to compute areas and volumes effectively by integration processes. This is the first of two chapters devoted to the core of numerical methods in scientific computing. This second part is also an introduction to Chapter 5, Initial Value Problems for Ordinary Differential Equations that mentions ordinary differential equations.
Chapter 4, Nonlinear Equations and Optimization, is a very technical chapter in which we discuss the best methods of obtaining the roots and extrema of systems of functions depending on the kinds of functions involved.
Chapter 5, Initial Value Problems for Ordinary Differential Equations, is the first of five chapters on applications to real-world problems. We show you, by example, the most popular techniques to solve systems of differential equations, as well as some applications.
Chapter 6, Computational Geometry, takes a tour of the most significant algorithms in this branch of computer science.
Chapter 7, Descriptive Statistics, is the first of two chapters on statistical computing and its applications to Data Analysis. In this chapter, we focus on probability and data exploration.
Chapter 8, Inference and Data Analysis, is the second chapter on Data Analysis. We focus on statistical inference, machine learning, and data mining.
Chapter 9, Mathematical Imaging, is the last chapter of this book. In it, we explore techniques for image compression, edition, restoration, and analysis.
What you need for this book
To work with the examples and try out the code of this book, all you need is a recent version of Python (2.7 or higher) with the SciPy stack: NumPy, the SciPy library, matplotlib, IPython, pandas, and SymPy. Although recipes to install all these independently are provided throughout the book, we recommend that you perform a global installation through a scientific Python distribution such as Anaconda.
Who this book is for
Although this book and technology are ultimately intended for applied mathematicians, engineers, and computer scientists, the material presented here is targeted at a broader audience. All that is needed is proficiency in Python, familiarity with iPython, some knowledge of the numerical methods in scientific computing, and a keen interest in developing serious applications in science, engineering, or data analysis.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: We can include other contexts through the use of the include directive.
Any command-line input or output is written as follows:
In [7]: %time eigvals, v = spspla.eigsh(A, 5, which='SM') CPU times: user 19.3 s, sys: 532 ms, total: 19.8 s Wall time: 16.7 s In [8]: print eigvals [ 10.565523 10.663114 10.725135 10.752737 10.774503]
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <feedback@packtpub.com>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. You can also download the code files from GitHub repository at https://github.com/blancosilva/Mastering-Scipy.
Downloading the color images of this book
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/4749OS_ColorImages.pdf.
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any li st of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
Questions
If you have a problem with any aspect of this book, you can contact us at <questions@packtpub.com>, and we will do our best to address the problem.
Chapter 1. Numerical Linear Algebra
The term Numerical Linear Algebra refers to the use of matrices to solve computational science problems. In this chapter, we start by learning how to construct these objects effectively in Python. We make an emphasis on importing large sparse matrices from repositories online. We then proceed to reviewing basic manipulation and operations on them. The next step is a study of the different matrix functions implemented in SciPy. We continue on to exploring different factorizations for the solution of matrix equations, and for the computation of eigenvalues and their corresponding eigenvectors.
Motivation
The following image shows a graph that represents a series of web pages (numbered from 1 to 8):
An arrow from a node to another indicates the existence of a link from the web page, represented by the sending node, to the page represented by the receiving node. For example, the arrow from node 2 to node 1 indicates that there is a link in web page 2 pointing to web page 1. Notice how web page 4 has two outer links (to pages 2 and 8), and there are three pages that link to web page 4 (pages 2, 6, and 7). The pages represented by nodes 2, 4, and 8 seem to be the most popular at first sight.
Is there a mathematical way to actually express the popularity of a web page within a network? Researchers at Google came up with the idea of a PageRank to roughly estimate this concept by counting the number and quality of links to a page. It goes like this:
We construct a transition matrix of this graph, T={a[i,j]}, in the following fashion: the entry a[i,j] is 1/k if there is a link from web page i to web page j, and the total number of outer links in web page i amounts to k. Otherwise, the entry is just zero. The size of a transition matrix of N web pages is always N × N. In our case, the matrix has size 8 × 8:
0 1/2 0 0 0 0 0 0 1 0 1/2 1/2 0 0 0 0 0 0 0 0 0 0 1/3 0 0 1/2 0 0 0 1 1/3 0 0 0 1/2 0 0 0 0 0 0 0 0 0 0 0 0 1/2 0 0 0 0 1/2 0 0 1/2 0 0 0 1/2 1/2 0 1/3 0
Let us open an iPython session and load this particular matrix to memory.
Note
Remember that in Python, indices start from zero, not one.
In [1]: import numpy as np, matplotlib.pyplot as plt, \ ...: scipy.linalg as spla, scipy.sparse as spsp, \ ...: scipy.sparse.linalg as spspla In [2]: np.set_printoptions(suppress=True, precision=3) In [3]: cols = np.array([0,1,1,2,2,3,3,4,4,5,6,6,6,7,7]); \ ...: rows = np.array([1,0,3,1,4,1,7,6,7,3,2,3,7,5,6]); \ ...: data = np.array([1., 0.5, 0.5, 0.5, 0.5, \ ...: 0.5, 0.5, 0.5, 0.5, 1., \ ...: 1./3, 1./3, 1./3, 0.5, 0.5]) In [4]: T = np.zeros((8,8)); \ ...: T[rows,cols] = data
From the transition matrix, we create a PageRank matrix G by fixing a positive constant p between 0 and 1, and following the formula G = (1-p)*T + p*B for a suitable damping factor p. Here, B is a matrix with the same size as T, with all its entries equal to 1/N. For example, if we choose p = 0.15, we obtain the following PageRank matrix:
In [5]: G = (1-0.15) * T + 0.15/8; \ ...: print G [[ 0.019 0.444 0.019 0.019 0.019 0.019 0.019 0.019] [ 0.869 0.019 0.444 0.444 0.019 0.019 0.019 0.019] [ 0.019 0.019 0.019 0.019 0.019 0.019 0.302 0.019] [ 0.019 0.444 0.019 0.019 0.019 0.869 0.302 0.019] [ 0.019 0.019 0.444 0.019 0.019 0.019 0.019 0.019] [ 0.019 0.019 0.019 0.019 0.019 0.019 0.019 0.444] [ 0.019 0.019 0.019 0.019 0.444 0.019 0.019 0.444] [ 0.019 0.019 0.019 0.444 0.444 0.019 0.302 0.019]]
PageRank matrices have some interesting properties:
1 is an eigenvalue of multiplicity one.
1 is actually the largest eigenvalue; all the other eigenvalues are in modulus smaller than 1.
The eigenvector corresponding to eigenvalue 1 has all positive entries. In particular, for the eigenvalue 1, there exists a unique eigenvector with the sum of its entries equal to 1. This is what we call the PageRank vector.
A quick computation with scipy.linalg.eig finds that eigenvector for us:
In [6]: eigenvalues, eigenvectors = spla.eig(G); \ ...: print eigenvalues [ 1.000+0.j -0.655+0.j -0.333+0.313j -0.333-0.313j –0.171+0.372j -0.171-0.372j 0.544+0.j 0.268+0.j ] In [7]: PageRank = eigenvectors[:,0]; \ ...: PageRank /= sum(PageRank); \ ...: print PageRank.real [ 0.117 0.232 0.048 0.219 0.039 0.086 0.102 0.157]
Those values correspond to the PageRank of each of the eight web pages depicted on the graph. As expected, the maximum value of those is associated to the second web page (0.232), closely followed by the fourth (0.219) and then the eighth web page (0.157). These values provide us with the information that we were seeking: the second web page is the most popular, followed by the fourth, and then, the eight.
Note
Note how this problem of networks of web pages has been translated into mathematical objects, to an equivalent problem involving matrices, eigenvalues, and eigenvectors, and has been solved with techniques of Linear Algebra.
The transition matrix is sparse: most of its entries are zeros. Sparse matrices with an extremely large size are of special importance in Numerical Linear Algebra, not only because they encode challenging scientific problems but also because it is extremely hard to manipulate them with basic algorithms.
Rather than storing to memory all values in the matrix, it makes sense to collect only the non-zero values instead, and use algorithms which exploit these smart storage schemes. The gain in memory management is obvious. These methods are usually faster for this kind of matrices and give less roundoff errors, since there are usually far less operations involved. This is another advantage of SciPy, since it contains numerous procedures to attack different problems where data is stored in this fashion. Let us observe its power with another example:
The University of Florida Sparse Matrix Collection is the largest database of matrices accessible online. As of January 2014, it contains 157 groups of matrices arising from all sorts of scientific disciplines. The sizes of the matrices range from very small (1 × 2) to insanely large (28 million × 28 million). More matrices are expected to be added constantly, as they arise in different engineering problems.
Tip
More information about this database can be found in ACM Transactions on Mathematical Software, vol. 38, Issue 1, 2011, pp 1:1-1:25, by T.A. Davis and Y.Hu, or online at http://www.cise.ufl.edu/research/sparse/matrices/.
For example, the group with the most matrices in the database is the original Harwell-Boeing Collection, with 292 different sparse matrices. This group can also be accessed online at the Matrix Market: http://math.nist.gov/MatrixMarket/.
Each matrix in the database comes in three formats:
Matrix Market Exchange format [Boisvert et al. 1997]
Rutherford-Boeing Exchangeformat [Duff et al. 1997]
Proprietary Matlab.matformat.
Let us import to our iPython session two matrices in the Matrix Market Exchange format from the collection, meant to be used in a solution of a least squares problem. These matrices are located at www.cise.ufl.edu/research/sparse/matrices/Bydder/mri2.html.The numerical values correspond to phantom data acquired on a Sonata 1.5-T scanner (Siemens, Erlangen, Germany) using a magnetic resonance imaging (MRI) device. The object measured is a simulation of a human head made with several metallic objects. We download the corresponding tar bundle and untar it to get two ASCII files:
mri2.mtx (the main matrix in the least squares problem)
mri2_b.mtx (the right-hand side of the equation)
The first twenty lines of the file mri2.mtx read as follows:
The first sixteen lines are comments, and give us some information about the generation of the matrix.
The computer vision problem where it arose: An MRI reconstruction
Author information: Mark Bydder, UCSD
Procedures to apply to the data: Solve a least squares problem A * x - b, and posterior visualization of the result
The seventeenth line indicates the size of the matrix, 63240 rows × 147456 columns, as well as the number of non-zero entries in the data, 569160.
The rest of the file includes precisely 569160 lines, each