Session5 - Analytics For Programming II - Siryani - 091924
Session5 - Analytics For Programming II - Siryani - 091924
Session 5
Thursday September 19th, 2024
Source: www.gartner.com
Predictive Analytics
What is predictive analytics?
❖ Predictive analytics is an applied field that uses a variety of quantitative methods that make use of data in order to make predictions
❖ An applied field: finance, telecommunications, advertising, insurance, healthcare, education, entertainment, banking, and so on
❖ Uses a variety of quantitative methods: bayesian inference, machine learning, deep learning, empirical findings, visualization
❖ That makes use of data: data is the raw material out of which, predictive analytics models are built
❖ Example: we can build a predictive model that is able to "predict", if a patient has the disease X using his clinical data
▪ now, when we gather the patient's data, the disease X is already present or not
▪ we are not "predicting" if the patient will have the disease X in the future
▪ the model is giving an assessment (an educated guess) about the unknown event "the patient has disease X“
▪ sometimes, of course, the prediction will be about the future, though keep in mind that won't be necessarily the case
Regular Expressions
❖ A regular expression (shortened as regex) is a sequence of characters that specifies a search pattern in text
❖ Such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings
❖ Regular expression techniques are developed in theoretical computer science and formal language theory
❖ The American mathematician Stephen Cole Kleene formalized the concept of a regular language
❖ Most general-purpose programming languages support regex capabilities including Python, Java, C, C++, etc.
❖ Regular expressions are used in search engines, in search and replace dialogs of word processors and text editors
Regular Expressions
❖ Accessing and processing lot of data are not the main problem
❖ Filtering data is
❖ Regular expressions provide one type of filter that can be used to extract relevant data from the big chunks of data
▪ Regular expressions make it possible to query this XML text and find all movies that were filmed in Budapest in 2016
❖ Many small features are also easier Regular expressions are everywhere. These skills will come handy for you in your IT
engineering career.
Regular Expressions
Linear Regression
Linear Regression
❖ Regression analysis can be used to develop an equation showing how the variables are related:
▪ The variable being predicted is called the dependent variable and is denoted by “y”
▪ The variables being used to predict the value of the dependent variable are called the independent variables
❖ Simple linear regression involves one independent variable and one dependent variable
❖ Regression analysis involving two or more independent variables is called multiple regression
2 Lecture 5
Web Scraping, Basic HTML
Web Scraping
❖ Web scraping (web harvesting or web data extraction) is a technique for extracting information
from websites
❖ Web Scraping scripts tend to simulate a person viewing a Web site with a browser. With these
scripts you can connect to a Web page and request a page, exactly as a browser would do
❖ The Web server will send back the page which you can then manipulate or extract specific
information from.
Document Object Model (DOM)
❖ Document Object Model (DOM) is an application programming interface (API) for valid HTML and
well-formed XML documents
❖ It defines the logical structure of documents and the way a document is accessed and
manipulated
• Requests is an elegant and simple HTTP library for Python, built for human beings
• Getting Started
r = requests.get('https://quotes.toscrape.com/')
r.status_code
r.encoding
r.text
Python Requests Module
•Extract quotes and author names from the website and save
them in a csv file named “quotes.csv”
• https://quotes.toscrape.com/
•e.g:
• Albert Einstein, Try not to become a man of success. Rather
become a man of value
• André Gide , It is better to be hated for what you are than to be
loved for what you are not
Beautiful Soup
Requests Scrapy
Sequential / Synchronous Parallel / Asynchronous
Suitable for small projects and API Suitable for small and large projects
Scrapy
• Extract the tags along with the Quotes and Author name
Scrapy
• Extract the year from the first page as well and pass it to the next
callback
Selenium
joesiryani@gwu.edu
josephsiryani