Movies Analysis

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

PANDAS

PANDAS is a python library for data analysis. Started by


Wes McKinney in 2008 out of a need for a powerful and
flexible quantitate analysis tool, pandas has been grown
into one of most popular libraries Python libraries. It has
an extremely active community contributors.
Pandas is built on top of two core Python libraries—
matplotlib for data visualization and NumPy for
mathematical operations. Pandas acts as a wrapper
over these libraries, allowing you to access many of
matplotlib's and NumPy's methods with less code. For
instance, pandas' .plot() combines multiple matplotlib
methods into a single method, enabling you to plot a
chart in a few lines.
Before pandas, most analysts used Python for data
munging and preparation, and then switched to a more
domain specific language like R for the rest of their
workflow. Pandas introduced two new types of objects
for storing data that make analytical tasks easier and
eliminate the need to switch tools: Series, which have
a list-like structure, and DataFrames, which have a
tabular structure.

pandas tutorials
Here are some analysis-focused pandas tutorials that
aren't riddled with technical jargon.
 Pandas cookbook (Julia Evans) - This tutorial uses
real-world data and presents a problem to solve or
question to answer in every example. Great for
putting pandas' capabilities in context of the actual
analytical workflow.
 Practical Data Analysis with Python (Anita

Raichand) - Provides code examples for four specific


analytical tasks: data munging, aggregation,
visualization, and time series analysis.
 [VIDEO SERIES] Easier data analysis in Python with

pandas (Data School) - A series of video tutorials for


pandas newbies who know some Python. Each video
answers a student-posed question using real-world
data.
 An Introduction to Pandas (Michael Hansen) - This

tutorial covers the basics of pandas with a complete


analysis of weather data—from reading in data to
creating charts.
 Modern Pandas (Tom Augspurger) - An

intermediate tutorial for experienced Python users


looking to stay sharp on pandas.

pandas data structures


SERIES
You can think of a series as a single column of data.
Each value in the series has a label, and these labels
are collectively referred to as an index. This is
demonstrated in the output below. 0-4 is the index
and the column of numbers to the right contain the
values.
0 22
1 27
2 31
3 33
4 34

DATAFRAME
While series are useful, most analysts work with the
majority of their data in DataFrames. DataFrames
store data in the familiar table format of rows and
columns, much like a spreadsheet or database.
DataFrames makes a lot of analytical tasks easier,
such as finding the averages per column in a
dataset.
You can also think of DataFrames as a collection of
series—just as multiple columns combined make up
a table, multiple series make up a DataFrame.
home_page_visits like_messages
messages searches
0 784 492
292 102
1 793 500
287 106
2 253 172
110 40
3 134 95
55 33
4 501 331
182 119
Note: In Mode, the results of your SQL queries are
automatically converted into DataFrames and made
available in the list variable "datasets." To describe
or transform the results of Query 1, use datasets[0],
for the results of Query 2, use datasets[1] and so on.
For more on manipulating pandas data structures,
check out Greg Reda's three-part tutorial, which
approaches the topic from a SQL perspective.

PANDAS FEATURE
TIME SERIES ANALYSIS
 Time Series / Date functionality (Official Pandas
Documentation)
 Times series analysis with pandas (EarthPy)

 Timeseries with pandas (Jupyter)

 Complete guide to create a Time Series Forecast

(with Codes in Python) (Analytics Vidhya)


SPLIT COMBINE APPLY
Split-apply-combine is a common strategy used
during analysis to summarize data—you split data
into logical subgroups, apply some function to each
subgroup, and stick the results back together again.
In pandas, this is accomplished using
the groupby() function and whatever functions you
want to apply to the subgroups.
 Group By: split-apply-combine (Official Pandas
Documentation)
 Summarizing Data in Python with Pandas (Brian

Connelly)
 Using Pandas: Split-Apply-Combine (Duke

University)
DATA VISUALIZATION
 Visualization (Official Pandas Documentation)
 Simple Graphing with IPython and Pandas (Chris

Moffitt)
 Beautiful Plots With Pandas and Matplotlib (The

Data Science Lab)


PIVOT TABLES
 Reshaping and Pivot Tables (Official Pandas
Documentation)
 Pandas Pivot Table Explained (Chris Moffitt)

 Pivot Tables in Python (O'Reilly)


WORKING WITH MISSING DATA
 Working with missing data (Official Pandas
Documentation)
 Handling missing data (O'Reilly)

python
Python is a high-level, general-purpose
programming language. Its design philosophy
emphasizes code readability with the use
of significant indentation.[31]
Python is dynamically typed and garbage-collected.
It supports multiple programming paradigms,
including structured (particularly procedural), object-
oriented and functional programming. It is often
described as a "batteries included" language due to
its comprehensive standard library.
Guido van Rossum began working on Python in the
late 1980s as a successor to the ABC programming
language and first released it in 1991 as
Python 0.9.0. Python 2.0 was released in 2000.
Python 3.0, released in 2008, was a major revision
not completely backward-compatible with earlier
versions. Python 2.7.18, released in 2020, was the
last release of Python 2.
Python consistently ranks as one of the most popular
programming languages, and has gained widespread
use in the machine learning community.

History
The designer of Python, Guido van Rossum,
at OSCON 2006

Main article: History of Python


Python was conceived in the late 1980s by Guido
van Rossum at Centrum Wiskunde &
Informatica (CWI) in the Netherlands as a successor
to the ABC programming language, which was
inspired by SETL, capable of exception handling and
interfacing with the Amoeba operating system. Its
implementation began in December 1989. Van
Rossum shouldered sole responsibility for the
project, as the lead developer, until 12 July 2018,
when he announced his "permanent vacation" from
his responsibilities as Python's "benevolent dictator
for life", a title the Python community bestowed
upon him to reflect his long-term commitment as the
project's chief decision-maker. In January 2019,
active Python core developers elected a five-
member Steering Council to lead the project.
Python 2.0 was released on 16 October 2000, with
many major new features such as list
comprehensions, cycle-detecting garbage
collection, reference counting,
and Unicode support. Python 3.0, released on 3
December 2008, with many of its major
features backported to Python 2.6.x and 2.7.x.
Releases of Python 3 include the 2to3 utility, which
automates the translation of Python 2 code to
Python 3.
Python 2.7's end-of-life was initially set for 2015,
then postponed to 2020 out of concern that a large
body of existing code could not easily be forward-
ported to Python 3. No further security patches or
other improvements will be released for it. Currently
only 3.8 and later are supported (2023 security
issues were fixed in e.g. 3.7.17, the final 3.7.x
release).

Design philosophy and features


Python is a multi-paradigm programming
language. Object-oriented
programming and structured programming are fully
supported, and many of their features support
functional programming and aspect-oriented
programming (including metaprogramming and met
aobjects). Many other paradigms are supported via
extensions, including design by contract and logic
programming.
Python uses dynamic typing and a combination
of reference counting and a cycle-detecting garbage
collector for memory management. It uses
dynamic name resolution (late binding), which binds
method and variable names during program
execution.
Its design offers some support for functional
programming in the Lisp tradition. It
has filter ,mapandreduce functions; list
comprehensions, dictionaries, sets,
and generator expressions. The standard library has
two modules ( itertools and functools ) that
implement functional tools borrowed
from Haskell and Standard ML.
Its core philosophy is summarized in the
document The Zen of Python (PEP 20), which
includes aphorisms such as:
 Beautiful is better than ugly.
 Explicit is better than implicit.

 Simple is better than complex.

 Complex is better than complicated.

 Readability counts.

Rather than building all of its functionality into its


core, Python was designed to be
highly extensible via modules. This compact
modularity has made it particularly popular as a
means of adding programmable interfaces to
existing applications. Van Rossum's vision of a small
core language with a large standard library and
easily extensible interpreter stemmed from his
frustrations with ABC, which espoused the opposite
approach.

What Is Matplotlib
In Python?

Matplotlib is a cross-platform, data visualization and


graphical plotting library (histograms, scatter plots,
bar charts, etc) for Python and its numerical
extension NumPy. As such, it offers a viable open
source alternative to MATLAB. Developers can also
use matplotlib’s APIs (Application Programming
Interfaces) to embed plots in GUI applications.

A Python matplotlib script is structured so that a few


lines of code are all that is required in most
instances to generate a visual data plot. The
matplotlib scripting layer overlays two APIs:
 The pyplot API is a hierarchy of Python code
objects topped by matplotlib.pyplot
 An OO (Object-Oriented) API collection of objects

that can be assembled with greater flexibility than


pyplot. This API provides direct access to Matplotlib’s
backend layers.

MATPLOTLIB AND PYPLOT IN PYHTON


The pyplot API has a convenient MATLAB-style
stateful interface. In fact, the matplotlib Python
library was originally written as an open source
alternative for MATLAB. The OO API and its interface
is more customizable and powerful than pyplot, but
considered more difficult to use. As a result, the
pyplot interface is more commonly used, and is
referred to by default in this article.
Understanding matplotlib’s pyplot API is key to
understanding how to work with plots:
 matplotlib.pyplot.figure: Figure is the top-level
container. It includes everything visualized in a plot
including one or more Axes.
 matplotlib.pyplot.axes: Axes contain most of

the elements in a plot: Axis, Tick, Line2D,


Text, etc., and sets the coordinates. It is the area in
which data is plotted. Axes include the X-Axis, Y-Axis,
and possibly a Z-Axis, as well.

INSTALLING MATPLOTLIB
Matplotlib and its dependencies can be downloaded
as a binary (pre-compiled) package from the Python
Package Index (PyPI), and installed with the
following command:
python -m pip install matplotlib
Matplotlib is also available as uncompiled source
files from GitHub. Compiling from source will require
your local system to have the appropriate compiler
for your OS, all dependencies, setup scripts,
configuration files, and patches available. This can
result in a fairly complex installation. Alternatively,
consider using the ActiveState Platform to
automatically build matplotlib from source and
package it for your OS.

You might also like