Data Analytics and Reporting - Notes Unit 1 and 2
Data Analytics and Reporting - Notes Unit 1 and 2
Data Analytics and Reporting - Notes Unit 1 and 2
Reporting: An Introduction
Welcome to the World of Data!
Understanding the Importance of Data
In today's digital age, data is the new oil. It's the raw material that fuels
innovation, decision-making, and business growth. Data Analytics is the process
of examining, cleaning, transforming, and modeling data to discover useful
information, draw conclusions, and support decision-making.
Why Python?
Python has emerged as the language of choice for data scientists and analysts
due to its simplicity, readability, and powerful libraries. It's versatile, making it
suitable for both beginners and experienced programmers.
● History of Python:
a. Created by Guido van Rossum in the late 1980s
b. Named after the British comedy group Monty Python
c. Initially designed for scripting and automation
d. Grew in popularity due to its focus on code readability and efficiency
● Purpose of Python in Data Analytics:
a. Data manipulation and cleaning
b. Exploratory data analysis (EDA)
c. Data visualization
d. Machine learning and model building
e. Statistical analysis
● Numeric:
a. int: Integer values (e.g., 42, -10)
b. float: Floating-point numbers (e.g., 3.14, 2.5)
c. complex: Complex numbers (e.g., 2+3j)
● Text:
a. str: Strings (e.g., "Hello", 'World')
● Boolean:
a. bool: Boolean values (True or False)
● Sequence:
a. list: Ordered collection of items (mutable)
b. tuple: Ordered collection of items (immutable)
● Mapping:
a. dict: Unordered collection of key-value pairs
Installation:
1. Open your terminal or command prompt.
2. Type the following command and press Enter:
Bash
pip install pandas
Importing Pandas:
Python
import pandas as pd
Creating a DataFrame:
Python
import pandas as pd
df = pd.DataFrame(data)
print(df)
In the next session, we will delve deeper into Pandas, exploring various
data manipulation techniques and visualization capabilities.
Remember: Practice is key to mastering Python and Pandas. Experiment with
different datasets and explore the vast functionalities offered by these libraries.
Unit-01: Introduction to
Data Analytics and
Reporting
Lecture 1: What is Data Analytics?
Data Analytics is the process of examining large data sets to discover trends
and patterns. It involves collecting, cleaning, transforming, and analyzing data to
extract meaningful insights. These insights can be used to make informed
decisions, identify opportunities, and solve problems.
Data Processing is the conversion of raw data into a more organized format
suitable for analysis. This involves tasks like data cleaning, transformation, and
integration.
Data Sources:
Python
import pandas as pd
# Create a Series
data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30,
35]}
df = pd.DataFrame(data)
print(df)
Python
import pandas as pd
# Export to CSV
df.to_csv('output.csv', index=False)
Python
import pandas as pd
import numpy as np
# Remove duplicates
df.drop_duplicates(inplace=True)
import pandas as pd
Python
import pandas as pd
# Sort by age
df.sort_values('Age', ascending=False, inplace=True)
import pandas as pd