jenisha INTERNSHIP REPORT-2.docx (1)
jenisha INTERNSHIP REPORT-2.docx (1)
jenisha INTERNSHIP REPORT-2.docx (1)
Python is a versatile programming language that is widely used for data science,
machine learning, software development, web development and more. Here are some things
to know about Python for data science.
It is a versatile programming language used in various fields. It is widely used for data
analysis and visualization, with libraries such as pandas, NumPy, matplotlib, and seaborn.
Python is also a popular choice for machine learning, software development, web
development, and task automation or scripting.
It is favoured in data science due to its readability, simplicity, and versatility. Its
extensive libraries and frameworks streamline complex tasks, allowing data scientists to
focus on problem-solving rather than coding intricacies.
It is a popular choice for data science because of its built-in mathematical libraries
and functions, which make it easier to perform data analysis and calculate mathematical
problems.
Data science combines statistical analysis, programming skills, and domain expertise
to extract insights and knowledge from data. It has become essential to various industries,
from healthcare to finance, enabling organizations to make data-driven decisions. This report
provides a detailed comprehensive introduction to data science with Python, covering key
concepts and practical examples.
1
CHAPTER 1
Quick Analysis takes a range of data and helps you pick the perfect chart with just a
few commands.
This feature helps highlight important data points, trends, or outliers, making it easier
to identify key insights in your data.
Data validation in Excel refers to setting specific criteria for accepting data in a cell or
range of cells.
2
1.6 Formulas and Functions
One of Excel's standout features is its ability to perform calculations and operations
on data using formulas and functions. Users can create complex calculations by combining
mathematical operators, cell references, and built-in functions.
⮚ IF function: A versatile tool that allows users to build logic and decisions into their
spreadsheets.
⮚ Sum of data =SUMIFS is an important formula that can arise in an entry-level data
analysis interview.
Formula: =SUMIF(RANGE,CRITERIA,[sum_range])
⮚ Average of data =AVERAGEIF has similarities to =SUMIF, and the two usually work
in conjunction. It enables you to determine averages of multiple variables.
Formula: =AVERAGEIF(SELECT CELL,CRITERIA,[AVERAGE_RANGE])
⮚ Connecting data sets=VLOOKUP function enables users to marry data from two
different sources within the spreadsheet to get a numerical result.
Formula:=VLOOKUP(LOOKUP_VALUE,TABLE_ARRAY,COL_INDEX_NU
M, [RANGE_LOOKUP])
⮚ DAYS function: Use this function to return the number of days between two dates.
NETWORKDAYS function: Returns the number of whole workdays between two
dates.
TODAY function: Returns the serial number of today's date.
NOW function: Returns the serial number of the current date and time.
Clean text data by removing duplicates, trimming spaces, and standardizing formats.
Pivot Tables are powerful tools for summarizing and analyzing large datasets. Pivot
Charts works hand-in-hand with Pivot Tables to provide visual representations of the
summarized data.
1.9 Dashboard
A dashboard in Excel is a visual representation of key metrics that helps you analyze
data and make quick decisions.
3
CHAPTER 2
PYTHON ESSENTIALS
Variable: Variable is a name that is used to store data in the memory location.
Keyword: Python keywords are reserved words that define the structure and syntax of the
Python language. They are case sensitive and cannot be used as variable names, function
names, other identifiers. Example: and, as, break, except, false, or, not, if, true, etc.
2.3 Operators
Operators are special symbols in Python that carry out arithmetical or logical
computation. The value that the operator operates on is called the operand.
4
2.3.1 Types of Operators
2.5 Functions
It is block of code that runs or works when it is called. The types are:
o Built-in functions: These functions are pre-defined in Python and can be used
without further declaration. Examples: enumerate(), eval(), exec(), and filter().
o User-defined functions: These are created by the user to perform specific tasks.
o Lambda functions: These are small, unnamed functions defined using the
lambda keyword. They are typically used for short and simple operations.
Data Structures are a way of organizing data so that it can be accessed more
efficiently depending upon the situation.
1. String
2. List
3. Tuple
4. Set
5. Dictionary
5
CHAPTER 3
3.1 NumPy
NumPy stands for Numerical Python. It is a Python library used for working with
arrays. NumPy was created in 2005 by Travis Oliphant. It is an open source project and you
can use it freely. It provides a high-performance multidimensional array object and tools for
working with these arrays.
● np.zeros: The np.zeros function fills the whole array with zeros.
● np.ones: The np.ones function fills the whole array with ones.
● np.full: The np.full function structure is a bit different from the others until now. Along
with the shape and data type, it also takes another argument called ‘fill_value’.
● np.eye: The np.eye function produces a diagonal matrix. It returns a 2-D array with 1’s on
the diagonals, and 0’s everywhere else.
6
● np.random.random: The purpose of this function is to return random values from a
continuous uniform distribution.
3.2 Pandas
Pandas is a Python library used for working with data sets. It has functions for
analyzing, cleaning, exploring, and manipulating data. It is a powerful and open-source
Python library. The Pandas library is used for data manipulation and analysis. Pandas consist
of data structures and functions to perform efficient operations on data.
It is the first machine learning step in which we transform raw data obtained from
various sources into a usable format to implement accurate machine learning models.
Pandas Data Frame is created by loading the datasets from existing storage (which
can be a SQL database, a CSV file, or an Excel file). It can also be created from lists,
dictionaries, a list of dictionaries, etc.
Pandas generally provide two data structures for manipulating data. They are:
7
CHAPTER 4
Python is a popular programming language for data science because of its flexibility,
ease of use and extensive libraries. It's used in a variety of real-time applications, including:
Python is used to clean, prepare, and analyze data, and to identify patterns,
relationships, and trends.
Python can be used to build and fine-tune models, and make data-driven decisions.
Python is used for web development, including social media monitoring tools and chat
bots.
4.5 In Transport
Data Science is also entered in real-time such as the Transport field like Driverless
Cars. With the help of Driverless Cars, it is easy to reduce the number of Accidents.
4.7 In E-Commerce
E-Commerce Websites like Amazon, Flipkart, etc. uses data Science to make a better
user experience with personalized recommendations.
8
4.8 In Finance
Financial Industries uses Data Science Analytics tools in order to predict the future. It
allows the companies to predict customer lifetime value and their stock market moves.
4.10 Gaming
Video and computer games are now being created with the help of data science and
that has taken the gaming experience to the next level.
Identifying patterns is one of the most commonly known applications of data science.
Netflix and Amazon give movie and product recommendations based on what you like
to watch, purchase, or browse on their platforms.
4.13 Logistics
Banking and financial institutions use data science and related algorithms to detect
fraudulent transactions.
Next up in the data science and its applications list comes route planning. As a result
of data science, it is easier to predict flight delays for the airline industry, which is helping it
grow. It also helps to determine whether to land immediately at the destination or to make a
stop in between, such as a flight from Delhi to the United States of America or to stop in
between and then arrive at the destination.
9
CHAPTER 5
DATA VISUALIZATION
Data visualization provides a good, organized pictorial representation of the data which
makes it easier to understand, observe, analyze. Python offers several plotting libraries,
namely Matplotlib, Seaborn and many other such data visualization packages with different
features for creating informative, customized, and appealing plots to present data in the most
simple and effective way. Here’s an overview of some key libraries and techniques for data
visualization in Python:
5.1.1 Matplotlib
5.1.2 Seaborn
Pandas offer built-in plotting capabilities that work seamlessly with Data Frames.
5.1.4 Plotly
A library for creating interactive plots that can be easily shared or embedded in web.
5.1.5 Bokeh
● Another library for creating interactive visualizations, particularly suited for web
applications.
● It allows for complex visualizations with interactivity.
10
5.2 Visualization Techniques
Use scatter plots, box plots, and histograms to explore distributions and relationships in
the data.
● Choose the Right Chart Type: Use the most suitable chart type for your data and
audience.
● Keep it Simple: Avoid clutter and focus on the key insights you want to
communicate.
● Use Color Wisely: Be mindful of color choices for clarity and accessibility.
● Label Clearly: Ensure all axes and legends are clearly labeled for better
understanding.
Mastering data visualization in Python enhances your ability to analyze and present
data effectively. Experiment with different libraries and techniques to find the best fit for your
projects.
11
CHAPTER 6
1. Data Collection
Gathering data from various sources such as databases, APIs, and web scraping.
2. Data Cleaning
Preparing the data by handling missing values, duplicates, and inconsistencies to
ensure high-quality input for analysis.
3. Data Exploration
Analyzing data through descriptive statistics and visualizations to uncover patterns
and insights.
4. Feature Engineering
5. Modeling
Applying statistical and machine learning techniques to build predictive models.
6. Evaluation
Assessing the performance of models using various metrics to ensure their
reliability.
7. Deployment
Implementing models into production environments for real-time usage.
8. Monitoring and Maintenance
Continuously tracking model performance and updating them as needed.
9. Communication
Effectively presenting findings through visualizations and reports to stakeholders.
12
10. Machine learning
A core component of data science, machine learning uses algorithms to help
machines learn patterns and trends from data to make predictions.
11. NumPy
A Python library that allows for scientific calculations and multi-dimensional array
objects.
12. Data analysis
An essential part of data science, data analysis helps provide insights about data.
Python libraries like Pandas, Matplotlib, and Seaborn can be used for data analysis.
13. Deep learning
A field of data science that involves understanding deep learning concepts and
neural network architecture.
13
ASSIGNMENTS
Excel Assignment 1
14
Output:
15
Output:
16
Output:
17
CONCLUSION
Python has established itself as a leading language in the field of data science, owing to its
versatility, ease of use, and extensive libraries. The various components of data science
ranging from data collection and cleaning to modeling and communication can be efficiently
implemented using Python's rich ecosystem, including libraries like Pandas, NumPy, and
Matplotlib.
By leveraging these tools, data scientists can transform raw data into meaningful insights,
enabling organizations to make informed decisions. The collaborative nature of the Python
improvement.
increasingly complex challenges and drive impactful solutions across diverse industries.
18
19