Internshippresentation 230414184008 11879a25
Internshippresentation 230414184008 11879a25
Internshippresentation 230414184008 11879a25
Presentation
on
“DATA ANALYTICS INTERN”
Guided by
Prof. Apurva Bharat Mandalaywala
PRESENTATION OUTLINES
● Introduction
● Learning Data Science With Python - Libraries
● Methodology
● Machine Learning
● Outline of work
● Future Work
● References
● Conclusion
Introduction
● Background of the Internship and Company
My name is Anuj Vaghani and I am currently interning at Devotee Infotech Private Limited. During
my internship, I have been working with the data analytics and machine learning teams, and have
gained valuable insights into how these technologies can drive business success .
Data analytics is the process of analysing and interpreting large sets of data to extract insights and
make informed decisions. Machine learning, on the other hand, is a subset of artificial intelligence that
enables computer systems to learn from data and improve their performance over time.
Introduction
● Objective of Presentation
The objective of this presentation is to provide an overview of data analytics and machine learning, and
their importance in today's business landscape. I will discuss the key concepts, processes, and
techniques involved in data analytics and machine learning, as well as their applications in various
industries. Additionally, I will share my experiences working with the data analytics and machine
learning teams at Devotee Infotech Private Limited, and provide recommendations for how these
technologies can be leveraged to drive business success.
Learning Data Science With Python - Libraries
NumPy is a powerful library for numerical computing in Pandas is a library for data manipulation and analysis. It
Python. It provides support for multi-dimensional arrays, provides tools for working with structured data, such as
mathematical functions, and operations on arrays. Some of data frames and series, and supports a wide range of data
the key topics that will be covered in this section include: formats. Some of the key topics that will be covered in
• Basics of NumPy arrays this section include:
• Basics of Pandas data frames and series
• Array operations and calculations
• Data manipulation and cleaning
• Mathematical functions and operations
• Data aggregation and summarization
• Random number generation
• Merging and joining data frames
Learning Data Science With Python - Libraries
Matplotlib Seaborn
Matplotlib is a powerful library for data visualization in Seaborn is a library for data visualization that is built
Python. It provides support for creating a wide range of on top of Matplotlib. It provides a higher-level
charts and plots, including line charts, scatter plots, interface for creating sophisticated and aesthetically
histograms, and heatmaps. Some of the key topics that will pleasing visualizations. Some of the key topics that
be covered in this section include: will be covered in this section include:
• Basics of Matplotlib charts and plots • Basics of Seaborn charts and plots
• Customizing Seaborn visualizations
• Customizing charts and plots
• Adding labels and annotations • Creating complex visualizations, such as heatmaps and
• violin plots
• Creating subplots and multiple charts
• Visualizing relationships between variables
Methodology
There is seven steps show in below for methodology
(1) Introduction to Methodology
In order to effectively utilize data analytics and machine learning during my internship, I followed a
specific methodology to guide my work.
(2) Define Problem Statement
The first step was to clearly define the problem statement or objective that I wanted to achieve. This
involved identifying the business problem or opportunity and specifying the data sources and variables
that were relevant to the problem.
(3) Data Collection and Preparation
The next step was to collect and prepare the data for analysis. This involved identifying the relevant data
sources and extracting the data, cleaning and transforming the data, and ensuring that the data was ready
for analysis.
(4) Exploratory Data Analysis (EDA)
The third step was to conduct exploratory data analysis (EDA) to gain a better understanding of the data
and identify any patterns or anomalies. This involved using various statistical and visualization techniques
to explore the data and gain insights.
Methodology
(5) Feature Selection and Engineering
The next step was to select and engineer the features that would be used for machine learning. This
involved identifying the relevant features and engineering them to improve their predictive power.
(6) Model Selection and Training
The next step was to select the appropriate machine learning model and train it on the prepared data.
This involved selecting the right algorithm, tuning the hyperparameters, and training the model using a
variety of techniques.
(7) Model Evaluation and Deployment
The final step was to evaluate the performance of the machine learning model and deploy it for use in the
real world. This involved evaluating the model's accuracy and performance, testing it on new data, and
deploying it in a way that could be easily integrated into the business process.
Machine Learning
Types of Machine Learning
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning
• Deep Learning
Machine Learning
SUPERVISED LEARNING
• Common Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forests,
Support Vector Machines, Naive Bayes, k-Nearest Neighbours.
• Applications in Industry, such as Fraud Detection, Demand Forecasting, and Image Recognition
• Demo: Building a Regression Model to Predict Housing Prices Based on Features such as
Location, Square Footage, and Number of Bedrooms/Bathrooms.
UNSUPERVISED LEARNING
• Common Algorithms: k-Means Clustering, Hierarchical Clustering, Principal Component Analysis,
t-SNE .
• Applications in Industry, such as Customer Segmentation and Anomaly Detection
• Demo: Using k-Means Clustering to Segment Customer Data and Identify Groups with Similar
Behaviour's and Characteristics.
Outline of work
During my internship, I created two major projects that covered all
the concepts of technology
(1) HOTEL BOOKING ANALYSIS
The purpose of this project was to analyse hotel booking data and gain insights into the factors that influence
hotel booking cancellations. The data was collected from a publicly available dataset on Kaggle, and the
analysis was performed using Python and its data analytics libraries. The project involved data cleaning and
pre-processing, exploratory data analysis, and data visualization to gain insights into the patterns and trends
in the data. The results of the analysis provide insights into the key factors that contribute to hotel booking
cancellations and offer recommendations to hotel operators to reduce cancellations and optimize revenue.
Outline of work
• Identify Relevant Data Sources: Hotel Booking Dataset from Kaggle
• Collect Data and Store it in a Structured Manner
• Perform Data Cleaning and Pre-processing
• Exploratory Data Analysis B. Data Modelling and Analysis
• Determine Relevant Variables and Features: Guest Demographics, Booking Details, Hotel
Information, etc.
• Choose Appropriate Modelling Techniques: Linear Regression, Random Forest Regression, etc.
• Evaluation of Models: Mean Squared Error, R-Squared Value, etc. C. Results and Conclusion
• Insights Gained from Analysis: Key Drivers of Booking Cancellations, Popular Booking Channels,
etc.
• Potential Business Applications: Improve Booking Experience, Optimize Hotel Inventory
Management, etc.
Outline of work
OBSERVATION
DATA VISUALIZATION : Data visualization is the art of communicating complex data through
visual representations. Future work in this area could focus on developing new techniques for
visualizing large-scale and high-dimensional data sets, as well as exploring the use of augmented and
virtual reality for data visualization.
References
Kaggle: https://www.kaggle.com/competitions