Data Science Internship Task List
Data Science Internship Task List
Internship
Task List
Internship Pre-requisites before starting your tasks:
1.LinkedIn Profile Update: Ensure that your LinkedIn profile is
updated to reflect your technical skills, and update your experience
section to include "ShadowFox Data Science Intern."
After completing all the above steps, proceed with your task
completion.Kindly note that all the details and screenshots you
submit will be thoroughly verified.
Task Level (Beginner):
Requirements:
1. Library Overview: Provide a brief introduction to the selected
libraries, highlighting their unique features and typical use cases.
2. Graph Types:
- Document the different types of graphs available in each
library, such as line plots, scatter plots, bar charts, histograms,
pie charts, etc.
- For each graph type, include a brief description, potential use
case, and a simple code example demonstrating how to generate
the graph.
3. Comparison: Offer a comparison section discussing the
strengths and weaknesses of each library regarding ease of use,
customization options, interactivity, and performance with large
datasets.
4.Resources
Matplotlib:https://matplotlib.org/stable/users/explain/quick_star
t.html#quick-start
Seaborn-https://seaborn.pydata.org/tutorial/introduction.html
Plotly-https://plotly.com/python/distplot/
Bokeh:https://docs.bokeh.org/en/latest/docs/user_guide/basic.ht
ml
Pandas-https://pandas.pydata.org/docs/user_guide/index.html
Deliverable:
A PDF or Markdown file containing the compiled guide,
ensuring the content is clear, concise, and informative for new
users.
Matplotlib:https://matplotlib.org/stable/users/explain/quick_star
t.html#quick-start
Seaborn-https://seaborn.pydata.org/tutorial/introduction.html
Plotly-https://plotly.com/python/distplot/
Bokeh:https://docs.bokeh.org/en/latest/docs/user_guide/basic.ht
ml
Pandas-https://pandas.pydata.org/docs/user_guide/index.html
Task Level (Intermediate): Do any 1 of 2
Dataset Features:
- Match No.: Identifier for the match.
- Innings: Which innings the data is being recorded for.
- Team: The team in the field.
- Player Name: The fielder involved in the action.
- Ballcount: Sequence number of the ball in the over.
- Position: Fielding position of the player at the time of the ball.
- Short Description: Brief description of the fielding event.
- Pick: Categorize the pick-up as clean pick, good throw, fumble, bad
throw, catch, or drop catch.
- Throw: Classify the throw as run out, missed stumping, missed run
out, or stumping.
- Runs: Enter the number of runs saved (+) or conceded (-) through the
fielding effort.
- Overcount: The over number in which the event occurred.
- Venue: Location of the match.
Task Level(Hard): Continued...
Performance Metrics Formula:
To assess the fielding performance, use the following formula:
PS=(CP×WCP)+(GT×WGT)+(C×WC)+(DC×WDC)+(ST×WST)+
(RO×WRO )+(MRO×WMRO)+(DH×WDH)+RS
Where:
PS: Performance Score
CP: Clean Picks
GT: Good Throws
C: Catches
DC: Dropped Catches
ST: Stumpings
RO: Run Outs
MRO: Missed Run Outs
DH: Direct Hits
RS: Runs Saved (positive for runs saved, negative for runs conceded)
Task Instructions:
1. Data Collection: For each ball bowled in the match, record the fielding
effort according to the dataset features outlined above. Pay close attention
to the effectiveness of fielding actions and their outcomes.
2. Analysis Preparation: Your collected data will be used for advanced
fielding analysis, identifying key areas of improvement and fielding strengths
within the team.
Deliverable: A well-organized spreadsheet or database containing the
complete fielding data for the match.
This task requires meticulous attention to detail and an understanding of
cricket fielding dynamics. Your analysis will contribute to strategic fielding
placements and improvements in team performance.
Example: In the intricate realm of sales data analytics, the dataset under
consideration is a compilation of transactional intricacies encompassing
transaction ID, date, gross sales, net sales, profit/loss, and additional
factors such as cost of goods sold (COGS), manufacturing costs, and
freight costs. The analytical odyssey commences with a meticulous
exploration, encompassing exploratory data analysis (EDA) techniques
to unravel underlying patterns, detect outliers, and address data
anomalies, with a particular focus on intricate financial metrics like
COGS.
Fiscal years, representing the financial reporting period distinct from the
conventional calendar year, add an additional layer of complexity to the
analysis.