Data Science - FYBCA-Sem-II
Data Science - FYBCA-Sem-II
Data science is the scientific study of data to gain insights and make predictions. It's a
multidisciplinary field that combines math, statistics, computer science, and business.
Data scientists use tools like data analytics, machine learning, and artificial
intelligence
Data science is a fast-growing field with many job opportunities, including data
scientists, data analysts, data architects, and data engineers.
• High demand
• Career prospects
Data science skills can lead to careers in finance, healthcare, marketing, and technology.
• Problem solving
Data science can help you solve real-world problems by analyzing data to uncover insights.
• Innovation
Data science can help drive innovation and profitability by creating algorithms and data
models to forecast outcomes.
• Freelance opportunities
Freelancers with data science training can choose their clients, working hours, and work
area.
Statistics
Data visualization
Helps you make data more accessible and understandable to both technical and non-
technical audiences
Programming
Knowledge of programming languages like Python, R, and SQL is important for data science
projects
Data science career paths: data analyst, data scientist, and machine learning engineer.
Types of Data :
1. Structured data –
Structured data is data whose elements are addressable for effective analysis. It has
been organized into a formatted repository that is typically a database. It concerns all
data which can be stored in database SQL in a table with rows and columns. They
have relational keys and can easily be mapped into pre-designed fields. Today, those
data are most processed in the development and simplest way to manage
information. Example: Relational data.
2. Semi-Structured data –
Semi-structured data is information that does not reside in a relational database but
that has some organizational properties that make it easier to analyze. With some
processes, you can store them in the relation database (it could be very hard for
some kind of semi-structured data), but Semi-structured exist to ease
space. Example: XML data.
3. Unstructured data –
Unstructured data is a data which is not organized in a predefined manner or does
not have a predefined data model, thus it is not a good fit for a mainstream relational
database. So for Unstructured data, there are alternative platforms for storing and
managing, it is increasingly prevalent in IT systems and is used by organizations in a
variety of business intelligence and analytics applications. Example: Word, PDF, Text,
Media logs.
Unstructured
Properties Structured data Semi-structured data data
Matured transaction
No transaction
Transaction and various Transaction is adapted
management and
management concurrency from DBMS not matured
no concurrency
techniques
Data Science is the deep study of a large quantity of data, which involves extracting some
meaning from the raw, structured, and unstructured data. Extracting meaningful data from
large amounts usesalgorithms processing of data and this processing can be done using
statistical techniques and algorithm, scientific techniques, different technologies, etc. It uses
various tools and techniques to extract meaningful data from raw data. Data Science is also
known as the Future of Artificial Intelligence.
For Example, Jagroop loves books to read but every time he wants to buy some books he is
always confused about which book he should buy as there are plenty of choices in front of
him. This Data Science Technique will be useful. When he opens Amazon he will get product
recommendations based onuses his previous data. When he chooses one of them he also
gets a recommendation to buy these books with this one as this set is mostly bought. So all
Recommendations of Products and Showing sets of books purchased collectively is one of
the examples of Data Science.
1. In Search Engines
The most useful application of Data Science is Search Engines. As we know when we want to
search for something on the internet, we mostly use Search engines like Google, Yahoo,
DuckDuckGo and Bing, etc. So Data Science is used to get Searches faster.
For Example, When we search for something suppose “Data Structure and algorithm courses
” then at that time on Internet Explorer we get the first link of GeeksforGeeks Courses. This
happens because the GeeksforGeeks website is visited most in order to get information
regarding Data Structure courses and Computer related subjects. So this analysis is done
using Data Science, and we get the Topmost visited Web Links.
2. In Transport
Data Science is also entered in real-time such as the Transport field like Driverless Cars. With
the help of Driverless Cars, it is easy to reduce the number of Accidents.
For Example, In Driverless Cars the training data is fed into the algorithm and with the help
of Data Science techniques, the Data is analyzed like what as the speed limit in highways,
Busy Streets, Narrow Roads, etc. And how to handle different situations while driving etc.
3. In Finance
Data Science plays a key role in Financial Industries. Financial Industries always have an issue
of fraud and risk of losses. Thus, Financial Industries needs to automate risk of loss analysis
in order to carry out strategic decisions for the company. Also, Financial Industries uses Data
Science Analytics tools in order to predict the future. It allows the companies to predict
customer lifetime value and their stock market moves.
For Example, In Stock Market, Data Science is the main part. In the Stock Market, Data
Science is used to examine past behavior with past data and their goal is to examine the
future outcome. Data is analyzed in such a way that it makes it possible to predict future
stock prices over a set timetable.
4. In E-Commerce
E-Commerce Websites like Amazon, Flipkart, etc. uses data Science to make a better user
experience with personalized recommendations.
For Example, When we search for something on the E-commerce websites we get
suggestions similar to choices according to our past data and also we get recommendations
according to most buy the product, most rated, most searched, etc. This is all done with the
help of Data Science.
5. In Health Care
In the Healthcare Industry data science act as a boon. Data Science is used for:
• Detecting Tumor.
• Drug discoveries.
6. Image Recognition
Currently, Data Science is also used in Image Recognition. For Example, When we upload our
image with our friend on Facebook, Facebook gives suggestions Tagging who is in the
picture. This is done with the help of machine learning and Data Science. When an Image is
Recognized, the data analysis is done on one’s Facebook friends and after analysis, if the
faces which are present in the picture matched with someone else profile then Facebook
suggests us auto-tagging.
7. Targeting Recommendation
Targeting Recommendation is the most important application of Data Science. Whatever the
user searches on the Internet, he/she will see numerous posts everywhere. This can be
explained properly with an example: Suppose I want a mobile phone, so I just Google search
it and after that, I changed my mind to buy offline. In Real -World Data Science helps those
companies who are paying for Advertisements for their mobile. So everywhere on the
internet in the social media, in the websites, in the apps everywhere I will see the
recommendation of that mobile phone which I searched for. So this will force me to buy
online.
With the help of Data Science, Airline Sector is also growing like with the help of it, it
becomes easy to predict flight delays. It also helps to decide whether to directly land into
the destination or take a halt in between like a flight can have a direct route from Delhi to
the U.S.A or it can halt in between after that reach at the destination.
In most of the games where a user will play with an opponent i.e. a Computer Opponent,
data science concepts are used with machine learning where with the help of past data the
Computer will improve its performance. There are many games like Chess, EA Sports, etc.
will use Data Science concepts.
The process of creating medicine is very difficult and time-consuming and has to be done
with full disciplined because it is a matter of Someone’s life. Without Data Science, it takes
lots of time, resources, and finance or developing new Medicine or drug but with the help of
Data Science, it becomes easy because the prediction of success rate can be easily
determined based on biological data or factors. The algorithms based on data science will
forecast how this will react to the human body without lab experiments.
Various Logistics companies like DHL, FedEx, etc. make use of Data Science. Data Science
helps these companies to find the best route for the Shipment of their Products, the best
time suited for delivery, the best mode of transport to reach the destination, etc.
12. Autocomplete
AutoComplete feature is an important part of Data Science where the user will get the
facility to just type a few letters or words, and he will get the feature of auto-completing the
line. In Google Mail, when we are writing formal mail to someone so at that time data
science concept of Autocomplete feature is used where he/she is an efficient choice to auto-
complete the whole line. Also in Search Engines in social media, in various apps,
AutoComplete feature is widely used.
Data Science Lifecycle revolves around the use of machine learning and different analytical
strategies to produce insights and predictions from information in order to acquire a
commercial enterprise objective. The complete method includes a number of steps like data
cleaning, preparation, modelling, model evaluation, etc. It is a lengthy procedure and may
additionally take quite a few months to complete. So, it is very essential to have a generic
structure to observe for each and every hassle at hand. The globally mentioned structure in
fixing any analytical problem is referred to as a Cross Industry Standard Process for Data
Mining or CRISP-DM framework.
Earlier data used to be much less and generally accessible in a well-structured form, that we
could save effortlessly and easily in Excel sheets, and with the help of Business Intelligence
tools data can be processed efficiently. But Today we used to deals with large amounts of
data like about 3.0 quintals bytes of records is producing on each and every day, which
ultimately results in an explosion of records and data. According to recent researches, It is
estimated that 1.9 MB of data and records are created in a second that too through a single
individual.
So this a very big challenge for any organization to deal with such a massive amount of data
generating every second. For handling and evaluating this data we required some very
powerful, complex algorithms and technologies and this is where Data science comes into
the picture.
The following are some primary motives for the use of Data science technology:
1. It helps to convert the big quantity of uncooked and unstructured records into
significant insights.
4. Companies are shifting towards Data science and opting for this technology.
Amazon, Netflix, etc, which cope with the big quantity of data, are the use of
information science algorithms for higher consumer experience.
1. Business Understanding: The complete cycle revolves around the enterprise goal. What
will you resolve if you do not longer have a specific problem? It is extraordinarily essential to
apprehend the commercial enterprise goal sincerely due to the fact that will be your
ultimate aim of the analysis. After desirable perception only we can set the precise aim of
evaluation that is in sync with the enterprise objective. You need to understand if the
customer desires to minimize savings loss, or if they prefer to predict the rate of a
commodity, etc.
3. Preparation of Data: Next comes the data preparation stage. This consists of steps like
choosing the applicable data, integrating the data by means of merging the data sets,
cleaning it, treating the lacking values through either eliminating them or imputing them,
treating inaccurate data through eliminating them, additionally test for outliers the use of
box plots and cope with them. Constructing new data, derive new elements from present
ones. Format the data into the preferred structure, eliminate undesirable columns and
features. Data preparation is the most time-consuming but arguably the most essential step
in the complete existence cycle. Your model will be as accurate as your data.
4. Exploratory Data Analysis: This step includes getting some concept about the answer and
elements affecting it, earlier than constructing the real model. Distribution of data inside
distinctive variables of a character is explored graphically the usage of bar-graphs, Relations
between distinct aspects are captured via graphical representations like scatter plots and
warmth maps. Many data visualization strategies are considerably used to discover each and
every characteristic individually and by means of combining them with different features.
5. Data Modeling: Data modeling is the coronary heart of data analysis. A model takes the
organized data as input and gives the preferred output. This step consists of selecting the
suitable kind of model, whether the problem is a classification problem, or a regression
problem or a clustering problem. After deciding on the model family, amongst the number
of algorithms amongst that family, we need to cautiously pick out the algorithms to put into
effect and enforce them. We need to tune the hyperparameters of every model to obtain
the preferred performance. We additionally need to make positive there is the right stability
between overall performance and generalizability. We do no longer desire the model to
study the data and operate poorly on new data.
7. Model Deployment: The model after a rigorous assessment is at the end deployed in the
preferred structure and channel. This is the last step in the data science life cycle. Each step
in the data science life cycle defined above must be laboured upon carefully. If any step is
performed improperly, and hence, have an effect on the subsequent step and the complete
effort goes to waste. For example, if data is no longer accumulated properly, you’ll lose
records and you will no longer be constructing an ideal model. If information is not cleaned
properly, the model will no longer work. If the model is not evaluated properly, it will fail in
the actual world. Right from Business perception to model deployment, every step has to be
given appropriate attention, time, and effort.
In the world of data space, the era of Big Data emerged when organizations are dealing with
petabytes and exabytes of data. It became very tough for industries for the storage of data
until 2010. Now when the popular frameworks like Hadoop and others solved the problem
of storage, the focus is on processing the data. And here Data Science plays a big role.
Nowadays the growth of data science has been increased in various ways and one should be
ready for the future by learning what data science is and how can we add value to it.
Data science means different things for different people, but at its gist, data science is using
data to answer questions. This definition is a moderately broad definition, and that’s
because one must say data science is a moderately broad field!
Data science is the science of analyzing raw data using statistics and machine learning
techniques with the purpose of drawing conclusions about that information.
So after knowing what data science is and the key pillars of data science, but something else
we need to talk about is who precisely a data scientist is? An Economist Special Report says
that a data scientist is defined as someone:
“who integrates the skills of software programmer, statistician and storyteller slash artist
to extract the nuggets of gold hidden under mountains of data”
But in reality, there are many questions that arise. Some important questions are: what’s the
role of a data scientist? What’s the responsibility of a data scientist? How data scientists
are different from data analysts and data engineers? So let’s discuss these types of
questions to understand who is a data scientist in detail?
• Analytics: The Data Scientist represents a scientific role where he plans, implements,
and assesses high-level statistical models and strategies for application in the
business’s most complex issues. The Data Scientist develops econometric and
statistical models for various problems including projections, classification, clustering,
pattern analysis, sampling, simulations, and so forth.
• Collaboration: The role of the Data Scientist is not a solitary role and in this position,
he collaborates with superior data scientists to communicate obstacles and findings
to relevant stakeholders in an effort to enhance drive business performance and
decision-making.
• Knowledge: The Data Scientist also takes leadership to explore different technologies
and tools with the vision of creating innovative data-driven insights for the business
at the most agile pace feasible. In this situation, the Data Scientist also uses initiative
in assessing and utilizing new and enhanced data science methods for the business,
which he delivers to senior management of approval.
• Other Duties: A Data Scientist also performs related tasks and tasks as assigned by
the Senior Data Scientist, Head of Data Science, Chief Data Officer, or the Employer.
Data Scientist, Data Engineer, and Data Analyst are the three most common careers in data
science. So let’s understand who’s data science by comparing it with its similar jobs.
Data Scientist Data Analyst Data Engineer
• Open Data:
• Multimedia:
• Standard Datasets:
• Data Access: Always ensure you have the necessary permissions to access and use
data from any source, especially when dealing with sensitive personal information.
• Data Cleaning: Raw data often requires cleaning and pre-processing before analysis.
• Data Quality: Consider the reliability and accuracy of data sources before using them
in your analysis.