Conservation economics
Conservation economics
Conservation economics
00:00:33 your job is to solve real company problems using data and what kind of tools
you use we don't care Now there's a lot of misconception about data science, especially on
YouTube and I think the reason for this is because there's a huge misalignment between
what's popular to talk about and what's needed in the industry. So because of that I want to
make things clear. I am a data scientist working for a GAFA company and those companies
really emphasize on using data to improve their products So this is my take on what is data
science
00:01:21 Before data science, we popularized the term data mining in an article called
from data mining to knowledge discovery in databases in 1996 in which it referred to the
overall process of discovering useful information from data In 2001, William S. Cleveland
wanted to bring data mining to another level He did that by combining computer science with
data mining Basically He made statistics a lot more technical which he believed would
expand the possibilities of data mining and produce a powerful force for innovation
00:01:53 Now you can take advantage of compute power for statistics and he called
this combo data science. Around this time this is also when web 2.0 emerged where
websites are no longer just a digital pamphlet, but a medium for a shared experience
amongst millions and millions of users These are web sites like MySpace in 2003 Facebook
in 2004 and YouTube in 2005. We can now interact with these web sites meaning we can
contribute post comment like upload share leaving our footprint in the digital landscape we
call Internet and help create and shape the ecosystem
00:02:30 we now know and love today. And guess what? That's a lot of data so much
data, it became too much to handle using traditional technologies. So we call this Big Data.
That opened a world of possibilities in finding insights using data But it also meant that the
simplest questions require sophisticated data infrastructure just to support the handling of
the data We needed parallel computing technology like MapReduce, Hadoop, and Spark so
the rise of big data in 2010 sparked the rise of data science to support the needs of the
businesses to draw insights from their massive unstructured data sets
00:03:07 So then the journal of data science described data science as almost
everything that has something to do with data Collecting analyzing modeling. Yet the most
important part is its applications. All sorts of applications. Yes, all sorts of applications like
machine learning So in 2010 with the new abundance of data it made it possible to train
machines with a data-driven approach rather than a knowledge driven approach. All the
theoretical papers about recurring neural networks support vector machines became feasible
00:03:40 Something that can change the way we live and how we experience things in
the world Deep learning is no longer an academic concept in these thesis paper It became a
tangible useful class of machine learning that would affect our everyday lives So machine
learning and AI dominated the media overshadowing every other aspect of data science like
exploratory analysis, experimentation, ... And skills we traditionally called business
intelligence So now the general public think of data science as researchers focused on
machine learning and AI but the industry is hiring data scientists as analysts
00:04:17 So there's a misalignment there The reason for the misalignment is that yes,
most of these data scientists can probably work on more technical problems but big
companies like Google Facebook Netflix have so many low-hanging fruits to improve their
products that they don't require any advanced machine learning or statistical knowledge to
find these impacts in their analysis Being a good data scientist isn't about how advanced
your models are It's about how much impact you can have with your work. You're not a data
cruncher. You're a problem solver
00:04:50 You're strategists. Companies will give you the most ambiguous and hard
problems. And we expect you to guide the company to the right direction Ok, now I want to
conclude with real-life examples of data science jobs in Silicon Valley But first I have to print
some charts. So let's go do that So this is a very useful chart that tells you the needs of data
science. Now, it's pretty obvious but sometimes we kind of forget about it now At the bottom
of the pyramid we have collect you obviously have to collect some sort of data to be able to
use that data
00:05:42 So collect storing transforming all of these data engineering effort is pretty
important and it's actu- It's actually quite captured pretty well in media because of big data
we talked about how difficult it is to manage all this data We talked about parallel computing
which means like Hadoop and Spark Stuff like that. We know about this. Now the thing that's
less known is the stuff in between which is right here everything that's here and Surprisingly
this is actually one of the most important things for companies because you're trying to tell
the company
00:06:15 what to do with your product. So what do I mean by that? So I'm an analytics
that tells you using the data what kind of insights can tell me what are happening to my
users and then metrics this is important because what's going on with my product? You
know, these metrics will tell you if you're successful or not. And then also, you know a be
testing of course Experimentation that allows you to know, which product versions are the
best So these things are actually really important but they're not so covered in media. What's
covered in media
00:06:46 is this part. AI, deep learning. We've heard it on and on about it, you know But
when you think about it for a company, for the industry, It's actually not the highest priority or
at least it's not the thing that yields the most result for the lowest amount of effort That's why
AI deep learning is on top of the hierarchy of needs and these things may be testing
analytics they're actually way more important for industry so that's why we're hiring a lot of
data scientists that does that. So what do data scientists actually do?
00:07:17 Well that depends on the company because of them as of the size So for a
start-up you kind of lack resources So you can only kind of have one DS. So that one data
scientist he has to do everything. So you might be seeing all all this being data scientists.
Maybe you won't be doing AI or deep learning because that's not a priority right now But you
might be doing all of these. You have to set up the whole data infrastructure You might even
have to write some software code to add logging and then you have to do the analytics
00:07:45 yourself, then you have to build the metrics yourself, and you have to do A/B
testing yourself. That's why for startups if they need a data scientist this whole thing is data
science, so that means you have to do everything. But let's look at medium-sized
companies. Now, finally they have a lot more resources. They can separate the data
engineers and the data scientists So usually in collection, this is probably software
engineering. And then here, you're gonna have data engineers doing this. And then
depending if you're medium-sized company does a lot of
00:08:20 recommendation models or stuff that requires AI, then DS will do all these
Right. So as a data scientist, you have to be a lot more technical That's why they only hire
people with PhDs or masters because they want you to be able to do the more complicated
things So let's talk about large company now Because you're getting a lot bigger you
probably have a lot more money and then you can spend it more on employees So you can
have a lot of different employees working on different things. That way the employee does
not need to think about this stuff that they don't want to do and they could focus on the
things that they're
00:08:52 best at. For example, me and my untitled large company I would be in
analytics so I could just focus my work on analytics and metrics and stuff like that So I don't
need to worry about data engineering or AI deep learning stuff So here's how it looks for a
large company Instrumental logging sensors. This is all handled by software engineers
Right? And then here, cleaning and building data pipelines This is for data engineers. Now
here, between these two things, we have Data Science Analytics. That's what it's called
00:09:29 But then once we go to the AI and deep learning, this is where we have
research scientists or we call it data science core and they are backed by and now
engineers which are machine learning engineers. Yeah Anyways, so in summary, as you
can see, data science can be all of this and it depends what company you are in And the
definition will vary. So please let me know what you would like to learn more about AI deep
learning, or A/B testing, experimentation,... Depending on what you want to learn about
### 1. **Healthcare**
- **Predictive Analytics:** Predicting disease outbreaks, patient readmissions, and
potential health risks using historical data.
- **Medical Imaging:** Using machine learning to analyze medical images (e.g., X-rays,
MRIs) for early detection of diseases like cancer.
- **Personalized Medicine:** Tailoring treatments based on genetic information and patient
history.
- **Drug Discovery:** Accelerating the development of new drugs by analyzing biological
data.
### 2. **Finance**
- **Fraud Detection:** Identifying fraudulent transactions by analyzing patterns and
anomalies in financial data.
- **Algorithmic Trading:** Using predictive models to make high-frequency trading
decisions.
- **Risk Management:** Assessing and mitigating risks by analyzing market trends and
customer behavior.
- **Credit Scoring:** Evaluating the creditworthiness of individuals and businesses using
historical data.
### 3. **Retail**
- **Customer Segmentation:** Grouping customers based on purchasing behavior to tailor
marketing strategies.
- **Inventory Management:** Optimizing stock levels using demand forecasting models.
- **Recommendation Systems:** Suggesting products to customers based on their
browsing and purchase history (e.g., Amazon, Netflix).
- **Price Optimization:** Dynamically adjusting prices based on demand, competition, and
other factors.
### 4. **Transportation**
- **Route Optimization:** Finding the most efficient routes for delivery and logistics.
- **Autonomous Vehicles:** Using machine learning and computer vision for self-driving
cars.
- **Traffic Management:** Analyzing traffic patterns to reduce congestion and improve
urban planning.
- **Predictive Maintenance:** Monitoring vehicle health to predict and prevent mechanical
failures.
### 5. **Telecommunications**
- **Network Optimization:** Improving network performance by analyzing usage patterns
and predicting congestion.
- **Customer Churn Prediction:** Identifying customers who are likely to switch to
competitors and taking proactive measures to retain them.
- **Sentiment Analysis:** Analyzing customer feedback and social media data to gauge
public sentiment and improve services.
### 6. **Energy**
- **Smart Grids:** Optimizing the distribution and consumption of electricity using real-time
data.
- **Predictive Maintenance:** Monitoring equipment to predict failures and schedule
maintenance.
- **Energy Consumption Forecasting:** Predicting energy demand to optimize production
and reduce waste.
### 7. **Marketing**
- **Campaign Optimization:** Analyzing the effectiveness of marketing campaigns and
adjusting strategies in real-time.
- **Customer Lifetime Value Prediction:** Estimating the long-term value of customers to
prioritize marketing efforts.
- **Sentiment Analysis:** Monitoring social media and customer reviews to understand
public perception of a brand.
### 8. **Sports**
- **Performance Analysis:** Using data to analyze player performance and develop
training programs.
- **Injury Prediction:** Identifying players at risk of injury based on physical data and
playing conditions.
- **Game Strategy:** Analyzing opponent data to develop game strategies and improve
team performance.
### 9. **Government**
- **Public Health:** Tracking and predicting the spread of diseases to implement timely
interventions.
- **Crime Prediction:** Using data to predict crime hotspots and allocate resources
effectively.
- **Urban Planning:** Analyzing data to improve infrastructure, transportation, and public
services.
These applications demonstrate the versatility and impact of data science in solving complex
problems and driving innovation across various sectors.
7. **Basic Statistics**:
Statistics is the foundation of Data Science, covering concepts like probability,
distributions, hypothesis testing, and regression analysis, which are used to interpret data.
8. **Data Munging**:
Data munging (or wrangling) involves cleaning, transforming, and preparing raw data into a
usable format for analysis.
9. **Filtering**:
Filtering is the process of selecting specific subsets of data based on certain conditions or
criteria.
---
### **1. Technical Skills**:
- **Programming Languages**:
- **Python** or **R** (most commonly used in Data Science).
- Knowledge of libraries like Pandas, NumPy, Scikit-learn, Matplotlib, and TensorFlow (for
Python).
- **SQL**:
- Essential for querying and managing relational databases.
- **Data Wrangling**:
- Cleaning, transforming, and preparing raw data for analysis using tools like Pandas or
dplyr (in R).
- **Data Visualization**:
- Tools like Tableau, Power BI, or libraries like Matplotlib, Seaborn, and Plotly.
- **Big Data Tools**:
- Familiarity with tools like Hadoop, Spark, or cloud platforms (AWS, Google Cloud, Azure).
---
---
---
---
---
---
---
By mastering these prerequisites, you can build a strong foundation for a career in Data
Science. Let me know if you need more details on any specific area!
---
### **Problem**:
Starbucks wanted to expand its store locations globally while ensuring profitability and
customer satisfaction. They needed to identify the best locations for new stores and
understand customer preferences.
---
2. **Data Analysis**:
- Using BI tools like **Tableau** and **Microsoft Power BI**, Starbucks analyzed the data
to:
- Identify high-traffic areas with potential for new stores.
- Understand customer preferences (e.g., popular drinks, peak hours).
- Predict sales performance for new locations.
4. **Decision-Making**:
- Based on the insights, Starbucks:
- Opened new stores in high-potential locations.
- Customized menus to match local preferences.
- Optimized store layouts and staffing based on peak hours.
---
### **Outcome**:
- Starbucks successfully expanded its global presence while maximizing profitability.
- Improved customer satisfaction by tailoring offerings to local tastes.
- Enhanced operational efficiency through data-driven decisions.
---
This example demonstrates how **Business Intelligence** transforms raw data into
actionable insights, enabling companies like Starbucks to make informed decisions and gain
a competitive edge. Let me know if you'd like more examples!
---
2. **Data Transformation**:
- Converting data types (e.g., strings to numbers, dates to a consistent format).
- Normalizing or scaling data (e.g., converting values to a standard range).
- Encoding categorical variables (e.g., one-hot encoding).
3. **Data Integration**:
- Combining data from multiple sources (e.g., merging datasets).
- Resolving conflicts or mismatches in data formats.
4. **Data Reduction**:
- Removing unnecessary columns or rows.
- Aggregating data (e.g., summarizing sales data by month).
5. **Feature Engineering**:
- Creating new features from existing data (e.g., calculating age from a birthdate).
- Extracting useful information (e.g., splitting addresses into city, state, and zip code).
---
---
---
---
Let me know if you'd like a practical example or code snippet for data munging!
The **5 V's of Data Science** are key characteristics that define the complexity and
challenges of working with data in the modern world. They are:
---
---
---
---
---
---
---
Let me know if you'd like examples or further details on any of the 5 V's!