0% found this document useful (0 votes)
7 views

Data Science Notes

The document outlines a 3-month learning plan for Data Science and Machine Learning, detailing weekly topics and resources such as Python, statistics, machine learning algorithms, and deep learning. It includes links to online courses and projects on platforms like EdX, Khan Academy, and Kaggle. Additionally, it emphasizes practical applications, feature engineering, and coding libraries in Python.

Uploaded by

Daniel Wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Data Science Notes

The document outlines a 3-month learning plan for Data Science and Machine Learning, detailing weekly topics and resources such as Python, statistics, machine learning algorithms, and deep learning. It includes links to online courses and projects on platforms like EdX, Khan Academy, and Kaggle. Additionally, it emphasizes practical applications, feature engineering, and coding libraries in Python.

Uploaded by

Daniel Wu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Learn Data Science in 3 Months

6/24

Week 1 - Learn Python - EdX https://www.edx.org/course/introducti...


- Siraj Raval https://www.youtube.com/watch?v=T5pRl...

Week 2 - Statistics & Probability - KhanAcademy https://www.khanacademy.org/math/stat...

Week 3 - Data Pre-processing, Data Vis, Exploratory Data Analysis - EdX


https://www.edx.org/course/introducti...

Week 4 - Kaggle Project #1

Week 5-6 - Algorithms & Machine Learning - Columbia https://courses.edx.org/courses/cours...

Week 7 - Deep Learning - Part 1 and 2 of DL Book https://www.deeplearningbook.org/


- Siraj Raval https://www.youtube.com/watch?v=vOppz...

Week 8 - Kaggle Project #2 Week 9 - Databases (SQL + NoSQL) - Udacity


https://www.udacity.com/course/intro-...
- EdX https://www.edx.org/course/introducti...

Week 10 - Hadoop & Map Reduce + Spark - Udacity https://www.udacity.com/course/intro-...


- Spark Workshop https://stanford.edu/~rezab/sparkclas...

Week 11 - Data Storytelling - Edx https://www.edx.org/course/analytics-...

Week 12- Kaggle Project #3


Learn Machine Learning in 3 Months
第 1 个月

第一周 线性代数 https://ocw.mit.edu/courses/mathemati...

第二周 微积分 https://www.youtube.com/playlist?list...

第三周 https://www.edx.org/course/introducti...

第四周 算法 https://www.coursera.org/courses?lang...

第 2 个月

第一周 learn python for data science https://www.youtube.com/watch?v=T5pRl...

Math of Intelligence https://www.youtube.com/watch?v=xRJCO...

Intro to Tensorflow https://www.youtube.com/watch?v=2FmcH...

第二周 Intro to ML (Udacity) https://eu.udacity.com/course/intro-t...

第三四周 ML Project Ideas https://github.com/NirantK/awesome-pr...

第 3 个月(深度学习)

第一周 Intro to Deep Learning https://www.youtube.com/watch?v=vOppz... 第二周 Deep Learning by


Fast.AI http://course.fast.ai/

第三四周 按照我的 github 重新实现深度学习项目 https://github.com/llSourcell?tab=rep..

 Linear regression
 Logistic regression
 Random forest
 Gradient boosting
 PCA
 k-mean clustering
 k nearest neighbors
 Natural language processing (2 sessions)
 Exploratory data analysis
 Python web APIs
 Feature engineering (2 sessions)
 Object-oriented programming
 Forecasting
 Linear regression
 Logistic regression
 SVM
 Random forest
 Gradient boosting
 PCA
 k-means
 Collaborative filtering
 kNN
 ARIMA

Business use case -> Domain expertise

Data gathering from vary data source (balance vs. unbalance dataset)

Whether the data is in the right format cleansing, wrangling, exploring EDA and how to handle the
missing value, to better put into ML algorithm. (Feature Engineering -> also apply some stats knowledge
to check Mean, Median, Mode)

Feature selection (regression back elimination, p-value)

Modeling (ML, DL algorithm select 1. accuracy. 2. Confusion matrix. 3. Cross validation

Coding library
Python:
The Inplace parameter

The inplace parameter is commonly used with the following methods:

 dropna()

 drop_duplicates()

 fillna()

 query()

 rename()

 reset_index()

 sort_index()

 sort_values()

import itertools
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings("ignore")

plt.style.use('fivethirtyeight')

import statsmodels.api as sm
import matplotlibmatplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12
matplotlib.rcParams['ytick.labelsize'] = 12
matplotlib.rcParams['text.color'] = 'k'

for p in p_values:

for d in d_values:

for q in q_values:

order = (p,d,q)

train, test = shampoo [0:25], shampoo[25:36]

prediction = list()

for i in range(len(test)):

try:

model = ARIMA(train, order)

model_fit = model.fit(disp=0)

pred_y = model_fit.forecast()[0]

predictions.append(pred_y)

error = mean_squared_error(test,predictions)

print('ARIMA%s MSE = %.2f'%(order,error))

except:

continue

You might also like