Dr Gao's Resources

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Kaggle Competitions: https://www.kaggle.

com/

The Data Science Handbook: http://www.thedatasciencehandbook.com

Python Data Science Handbook:


https://pacificatclassic.pacific.edu/record=b1823201~S1

Source Code: https://github.com/jakevdp/PythonDataScienceHandbook

Data Science Salary Survey:


https://www.oreilly.com/ideas/2015-data-science-salary-survey?cmp=pe-data-free-lp-
lgen_dsc_email_oct_6

Freakonomics series:
https://www.amazon.com/Steven-D.-Levitt/e/B001IGV3MY/ref=dp_byline_cont_book_1

5 Resources to Inspire Your Next Data Science Project


https://towardsdatascience.com/5-resources-to-inspire-your-next-data-science-project-
ea6afbe20319

Platforms:
RapidMiner: https://rapidminer.com/
Orange: https://orange.biolab.si

Public Data Sets:


 Amazon: http://aws.amazon.com/publicdatasets/
 Google: http://www.google.com/publicdata/directory
 Microsoft: https://datamarket.azure.com/
 DataMarket: http://datamarket.com/?1
 DataBib: http://databib.org/
 Open Access Data: http://oad.simmons.edu/oadwiki/Data_repositories
 USA: http://data.gov
 Canada: http://www.data.gc.ca/default.asp?lang=En&n=9BFA2E85-1
 UK: http://data.gov.uk/
For a more complete list, please visit http://www.kdnuggets.com/datasets/
Big data sets available for free: http://www.datasciencecentral.com/profiles/blogs/big-
data-sets-available-for-free?
utm_content=buffera413f&utm_medium=social&utm_source=linkedin.com&utm_campaign=
buffer

Suggested Readings:
 Big Data Landscape: http://www.bigdatalandscape.com
 7 Important Data Science Papers
http://datascience101.wordpress.com/2013/08/26/7-important-data-science-papers/
 38 Seminal Articles Every Data Scientist Should Read
http://www.datasciencecentral.com/profiles/blogs/30-seminal-articles-every-data-
scientist-should-read
 Must Read Before Attending Any Data Science Interview
http://www.datasciencecentral.com/group/resources/forum/topics/must-read-before-
attending-any-data-science-interview
 39 Data Visualization Tools for Big Data http://blog.profitbricks.com/39-data-
visualization-tools-for-big-data/
 Big Data Visualization: Turning Big Data Into Big Insights
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/big-
data-visualization-turning-big-data-into-big-insights.pdf

Movies:
 1984 http://www.imdb.com/title/tt0087803/
 Moneyball http://www.imdb.com/title/tt1210166/?ref_=nv_sr_1
 War and Peace http://www.imdb.com/title/tt0049934/?ref_=fn_al_tt_1

Data Science Tools:


 AnalyticTalent
 ArcGIS - a geographic information system (GIS) for working with maps and geographic
information. It is used for: creating and using maps; compiling geographic data; analyzing
mapped information; sharing and discovering geographic information; using maps and
geographic information in a range of applications; and managing geographic information
in a database. (WikiPedia)
 Asterdata: The SQL-MapReduce database and step-by-step tutorials can be
downloaded here: http://www.asterdata.com/downloads/
 Cloudera
 ELK Stack - Logstash + Elasticsearch + Kibana = easy customizable log
analysis/parsing/etc.
 EMC
 Gephi: an open source graph visualization and manipulation software,
www.gephi.org
 GingerBrain: an upcoming predictive and statistical analytics platform in the cloud
 Greenplum
 Hadoop
 IBM SPSS Modeler
 IBM SPSS Statistics
 Infinite Insight: KXEN’s predictive modeling suite that focuses on ease of use and
rapid deployment of predictive models. Infinite Insight has been mostly deployed in
the Communications, Financial Services, e-Business and Retail industries.
 Kaggle
 KNIME: A user-friendly graphical workbench for the entire analysis process: data
access, data transformation, initial investigation, powerful predictive analytics,
visualisation and reporting. The open (source) platform provides a home for over
1000 modules, including those of the KNIME community.
 Lavastorm: Lavastorm's analytic software enables companies to empower financial
and operational teams to tackle high-value, complex, financial and operational
analytic problems in a new way, delivering actionable insights and results faster at a
significantly lower cost than traditional business intelligence solutions.
 MarkLogic
 R
 Rapid Miner
 Revolution Analytics
 Oracle Data Miner
 Oracle R Enterrpise
 ParaView
 Pervasive: End-to-end data load, transformation, preparation and/or analytics that
executes natively on Hadoop clusters at impossible speeds. Pervasive offers a suite
of extensible big data and analytics software solutions that save development and
deployment time while conserving hardware dollars.
 PixieDust: https://github.com/pixiedust/pixiedust
 Salford Systems
 SAS
 SAS Enterprise Miner
 Splunk
 SPSS: Product family of market leading data mining, statistical analysis, text
analytics, survey analysis and data collection tools. SPSS was acquired by IBM in
2009.
 Statistica: Data mining, text mining and statistical analysis package developed and
marketed by StatSoft. Its deep integration with the Wintel platform translates to rapid
speeds in analytic data processing.
 Tableau: https://www.tableau.com/
 Tibco
 Teradata
 UVCDAT (Ultrascale Visualization Climate Data Analysis Tools): 3D, 2D, 1D
visualizations, customizable filters, Python scripting allowed, ASCII conversion tool.
 Vertica
 VisIt
 Weka
 Zementis

You might also like