100% found this document useful (1 vote)
737 views

Statistics and Data Analytics Cheat Sheets

This document provides a cheat sheet on analytics concepts and tools. It defines analytics, describes the analytics lifecycle process (CRISP-DM), and lists common job titles in analytics such as business analyst, data analyst, and data scientist. It also defines big data using the three V's: volume, velocity, and variety. Key software tools mentioned include Microsoft Excel, Tableau, Python, R, SQL, and databases for exploring, visualizing, and building models with data.

Uploaded by

Giova Rossi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
737 views

Statistics and Data Analytics Cheat Sheets

This document provides a cheat sheet on analytics concepts and tools. It defines analytics, describes the analytics lifecycle process (CRISP-DM), and lists common job titles in analytics such as business analyst, data analyst, and data scientist. It also defines big data using the three V's: volume, velocity, and variety. Key software tools mentioned include Microsoft Excel, Tableau, Python, R, SQL, and databases for exploring, visualizing, and building models with data.

Uploaded by

Giova Rossi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

(Cheat Sheet)

Data Types: Helps answer: “Is the sample a fair representation of the population?”

- Attribute Data – Qualitative: Hypotheses:


* Text Data – e.g. yes/no, pass/fail, approve/reject…
- Variable Data – Quantitative: - Null Hypothesis (Ho) – assumes NO differences (the same), p-value > 0.05
* Discrete – counted numbers – e.g. # of defects (74), # of customer returns (13) - Alternative Hypothesis (Ha) – states there is a difference, p-value < 0.05
* Continuous – decimal numbers – e.g. time (12:24:59), money ($17.4354), pressure (25.44534 lbs.)
Tests for Normal Data (“t-tests”):
Types of Statistics:
- 1-Sample t-Test – study one sample’s mean against a target.
- Descriptive Stats – Used to describe and summarize data. - 2-Sample t-Test – study means from two different samples.
- Inferential Stats – Drawing conclusions about a population, when sample data is used. - ANOVA Test – study means from more than two samples.
* As we gather data, we work with samples. - Paired t-Test – study paired data (e.g. same part before/after improvement).
* We need confidence that our sample represents the population.
Normal vs Non-Normal Data
Measures of Central Tendency:
- Hypothesis tests with NORMAL data use the mean for central tendency
- Mean – The average.
- Hypothesis tests with NON-NORMAL data use the median for central tendency
- Median – The middle value.
- Mode – The most frequently occurring value.
- Trimmed Mean – A compromise between the mean and median, removes some outliers then averages.
Measures of Variation:
- Range – Difference between the largest and smallest value.
- Interquartile Range – Difference between the 75th and 25th percentile. - Shows the cause and effect relationship between X and Y.
- Standard Deviation – Average deviation of values from the mean. - Helps determine the proper settings (levels) for our inputs (X) in order to optimize our output (Y).
- Variance – Average squared deviation of values from the mean.
Key Terminology:
Basic Graphs:
- Factors (x) – The independent variables being used (e.g. temperature).
- Histogram – shows central tendency and variation within a single distribution. - Levels – The various settings for the factors (e.g. 300°, 500°).
- Dotplot – similar to a histogram, but shows each value as an individual point. - Run – A set of experimental conditions. (Experiments have multiple.)
- Boxplot – shows central tendency and a variation within several distributions, not just one. - Response (y) – The result from an experimental run (e.g. material strength).
- Time-Series Plot – shows critical quality measurements over time. - Replication – The repetition of experimental runs. (Challenges the result.)
- Scatterplot – shows the relationship between two variables. Common Types of Experiments:
Data Measurement Scales: - Full Factorials – use 2-5 input variables with all combinations of levels (or settings).
- Fractional Factorials – use 4-15 input variables and a fraction of combinations.
- Nominal – Cannot be ordered; no arithmetic can be performed. e.g. city (Detroit, Cleveland, Seattle).
- Ordinal – Can be ordered; differences between values meaningless. e.g. taste (bad, okay, good). General Notation for Full Factorial Design (2k):
- Interval – Can be ordered; differences between values meaningful (not ratios). e.g. temp (0o, 10o , 20o).
- Ratio – Can be ordered; ratios meaningful; zero indicates an absence. e.g. weight (0kg, 25kg, 50kg). - k = # of input variables
- 2 = # of levels used for each factor
Types of Sampling & Measurement Errors:
Principles of Good Experimental Design:
- Sampling Error – Differences among samples drawn at random (“luck of the draw”).
- Randomization of runs to remove bias and spread noise
- Sampling Bias – A lack of random samples (e.g. height of basketball players only).
- Replication of the experiment to challenge or strengthen the validity of results.
- Measurement Error – Issues with our measurement systems.
- Monitoring of noise.
- Measurement Invalidity – Not measuring what it is intended (e.g. temperature near a furnace). - Holding other factors constant. (Those that are not a focus on the experiment.)
(Cheat Sheet)

What is Analytics?
Microsoft Excel
Allows you to explore/analyze smaller data sets
data
business
data
insight Tableau Desktop (or Power BI)
data
Allows you to visualize your data with dashboards

Types of Analytics Python Language (or R)


o Descriptive Analytics What happened? Allows you to build models to make predictions
o Predictive Analytics What might happen?
o Prescriptive Analytics What should we do?
Structured Query Language (SQL)
Lifecycle of Analytics (“CRISP-DM”) Allows you to communicate and interact with databases
o Business Understanding Define the business problem.
o Data Understanding Identify available data and gaps in data.
o Data Preparation Clean and prepare the data.
o Modeling Build predictive models.
o Evaluation Evaluate how the models perform.
o Deployment Start using the chosen model.
Common Job Titles
o Business Analyst
o Business Intel. Analyst
o Analytics Manger
o Data Analyst
Big data is so large that “it requires the use of new technical architectures … o Data Scientist *
to enable insights that unlock new sources of business value.” (McKinsey) o …

* Most people feel this job is more technical.


3 V’s of Big Data (Defining Characteristics)

Most job postings ask about software, so:


o Select a tool from above
o Download a free trial.
o Get a pizza!
o Spend a weekend to learn.
o State you have “Experience with…” on your resume.
Volume Velocity Variety

You might also like