0% found this document useful (0 votes)
40 views49 pages

Week 1

The document provides an overview of a course on data visualization, including the instructor's background, definitions and purposes of data visualization, intended goals and audience for the course, topics to be covered in each module, assignments and grading criteria, and an outline of the first class. The course aims to teach students effective methods for visualizing and presenting data using software like R, Tableau, and Python.

Uploaded by

lengbiao111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views49 pages

Week 1

The document provides an overview of a course on data visualization, including the instructor's background, definitions and purposes of data visualization, intended goals and audience for the course, topics to be covered in each module, assignments and grading criteria, and an outline of the first class. The course aims to teach students effective methods for visualizing and presenting data using software like R, Tableau, and Python.

Uploaded by

lengbiao111
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Data Visualization

Angela Lu

Week 1
Information Systems, CB
City University of Hong Kong
About me…
• Assistant Professor of Information Systems
• PhD in sociology from Stanford University
• Research areas: sharing economy, IT career, IT innovation,
computational social science, social network analysis

• Prior background in data visualization


• Designed and taught the undergraduate version of this course
(IS4335)

2
What is data visualization?
• Definition: computer-based visualization systems provide
visual representations of datasets designed to help people
carry out tasks more effectively

• Visualization is suitable when there is a need to augment


human capabilities rather than replace people with
computational decision-making methods

3
Why visualize data?
• Data visualization helps decision making
• Big data is messy and hard to make sense of
• A graph is worth 1000 words (intuitive to understand)
• Present details in data
• Helps formulate interesting questions
• Speed up analytical process

4
Intended goals
• Know ways to visualize and present data, using software
packages such as R, Tableau, and Python
• Know what design is effective, and why
• Extract information from visual presentations; know what
questions to ask

5
Intended audience
• No prerequisites
• No assumed coding background
• Many areas helpful but not required
• computer programming, statistics, business analytics

• Open to master’s students who want to pursue business


analytics and information systems as a career

6
Schedule
• Lectures on Wednesday (S61) and Thursday (S62), 7-9:50 pm
• Semester A 2022/23, 13 weeks
• Materials are posted on Canvas
• Data visualization tools: R, Tableau, Python
• 50 min tutorial, 50 min lab exercises
• No tutorial during first week, only lab organizations and group
sign-up

7
Modules
Basics Data structure Advanced Topics

Design analysis Tables (w4-5)


framework Computer
(w2) vision (w10)
Spatial data
(w6) Intro to deep
Marks and learning
channels (w3)
(w11)
Networks
(w8-9)
8
Assignments & grading
• Individual project, 40%
• Lab exercises, 30%
• Submit exercises from week 3 to week 9
• Group project, 20%
• Class participation, 10%

9
Individual project (40%)
• Similar to a research paper
• Completed individually
• Apply visualization techniques and concepts to address a real-
world issue
• Research question description
• Data description
• Design visuals and interpret insights
• Research and business implications
• Comprehensive written report (5-10 pp.)

10
Lab exercise (30%)
• 50 min tutorial, 50 min in-class exercises, 10 min break in
between
• Weekly exercises in R and Tableau (w3-w9), complete individually
and submit on canvas
• 10-15 min of discussion of the answers at the beginning of next
tutorial
• Get full credit as long as you submit at the end of the week
• Late submission is only allowed in exceptional circumstances

11
Group project (20%)
• Group of 5-8 students (self sign-up on Canvas)
• Apply visualization tools to form an analytics report
• Identify a data source. Resources on datasets later on
• Find an interesting problem about the data
• Use R, Python or Tableau for analysis and visualization
• Project milestone report submitted in week 6: 2-3pp on data
source, analytical steps, and initial findings
• Project milestone presentation in week 7: 5 minutes each group
• Final presentation: 15 min including Q&A, in the last 2 weeks

12
Textbooks
• Visualization Analysis and Design (VAD). Tamara Munzner, 2014
• Lectures will follow this

• R and Tableau references:


• R: Data Analysis and Visualization. Tony Fischetti and Brett Lantz, 2016
• Tableau Your Data!: Fast and Easy Visual Analysis with Tableau
Software. Daniel G. Murray, 2016

• Python reference:
• François Chollet, 2018. Deep Learning with Python.
13
Software resources
• R:
• Download: https://www.r-project.org/
• Rstudio download: https://www.rstudio.com/products/rstudio/download/

• Tableau Public
• https://public-pantheon.tableau.com/en-us/s/download

• Python:
• Python: pre-installed in mac OS
• Anaconda distribution: https://www.anaconda.com/products/distribution

14
Today’s outline
• What is data visualization, and why use it
• Examples

15
What is data visualization?
• Definition: computer-based visualization systems provide
visual representations of datasets designed to help people
carry out tasks more effectively
• Visualization is suitable when there is a need to augment human
capabilities
• It does not replace human decision making, but makes it better
• Visualization examples

16
17
18
19
Why visualize data?
• The advantage of visualization
• Large datasets are hard to represent by hand
• Visualization is intuitive; more effective representation
• Present details otherwise hard to see
• Dynamic trends over time

20
Gene interaction: messy big data
21
Computer-based visualization systems provide visual representations of datasets
designed to help people carry out tasks more effectively.

• summaries lose information, details matter


confirm expected
–Summaries and find unexpected patterns
lose information;
–details
assessmatter
validity of statistical model
[Anscombe, 1973]
Anscombe’s Quartet
Identical statistics
x mean 9
x variance 10
y mean 7.5
y variance 3.75
x/y correlation 0.816
https://www.youtube.com/watch?v=DbJyPELmhJc

Same Stats, Different Graphs 6

Why show details? Statistical properties can be misleading…


22
23
What design is better?
• Main concern: effectiveness of visualization design for
particular tasks

• Why some designs are more effective than others?

24
Search space metaphor for vis design
25
Which design is better?

26
A: Information density is low

C: Good tradeoff btw density and spatial position


B: Information is compact; spatial position hard to read
Which design is better?

27
What design is better?
• Main concern: effectiveness of visualization design for
particular tasks

• Why some designs are more effective than others?

• Is there a way to measure design effectiveness? Introducing


an analytical framework next time…

28
Next …
• Lab organization

29
Lab Organization
• Provide hands-on experience of data visualization
• Try concepts and materials covered in lectures
• Use software packages, such as R, Tableau, and Python

• Lab components…

30
Lab Components
• 50 min tutorial
• 50 min exercises
• Submit exercises on canvas before Sunday midnight the
same week
• Group discussion of exercises is encouraged
• No tutorial this week, only demos and group sign-up

31
Lab exercises
• Small exercises in R and Tableau (w3-w9)
• Get full credit as long as you submit before the deadline, typically
Sunday the same week
• Late submission is only allowed in exceptional circumstances;
email us in advance

32
Visualization software for the lab
• Tableau Public: https://www.tableau.com/products/public/download
• R: https://www.r-project.org/
• Rstudio: https://www.rstudio.com/products/rstudio/download/
• Python Software
• Python: pre-installed in mac OS
• Anaconda distribution: https://www.anaconda.com/products/distribution

• Today:
• Software demos
• Software installation
• Group sign-up

33
What is Tableau?
• Tableau: one of the fastest growing business analytics and data
visualization tools; easy to share analytics in the cloud
• Good for fast analytics
• Big data; built-in calculations
• Live data update and connection
• Little programming required; drag and drop visualization

• A short demo …
• https://www.youtube.com/watch?v=YfE9jBq002s&ab_channel=T
ableauSoftware
34
Tableau Public
• Tableau public installed in lab, free version, great for sharing
• Download: https://www.tableau.com/products/public/download
• Features of Tableau public…
• Connectivity: Can connect to excel and txt files; cannot connect to
databases
• Security: Cannot save your workbook in private
• Anything you save is saved on Tableau public server (no confidentiality)
• Data limit: limited to 10 mil rows in a single connection
• Account space: 10 GB
• Check out cool viz examples:
• https://public.tableau.com/en-gb/gallery/?tab=viz-of-the-
day&type=viz-of-the-day

35
38

R visualization
• Data table
• Spatial & map
• network
What is R?
• R is a statistical programming language for data analysis
• Powerful, free, similar to Matlab
• You can do almost anything in R, besides data visualization
• Programming required
• Popular IDE (integrated development environment): RStudio

39
R download and installation
• Download R first
• R for Windows:
• https://cran.r-project.org/bin/windows/base/
• R for Mac OS:
• https://cran.r-project.org/bin/macosx/

• Download Rstudio:
• https://www.rstudio.com/products/rstudio/download/
40
Ggplot2 examples

41
42

Python
Visualization
Python Installation
• Python: pre-installed in Mac OS
• Python for Windows:
• https://www.python.org/downloads/windows/
• Anaconda distribution:
https://www.anaconda.com/products/distribution

• Basic libraries: numpy, matplotlib, ggplot, plotly, seaborn …

43
Installing Jupyter Notebook: Easy Way
• Prerequisite: Python
• While Jupyter runs code in many programming
languages, Python is a requirement (Python 3.3 or greater, or
Python 2.7) for installing the Jupyter Notebook
• We recommend using the Anaconda distribution to install Python
and Jupyter
• Download Anaconda:
https://www.anaconda.com/products/distribution
• See example notebook on Canvas
• Youtube demo: https://www.youtube.com/watch?v=a9UrKTVEeZA

44
Alternative: Pip Install
• Open terminal (if using Mac or Linux), or command line in
Windows
• First install pip
• curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
• python get-pip.py
• Then install Python & Jupyter notebook
• pip install python
• pip install jupyterlab
• pip install notebook
• Open Jupyter Notebook
• jupyter notebook

45
46
47
Group Sign-up
• Form groups of 5-8 students for group project
• Sign up on Canvas

48
Questions?
• Contact:
• Angela Lu: qinglilu@cityu.edu.hk
• Jingwen Pan (TA): jingwepan4-c@my.cityu.edu.hk
• Subject line: include “IS6335”

• Office hours: Monday 3-4pm, Lau 6-270

49

You might also like