Week 1
Week 1
Angela Lu
Week 1
Information Systems, CB
City University of Hong Kong
About me…
• Assistant Professor of Information Systems
• PhD in sociology from Stanford University
• Research areas: sharing economy, IT career, IT innovation,
computational social science, social network analysis
2
What is data visualization?
• Definition: computer-based visualization systems provide
visual representations of datasets designed to help people
carry out tasks more effectively
3
Why visualize data?
• Data visualization helps decision making
• Big data is messy and hard to make sense of
• A graph is worth 1000 words (intuitive to understand)
• Present details in data
• Helps formulate interesting questions
• Speed up analytical process
4
Intended goals
• Know ways to visualize and present data, using software
packages such as R, Tableau, and Python
• Know what design is effective, and why
• Extract information from visual presentations; know what
questions to ask
5
Intended audience
• No prerequisites
• No assumed coding background
• Many areas helpful but not required
• computer programming, statistics, business analytics
6
Schedule
• Lectures on Wednesday (S61) and Thursday (S62), 7-9:50 pm
• Semester A 2022/23, 13 weeks
• Materials are posted on Canvas
• Data visualization tools: R, Tableau, Python
• 50 min tutorial, 50 min lab exercises
• No tutorial during first week, only lab organizations and group
sign-up
7
Modules
Basics Data structure Advanced Topics
9
Individual project (40%)
• Similar to a research paper
• Completed individually
• Apply visualization techniques and concepts to address a real-
world issue
• Research question description
• Data description
• Design visuals and interpret insights
• Research and business implications
• Comprehensive written report (5-10 pp.)
10
Lab exercise (30%)
• 50 min tutorial, 50 min in-class exercises, 10 min break in
between
• Weekly exercises in R and Tableau (w3-w9), complete individually
and submit on canvas
• 10-15 min of discussion of the answers at the beginning of next
tutorial
• Get full credit as long as you submit at the end of the week
• Late submission is only allowed in exceptional circumstances
11
Group project (20%)
• Group of 5-8 students (self sign-up on Canvas)
• Apply visualization tools to form an analytics report
• Identify a data source. Resources on datasets later on
• Find an interesting problem about the data
• Use R, Python or Tableau for analysis and visualization
• Project milestone report submitted in week 6: 2-3pp on data
source, analytical steps, and initial findings
• Project milestone presentation in week 7: 5 minutes each group
• Final presentation: 15 min including Q&A, in the last 2 weeks
12
Textbooks
• Visualization Analysis and Design (VAD). Tamara Munzner, 2014
• Lectures will follow this
• Python reference:
• François Chollet, 2018. Deep Learning with Python.
13
Software resources
• R:
• Download: https://www.r-project.org/
• Rstudio download: https://www.rstudio.com/products/rstudio/download/
• Tableau Public
• https://public-pantheon.tableau.com/en-us/s/download
• Python:
• Python: pre-installed in mac OS
• Anaconda distribution: https://www.anaconda.com/products/distribution
14
Today’s outline
• What is data visualization, and why use it
• Examples
15
What is data visualization?
• Definition: computer-based visualization systems provide
visual representations of datasets designed to help people
carry out tasks more effectively
• Visualization is suitable when there is a need to augment human
capabilities
• It does not replace human decision making, but makes it better
• Visualization examples
16
17
18
19
Why visualize data?
• The advantage of visualization
• Large datasets are hard to represent by hand
• Visualization is intuitive; more effective representation
• Present details otherwise hard to see
• Dynamic trends over time
20
Gene interaction: messy big data
21
Computer-based visualization systems provide visual representations of datasets
designed to help people carry out tasks more effectively.
24
Search space metaphor for vis design
25
Which design is better?
26
A: Information density is low
27
What design is better?
• Main concern: effectiveness of visualization design for
particular tasks
28
Next …
• Lab organization
29
Lab Organization
• Provide hands-on experience of data visualization
• Try concepts and materials covered in lectures
• Use software packages, such as R, Tableau, and Python
• Lab components…
30
Lab Components
• 50 min tutorial
• 50 min exercises
• Submit exercises on canvas before Sunday midnight the
same week
• Group discussion of exercises is encouraged
• No tutorial this week, only demos and group sign-up
31
Lab exercises
• Small exercises in R and Tableau (w3-w9)
• Get full credit as long as you submit before the deadline, typically
Sunday the same week
• Late submission is only allowed in exceptional circumstances;
email us in advance
32
Visualization software for the lab
• Tableau Public: https://www.tableau.com/products/public/download
• R: https://www.r-project.org/
• Rstudio: https://www.rstudio.com/products/rstudio/download/
• Python Software
• Python: pre-installed in mac OS
• Anaconda distribution: https://www.anaconda.com/products/distribution
• Today:
• Software demos
• Software installation
• Group sign-up
33
What is Tableau?
• Tableau: one of the fastest growing business analytics and data
visualization tools; easy to share analytics in the cloud
• Good for fast analytics
• Big data; built-in calculations
• Live data update and connection
• Little programming required; drag and drop visualization
• A short demo …
• https://www.youtube.com/watch?v=YfE9jBq002s&ab_channel=T
ableauSoftware
34
Tableau Public
• Tableau public installed in lab, free version, great for sharing
• Download: https://www.tableau.com/products/public/download
• Features of Tableau public…
• Connectivity: Can connect to excel and txt files; cannot connect to
databases
• Security: Cannot save your workbook in private
• Anything you save is saved on Tableau public server (no confidentiality)
• Data limit: limited to 10 mil rows in a single connection
• Account space: 10 GB
• Check out cool viz examples:
• https://public.tableau.com/en-gb/gallery/?tab=viz-of-the-
day&type=viz-of-the-day
35
38
R visualization
• Data table
• Spatial & map
• network
What is R?
• R is a statistical programming language for data analysis
• Powerful, free, similar to Matlab
• You can do almost anything in R, besides data visualization
• Programming required
• Popular IDE (integrated development environment): RStudio
39
R download and installation
• Download R first
• R for Windows:
• https://cran.r-project.org/bin/windows/base/
• R for Mac OS:
• https://cran.r-project.org/bin/macosx/
• Download Rstudio:
• https://www.rstudio.com/products/rstudio/download/
40
Ggplot2 examples
41
42
Python
Visualization
Python Installation
• Python: pre-installed in Mac OS
• Python for Windows:
• https://www.python.org/downloads/windows/
• Anaconda distribution:
https://www.anaconda.com/products/distribution
43
Installing Jupyter Notebook: Easy Way
• Prerequisite: Python
• While Jupyter runs code in many programming
languages, Python is a requirement (Python 3.3 or greater, or
Python 2.7) for installing the Jupyter Notebook
• We recommend using the Anaconda distribution to install Python
and Jupyter
• Download Anaconda:
https://www.anaconda.com/products/distribution
• See example notebook on Canvas
• Youtube demo: https://www.youtube.com/watch?v=a9UrKTVEeZA
44
Alternative: Pip Install
• Open terminal (if using Mac or Linux), or command line in
Windows
• First install pip
• curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
• python get-pip.py
• Then install Python & Jupyter notebook
• pip install python
• pip install jupyterlab
• pip install notebook
• Open Jupyter Notebook
• jupyter notebook
45
46
47
Group Sign-up
• Form groups of 5-8 students for group project
• Sign up on Canvas
48
Questions?
• Contact:
• Angela Lu: qinglilu@cityu.edu.hk
• Jingwen Pan (TA): jingwepan4-c@my.cityu.edu.hk
• Subject line: include “IS6335”
49