0% found this document useful (0 votes)
67 views

01 - CM2015 - Introduction To Data Programming (2022-10)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

01 - CM2015 - Introduction To Data Programming (2022-10)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

CM2015 – Programming with Data [SIM – UOL]

Topic 1
Introduction to Data Programming

Learning Outcomes
After completing this topic and the recommended reading, you should be able
to:
• Set up and run Jupyter Notebook on a Windows, Mac or Linux operating
system.
• Use Jupyter Notebook to write and edit code.
• Write and explain simple Python programs using variables and
mathematical operators.

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 1


CM2015 – Programming with Data [SIM – UOL]

1. Introduction to Data Programming

Data (definition)
• “Facts and statistics collected together for reference or analysis.”
[Oxford English Dictionary]

• “Information, especially facts or numbers, collected to be examined and


considered and used to help decision-making, or information in an
electronic form that can be stored and used by a computer.”
[Cambridge Dictionary]

• “Factual information (such as measurements or statistics) used as a basis


for reasoning, discussion, or calculation.”
[Merriam-Webster]

• “Data are individual facts, statistics, or items of information, often


numeric, that are collected through observation. In a more technical
sense, data are a set of values of qualitative or quantitative variables
about one or more persons or objects.”
[Wikipedia]

Information (definition)
• “Facts provided or learned about something or someone.”
[Oxford English Dictionary]

• “Facts or details about a situation, person, event, etc.”


[Cambridge Dictionary]

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 2


CM2015 – Programming with Data [SIM – UOL]

• “Knowledge obtained from investigation, study, or instruction.”


[Merriam-Webster]

• “Knowledge communicated or received concerning a particular fact or


circumstance; knowledge gained through study, communication,
research, instruction, etc.”
[Dictionary.com]

Data vs. Information


• Data
o Raw, unorganised facts that need to be processed.
o Unusable until it is organised.
• Information
o Created when data is processed, organised, and structured.
o Needs to be put in an appropriate context in order to become
useful.

Data Science

Data Processing Information

Programming and Data


• Tasks to undertake for data programming
o Data collection
o Data processing (wrangling)
o Data visualisation

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 3


CM2015 – Programming with Data [SIM – UOL]

o Train and apply algorithms from fields such as machine learning,


statistics, data mining, optimisation, image processing, etc.

• Programming
o The process of producing an executable computer program that
performs a specific task.
o The purpose is to find a sequence of instructions that automate the
implementation of the task for solving a given problem.

• Programming Language
o The source code of a program is written in one or more languages
that are intelligible to humans, rather than machine code, which is
directly executed by the CPU.

o Python
§ https://www.python.org/

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 4


CM2015 – Programming with Data [SIM – UOL]

2. Introduction to Development Environments

Source-code Editors
• Source-code editor, or programming text editor, is a fundamental
programming tools designed specifically for editing source code of
computer programs.
• It highlights the syntax elements of your programs, and provides many
features that aid in your program development.
• Examples:
o Visual Studio Code [https://code.visualstudio.com/]
o Notepad++ (Windows only) [https://notepad-plus-plus.org/]
o Vim [https://www.vim.org/]
o Sublime Text (not open source) [https://www.sublimetext.com/]
o Atom [https://atom.io/]
o Emacs [https://www.gnu.org/software/emacs/]
o TextMate (Macs only) [https://macromates.com/]

o Jupyter
§ https://jupyter.org/

Integrated Development Environments (IDEs)


• Integrated development environment is a software application that
provides comprehensive facilities to computer programmers for software
development.
• An IDE normally consists of at least a source code editor, build
automation tools and a debugger.
• Examples:
o Spyder [https://www.spyder-ide.org/]

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 5


CM2015 – Programming with Data [SIM – UOL]

o RStudio [https://rstudio.com]
o Eclipse [https://www.eclipse.org/]
o Microsoft Visual Studio [https://visualstudio.microsoft.com/vs/]
o Wing Python IDE [https://wingware.com]

Markdown / Markup Languages


• Markdown is a markup language that consists of a set of rules for adding
formatting elements to plain text documents
o Boldface, italics, headers, paragraphs, lists, code blocks, images,
etc.
o https://www.markdownguide.org/
• Invented by John Gruber
o The overriding design goal for Markdown’s formatting syntax is to
make it as readable as possible.
o The idea is that a Markdown-formatted document should be
publishable as-is, as plain text, without looking like it’s been
marked up with tags or formatting instructions
• Examples
o HTML; XML; LaTeX

Version Control Systems


• Version Control is a class of systems responsible for managing changes
to computer programs, documents, large websites, or other collections of
information.
• Version Control Systems (VCS) are software tools that help software
teams manage changes to source code over time.
o Undertakes the tedious task of keeping track of the changes to all
project’s files and who made them

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 6


CM2015 – Programming with Data [SIM – UOL]

o Allows users to recover any previous version at any given time


• Examples:
o Subversion [https://subversion.apache.org]

o Git
§ https://git-scm.com/

o GitHub
§ https://github.com/

Package/Environment Manager
• Package manager, or package management system, is a collection of
software tools that automates the process of installing, upgrading,
configuring, and removing computer programs for a computer in a
consistent manner. Also deals with packages, distributions of software
and data in archive files.
• Environment manager enables personalised, consistent desktop
environments without cumbersome roaming profiles or scripts.
• Example:

o Anaconda
§ https://www.anaconda.com/

Installing Anaconda
• Go to Anaconda, download Anaconda Individual Edition
o https://www.anaconda.com/products/distribution
• Packages include
o conda
§ package management system

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 7


CM2015 – Programming with Data [SIM – UOL]

o pandas, scikit-learn, nltk


§ packages for data science
o Anaconda Navigator
§ a graphical user interface
o QtConsole
§ an interactive Python environment
o Spyder
§ a standard cross-platform IDE for Python
o Jupyter Notebook
§ an interactive web-browser based application for creating
and sharing code

Package Installer for Python (pip)


• pip is the de facto and recommended package-management system
written in Python and is used to install and manage software packages.
• It connects to an online repository of public packages, called the Python
Package Index.
• We use pip to install packages from the Python Package Index
• Examples
o pip install beautifulsoup4
o pip install -r dependencies
§ Install packages based on dependencies in code
o pip freeze
§ See all the packages installed

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 8


CM2015 – Programming with Data [SIM – UOL]

3. Introduction to Python

• Open-source, interpreted, high-level, object-oriented, general-purpose,


easy to download, write and read
• Named for the British comedy group Monty Python
• Simpler language, allow us to focus less on the language and more on
problem solving
• Many of the best parts of other languages are included
o Data structures
o Controls
o Many packages for common tasks

Variables
• Variable is a named piece of memory whose value can change during the
running of the program; constant is a value which cannot change as the
program runs.
o Python doesn’t use constant
• We use variable names to represent objects (number, data structures,
functions, etc.) in our program, to make our program more readable.
o All variable names must be one word, spaces are never allowed.
o Can only contain alpha-numeric characters and underscores.
o Must start with a letter or the underscores character.
o Cannot begin with a number.
o Case-sensitive
o Standard way for most things named in Python is lower with under
§ Lower case with separate words joined by an underscore

Comments

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 9


CM2015 – Programming with Data [SIM – UOL]

• Not processed by the computer, valued by other programmers.


• Header comments
o Appear at beginning of a program or a module
o Provide general information
• Step comments or in-line comments
o Appear throughout program
o Explain the purpose of specific portion of code
• Often comments delineated by
o // comment goes here
o /* comment goes here */
o # Python uses this

Python Operations
• Assignment Operator
o “=”
o Example:
§ a = 67890/12345
# compute the ratio, store the result in ram, assign to a
# the value of a is 5.499392
§ b=a
# b pointing to value of a

• Output
o “print()”
o Example:
§ print(‘Hello World!’) # print the string literals
§ print(a) # print the value of a

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 10


CM2015 – Programming with Data [SIM – UOL]

Data Types in Python


• Declaration of variables in Python is not needed
o Use an assignment statement to create a variable

• Float
o Stores real numbers
o a = 4.6
o print(type(a))

• Integer
o Stores integers
o b = 10
o print(type(b))

• Conversion
o int(a) # convert float to int => 4
o float(b) # convert int to float => 10.0

• Basic arithmetic operators


o 3+2 # Addition => 5
o 5–2 # Subtraction => 3
o 5 * -2 # Multiplication => -10
o 5 / 2.5 # Division => 2.0
o 2**2 # Exponentiation => 4
o 10 % 3 # Modulus => 1

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 11


CM2015 – Programming with Data [SIM – UOL]

o 10 // 3 # Floor Division => 3

• String
o Stores strings
o phrase = ‘All models are wrong, but some are useful.’
o phrase[0:3] # slicing character 0 up to 2
=> All
o phrase.find(‘models’) # find the starting index of word
=> 4
o phrase.find(‘right’) # word not found
=> -1
o phrase.lower() # set to lower case
=> ‘all models are wrong, but
some are useful.’
o phrase.upper() # set to upper case
=> ‘ALL MODELS ARE
WRONG, BUT SOME ARE
USEFUL.’
o phrase.split(‘,’) # split strings into list, base on delimiter
=> [‘All models are wrong’,
‘ but some are useful.’]

• Boolean
o Stores logical or Boolean values of TRUE or FALSE
o k=1>3
o print(k)

o print(type(k))

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 12


CM2015 – Programming with Data [SIM – UOL]

• Logical operators
o Conjunction (AND): “and”
o Disjunction (OR): “or”
o Negation (NOT): “not”
a b a and b a or b not a
T T T T F
T F F T F
F T F T T
F F F F T

Data Structures in Python


• Tuples
o Store ordered collection of objects
o Immutable: elements cannot be modified, added or deleted
o Written with round brackets “( )”
§ tuple1 = (“apple”, “banana”, “cherry”, “orange”, “kiwi”,
“melon”, “mango”)
§ tuple2 = (“Handsome Koh”, 4896, 13.14, True)
o Accessing elements by indexing
§ tuple1[0] # first element index => ‘apple’
§ tuple1[-1] # last element index => ‘mango’
§ tuple1[2:5] # range of elements => (‘cherry’, ‘orange’,
‘kiwi’)

• Lists

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 13


CM2015 – Programming with Data [SIM – UOL]

o Store ordered collection of objects; mutable


o Written with square brackets “[ ]”
§ list1 = [“apple”, “banana”, “cherry”]
§ list2 = [“Handsome Koh”, 4896, 13.14, True]
o Changing elements
§ list1.append(“orange”) # add to last position
=> [‘apple’, ‘banana’, ‘cherry’,
‘orange’]
§ list1[2] = “coconut” # modify index element
=> [‘apple’, ‘banana’, ‘coconut’,
‘orange’]
§ list1.remove(“apple”) # delete elements
=> [‘banana’, ‘coconut’, ‘orange’]
§ list1.insert(2, “durian”) # insert element at position
=> [‘banana’, ‘coconut’, ‘durian’,
‘orange’]

• Sets
o Store unordered, unindexed, nonduplicates collection of objects
o Written with square brackets “{ }”
§ set1 = {“apple”, “banana”, “cherry”}
§ set2 = {“apple”, “samsung”}
o Set operations
§ set1.union(set2) # Union both sets
=> {‘apple’, ‘banana’, ‘cherry’,
‘samsung’}
§ set1.intersection(set2) # Intersect both sets
=> {‘apple’}

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 14


CM2015 – Programming with Data [SIM – UOL]

• Dictionaries
o Store unordered collection of objects
o Written with square brackets “{ }”, and “key:value” pair
§ thisdict = {“brand”: “Ford”, “model”: “Mustang”,
“year”: 1964}
o Accessing/modifying elements by key name
§ thisdict[“model”] => ‘Mustang’
§ thisdist[“year”] = 2018 => {‘brand’: ‘Ford’,
thisdist[“color”] = “red” ‘model’: ‘Mustang’,
‘year’: 2018,
‘color’: ‘red’}

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 15


CM2015 – Programming with Data [SIM – UOL]

4. Introduction to Jupyter Notebook

• Jupyter Notebook is a web-based interactive computing platform.


• “Julia” + “Python” + “R”
• Integrate code and output into a single document contains:
o Live code, mathematical equations, visualisations, and
explanatory/narrative text, interactive dashboards and other media
• Can be easily shared
o Notebook files have “.ipynb” extension
o Export to “.html” and “.pdf” extensions

• Launch “Jupyter Notebook” from “Anaconda Navigator”


• Create new notebook
o “File” à “New Notebook” à “Python 3”
• Exporting notebook
o “File” à “Download as” à “HTML (.html)”
o “File” à “Print Preview” (for PDF)
• Shutting Down Jupyter
o “File” à “Close and Halt”
o Quit

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 16


CM2015 – Programming with Data [SIM – UOL]

5. Exercises

1.301 Practice Exercises (Coursera)


• Refers to “1.301 part-1.html”

1.302 A bit more Python – our first downloadable notebook!


(Coursera)
• Refers to “1.302 pythonPractice.html”

1.304 World’s Population


• Refers to “1.304 Topic 1 - Lab.html”

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 17


CM2015 – Programming with Data [SIM – UOL]

6. Practice Quiz
• Work on Practice Quiz 01 posted on Canvas.

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 18


CM2015 – Programming with Data [SIM – UOL]

Useful Resources

o http://

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 19

You might also like