0% found this document useful (0 votes)

37 views51 pages

01 Intro Bits Bytes

2^20 = 1,048,576 possible messages with 20 yes/no questions. If some messages are disallowed or more likely than others, we could assign probabilities to the messages and use an entropy encoding like Huffman coding to compress the messages into fewer bits on average.

Uploaded by

Cường Darma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views51 pages

01 Intro Bits Bytes

Uploaded by

Cường Darma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Introduction:

algorithms (recapitulation),
bits, strings
Jesper Larsson
Course, teaching
• Me: Jesper Larsson, string and compression algorithms
person, teaching at MAU since 2014, research
background in string algorithms and data compression

• You?

• Languages? Spoken and for programming

• Lectures, assignments, computer time

• Not based on book, but exists in books

• Small and familiar course

Lectures (preliminary)
1. Intro, bits and strings
2. Bucket and radix sorting
3. Trie (digital tree)
4. Su x tree
5. Inverted le (search engine data structure) + regular expressions
6. Su x array
7. Su x data structure algorithms, supplemental
8. Information theory, codes, entropy coding
9. More on codes and their applications, Ziv-Lempel compression
10. The Burrows–Wheeler transform (BWT)
11. Substring search (KMP, BM, Karp–Rabin)
12. Catching up and summary
ffi
ffi
ffi
fi
I expect that you know

• Sorting and searching taught on basic algorithms

courses: binary search trees, hash tables, { selection,
insertion, quick-, merge-}-sort

• Programming

• Principles of algorithm analysis, O-notation etc.

Assignments (preliminary)

1. Word frequencies + compact le

2. Radix sorting (general interface)
3. Word frequencies 2 (trie) + search engine
4. Su x sorting
5. Entropy coding
6. BWT
ffi
fi
Today

• Brief recapitulation of algorithm time complexity

• We take a step back: digital representation of information

• Bitwise operations, numbers

(Not what this course is supposed to teach, but you are
going to need this in your programming assignments)

• If time: counting/bucket sort (“key-indexed counting”)

Algorithmic research
• Come up with algorithms for speci c problems

• Determine the “speed” of algorithms

• Find “faster” algorithms

• Prove that:
- A speci c algorithm does what it’s supposed to
- A speci c algorithm has a certain “speed”
- There can be no algorithm for the problem “faster”
than a certain “speed”
fi
fi
fi
how many times
do you have to
turn the crank?

Charles Babbage’s
analytical engine,
“programmed” by
Ada Lovelace
Time complexity of algorithm
• T(N) = a measure for the time it takes to run the program
on an input of size N

• Approximate with “at most proportional to”, O-notation

~, Θ, O

Ex 1. ⅙ N 3 + 20 N + 16 ~ ⅙N3 is Θ(N3)
Ex 2. ⅙ N 3 + 100 N 4/3 + 56 ~ ⅙N3 = proportional to N3
Ex 3. ⅙N3 - ½N 2 + ⅓ N ~ ⅙N3 = cubic

Most common measure

f(N) is O(g(N)) means: f(N) is at most proportional to g(N)
f(N) is Θ(g(N)) means: f(N) is precisely proportional to g(N)
g(N) is the “best” function such that f(N) is O(g(N))

Formally
f(N) is O(g(N)) means: ∃ constants N0 and c so that if N > N0, then |f(N)| < c·g(N)
f(N) is Ω(g(N)) means: ∃ constants N0 and c so that if N > N0, then |f(N)| > c·g(N)
f(N) is Θ(g(N)) means: f(N) is both O(g(N)) and Ω(g(N))
Time for sorting with
pairwise compares
log2

• Upper bound: ~N lg N compares (given by mergesort)

• Lower bound: ?

• Optimal algoritm: ?

Start with considering how many possible orderings there are

Decision tree to nd, using compare, which possible ordering of 3 elements is correct

a<b
tree heigh =
yes no number of
compares in the

b<c a<c

yes
no yes no

abc a<c bac b<c

yes no yes no

acb cab bca cba

one leaf per possible ordering: 3! = 6 leaves

fi
Decision tree for possible orderings of N values

• N values a1 to aN. Assume they are all di erent (a case we need to manage).
• N! = N · (N−1) · (N−2) · … · 3 · 2 · 1 di erent orderings
• Tree with compares (internal nodes), with orderings as leaves:

at least N! leaves no more than 2h leaves

• Worst case time: height h.

2 h ≥ N!
• Binary tree med h levels: at most 2h leaves
h ≥ lg (N!) ~ N lg N

Stirling's approximation

• Conclusion: Any algorithm must use lg (N!) ~ N lg N compares (worst case)

ff
ff
Stupid question?
• Well known: computers at machine level represent
everything using only 0 and 1

• So how can computers process and output graphics,

audio, or even text, which don’t look like 0s and 1s?

• What does it even mean “only 0 and 1”?

• (How to explain this to someone without comp sci

knowledge?)
Blondinrikard Fröberg, Listen closely https:// ic.kr/p/tRbAcU

στρατός (formerly known as Michelangelo_MI), At the end of the track https:// ic.kr/p/4wMSNh
fl
fl
Sound
• A sequence of amplitude values in binary
representatio

• Parameters: frequency, bits per sample…

Pictures
• Bit-mapped
PDF, Postscript, …

PNG, JPEG, GIF, …

ht
tp
:/
/b
lo
g.
xk
cd
.c
om
/2
01
0/
05
/0
3/
co
lo
r-
su
rv
ey
-r
es
ul
ts
/
Pictures
• Colors: RGB

• 3 byte numbers, 256×256×256

= 16777216 different colors
D. B. Gaston, Arabian Nights text cropped https:// ic.kr/p/5QvRXv

fl
7 bit ascii
International (e.g.
Scandinavian) characters
• Replace some glyphs [ ] \ { }
• Use 8 bits: Latin-1 (ISO 8859-1
• Replace for new chars (€ etc.): Latin-9
(ISO-8859-15
• Microsoft variant: Windows-125
• Unicode multibyte: UTF-8 de-facto standard?
)

UTF-8
No of bits we Bytes in
Byte 1 Byte 2 Byte 3 Byte 4
need UTF-8 value

7 1 0xxxxxxx

8–11 2 110xxxxx 10xxxxxx

12–16 3 1110xxxx 10xxxxxx 10xxxxxx

17–21 4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Two approaches (which are
both kind of the same)

• Recursively split the universe in two parts. Denote the

parts 1 and 0. Binary tree with what we want to represent
in the leaves

• Assign numbers to everything we want to represent.

Encode in base 2 (binary numbers)
Game: 20 questions

• How many ”messages” are possible with 20 questions =

bits?

• What if some messages are (much) more likely than

others?

• (How many bits would we need to distinguish any

message ever written by a human?)
X has probability Claude Shannon

Optimal number of bits to represent X:

1
log2 bits
p
p

~ Unary number representation

Positional number system

ِ ‫خ َوا ِرزْم‬ َ ‫َعبْ َداهلل ُم َح َّمد ِبن ُم‬

ْ ‫وسى ا َ ْل‬
Abū ʿAbdallāh Muḥammad ibn Mūsā al-Khwārizmī
Indian positional system → “Arabic” numbers
782 = 2·100
+ 8·101
+ 7·102

2 7
1 3 6 8

4 5

0 9
Binary number
representation
decimal:
782 = 2·100
+ 8·101
+ 7·102

binary:
1100001110 = 0·20
+ 1·21
+ 1·22
+ 1·23
+ 0·24
+ 0·25
+ 0·26
+ 0·27
+ 1·28
+ 1·29
“8 Questions” for unsigned
numbers (octets or “bytes”)
00000000:
00000001:
00000010:
00000011:
00000100:
⋮
01111110: 12
01111111: 12
10000000: 12
10000001: 12
10000010: 13
⋮
11111110: 25
11111111: 255
4

Signed 8-bit numbers (octets

or “bytes”): two’s complement
00000000: 10000000: −12
00000001: 10000001: −12
00000010: 10000010: −12
00000011: ⋮
00000100: 11111110: −
⋮ 11111111: −1
01111110: 12 00000000:
01111111: 12 00000001:
10000000: −12 00000010:
10000001: −12 00000011:
10000010: −12 00000100:
⋮ ⋮
11111110: − 01111110: 12
11111111: −1 01111111: 127
4

Floating-point representation
0.250244140625:

Single precision, 32 bits: 1 sign, 8 exponent, 23 mantissa

0 01111101 00000000010000000000000

sign 0: positive
exponent: 01111101 = 125, subtract 127 (exponent bias): −2
mantissa: 1·20 (implicit rst 1 bit)
+ 0·2−1
+ 0·2−2
+ 0·2−3
⋮
+ 0·2−9
+ 1·2−10
+ 0·2−11
⋮
= 1.0009765625

+ 1.0009765625 · 2−2 = 0.250244140625

fi
Bitwise operations
• and & ∧ ·

• or | ∨ +

• xor ^ ⊻ ⊕

• not ~ ¬ ¯

• <<

• >>, >>>
a b a&b
0 0 0
0 1 0
1 0 0
1 1 1

a & b is true only if both a and b are true

a b a|b
0 0 0
0 1 1
1 0 1
1 1 1
a ∨ b is true if at least one of a and b is true
a b a^b
0 0 0
0 1 1
1 0 1
1 1 0

a ⊕ b is true when a and b are not equal

• Setting bit to 1: or 1 (or 0 means don't change)

• Setting bit to 0: and 0 (and 1 means don't change

• Flipping bit: xor 1 (xor 0 means don't change)

decimal: 9 + 5 = 14 decimal: 11 + 7 = 18
binary: 1001 + 101 = 1110 binary: 1011 + 111 = 10010

carry C
A
B

Each row is an operation withthree input bits and two output bits

outi = Ai ^ Bi ^ Ci
Ci+1 = Ai & Bi | Ci & (Ai ^ Bi)
Groups of bits

• Groups of 3: octal

• Groups of 4: hex[adecimal]

• groups of 8: byte “strings”

(One strange representation: IPv4 32-bit address: “192.168.10.199”)
Let’s calculate

• 2022 * 666
= 1346652

• 111|11100110 * 10|10011010
= 10100|10001100|01011100
2022 & 0x = 230
2022 >>> 8 = 7 7 230

666 & 0x = 154 2 154

666 >>> 8 = 2
ff
ff
154*230 & 0x = 92
154*230 >>> 8 = 138 7 230

138 2 154

92
ff
154*7 + 138 & 0x = 192
154*7 + 138 >>> 8 = 4 7 230

2 154

ff
4 192 92
2*230 & 0x = 204
2*230 >>> 8 = 1 7 230

1 2 154

4 192 92

204
ff
2*7 + 1 = 15
7 230

2 154

4 192 92

15 204
192 + 204 & 0x = 140
192 + 204 >>> 8 = 1 7 230

1+4+15 = 20 2 154

4 192 92

15 204

20 140 92

20·2562 + 140·2561 + 92·2560 = 1346652

= 666·2022
ff
Endianness

• Little-endian: byte 0 at address 0, byte 1 at address 1, …

Makes sense because it's analogous to bit 0 being the least signi cant bit

• Big-endian: byte 0 at the last address of the word,

most signi cant byte at address 0…
Makes sense because you can sort (unsigned) numbers as if they were strings
fi
fi
Counting multi-byte ints
A rst go at sorting in
less than O(N log N)

Next lecture!
fi

GS5102 - Introduction To Human Rights - Labour Law
No ratings yet
GS5102 - Introduction To Human Rights - Labour Law
114 pages
CS50 Lecture 0 Notes
100% (1)
CS50 Lecture 0 Notes
13 pages
10 SC Cases On Legal Profession
100% (1)
10 SC Cases On Legal Profession
11 pages
Number Systems and Number Representation: Princeton University
No ratings yet
Number Systems and Number Representation: Princeton University
51 pages
Revision - Part 2
No ratings yet
Revision - Part 2
53 pages
DE Unit 1
No ratings yet
DE Unit 1
116 pages
DLC U1 - EE8351
No ratings yet
DLC U1 - EE8351
52 pages
The Language of Bits: Computer Organisation and Architecture
No ratings yet
The Language of Bits: Computer Organisation and Architecture
72 pages
UNIT1 - Introduction Number Systems and Conversion PDF
No ratings yet
UNIT1 - Introduction Number Systems and Conversion PDF
33 pages
DLD 3 4
No ratings yet
DLD 3 4
77 pages
Binary
No ratings yet
Binary
8 pages
Logic Contents All Chapters Except 5
No ratings yet
Logic Contents All Chapters Except 5
198 pages
Logic Gates
No ratings yet
Logic Gates
36 pages
Chapter - 6 Digital Electronics
No ratings yet
Chapter - 6 Digital Electronics
75 pages
chapter 1&2
No ratings yet
chapter 1&2
25 pages
CSE031.Lecture 02.data Representation - Part I
No ratings yet
CSE031.Lecture 02.data Representation - Part I
29 pages
Switching Theory and Logic Circuits
100% (1)
Switching Theory and Logic Circuits
159 pages
Switching Theory and Logic Circuits
No ratings yet
Switching Theory and Logic Circuits
159 pages
Unit 1
No ratings yet
Unit 1
36 pages
АКТ Лекция 3
No ratings yet
АКТ Лекция 3
8 pages
2019 2020 CSE206 Week09 Ch9 Ch10 Number Systems and Computer Arithmetic
No ratings yet
2019 2020 CSE206 Week09 Ch9 Ch10 Number Systems and Computer Arithmetic
39 pages
Cours Struct M Chapitre1
No ratings yet
Cours Struct M Chapitre1
46 pages
Fundamental of Computer Science: Subject code-IMIT-1101
No ratings yet
Fundamental of Computer Science: Subject code-IMIT-1101
48 pages
Lesson 3 - Binary Maths
No ratings yet
Lesson 3 - Binary Maths
128 pages
DLD
No ratings yet
DLD
8 pages
Cs302 Lec03
No ratings yet
Cs302 Lec03
56 pages
Addition in Binary and Hexadecimal: 0 + 0 0 0 + 1 1 1 + 0 1 1 + 1 0 Carry 1
No ratings yet
Addition in Binary and Hexadecimal: 0 + 0 0 0 + 1 1 1 + 0 1 1 + 1 0 Carry 1
15 pages
02-Data Representation in The Computer Systems
No ratings yet
02-Data Representation in The Computer Systems
42 pages
DDMP Unit 1 Updated 2024
No ratings yet
DDMP Unit 1 Updated 2024
65 pages
Object Oriented Programming in Java Binary and Hexadecimal Numeration and Logical Operations
No ratings yet
Object Oriented Programming in Java Binary and Hexadecimal Numeration and Logical Operations
12 pages
Lec1 Digitalsystem Chap1
No ratings yet
Lec1 Digitalsystem Chap1
35 pages
Unit 1
No ratings yet
Unit 1
36 pages
Bits and Bytes
No ratings yet
Bits and Bytes
29 pages
Cs8351 Digital Principles and System Design
No ratings yet
Cs8351 Digital Principles and System Design
161 pages
Running Time of Programslucifer
No ratings yet
Running Time of Programslucifer
16 pages
Computer Organization Lecture III-1
No ratings yet
Computer Organization Lecture III-1
26 pages
Ch. 1 Digital Systems - Tagged
No ratings yet
Ch. 1 Digital Systems - Tagged
28 pages
TM112 Book
No ratings yet
TM112 Book
473 pages
DCN 157 - Introduction To IT (Lecture 3)
No ratings yet
DCN 157 - Introduction To IT (Lecture 3)
31 pages
Lecture Notes Fit1047
No ratings yet
Lecture Notes Fit1047
74 pages
ZNote For Theory A Level
No ratings yet
ZNote For Theory A Level
36 pages
CH - 1 - Digital Systems and Binary Numbers - S2-20-21
No ratings yet
CH - 1 - Digital Systems and Binary Numbers - S2-20-21
62 pages
ZNote For Theory
No ratings yet
ZNote For Theory
36 pages
Digital Logic & Design: Dr. Waseem Ikram
No ratings yet
Digital Logic & Design: Dr. Waseem Ikram
42 pages
Week 10-11 Module
No ratings yet
Week 10-11 Module
52 pages
Chapter 1 Digital Systems and Binary Numbers
No ratings yet
Chapter 1 Digital Systems and Binary Numbers
60 pages
CS34 Digital Principles and System Design
No ratings yet
CS34 Digital Principles and System Design
107 pages
DPSD Lecture Notes
No ratings yet
DPSD Lecture Notes
108 pages
The Design of C: A Rational Reconstruction: Goals of This Lecture
No ratings yet
The Design of C: A Rational Reconstruction: Goals of This Lecture
18 pages
Digital Electronics SPH 323
No ratings yet
Digital Electronics SPH 323
153 pages
Lecture2 Number Systems
No ratings yet
Lecture2 Number Systems
32 pages
Math For Developers
100% (1)
Math For Developers
39 pages
Curriculum PDF
No ratings yet
Curriculum PDF
7 pages
Chapter 2: Number System
100% (1)
Chapter 2: Number System
60 pages
Week 0 - Lectures
No ratings yet
Week 0 - Lectures
34 pages
Chapter 1-DigitalSystemsInformation
No ratings yet
Chapter 1-DigitalSystemsInformation
45 pages
STLD UNIT-I Notes
No ratings yet
STLD UNIT-I Notes
18 pages
Lecture 1 - ١٢٢١٣٠
No ratings yet
Lecture 1 - ١٢٢١٣٠
46 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
Nell: An SVG Drawing Language
From Everand
Nell: An SVG Drawing Language
Stefan Hollos
No ratings yet
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet
Sanjay Kumar Ekka LLR
No ratings yet
Sanjay Kumar Ekka LLR
1 page
95ca4a94 7
No ratings yet
95ca4a94 7
2 pages
Babison Shrestha - Z Test Assignment-Compressed
No ratings yet
Babison Shrestha - Z Test Assignment-Compressed
20 pages
Interactive Mechanical Model For Shear Strength of Deep-Tan
No ratings yet
Interactive Mechanical Model For Shear Strength of Deep-Tan
11 pages
Flat Slab 6 X 6 Int PNL
100% (1)
Flat Slab 6 X 6 Int PNL
4 pages
Infiniti g35
No ratings yet
Infiniti g35
3 pages
Tuesday, April 5, 2011
No ratings yet
Tuesday, April 5, 2011
8 pages
Wingo - 4HML09
No ratings yet
Wingo - 4HML09
3 pages
Practical - Regression
No ratings yet
Practical - Regression
114 pages
Ferguson 1974
No ratings yet
Ferguson 1974
19 pages
MR Birling Notes
No ratings yet
MR Birling Notes
1 page
DoE Prevention Sexual Harassment
No ratings yet
DoE Prevention Sexual Harassment
16 pages
Pack Bienvenida - Asegurado HC0010342 01 - PATRICIA PAIXAO MARTINS - P363liza 709009 - Fecha 09-07-2024 - (8545962,0)
No ratings yet
Pack Bienvenida - Asegurado HC0010342 01 - PATRICIA PAIXAO MARTINS - P363liza 709009 - Fecha 09-07-2024 - (8545962,0)
113 pages
Company-Profile Sika-Indonesia EN
100% (1)
Company-Profile Sika-Indonesia EN
13 pages
DS Group
No ratings yet
DS Group
3 pages
170011920671magadh University Plagiarism Policy
No ratings yet
170011920671magadh University Plagiarism Policy
7 pages
Mechanical Project Milling Fixture
100% (1)
Mechanical Project Milling Fixture
22 pages
Learn The Fundamentals of AWS Cloud: Learning Resource Duration Type
No ratings yet
Learn The Fundamentals of AWS Cloud: Learning Resource Duration Type
4 pages
Done Project - The - Effect - of - Corrupt - Government - On - Tertiary - Institutions - Colleges
No ratings yet
Done Project - The - Effect - of - Corrupt - Government - On - Tertiary - Institutions - Colleges
14 pages
6 Sampling Design - MR-5
No ratings yet
6 Sampling Design - MR-5
31 pages
Transport Requisition Form
No ratings yet
Transport Requisition Form
1 page
Advanced Geotechnical Design and Slope Stability Analysis of Tailings Dams
No ratings yet
Advanced Geotechnical Design and Slope Stability Analysis of Tailings Dams
5 pages
Caterpillar Crawler Service Manual CT S d5 6r
No ratings yet
Caterpillar Crawler Service Manual CT S d5 6r
18 pages
SK6812 LED Datasheet
No ratings yet
SK6812 LED Datasheet
16 pages
Case 2
No ratings yet
Case 2
3 pages
Ethylene Oxide (ETO) Detection in Readymade Spice Mixes
No ratings yet
Ethylene Oxide (ETO) Detection in Readymade Spice Mixes
4 pages
322-70 Smoke Goggles
No ratings yet
322-70 Smoke Goggles
1 page
The Taxperts Group
No ratings yet
The Taxperts Group
2 pages

01 Intro Bits Bytes

Uploaded by

01 Intro Bits Bytes

Uploaded by

Introduction:

• Languages? Spoken and for programming

• Lectures, assignments, computer time

• Not based on book, but exists in books

• Small and familiar course

• Sorting and searching taught on basic algorithms

• Principles of algorithm analysis, O-notation etc.

1. Word frequencies + compact le

• Brief recapitulation of algorithm time complexity

• We take a step back: digital representation of information

• Bitwise operations, numbers

• If time: counting/bucket sort (“key-indexed counting”)

• Determine the “speed” of algorithms

• Find “faster” algorithms

• Approximate with “at most proportional to”, O-notation

Most common measure

• Upper bound: ~N lg N compares (given by mergesort)

Start with considering how many possible orderings there are

abc a<c bac b<c

acb cab bca cba

one leaf per possible ordering: 3! = 6 leaves

at least N! leaves no more than 2h leaves

• Worst case time: height h.

• Conclusion: Any algorithm must use lg (N!) ~ N lg N compares (worst case)

• So how can computers process and output graphics,

• What does it even mean “only 0 and 1”?

• (How to explain this to someone without comp sci

• Parameters: frequency, bits per sample…

PNG, JPEG, GIF, …

• 3 byte numbers, 256×256×256

8–11 2 110xxxxx 10xxxxxx

12–16 3 1110xxxx 10xxxxxx 10xxxxxx

17–21 4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

• Recursively split the universe in two parts. Denote the

• Assign numbers to everything we want to represent.

• How many ”messages” are possible with 20 questions =

• What if some messages are (much) more likely than

• (How many bits would we need to distinguish any

Optimal number of bits to represent X:

~ Unary number representation

ِ ‫خ َوا ِرزْم‬ َ ‫َعبْ َداهلل ُم َح َّمد ِبن ُم‬

Signed 8-bit numbers (octets

Single precision, 32 bits: 1 sign, 8 exponent, 23 mantissa

+ 1.0009765625 · 2−2 = 0.250244140625

a & b is true only if both a and b are true

a ⊕ b is true when a and b are not equal

• Setting bit to 0: and 0 (and 1 means don't change

• Flipping bit: xor 1 (xor 0 means don't change)

• groups of 8: byte “strings”

666 & 0x = 154 2 154

20·2562 + 140·2561 + 92·2560 = 1346652

• Little-endian: byte 0 at address 0, byte 1 at address 1, …

• Big-endian: byte 0 at the last address of the word,

You might also like