Lecture1 Introduction PDF
Lecture1 Introduction PDF
Lecture1 Introduction PDF
1
Syllabus and Mechanics
• SIC 201, Tuesday, & Friday 330-500PM
• Mechanics:
– Attending Classes – Mandatory
– Evaluation
• Quiz: Two – 20%
• Project/Presentation: 30%
– Midterm : 25%
– Final Exam: 30%
2
What we shall study
• Sequential + Concurrency + Distribution
– Nondeterminism– Guarded Command Notation
– Shared Variable Concurrency, Interleaving semantics
– Mutual Exclusion and other Classic Synchronization Problems
– Language Abstractions: Monitors
– Distributed Programming via Communication primitives
• Communicating Sequential processes Notation (CSP)
– Understand subtleties among Concurrency, parallelism, distribution
– Distributed Mutual exclusion, Remote Procedure Calls, Time Synchronization
• Strategies for Parallel Programming
– Thread Parallelism, Run Time Scheduling, Work stealing
• Logging, Recovery, Crashes, Concurrency Control, CDN (content delivery networks)
• PGAS Parallel Languages: X10 Features
• Map-reduce
• Consensus: Paxos, BFT, SBFT, PBFT
– Blockchains, Cryptocurrencies
3
Whither Programming: Sequential
to Parallel!
4
Software Crisis
“To put it quite bluntly: as long as there were no
machines, programming was no problem at all;
when we had a few weak computers,
programming became a mild problem, and
now we have gigantic computers,
programming has become an equally gigantic
problem."
Edsgar Dijkstra, 1972, Turing Award Lecture
5
First Software Crisis: 1960s-70s
6
Second Software Crisis: 80s and ’90s
Problem:
• Inability to build and maintain complex and robust applications requiring
multi-million lines of code developed by hundreds of programmers
– Needed to get Composability, Malleability and Maintainability
– High-performance was not an issue • left to Moore’s Law
• Programming in the Large Vs Programming in the Small
– Coordination, discipline of software system management.
– software management needed to focus from software requirements to realization and
at the same time manage migration of systems to new technology or new requirements.
• Solution:
– Object Oriented Programming - C++, C# and Java
– Better tools:
• Component libraries, Purify
– Better software engineering methodology
– Design patterns, specification, testing, code reviews
• UML ….
7
Parallel Programming Models
• Data parallel model.:
– same or similar computations are performed on different data repeatedly.
Image processing algorithms that apply a filter to each pixel are a common
example of data parallelism.
– OpenMP is an API that is based on compiler directives that can express a data
parallel model.
• Task parallel model. independent works are encapsulated in functions to
be mapped to individual threads, which execute asynchronously. Thread
libraries (e.g., the Win32 thread API or POSIX* threads) are designed to
express task-level concurrency.
• Hybrid models. Sometimes, more than one model may be applied to solve
one problem, resulting in a hybrid algorithm model. A database is a good
example of hybrid models. Tasks like inserting records, sorting, or indexing
can be expressed in a task-parallel model, while a database query uses the
data-parallel model to perform the same operation on different data.
8
Methodology for Parallelization
• Basis for enhancement of performance via parallel processing basically lies
with decomposition techniques. Dividing a computation into smaller
computations and assigning these to different processors for execution are
two key steps in parallel design. Two of the most common decomposition
techniques
• Functional decomposition is used to introduce concurrency in the
problems that can be solved by different independent tasks. All these
tasks can run concurrently.
• Data decomposition works best on an application that has a large data
structure. By partitioning the data on which the computations are
performed, a task is decomposed into smaller tasks to perform
computations on each data partition. The tasks performed on the data
partitions are usually similar. There are different ways to perform data
partitioning: partitioning input/output data or partitioning intermediate
data.
– Static/Dynamic
9
How much do we gain in performance
Amdahl’s law provides the theoretical basis to assess the potential benefits of
converting a serial application to a parallel one. It predicts the speedup
limit of parallelizing an application:
Tparallel = {( 1 – P ) + P / N} * Tserial + O(N)
where Tserial: time to run an application in serial version,
P: parallel portion of the process,
N: number of processors,
O(N): parallel overhead in using N threads.
Predict the speedup by looking at the scalability:
Scalability = Tserial / Tparallel
Theoretical limit of scalability (assuming there is no parallel overhead):
Scalability = 1/ {(1– P) + P/N}
When N→∞, Scalability→1/ (1– P)
10
Challenges
• Decomposing applications to units of
computation that can be execued concurrently.
11
Parallel Programming Languages
• Parallel programming languages have not been able to address the issues
of natural specification of parallel algorithms and handling of the
architectural features to achieve performance concurrently.
12
Today: Programmers are
Oblivious about the Processors
13
Emerging Software Crisis: 2002 – 20??
• Problem: Sequential performance is left behind by
Moore’s law
– Needed continuous and reasonable performance
improvements
• To support new features to support larger datasets
• While sustaining portability, malleability and maintainability
without unduly increasing complexity faced by the programmer
– Critical to keep-up with the current rate of evolution in
software
.
Saman Amarasinghe, 6.189 Multicore Programming Primer, January (IAP) 2007. (Massachusetts Institute of Technology: MIT OpenCourseWare)
14
Programmers View of Memory
Till 1985
• 386,ARM, MIPS, SPARC
• 1-10 MHZ
• ext mem Cyc – 1 cycle
• Most Inst -- 1 Cycle
• Lang. C -ops- 1 or 2 Cycles
•On chip computation (Clk Sp) – sped up faster (till 2005) than
off-chip communication (with memory) as feature sizes shrank
• Gap filled by transistor budgets on caches - filled the mismatch till 2005
• caches, deep-pipelining with bypasses, superscalar -- burned power to keep
the illusion in tact till 2005 – Tipping point
16
Innovations in Architecture
Architecture is nothing but a way of thinking about programming
Edsgar Dijkstra
17
Where shall we search for the solution?
• Advances in Compilers
• Advances in Tools
18
Computer Architecture: Uniprocessors
• General-purpose unicores stopped historic
performance scaling
– Power consumption
– Wire delays
– DRAM access latency
– Diminishing returns of more instruction-level parallelism
19
20
Technology Trends: Microprocessor
Capacity
Moore’s Law
100,000,000 1000
10,000,000
R10000 100
Pentium
10
i80386
i80286 R3000
100,000 R2000
i8086 1
10,000
i8080
i4004
0.1
1,000
1970 1980 1990 2000
1970 1975 1980 1985 1990 1995 2000 2005
Year
Year
Nuclear
100
Reactor
interconnects
potential
L3 Cache L3 Cache L3 Cache
34
Parallel Computation Ideas
• Concepts -- looking at problems with “parallel eyes”
• Algorithms -- different resources; different goals
• Languages -- reduce control flow; increase
independence; new abstractions
• Hardware -- the challenge is communication, not
instruction execution
• Programming -- describe the computation without
saying it sequentially
• Practical wisdom about using parallelism
• Sequential programming a special case of Parallel
programming
35
A Naïve Parallel Execution
• Juggling -- event-based computation
• House construction -- parallel tasks, wiring
and plumbing performed at once
• Assembly line manufacture – pipelining, many
instances in process at once
• Call center -- independent tasks executed
simultaneously
• CRUX: Describe execution of tasks
36
Concurrent Vs Distributed
Programming
• Concurrency: Performance/Throughput
• Distributed: Partial Knowledge
Parallel Distributed
Goal Speed Partial
Knowledge
Interactions frequent Not frequent
Granularity Fine coarse
Reliable Assumed Not assumed
37
A simple task
• Adding an array of numbers A[0],…,A[n-1]
with sum
• Classical (Sequential):
(…((sum+A[0])+A[1])+…)+A[n-1]
• How do we execute in parallel?