Lnotes Linux
Lnotes Linux
Lnotes Linux
File Structures
A file is a collection of data stored on mass storage
(e.g., disk or tape)
Why on mass storage?
too big to fit in main memory
share data between programs
backup (disks and tapes are less volatile than main
memory)
The data is subdivided into records (e.g., student information).
Each record contains a number of fields (e.g., name,
GPA).
One (or more) field is the key field (e.g., name).
Issue: how to organize the records on the mass storage
to provide convenient access for the user?
We will discuss sequential files, indexed files, and hashed
files.
Sequential Files
Records are conceptually organized in a sequential list
and can only be accessed sequentially.
The actual storage might or might not be sequential:
On a tape, it usually is.
On a disk, it might be distributed across sectors and
the operating system would use a linked list of sectors to provide the illusion of sequentiality.
Convenient way to batch (group together) a number of
updates:
Store the file in sorted order of key field.
Sort the updates in increasing order of key field.
Scan through the file once, doing each update in
order as the matching record is reached.
Not a convenient organization for accessing a particular record quickly.
Indexed Files
Sequential search is even slower on disk/tape than in
main memory. Try to improve performance using more
sophisticated data structures.
An index for a file is a list of key field values occurring
in the file along with the address of the corresponding
record in the mass storage.
Typically the key field is much smaller than the entire
record, so the index will fit in main memory.
The index can be organized as a list, a search tree, a
hash table, etc. To find a particular record:
Search the index for the desired key.
When the search returns the index entry, extract the
records address on mass storage.
Access the mass storage at the given address to get
the desired record.
Multiple indexes, one per key field, allow searches based
on different fields.
Hashed Files
An alternative to storing the index as a hash table is to
not have an index at all.
Instead, hash on the key to find the address of the desired record and use open addressing to resolve collisions.
The usual hashing considerations arise.
Databases
A database is a collection of data in mass storage that
can
take on a variety of appearances and
can be used by a variety of applications.
Example: Collection of student records can be viewed
as a database to be used by:
payroll
mailing out report cards
preparing tuition bills
etc.
The advantages of consolidating the data:
saves space
saves duplication of effort to enter, update or correct
information
centralized control within the organization
Database Integrity
Data in a database is typically
long-lived and
of crucial importance to the organization.
Thus it must not get corrupted.
Data can be corrupted if several different programs (or
transactions) accessing the database at the same time.
Example of corrupted data:
T1 transfers $100 from account
to account .
DB Serializability
To prevent transactions from interfering with each other,
the DBMS should provide the illusion that each transaction runs in isolation.
This property is called serializability.
The DMBS does not have to (and should not) actually
make the transactions run serially, but if there is a potential conflict, the DBMS must take steps.
One solution is two-phase locking:
Before accessing any data item, the transaction must
obtain a lock for every data item it plans to access.
Only one transaction at a time can have a lock on
the same data item.
If another transaction already has the lock, then the
first one must wait.
After accessing all the data items, transaction releases all its locks.
Artificial Intelligence
Goal: Develop machines that communicate with their
environment through traditionally human sensory means,
such as
vision
speech recognition
and proceed intelligently without human intervention, e.g.,
planning
expert systems
reasoning
Distinct but related goals:
1. trying to make machines actually intelligent (whatever that would mean),
2. improving technology,
3. understanding how the human mind works by trying
to model it
8-Puzzle Example
Given a 3-by-3 box that holds 8 tiles, numbered 1 through
8. One tile is missing. The goal is to start with the tiles
scrambled and move them around so that they are in
order:
1
Computer Vision
It is not enough to simply store the image obtained
from the camera. The program must be able to understand the image:
figure out which parts of the image are the salient
objects, called feature extraction
and then recognize the objects by comparing them
to known symbols, called feature evaluation.
For the 8-puzzle, this problem can be highly simplified:
always expect the digits to be the same size (by
holding the box at a constant distance from the camera)
same perspective
small set of different images to be handled (8 numbers and blank)
no obstruction (one object overlapping another)
But in general this is a very difficult problem and one
where there has been extensive research.
Reasoning
How can the program solve the puzzle?
One solution is to preprogram solutions, i.e., look up
the solution in a table. For example, if the input is
1
Production Systems
Instead, have the program figure out the solution. One
approach is the production system model.
First, consider the state graph of the problem:
Every possible state of the system is a node.
Draw an arrow from one node to another if a single
move (or production, or rule) takes you from one
state to the other.
Here is a tiny piece of the state graph for the 8-puzzle:
1
Breadth-First Search
Build the search tree in a breadth-first manner:
The root is the start state.
The next level is all states reachable from the start
state with a single production.
The next level is all states reachable from states in
the first level with a single production. Etc.
For example:
1
Depth-First Search
Another approach is to search the state space depth
first, instead of breadth first.
Pursue more promising paths to greater depths and
consider other options only if the original choices turn
out to be false leads.
To implement this idea, we need some criterion to decide which paths are promising, or appear to be promising.
Such criteria are called heuristics. A heuristic is a rule
of thumb for the program.
We need something quantitative so we can compare
different choices and choose the best.
5
8
6
3
Socrates is a man.
All men are mortal.
Socrates is mortal.
is squeezed
T(n)
(a)
(b)
(c)
L(u)
n
On input
least
, running time is at
.
Thus
now precisely identified as
within constant factors).
(to
The Class P
All problems (not algorithms) whose time complexity
is at most some polynomial are said to be in the class
P (P for polynomial).
Example: Sorting is in P, since
.
is less than
is larger than
NP-Complete Problems
There is an important class of problems that might or
might not be in P nobody knows!
These problems are called NP-complete.
These problems have the following characteristic:
A candidate solution can be verified in polynomial
time as being a real solution or not.
However, there are an exponential number of candidate solutions.
Many real-world problems in science, math, engineering, operations research, etc. are NP-complete.
P vs. NP
Imagine an (unrealistically) powerful model of computation in which the computer first makes a lucky guess
(a nondeterministic choice) as to a candidate solution
in constant time, and then behaves as an ordinary computer and verifies the solution.
Problems solvable on this computer in polynomial time
are in the class NP (nondeterministic polynomial).
NP includes all the NP-complete problems.
Having polynomial running time on this funny computer would not seem to ensure polynomial running
time on a real computer.
That is, it seems likely that NP is a strictly larger class
of problems than P, and that the NP-complete problems
cannot be solved in polynomial time.
Computability Theory
Complexity theory focuses on how expensive it is to
solve various problems.
Computability theory focuses on which problems are
solvable at all by a computer (i.e., with an algorithm),
regardless of how expensive a solution might be.
We will focus on computing (mathematical) functions,
with inputs and outputs.
We would like to know if there exist functions that are
so complicated that no algorithm can compute them.
Church-Turing Thesis
First, we have to decide what constitutes an algorithm.
Assembly languages have restricted sets of primitives.
High-level languages have a wider choice of primitives.
Whats to say you couldnt have some language with
very powerful primitives?
Church-Turing thesis: (thesis means conjecture)
Anything that can reasonably be considered an algorithm can be represented as a Turing machine.
A Turing machine is a very abstract, yet low-level, model
of computation.
Every actual programming language is equivalent, in
computational power, to the Turing machine model.
Thus, for theoretical purposes, the choice of programming language is irrelevant.
Computing Functions
Some sample functions:
: very easy to compute, always return 3,
no matter what the input is
An Uncomputable Function
Define a function
1.
is the input
2. run program
3. let
4. if
as a subroutine on input
then return 0
,
does
cannot exist.