Accessing Files
Chris Brown
In This Module
Unbuffered I/O Using the standard library
Sequential access Buffered I/O
Random access Formatted I/O
Advanced Techniques
Demonstration:
Scatter/gather I/O
Four ways to copy a file
Mapping files into memory
The Heart of the Matter
open()
close()
read()
write()
Unbuffered I/O
Short of crawling out over the disc with a
tiny magnet, these system calls are the
lowest level of input/output in Linux
Opening a File
Access mode of the file
Pathname of the (only used if the file is
file to be opened being created)
fd = open(name, flags, mode)
Must include one of:
O_RDONLY, O_WRONLY, O_RDWR
Returns lowest
available descriptor May include one or more of:
O_APPEND, O_CREAT, O_TRUNC
Standard Streams
fd = 1
fd = 0 stdout
stdin
Screen
Keyboard Program (tty)
fd = 2
stderr
Using and Combining Symbolic Constants
Some system calls accept flag arguments, specified using symbolic constants
Some are integer constants (1, 2, 3, 4, )
- These are mutually exclusive (you must specify exactly one)
Some are single-bit values, e.g.:
- #define O_CREAT 0100
These flags may be combined
#define O_TRUNC 01000
using a bitwise 'OR'
#define O_APPEND 02000
fd = open("foo", O_RDWR | O_TRUNC | O_APPEND);
Unbuffered Output
Data in memory
File
write(fd, buffer, count)
Returns the number of bytes
actually written (-1 on error)
Unbuffered Input
Data in memory
File
read(fd, buffer, count)
Returns the number of bytes
actually read (0 on end-of-file)
Closing a File
An open file
descriptor Descriptors are implicitly
closed when a process
close(fd) terminates
Closes the descriptor There is a finite limit on how
many descriptors a process
Makes it available for re-use can have open
Sequential Access
Beginning End
File Position Pointer
read(fd, buffer, 1200)
Sequential Access
Beginning End
read(fd, buffer, 1200);
read(fd, buffer, 600);
Sequential Access
Beginning End
read(fd, buffer, 1200);
read(fd, buffer, 600);
Random Access
The file position pointer may be explicitly repositioned:
lseek(fd, offset, whence)
File descriptor
Specifies where the offset is relative to:
Byte offset. May be
positive or negative. SEEK_SET Relative to start of file
SEEK_CUR Relative to current position
SEEK_END Relative to end of file
Random Access Examples
Before
lseek(fd, 100, SEEK_CUR);
After
Random Access Examples
Before
lseek(fd, 100, SEEK_SET);
After
Random Access Examples
Before
lseek(fd, -100, SEEK_END);
After
Random Access Examples
Before
"Hole" reads
back as zeros
lseek(fd, 100, SEEK_END);
After
Random Access Example
#include <unistd.h>
#include <fcntl.h>
struct record { /* Define a "record" */
int id;
char name[80];
};
void main()
{
int fd, size = sizeof(struct record);
struct record info;
fd = open("datafile", O_RDWR); /* Open for read/write */
Random Access Example
lseek(fd, size, SEEK_SET); /* Skip one record */
read(fd, &info, size); /* Read second record */
info.id = 99; /* Modify record */
lseek(fd, -size, SEEK_CUR); /* Backspace */
write(fd, &info, size); /* Write modified record */
close(fd);
}
File IO and the Standard C Library
The Standard C library also specifies file IO routines
Buffered
Available on any conformant "C" environment
Opening a File
Pathname of the
file to be opened
fd = fopen(name, mode)
Valid modes include:
"r" open text file for reading
Returns a descriptor "w" truncate and open for writing
of type FILE * "r+" open text file for update
(or NULL on error)
Append "b" to the mode for binary files
Output
File
fwrite(buffer, size, num, fd)
Number of FILE * descriptor as
Returns the number of returned from fopen()
elements actually written objects
Input
File
fread(buffer, size, num, fd)
Number of FILE * descriptor as
Returns the number of returned from fopen()
elements actually read objects
Closing a File
An open file
descriptor Descriptors are implicitly
closed when a process
fclose(fd) terminates
Closes the descriptor There is a finite limit on how
many descriptors a process
Flushes any buffered data can have open
So What's the Difference?
Feature Low-level IO Standard Library IO
Read/write access open(), close(), fopen(), fclose(),
read(), write() fread(), fwrite()
Random access lseek() fseek(), rewind()
Type of descriptor int FILE *
User-space buffering? No Yes
Part of C standard? No Yes
Formatted IO
printf() and friends
printf()
Generates a formatted string and writes it to standard output
char *name = "Sharon";
int age = 45;
double wage = 34500.00;
printf("%12s is %d and earns %f", name, age, wage);
printf()
Generates a formatted string and writes it to standard output
char *name = "Sharon";
int age = 45;
double wage = 34500.00;
printf("%12s is %d and earns %f", name, age, wage);
printf()
Generates a formatted string and writes it to standard output
char *name = "Sharon";
int age = 45;
double wage = 34500.00;
printf("%12s is %d and earns %f", name, age, wage);
Returns the number Other text is
of characters printed treated literally
printf() Format Codes
%d decimal integer
%8d right-justified in 8 character field
%-8d left justified
See "man 3 printf"
for the details
%s string
%12.3f double, in 12 character field with
3 digits after the decimal point
printf's Friends and Relations
fd = fopen();
fprintf(fd, "hello"); (Use stderr to write
an error message)
char[100] buf; Formats a string
sprintf(buf, "hello") into memory
Scatter/Gather IO
Read or write multiple buffers of data
in a single call
Atomic
readv() and writev()
Scatter/Gather IO
writev(fd, iov, iocount)
iov iov_base
iov_len
iov_base File
iocount iov_len
iov_base
iov_len
Mapping Files into Memory
mmap() maps a file into memory
and allows you to access it as if it
were an array
Mapping Files into Memory
Set this to NULL to allow the PROT_READ File descriptor
kernel to choose the address PROT_WRITE from open()
mmap(addr, length, prot, flags, fd, offset)
The length of the mapping
Offset within
MAP_SHARED the file
MAP_PRIVATE
Returns the address at which
the file has been mapped
Random Access Using mmap()
#include <sys/mman.h>
#include <fcntl.h>
#include <stdlib.h>
struct record {
int id; /* Define a "record" */
char name[80];
};
int main()
{
int fd;
size_t size;
struct record *records; /* Pointer to an array of records */
Random Access Using mmap()
fd = open("foo", O_RDWR);
size = lseek(fd, 0, SEEK_END); /* Get size of file */
records = (struct record *)mmap(NULL, size, PROT_READ | PROT_WRITE,
MAP_PRIVATE, fd, 0);
records[1].id = 99; /* Update record 1 */
msync(records, size, MS_SYNC); Map in the whole file,
} viewing it as an array
of records.
Copying a File Using mmap()
In-memory
buffers
mmap() mmap()
memcpy() msync()
Input file Output file
"foo" "bar"
src dst
Module Summary
open()
close()
Seeking and random access
read()
write() Buffered IO printf() and friends
Advanced topics:
scatter/gather and memory-mapped IO
The heart of File IO
Moving Forward
Coming up in the next module:
File-system management
files, inodes, links and directories