0% found this document useful (0 votes)

32K views

C Programming Handout Nonotes - 2

This document provides an overview of the C programming language. It discusses that C was invented in 1972-1973 and is closely tied to Unix and Linux operating systems. C gives developers more control than higher-level languages but less than assembly. It abstracts away from machine instructions and provides data types to catch errors. The document explores C syntax, semantics, data types like char, int, and float, and undefined behaviors that can cause bugs.

Uploaded by

bantyneer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32K views

C Programming Handout Nonotes - 2

Uploaded by

bantyneer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Hacking in C 2020

The C programming language

Thom Wiggers

1
Table of Contents

Introduction

Undefined behaviour

Abstracting away from bytes in memory

Integer representations

2
Table of Contents

Introduction

Undefined behaviour

Abstracting away from bytes in memory

Integer representations

3
The C programming language

• Invented by Dennis Ritchie in 1972–1973

• Not one of the first programming languages: ALGOL for example is
older.
– Another predecessor is B.
• Closely tied to the development of the Unix operating system
• Unix and Linux are mostly written in C
• Compilers are widely available for many, many, many platforms
• Still in development: latest release of standard is C18. Popular
versions are C99 and C11.
• Many compilers implement extensions, leading to versions such as
gnu18, gnu11.
• Default version in GCC gnu11

4
Programming for hardware

• Initially C was co-developed with Unix

• Unix is an operating system: main job is managing hardware
• C was used to replace the assembly code and implement new
sofware for Unix
• Writing code in assembly:
– Almost no abstraction
– Full control
– No types, no bounds checking: everything is just bytes to the
CPU
– Direct access to CPU and memory
– Choice of instructions, register allocation left to programmer
– Need to do everything from scratch for different CPUs
– If a microarchitecture is released with new features, may need
to re-implement parts of the code

5
Comparing C to assembly code

• C takes away some control from the programmer

• C is portable (in theory)
– In practice, different compilers may not be fully compatible
(Microsoft vs GCC)
– You need to stay away from hardware-specific features and
hardware-specific assumptions
– You need to stay away from implementation-defined behaviour
(turn on at least -pedantic on GCC)
• Compiler can translate C code to the target CPU
• Compiler can optimize code for you, for the target microarchitecture
• C still gives raw access to memory
• Gives you types to detect some errors, but lets you convert between
any of them, often even implicitly.

6
Comparing C to C++

• C++ was originally developed from C

• C++ is not a strict superset of C.
int *x = malloc(sizeof(int) *10); is valid C, but not C++!
In C++ you will need to cast the void* pointer (but you should use
new in C++).
• It is easy to write some code in C and then call it from C++ code,
however.
– Commonly used when high-performance code is written in C
and a nice-to-use wrapper is written in C++.
– Not restricted to C++, many languages have such a foreign
function interface to link to libraries compiled from C.
– For example: Numpy (Python) implements many core maths
operations in C for performance reasons.

7
Table of Contents

Introduction

Undefined behaviour

Abstracting away from bytes in memory

Integer representations

8
Syntax and semantics

Syntax of a programming language

• Spelling and grammar rules
• Defines the language of valid programs
• Syntax errors are caught by the compiler
• Classical example: forget a ; at the end of a line
Semantics of a programming language
• Defines the meaning of a valid program
• In many languages semantics are fully specified
• Runtime errors (exceptions) are part of the semantics
• C is not fully specified!

9
Implementation-defined behaviour

• Some behaviour is unspecified by the standard and was left to be

specified by the compiler.
• Reason: simplify compiler implementation and allow compilers to
optimize things better.
• Often such behaviour is also specific to the hardware that you’re
running the software on
• Examples:
– Order of subexpression evaluation: f(g(), h()).
– Sizes of types (more later)
– Signedness of char
– Number of bits in a byte
• For most of this course, we assume GCC 7+ on a 64-bit AMD64 cpu.

10
Undefined behaviour

• Certain specific actions are defined as undefined behaviour (UB)

• When a program reaches UB, one or more of the following may
happen
– Crash every time
– Crash 0.01% of the time
– Crash not when you test it, but only you use it as a library
– Delete everything on your hard drive
– Murder some puppies
– Light your house on fire
– All of the above, and still give the right result
• The existence of UB anywhere in your program makes the entire
thing meaningless!
– Reason: compilers make assumptions based on it not existing,
which may change the meaning of your program
• Often UB leads to exploitable security problems.

11
Examples of undefined behaviour

• Accessing memory out of bounds

• Reading uninitialized memory
• Division by zero
• Dereferencing a null pointer
• Signed integer overflow (INT_MAX +1)
• Left-shifting a signed integer ((-42) << 3)
• Shifting by more than the size of the type
(char x = 1; 1 << 100;)
• Returning nothing from a non-void function (int f() {})
Compilers can find some of these problems, but for weird reasons, those
warnings are often switched off by default! Make sure you enable -Wall
-Wextra when you compile code.

12
Table of Contents

Introduction

Undefined behaviour

Abstracting away from bytes in memory

Integer representations

13
Values

• A program typically applies operations to values (add, sub, mul)

• In assembly, you need to carefully manage where you store values
– Limited number of registers, often necessary to spill to stack
• The compiler takes care of that for you in higher-level languages
• When calling a function void f(int x) { x += 10;} as f(y) you
pass it the value of y.
– this is called call-by-value
– The compiler copies x if necessary
– Modifying the passed value in f won’t change it outside the
function: y=10; f(y); printf("y = %d\n", y); will still
print 10.

14
Addresses

• You can get the address of a variable using the & operator:
int a; &a
• You then obtain a pointer to a
• A pointer to a type is denoted as type*, e.g. int*, char*.
We will return to pointers later

15
Types

• The hardware only understands memory as a bunch of bytes that it

can perform certain operations on
• Bytes are sets of 8 bits
• For writing software, other types are helpful to help determine
semantics
– it’s helpful that a compiler gives an error when you call
strlen(3).
• You can program without really understanding how these types map
to bytes.
• But we can have more fun if we do know how it works

16
char

• The most elementary data type

• Almost anywhere exactly 1 byte (required by POSIX)
• Can be used to store characters: char a = '2';
• But char is an 8-bit integer type
• We can just assign any 8-bit integer value to char types.
char a = '2';
char b = 2;
char c = 50;
• In fact, a == c because ASCII character ’2’ is 50.
• Writing 'A' + 3 is perfectly valid and will result in 'D'.

17
Tricky char

How many times will the following line be printed?

for (char i = 42; i >= 0; i--) {
printf("Crypto stands for cryptography");
}
• Trick question! It is compiler-defined if char is signed (-128–127) or
unsigned (0–255).
• On amd64, char is signed, so it will terminate.
• On Aarch64 (64-bit ARMv8), char is unsigned, so it will loop forever.
• Always write signed char or unsigned char in portable software.

18
Integral types

• Other types that are important:

– short: at least two bytes
– int: typically 4 bytes (but sometimes only two bytes!)
– long: either 4 bytes or 8 bytes (different between Linux and
Windows!)
– long long: 8 bytes
• Each of these are in signed (default) and unsigned variants
• Find the size of a type: printf("%zu\n", sizeof(int));
• We can also do this via variable: int x; sizeof(x);
• We can write integer literals as:
– Decimal: 255
– Octal: 0377 (prefix 0)
– Hexadecimal: 0xFF (prefix 0x)

19
Other integer types

• There is a special integer type to indicate sizes: size_t

• For example returned by sizeof, expected as argument by malloc
• Pointers also have a specific size, 8 bytes on amd64

20
Better integer types

• All those varying byte sizes of int et al. make it hard to write
efficient portable code
• Solution: use fixed-size integer types defined by stdint.h
– uint8_t is an 8-bit unsigned integer
– int8_t is an 8-bit signed integer
– uint16_t is a 16-bit unsigned integer
– ...
– int64_t is a 64-bit signed integer

21
Floating-point and complex values

• C also defines 3 “real” types:

– float: usually 32-bit IEEE 754 “single-precision” floats
– double: usually 64-bit IEEE 754 “double-precision” floats
– long double:: usually 80-bit “extended precision” floats
• Corresponding “complex” types (need to include complex.h)
• This course: not much float hacking
• However, this is fun, see “What every computer scientist should
know about floating point arithmetic”
www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf
• Small example:
double a; /* assume IEEE 754 standard */
// snip
a += 6755399441055744;
a -= 6755399441055744;
• What does this code do to a?
• Answer: it rounds a according to the currently set rounding mode
22
Excursion: printf

printf is a function that prints something according to a format string.

#include <stdio.h>
printf("%d", a); /* prints signed integers in decimal */
printf("%u", b); /* prints unsigned integers in decimal */
printf("%x", c); /* prints integers in hexadecimal */
printf("%o", c); /* prints integers in octal */
printf("%lu", d); /* prints long unsigned integer in decimal */
printf("%llu", d); /* prints long long unsigned integer in decimal *
printf("%p", &d); /* prints pointers (in hexadecimal) */
printf("%f", e); /* prints single-precision floats */
printf("%lf", e); /* prints double-precision floats */
printf("%llf", e); /* prints extended-precision floats */
printf("%zu", f); /* prints a size_t as unsigned decimal*/
printf("%" PRIu8, g); /* prints a uint8_t */
printf("%" PRIu64, h); /* prints a uint64_t */
printf("%" PRId64, i); /* prints a int64_t */
printf("%" PRIx64, i); /* prints a (u)int64_t as hex */

23
Implicit type conversion

• Sometimes we want to evaluate expressions involving different types

• Example:
float pi, r, circ;
pi = 3.14159265;
circ = 2*pi*r;
• C uses complex rules to implicitly convert types
• Often these rules are perfectly intuitive:
– Convert “less precise” type to more precise type, preserve values
– Compute modulo 216 , when casting from uint32_t to
uint16_t
• However, these rules can be rather counterintuitive:
unsigned int a = 1;
int b = -1;
if(b < a) printf("all good\n");
else printf("WTF?\n");

24
Explicit casts

• Sometimes we need to convert explicitly

• Example: multiply two (32-bit) integers:
uint32_t a,b;
...
uint32_t r = a*b;
• By “default”, result of a*b has 32-bits; upper 32 bits are “lost”
• Fix by casting one (or both) factors:
uint64_t r = (uint64_t)a*b;
• Can also use this to, e.g., truncate floats:
float a = 3.14159265;
float c = (int) a;
printf("%f\n", trunc(a));
printf("%f\n", c);
• Careful, this does not generally work (undefined behavior ahead)!

25
A small quiz

What do you think this program will print?

unsigned char x = 128;
signed char y = x;
printf("The value of y is %d\n", y);

(Obviously, the answer is “undefined behavior” – it’s C after all)

26
Table of Contents

Introduction

Undefined behaviour

Abstracting away from bytes in memory

Integer representations

27
Two’s complement

• Can represent a signed integer as “sign + absolute value”

• Disadvantage: zero has two representations (0 and -0)
• Other idea: flip all bits in a to obtain -a.
• This is known as “ones complement”
• Still: zero has two representations
• Much more common: two’s complement
– flip all bits in a
– add 1
• Sanity test: a == -(-a)
• Range of k-bit signed integer: {−2k−1 , . . . , 2k−1 − 1}
• Example: signed (8-bit) byte: {−128, . . . , 127}
• Can use the same hardware for signed and unsigned addition

28
Endianess

• Let’s consider the 32-bit integer 287454020 =0x11223344

• How would you put it into memory. . . ,like this?:
| 11 | 22 | 33 | 44 |

0x0...0 0x0...1 0x0...2 0x0...3

• How about like this?

| 44 | 33 | 22 | 11 |

0x0...0 0x0...1 0x0...2 0x0...3

• What do you find more intuitive?

29
Endianess, let’s try again
P3
• Take 4-byte integer a = i=0 ai 2
8i

• The ai are the bytes of a

• How would you put it into memory. . . ,like this?:
| a0 | a1 | a2 | a3 |

0x0...0 0x0...1 0x0...2 0x0...3

• Or would you rather have this?

| a3 | a2 | a1 | a0 |

0x0...0 0x0...1 0x0...2 0x0...3

• Again a quick poll: What do you find more intuitive?

30
Endianess, the conclusion

• Least significant bytes at low addresses: little endian

• Most significant bytes at low addresses: big endian
• This is short for “little/big endian byte first”
• Most CPUs today use little endian
• Examples for big-endian CPUs:
– Classic PowerPC
– UltraSPARC
• ARM and POWER8 can switch endianess (is “bi-endian”); usually
used little-endian
• The problem with little-endian intuition is just that we write
left-to-right (but use Arabic numbers)
• Endianness wil become important again when we need to write
memory addresses later

31
Memory addresses

• On 32-bit x86 processors, addresses were 4 bytes.

• Current AMD64 processors support up to 248 bytes of memory
(256TiB)
– This means you need 6 bytes to represent 248 addresses
– 8 Bytes are used for addresses though.
I Upper 3 bytes are either in 0x000000...–0x00007f...,
or 0xffff80...–0xffffff....
I On Linux, the first is userspace and the second is
kernelspace
I 0x000080...–0xffff7f... are not used

32
Back to pointers

We can print the address of a variable:

int a = 4; /* https://xkcd.com/221/ */
int* a_ptr = &a;
printf("The value of the variable a = %d\n", a);
printf("The address of the variable a = %p\n", &a);
printf("The value of the variable a_ptr = %p\n", a_ptr);
printf("The value pointed to by a_ptr = %d\n", *a_ptr);
Output:
The value of the variable a = 4
The address of the variable a = 0x7ffd1be9fb8c
The value of the variable a_ptr = 0x7ffd1be9fb8c
The value pointed to by a_ptr = 4

Variable a is stored very high in the user-space memory, because int a

defines a stack variable.

33
Heap addresses

We can print the address of a variable:

int* a_ptr = malloc(sizeof(int));
*a_ptr = 4; /* https://xkcd.com/221/ */
printf("The value stored at a_ptr = %d\n", *a_ptr);
printf("The value of a_ptr = %p\n", a_ptr);
free(a_ptr); /* need to manually manage heap */
Output:

The value a = 4
The addr &a = 0x55b899d552a0

a_ptr is somewhere halfway user-space memory, as it is on the heap.

Note that we have been writing *a_ptr to dereference the pointer, to
get the value stored at the address!

Part List - 69NT40-541-001 To 199
100% (1)
Part List - 69NT40-541-001 To 199
85 pages
00191257-04 - JG - S-Feeder - EN X
No ratings yet
00191257-04 - JG - S-Feeder - EN X
70 pages
Semin Detailed Lesson Plan in CSS 11 Assembly and Disassembly
100% (9)
Semin Detailed Lesson Plan in CSS 11 Assembly and Disassembly
4 pages
The C Handbook1
100% (1)
The C Handbook1
52 pages
Intro to C
No ratings yet
Intro to C
51 pages
The C Programming Language Handbook
No ratings yet
The C Programming Language Handbook
78 pages
C Handbook
No ratings yet
C Handbook
60 pages
Software Project: Instructor: Roded Sharan, Roded@post - Tau.ac - Il TA: Dana Silverbush Course Materials: Virtual - Tau.ac - Il
No ratings yet
Software Project: Instructor: Roded Sharan, Roded@post - Tau.ac - Il TA: Dana Silverbush Course Materials: Virtual - Tau.ac - Il
33 pages
WK 1
No ratings yet
WK 1
36 pages
C Basics
No ratings yet
C Basics
59 pages
CS3 Tutorial
No ratings yet
CS3 Tutorial
390 pages
Gopi b.com CA c Notes Final
No ratings yet
Gopi b.com CA c Notes Final
84 pages
Intro To C - Module 2
No ratings yet
Intro To C - Module 2
9 pages
C Programming Unit I Notes
100% (1)
C Programming Unit I Notes
42 pages
6CS005Lecture1 89718
No ratings yet
6CS005Lecture1 89718
55 pages
C Programming Revison
No ratings yet
C Programming Revison
104 pages
UNIT-1 Part 1
No ratings yet
UNIT-1 Part 1
120 pages
Data Types
No ratings yet
Data Types
6 pages
Embedded-C-1
No ratings yet
Embedded-C-1
150 pages
Alevel C
No ratings yet
Alevel C
402 pages
C Language PDF
No ratings yet
C Language PDF
402 pages
A Quick Introduction To C Programming PDF
No ratings yet
A Quick Introduction To C Programming PDF
42 pages
C Tutorial
No ratings yet
C Tutorial
42 pages
Turbo C
No ratings yet
Turbo C
74 pages
Unit 1 Pds
No ratings yet
Unit 1 Pds
190 pages
02-CP Lab Manual PDF
No ratings yet
02-CP Lab Manual PDF
71 pages
Unit1 CIntro
No ratings yet
Unit1 CIntro
51 pages
C Notes DJG
No ratings yet
C Notes DJG
234 pages
4.1 C Programming For Embedded
No ratings yet
4.1 C Programming For Embedded
126 pages
C Programming 1 Semester 2017-2018
No ratings yet
C Programming 1 Semester 2017-2018
46 pages
Revision C - Intro
No ratings yet
Revision C - Intro
73 pages
Synapseindia Dot Net Development - Programming Overview
No ratings yet
Synapseindia Dot Net Development - Programming Overview
27 pages
CSPP50101-1 Introduction To Programming: Professor: Andrew Siegel
No ratings yet
CSPP50101-1 Introduction To Programming: Professor: Andrew Siegel
77 pages
Unit 3.docx
No ratings yet
Unit 3.docx
20 pages
C Tutorial
No ratings yet
C Tutorial
45 pages
Unit-II Introduction To C Programming
No ratings yet
Unit-II Introduction To C Programming
125 pages
Introduction To Computers and Programming (CSC103)
No ratings yet
Introduction To Computers and Programming (CSC103)
24 pages
C Syllabus
No ratings yet
C Syllabus
1 page
C Is A General
No ratings yet
C Is A General
25 pages
1 Introduction
No ratings yet
1 Introduction
62 pages
c notes
No ratings yet
c notes
13 pages
DOC-20231025-WA0005
No ratings yet
DOC-20231025-WA0005
157 pages
Character Set & Identifiers
No ratings yet
Character Set & Identifiers
30 pages
Computer Programming
No ratings yet
Computer Programming
63 pages
Introduction To C: Basic Training
No ratings yet
Introduction To C: Basic Training
29 pages
Essential C
No ratings yet
Essential C
45 pages
C Essentials
No ratings yet
C Essentials
45 pages
Essential C: C Programming Language (Below), As Part of Their Research at AT&T. Unix and C++
No ratings yet
Essential C: C Programming Language (Below), As Part of Their Research at AT&T. Unix and C++
45 pages
Exercises
No ratings yet
Exercises
13 pages
Unit-1 Basics of C' Programming
No ratings yet
Unit-1 Basics of C' Programming
52 pages
Master C Langauge Ebook PDF
No ratings yet
Master C Langauge Ebook PDF
131 pages
C Language 100 Questions Answers
No ratings yet
C Language 100 Questions Answers
122 pages
TheBasicsofCProgramming Draft 20131030
No ratings yet
TheBasicsofCProgramming Draft 20131030
122 pages
WINSEM2021-22 BCSE102L TH VL2021220504672 2022-02-22 Reference-Material-I
No ratings yet
WINSEM2021-22 BCSE102L TH VL2021220504672 2022-02-22 Reference-Material-I
84 pages
Chapter 1-1
No ratings yet
Chapter 1-1
49 pages
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
C Programming for the Pc the Mac and the Arduino Microcontroller System
From Everand
C Programming for the Pc the Mac and the Arduino Microcontroller System
Peter D Minns
No ratings yet
Coding In C Decoded: Decoded, #1
From Everand
Coding In C Decoded: Decoded, #1
D Brown
No ratings yet
C in 30 Pages
From Everand
C in 30 Pages
U.Q. Magnusson
4.5/5 (2)
C++ Learn in 24 Hours
From Everand
C++ Learn in 24 Hours
Alex Nordeen
No ratings yet
Dive Into Sea of C
From Everand
Dive Into Sea of C
M Ashok
No ratings yet
C++ Essentials
From Everand
C++ Essentials
Zoe Codewell
No ratings yet
Clippers and Clampers
No ratings yet
Clippers and Clampers
6 pages
BSC11
No ratings yet
BSC11
2 pages
Good Blood Sugar Leval
No ratings yet
Good Blood Sugar Leval
2 pages
Kevin Hiteshkumar Chavda: Career Objective
No ratings yet
Kevin Hiteshkumar Chavda: Career Objective
3 pages
Vendor: Cisco Exam Code: 300-115 Exam Name: Implementing Cisco IP Switched Networks (SWITCH) Question 101 - Question 150
No ratings yet
Vendor: Cisco Exam Code: 300-115 Exam Name: Implementing Cisco IP Switched Networks (SWITCH) Question 101 - Question 150
45 pages
PassLeader 300-115 Exam Dumps (151-200) PDF
No ratings yet
PassLeader 300-115 Exam Dumps (151-200) PDF
16 pages
CH 4 - ISM
No ratings yet
CH 4 - ISM
23 pages
Tms 320 DM 6467
No ratings yet
Tms 320 DM 6467
355 pages
Mitsubishi: AJ65SBT-RPT Type CC-Link System Repeater (T-Branch) Module
No ratings yet
Mitsubishi: AJ65SBT-RPT Type CC-Link System Repeater (T-Branch) Module
4 pages
1st Edited Project Proposal For Library Management System
No ratings yet
1st Edited Project Proposal For Library Management System
8 pages
Tron Removed Files
No ratings yet
Tron Removed Files
1,544 pages
VxWorks Programmers Guide5.5
100% (2)
VxWorks Programmers Guide5.5
539 pages
Snap Ads - Creative Guidelines & Specs - 11.16
No ratings yet
Snap Ads - Creative Guidelines & Specs - 11.16
2 pages
Thiết kế bộ so sánh 2 số nhị phân 4 bit
No ratings yet
Thiết kế bộ so sánh 2 số nhị phân 4 bit
10 pages
Experiment 3 ACA PDF
No ratings yet
Experiment 3 ACA PDF
11 pages
Fundamental Solution Enabler
No ratings yet
Fundamental Solution Enabler
8 pages
Python Course Book
No ratings yet
Python Course Book
219 pages
Comptia A+ Certification: Trade Association
No ratings yet
Comptia A+ Certification: Trade Association
8 pages
Avnd2300ah Avnd4000ah DB SL GB 14-07-02 HK2052 01
No ratings yet
Avnd2300ah Avnd4000ah DB SL GB 14-07-02 HK2052 01
1 page
Ccnpv7 Route Lab1-1 Ripng Student 25673
No ratings yet
Ccnpv7 Route Lab1-1 Ripng Student 25673
9 pages
FRP
No ratings yet
FRP
463 pages
Current Conveyor
No ratings yet
Current Conveyor
7 pages
Huawei Videoconferencing HD Endpoint TE60 Datasheet - 20140221
No ratings yet
Huawei Videoconferencing HD Endpoint TE60 Datasheet - 20140221
2 pages
Instruction Set of 8086 Microprocessor
No ratings yet
Instruction Set of 8086 Microprocessor
95 pages
SQLLoader Examples
No ratings yet
SQLLoader Examples
5 pages
5008 Katana 17 B13VEK-294XES
No ratings yet
5008 Katana 17 B13VEK-294XES
1 page
TPH Sog Sol Exe Eaw All Found CBPXXXX A 0 English
No ratings yet
TPH Sog Sol Exe Eaw All Found CBPXXXX A 0 English
29 pages
Kyocera_DS_Dolphin_FS1320D_VIEW
No ratings yet
Kyocera_DS_Dolphin_FS1320D_VIEW
2 pages
MX-0808-PP-POH MX-0808-PP-POH-Custom: Instruction Manual
No ratings yet
MX-0808-PP-POH MX-0808-PP-POH-Custom: Instruction Manual
28 pages
CMP 222 Computer Circuits
No ratings yet
CMP 222 Computer Circuits
94 pages
Chapter 2 Motherboard
No ratings yet
Chapter 2 Motherboard
102 pages
Fpga Exercises Compilation
0% (1)
Fpga Exercises Compilation
48 pages
Industrial Electrician Tqwe
No ratings yet
Industrial Electrician Tqwe
9 pages