2021 Lecture Notes MAT1330

University of Ottawa
Lecture Notes
MAT 1300: Calculus I for the Life Sciences
Prof : Monica Nevins Fall 2021
These notes are for students registered in MAT1330.
Last update: August 19, 2021

Contents
1 Preface 2
1.1 How to use these notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 How to Succeed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Fundamental Skills 5
2.1 Mathematical language (new) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Sums and the geometric series formula (new) . . . . . . . . . . . . . . . . . . 7
2.2 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Parentheses and the order of operations . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Powers and exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.4 Fractions and rationalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.5 Polynomials: factoring and long division . . . . . . . . . . . . . . . . . . . . . 19
2.3 Inequalities and absolute values (new) . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Solving inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.2 Absolute values: how to handle them and how to solve equations . . . . . . . 27
2.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.4.1 Potential characteristics of functions . . . . . . . . . . . . . . . . . . . . . . . 31
2.4.2 Polynomial functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4.3 Rational functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.4 Root or radical functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.5 Absolute value function (new) . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4.6 Trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4.7 Exponential and logarithmic functions . . . . . . . . . . . . . . . . . . . . . . 45
2.4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3 Discrete Time Dynamical Systems (DTDS) 48

3.1 Modeling in the Life Sciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 First examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.1 Bacterial growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.2 Definition of a DTDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.3 Solutions of a DTDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.1 Fixed points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.2 General solution formula for a linear DTDS . . . . . . . . . . . . . . . . . . 54
3.4 Behaviour of general DTDS: cobwebbing . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Steady states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
i
3.5.1 Examples: finding equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5.2 Stability of equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.6 Stability in linear models: a theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.7 Stability in nonlinear models: examples . . . . . . . . . . . . . . . . . . . . . . . . . 68
4 Limits and the path to Calculus 74

4.1 Limits of functions: the concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.2 Evaluating limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Algebraic limit laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.4 Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.5 Back to finding limits: the nice case . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.6 Finding limits: methods for the trickier cases . . . . . . . . . . . . . . . . . . . . . . 85
4.7 Discontinuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.8 Limits involving infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.8.1 Limits that diverge to infinity: vertical asymptotes . . . . . . . . . . . . . . . 90
4.8.2 Limits as x goes to ∞: horizontal asymptotes and long-term behaviour of
functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.8.3 Methods for finding limits as x → ±∞ . . . . . . . . . . . . . . . . . . . . . . 93
5 The Derivative 96
5.1 The definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.1.1 Rate of change: the idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.1.2 Average rate of change (over an interval) . . . . . . . . . . . . . . . . . . . . 97
5.1.3 Instantaneous rate of change . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.2 Examples of using the definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.3 Five ways not to be differentiable at x . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.4 What f 0 tells you about f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.5 Differentation Rules: The basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.5.1 Why the power rule is true . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.5.2 Why derivative is linear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.5.3 Why the product rule is true . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.5.4 Why the chain rule is true . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.5.5 Why the quotient rule is true . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.6 Derivatives of exponential functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.7 Derivatives of logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.8 Derivatives of functions like f (x)g(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.9 Implicit differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.9.1 More examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.10 Derivatives of sine and cosine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.10.1 Geometric arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.11 Derivatives of other trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . 125
5.12 Inverse trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.12.1 The inverse sine function, arcsin(x) or sin−1 (x) . . . . . . . . . . . . . . . . . 126
5.12.2 The inverse tangent function arctan(x) = tan−1 (x) . . . . . . . . . . . . . . . 129
5.12.3 The remaining inverse trig functions . . . . . . . . . . . . . . . . . . . . . . . 130
5.13 Summary of known derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
ii
6 Applications of the Derivative 133
6.1 The first derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2 The second derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.3 Graphing functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.4 Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.4.1 Local extrema : First and second derivative tests . . . . . . . . . . . . . . . . 148
6.4.2 Methods for finding local extrema . . . . . . . . . . . . . . . . . . . . . . . . 148
6.4.3 Global Extrema and the Extreme Value Theorem . . . . . . . . . . . . . . . . 150
6.4.4 Method for finding global extrema . . . . . . . . . . . . . . . . . . . . . . . . 151
6.5 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.5.1 Maximization with trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.5.2 Areas and volumes: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6.5.3 Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.5.4 Maximize yield in a DTDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.5.5 Minimal perimeter for area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.5.6 Some more examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.6 L’Hôpital’s rule for finding limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.6.1 Recall: Algebra of limits with infinity . . . . . . . . . . . . . . . . . . . . . . 165
6.6.2 Recall: evaluating limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.6.3 Indeterminate forms and L’Hôpital’s rule . . . . . . . . . . . . . . . . . . . . 166
6.6.4 Product and difference indeterminate forms . . . . . . . . . . . . . . . . . . . 167
6.6.5 Exponential indeterminate forms . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.6.6 Graphing even more complex functions . . . . . . . . . . . . . . . . . . . . . 172
6.7 Approximating functions with polynomials . . . . . . . . . . . . . . . . . . . . . . . . 176
6.7.1 Estimating a function using a secant line . . . . . . . . . . . . . . . . . . . . 177
6.7.2 Linear approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.7.3 Taylor polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.7.4 More examples of Taylor approximations . . . . . . . . . . . . . . . . . . . . 183
6.7.5 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.7.6 Applications of the Mean Value Theorem (optional) . . . . . . . . . . . . . . 185
6.7.7 Proof of Rolle’s theorem and advanced applications of the Mean Value The-
orem (optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.8 Stability of Discrete Time Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . 187
6.8.1 Stability of linear DTDS (recall) . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.8.2 Stability of general DTDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.8.3 Example: Allee effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
6.8.4 Logistic growth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
6.8.5 The Ricker equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.8.6 Harvesting and optimization : DTDS . . . . . . . . . . . . . . . . . . . . . . 194
6.9 The Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.9.1 Classic solution: Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . 197
6.9.2 Sophisticated solution: Newton’s Method . . . . . . . . . . . . . . . . . . . . 198
6.9.3 Discussion: So why does it work? Can it fail? . . . . . . . . . . . . . . . . . . 200
iii
MAT 1330 : Fall 2020
7 Integration 201
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7.1.1 Motivation for Differential Equations . . . . . . . . . . . . . . . . . . . . . . . 201
7.1.2 Could there be more than one anti-derivative satisfying a given initial condition?203
7.1.3 Anti-differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
7.1.4 Applications of Anti-differentiation . . . . . . . . . . . . . . . . . . . . . . . . 208
7.2 Techniques of integration: Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . 211
7.2.1 The method of substitution: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
7.2.2 Other situations where you might try substitution . . . . . . . . . . . . . . . 214
7.2.3 Trying a substitution in the hopes of simplifying a complicated integrand . . 215
7.2.4 Tips on substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
7.2.5 Two examples where substitution is not enough . . . . . . . . . . . . . . . . . 219
7.3 Techniques of integration : Integration by Parts . . . . . . . . . . . . . . . . . . . . . 220
7.3.1 Method of integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.3.2 Applying by parts more than once: two different kinds of examples . . . . . . 224
7.3.3 Tips for integration by parts . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7.4 Mixed examples, and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7.4.1 More examples with integration by parts and substitution . . . . . . . . . . . 226
7.4.2 Application examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7.4.3 Integrals that we still can’t solve . . . . . . . . . . . . . . . . . . . . . . . . . 229
7.5 Definite Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
A Solutions to Selected Exercises 235
Index 241
1
Chapter 1
Preface
These lecture notes have been developed from multiple sources, including primarily the handwritten
notes of Dr. Frithjof Lutscher, from the University of Ottawa, Department of Mathematics and
Statistics, who originally developed MAT1330 : Calculus I for the Life Sciences.
For more details, and a different perspective, you can read the corresponding section of the text-
book by Adler and Lovric — a very good textbook with very few typos and excellent, illustrated
explanations and examples. The textbook also includes tons of exercises to practice and to hone
your skills.
I am also grateful to Dheera Venkatraman, who developed the online graphing tool FooPlot
(fooplot.com) which I have used to make most of the graphs in these notes. Elsewhere, I have
borrowed images from the internet and I have attempted to identify the source; please let me know
of errors or omissions.
1.1 How to use these notes
Please note the following features to these lecture notes. The distinctive colours are there to help
you spot different kinds of material more easily.
Definition 1.1.1. A mathematical definition is crucial: it is where we scintillate an abstract

concept and trap it in a single word or phrase. Read the definitions, read the examples and non-
examples, and then go back, until the word evokes the full range of the mathematical concept.
Theorem 1.1.2. A theorem (or lemma, or proposition, or corollary) is a result that usually has
the form “If Thing A is true, then Thing B is true.” It works like a short-cut: every time you
notice that Thing A is true, you can just jump ahead and know for sure that Thing B is true.
Example 1.1.3. Examples are essential — but remember to always step back afterward and try
to see the important parts: how did we start the problem, and what was the big step in the
solution?
2
MAT 1330 : Fall 2020 1.2. HOW TO SUCCEED
Exercise 1.1.4. Exercises in these notes are usually meant as triggers to keep you thinking about
the concepts just presented. Some solutions are included at the end of the notes.
As with all exercises: once you have read the solutions, the problem has lost most of its value as
a study tool for you — it’s important to graduate to solving problems on your own (including,
importantly, figuring out how to start). That’s why these notes are supplemented with:
The Course Guide, with one-page summaries of each lecture, and a list of problems from the
textbook, as well as problems from old assignments and exams;
A DGD Workbook, which you are encouraged to print off, with a subset of the above problems
that you absolutely should try in advance of the DGD each week, as the TA will solve them
there;
Assignments on Mobius, submitted online and graded automatically, developed specifically

for this course and all with full solutions.
Note. Sometimes, a box like this is used to summarize the important points after a complicated
discussion, or after a sequence of examples. In this case, the biggest message for success in a math
courses is:
Work problems regularly;
Be active: don’t just copy the steps of a solved problem — constantly ask yourself what you
should do next, and use the solution to hone your judgement.
The following marker End of lecture # 0
is used to approximately delineate the content of the course by lectures. These markers are hyper-
linked from the Index, at the end of the notes. Alternately, the Table of Contents, at the beginning
of the notes, has hyperlinks to let you jump directly to a particular section of the text.
1.2 How to Succeed
To succeed in this course, you need to do math problems. Organize your week and build time to
work on math problems into your calendar. The DGD workbook and the Mobius assignments are
tools to help motivate you to achieve the goal of five problems per day. The key to honing your
skills is to keep trying new problems rather than rehashing ones you’ve already seen. So turn to
the CourseGuide and the Textbook for more depth.
There are many resources available to you as part of this course; it is up to you to choose to use
them. The specific modalities of these in Fall 2020 will be as described in our course home on
Brightspace:
Brightspace The hub for our course, where you’ll find everything you need, and links to everything
on this list.
3
MAT 1330 : Fall 2020 1.2. HOW TO SUCCEED
Classes Two classes per week, each 80 minutes long. Theory and examples and a sense of the big
picture.
DGDs Once per week, 80 minutes, led by a graduate student Teaching Assistant (TA). Lots of
examples, smaller group size, more opportunities to ask questions.
Math Help Centre Open five days a week, staffed by graduate student TAs. If in-person: just
drop in. If on-line: book your same-day or next-day appointment online. Bring your exercises
(but not homework problems that you will subsequently submit for grades). The TAs will
help you learn how to solve math problems on your own. If your question is more theoretical
in nature, come to office hours instead.
Office Hours As scheduled. Come with your questions: theory, examples, exercises, etc.
Textbook Has an index at the back, and a good table of contents at the front, to help you find
things. Well-written, with lots of examples and excellent graphics. Many exercises have a
brief answer at the back of the book.
Homework We will have regular assignments using Mobius Assessments, an interactive platform
that you can use to measure your progress and see what topics you need to focus more
attention on. Always work the problems by hand, writing your steps logically, and then type
in the answers and see how you do. Don’t leave assignments to the last minute; working and
focussing on them is part of your learning process.
Peer help groups There are various mentorship and peer help groups on campus; this can be a
good way to learn and keep motivated. These groups are not overseen by the instructor and
are not part of the course — so please do confirm any rumours about course policies or test
content directly with your professor!
Friends Teaching, by trying to explain an example to a friend, can be the most effective way of
learning. Working with friends can also be very motivating — everyone’s in this together.
Note. In this course, we declare that you can do the assignments with friends or peer help groups,
but not with the help of a tutor (private or through the help centre) or instructor. In all cases, you
should ensure that you, personally, have done the work to get to the answer that you enter.
Midterms and exams are to be done individually, without the help of another person or computer
aids. We set out rules for each test, but note that the rules are not a game to play — any attempt
to undermine the academic integrity of a test, regardless of the outcome, is an act of academic
fraud, and has no place at the University.
4
Chapter 2
Fundamental Skills
We will review some algebra, trigonometry and functions in our first two modules to ensure we have
a common vocabulary and to preview things to come. Experience has shown that mastering
these pre-Calculus skills is 100% essential for success in Calculus. Bring your questions
to the drop-in center or office hours; we will be very happy to help.
This chapter includes far more material than what will be reviewed in a typical pair of classes! Our
classes will focus on the material that is probably new to you (as indicated) but you should use the
background as “warm-up.”
2.1 Mathematical language (new)
The beauty of math is it universality; and its universality comes from the precision of its language.
We will be practicing to speak and write mathematics in the course and so become fluent.
2.1.1 Sets
Sets of numbers come up everywhere: the domain of a function, the set of solutions to an equation,
or the set of solutions to an inequality.
Our notation is very precise and can be read out loud as a sentence. For example
S = x ∈ R | x2 > 5

is read as “S is the set of x in the real numbers such that x2 > 5”. We can parse this as
S |{z}
= { ∈
x |{z} R
|{z} | x2 > 5 } .
|{z} |{z}
is the set of in the real numbers such that
In general, when we write a set this way, you can expect it to have the form
S = {the kind of number x is|what characterizes the x in the set}
5
MAT 1330 : Fall 2020 2.1. MATHEMATICAL LANGUAGE (NEW)
or just
S = {x | condition on x to be in the set}
if it’s obvious from context what kind of number x should be. In Calculus, our sets are usually
in the real numbers R, but occasionally we might prefer the integers Z or the natural numbers
N = Z≥0 = {x ∈ Z | x ≥ 0}.
Sometimes we use : instead of | for “such that.”
Now suppose we are talking about subset of the real numbers R.
We can be fancier; for example

{x2 | 0 < x < 10}
is a way of saying “the set of all squares of numbers between 0 and 10”, and is shorthand for
{y ∈ R | y = x2 for some 0 < x < 10}.
Of course, if we think about this set, we might see it’s just a complicated way of saying
{y ∈ R | 0 < y < 100}.
This final form is an example of a very common kind of set called an interval .
Definition 2.1.1. Given two real numbers a < b, the closed interval from a to b is the set
[a, b] = {x ∈ R | a ≤ x ≤ b}.
If we want to exclude one or both endpoints, we replace the square bracket with a round bracket (or,
in French notation, with a square bracket facing the wrong waya ). That is,
(a, b) = {x ∈ R | a < x < b} =]a, b[

(a, b] = {x ∈ R | a < x ≤ b} =]a, b]
[a, b) = {x ∈ R | a ≤ x < b} = [a, b[.
a
This sometimes shows up in our Mobius homework.
Example 2.1.2. We have worked out that {x2 | 0 < x < 10} = (0, 100).
Note. So math is a precise language, yes, but not without its foibles. Notice that if we write just
(3, 4)
then we could mean the open interval 3 < x < 4, or we could mean the point in the cartesian plane
with coordinates x = 3 and y = 4.
Oh well. We just ran out of different symbols. The takeaway : when you see something like this,
read the whole sentence for context, and it should be clear.
Now we can combine intervals to describe a wide variety of sets.
6
Definition 2.1.3. The union of two or more intervals is the set of all points that are in at least
one of them. We write the union with the symbol ∪.
So for example
[−5, −3] ∪ [7, 19] = {x ∈ R | −5 ≤ x ≤ −3 or 7 ≤ x ≤ 19}
and
(−1, 0) ∪ (0, 1) = {x ∈ R | −1 < x < 1, x 6= 0}.
Finally, we need to talk about ∞.
Note. ∞ is not a number.
Infinity — both in terms of infinitely large and infinitesimally small — play a big role in Calculus.
You can say that Calculus was basically discovered because humanity desperately needed to find
a way to correctly talk about infinity. We’ll talk about this in great depth in Chapter 4; for now,
let’s just use it for intervals.
We think of ∞ as something larger than every real number. Thus
{x ∈ R | x > 5} = (5, ∞)
because we could translate that as saying 5 < x < ∞1 . Similarly,
{z ∈ R | z ≤ 19} = (−∞, 19],
since −∞ is something smaller than every real number. Note that [−∞, 19] is poor notation: we
don’t want to include −∞ in our set because we only want real numbers in our set2 .
Now, consider S = x ∈ R | x2 > 7 . We’ll talk about solving inequalities like this in more detail

in Section 2.3.1. You see that √ √

S = (−∞, − 7) ∪ ( 7, ∞),
using our ∪ notation for union√introduced earlier. That 2
√ is, for x to be bigger than 7, the number
x has to either be less than − 7, or else bigger than 7.
√ √
Exercise 2.1.4. Sketch/identify the following sets on a number line: S = (−∞, − 7) ∪ ( 7, ∞),
T = [−1, 1], U = {x ∈ Z | x2 ≤ 4}. (Recall that Z is the set of all integers.)
2.1.2 Sums and the geometric series formula (new)
You can write a long sum like

1 + 2 + 3 + · · · + 64;
1
Saying x < ∞ is totally redundant, so we never say it because it sounds silly.
2
If at this point you’re wondering about sets with things beyond real numbers, then (a) have you considered a
math major or minor? (b) famous mathematician John Conway created hyperreal numbers exactly to be able to
include them; unfortunately you can’t do Calculus with hyperreal numbers.
7
but it can get ambiguous. For example, if we write
1 + 2 + 4 + · · · + 64,
then you’d have to creatively think about how to fill in all the points in between. Fun for a game,
but not fun for science. So: we have a notation for that. We write
64
X
1 + 2 + 3 + · · · + 64 = i
i=1
whereas the second sum meant

6
X
1 + 2 + 4 + · · · + 64 = 2n .
n=0
Let’s decipher this symbol:

b
X
f (j).
j=a
First,
P
is the summation symbol ; it is there to signal you that this is about adding numbers
together3 .
Below and above the summation symbol are the limits of the sum (in this case, a and b, which
are integers, and a ≤ b) and the summation variable (in this case, j, but you can use any
letter you like, it’s just a pattern holder).
This variable j is a number that starts at a and then counts up (by integers) until it reaches
b. For each value, you get one term of your big sum.
After the summation symbol is the actual formula f (j) for the terms that you are summing.
They usually include the index of summation j, and we think of it as the pattern function.
Let’s do some short examples (that really don’t even need this fancy notation):
3
X
2j = 20 + 21 + 22 + 23 .
j=0
5
X
n2 = 12 + 22 + 32 + 42 + 52 .
n=1
6
X
(m2 − 10)xm = 6x4 + 15x5 + 26x6 .
m=4
3 P
Adding, as opposed to multiplying, or making a set; note that is the Greek capital letter Sigma, which is an
“s” for “sum.
8
A particularly interesting sum that we’ll encounter early in this course is the geometric series. For
any real number r, the expression
n
X
ri = 1 + r + r2 + r3 + · · · + rn
i=0
is called a geometric series. In any particular example, we could add it up directly; for example,
5
X
2i = 1 + 2 + 22 + 23 + 24 + 25 = 1 + 2 + 4 + 8 + 16 + 32 = 63,
i=0
and
4
X
3i = 1 + 3 + 32 + 33 + 34 = 1 + 3 + 9 + 27 + 81 = 121.
i=0
But there is a better way.
Theorem 2.1.5 (Geometric series formula). Let r be a real number with r 6= 1 and let t be a
positive integer. Then
1 − rt
1 + r + r2 + r3 + · · · + rt−1 = .
1−r
If r = 1, then the sum on the left is just equal to t.
We could have written this theorem more compactly as

t−1
X 1 − rt
ri = .
1−r
i=0
Notice how the sum goes to t − 1 on the left but the formula on the right is about t.
Example 2.1.6. With r = 2 and t = 6, the formula is telling us the sum of 1+2+22 +23 +24 +25 =
6 −63
1 + 2 + 4 + 8 + 16 + 32; the answer is 1−2
1−2 = −1 = 63, which is what we got above.
1
Example 2.1.7. With r = 2 and t = 6, the formula says that
1
1 1 1 1 1 1 − 64
1+ + + + + = = 1.96875.
2 4 8 16 32 1 − 12
When t is large, the formula is A LOT faster.
But why does the magic formula work?
Proof. We use a clever trick. Call the sum S; that is,
S = 1 + r + r2 + · · · + rt−1 .
9
MAT 1330 : Fall 2020 2.2. ALGEBRA
First assume r 6= 1. Notice that if we multiply S by r, it almost looks the same:
rS = r + r2 + r3 + · · · + rt .
Now do the substraction S − rS:
S = 1 + r + r2 + · · · +
rt−1

−rS = −r − r2 − · · · −
rt−1

− rt
−−− −−−−−−−−−−−−
S − rS = 1 − rt .
1 − rt
Solving for S gives (1 − r)S = 1 − rt or S = , as desired.
1−r
Finally, note that if if r = 1, then the sum had t terms (because we start our powers at 0 and end
at t − 1) and each of them was equal to 1i = 1, so S = |1 + ·{z
· · + 1} = t.
t times
We’ll use the geometric series early in the course, but other than that we won’t use see summation
notation in MAT1330 — but it’s used a lot in Statistics, for example.
2.2 Algebra
Solid algebraic skills are essential — they are the rules of the language of mathematics, and you
will be using them in all your courses and any analytic or quantitative work you do in your field.
Let’s recap some rules that you need to know in order to do the exercises attached to this section
(and indeed, that you will be using throughout the course, without comment). These represent
simplication steps in a calculation that would typically be skipped on the blackboard.4
2.2.1 Parentheses and the order of operations
It may seem lame to say it, but:
Note. Above all, the proper use of parentheses and respect for the order of operations is essential.
4
For solutions to the problems in this section, come to office hours, or the math help centre.
10
Picture taken from reddit.

Sometimes, we put parentheses where they are not, strictly speaking, needed, just to be clear and
to avoid ambiguity.
As a favourite example, note that

2
3 2
6= 3 i.e. 2/3/4 looks ambiguous: use parentheses!
4 4
Why? Well:
2
3 2 1 2 1
= = =
4 3 4 12 6
whereas
2 4 8
3 =2 =
4
3 3
which is a totally different answer.
A typical place where this kind of mistake slips in is with rational functions:
Don’t
do this. The “equal sign” with the frown is wrong. Rewrite the first term as
x+3
(x + 1) instead (notice the flip of the denominator), which conveniently fits nicely on
x+2
two lines, unlike the original headache.
Another dangerous place where this crops up is when you know perfectly well the parentheses are
there, but you’ve only put them in mentally... and then forget:
Don’t do this. Those parentheses around x + 3 were necessary and the “equal sign” with the
frown is wrong. Use parentheses. Please.
11
2.2.2 Powers and exponentials
Recall that if n is a positive integer and a ∈ R then

an = |a × a ×
{z· · · × a} .
n times
Therefore, if n, m are two positive integers, we have

am an = (a
| ×a×
{z· · · × a}) × (a
| ×a×
{z· · · × a}) = |a × a ×
{z· · · × a} = a
m+n
,
m times n times m + n times
whereas
(am )n = a m m
| × a {z× · · · × am} = (a
| ×a×{z· · · × a}) × · · · × (a
| ×a×
mn
{z· · · × a}) = a .
n times m times m times
| {z }
n times
Now assume a > 0. Then these two rules continue to hold for exponents that are arbitrary real
numbers. For example we can say
√ 1
a1/2 = a since (a1/2 )2 = a 2 ×2 = a.
In general, for a > 0 and integers m and n (with n ≥ 2) we have

√ √
am/n = (am )1/n = n am = ( n a)m
and
1
a−m/n = .
am/n
Exercise 2.2.1. Give some examples of exponents m ∈ R that CANNOT be applied to a negative
base like a = −2. Give other examples of exponents that CAN be applied to a negative base. (What
about a = 0?) We default to a positive base a exactly because not all exponents are valid.
(To define ax for a real number x that is not a fraction, like x = π, requires the notion of limits
from Calculus (and a ≥ 0). But your calculator can approximate it very well.)
Let’s consider how to solve equations that are phrased in terms of exponents.
3
Example 2.2.2. To solve x3/5 = 8 for x, we can take both sides to the power 5/3 (since 5 × 53 = 1)
to get √
(x3/5 )5/3 = 85/3 ⇒ x = ( 8)5 = 25 = 32.
3
Example 2.2.3. To solve x4 = 64, we get our positive solution x as above:

√
(x4 )1/4 = 641/4 = 64.
4
Since 64 = 82 , √ √ √ √ √
4 4
64 = 82 = 8= 4 · 2 = 2 2.
√ √
Thus
√ x = 2 2 is one solution — the positive solution. But x = −2 2 is another solution, since
(−2 2)4 = (8)2 = 64 as well. These are the only solutions.
12
This example emphasizes that the rules of exponents are designed for a positive base, but sometimes,
there is a negative solution as well. You probably know perfectly well that an equation like x2 = 5
has two solutions, but beware: that’s only for even powers!
Example 2.2.4. To solve x3 = 8, we get our positive solution x as x = 81/3 = 2. This time, we
check that (−2)3 = −8 6= 8 — so −2 is not a solution.
On the other hand, when the variable is in the exponent, such as in the equation 3x = 25, we need
to use the logarithm function.
2.2.3 Logarithms
Recall that the logarithm of base a > 0 is the inverse function of the exponential: for x > 0
y = loga x if and only if ay = x. (2.1)
When a = 10, then we often just write log; when a = e (Euler’s number) then it’s called the natural
logarithm and we write ln. Of all the logarithm functions, it’s ln(x) that is the easiest to work with
in Calculus, but log(x) is very common in the natural sciences (e.g. measuring pH values).
Example 2.2.5. To evaluate log2 (8), we have to ask ourselves: is 8 a nice power of 2? Oh, yes, it
is: 8 = 23 , so log2 (8) = 3.
Example 2.2.6. To evaluate log2 (10), we ask ourselves: is 10 a nice power of 2? No, it really isn’t.
But a fancy calculator will tell you that the answer is about 3.321928....
Remark 2.2.7. You can check that the answer log2 (10) = 3.321928... makes sense (again using a
calculator) because:
23 = 8, 23.3 ' 9.85, 23.32 ' 9.987, 23.3219 ' 9.9998, · · · .
That is, remembering that 3.321928... is an infinite decimal, we’re noticing that as we take 2 to
powers which are more and more precise approximations of this decimal, we are getting an answer
that is closer and closer to 10. In the language of Calculus: as x approaches log2 (10), 2x approaches
10.
Another way to write the fundamental identity of logarithms (2.1) is as the following identities:
loga (ay ) = y — and similarly, log(10z ) = z, and ln(ex ) = x;
aloga (x) = x for all x > 0 — and similarly, 10log(x) = x and eln(x) = x, for all x > 0.
The restriction on x in the second identity is crucial.
Note. Since ex > 0 for all x, it makes no sense to ask for the value of ln(0) or ln(−5) — there are
no such values.
13
Example 2.2.8. The pH of a solution measures the concentration of hydrogen ions in a solution.
If the molar concentration is C = 10−k , then the pH is k. What is the formula for pH in terms of
C?
Solution: To isolate for an exponent, we can apply log (base 10) :

log10 (C) = log10 (10−k ) = −k.
Thus the pH k is obtained by multiplying both sides by −1. That is,
k = − log10 (C).
Exercise 2.2.9. (a) Which solution has a greater concentration of hydrogen ions, one with a pH
of k = 1 or one with a pH of k = 7? (b) The formula for k has a minus sign in it. Could it ever
happen that k comes out to be a negative number? Why or why not?
The rules for logarithms are the inverse of the rules for exponentials. Where exponentials take
sums to products and products to exponents, logarithms do the opposite.
Theorem 2.2.10 (Rules of logarithms). If a, x, y > 0 and t ∈ R then
loga (xy) = loga (x) + loga (y), loga (xt ) = t loga (x). (2.2)
In other words, taking the logarithm of an expression can make it simpler, which is part of the
reason they are so useful.
Proof. This proof gives us practice in manipulating logarithms. Suppose a, x, y > 0 and set
p = loga (x), and q = loga (y).
By definition, this means ap = x and aq = y. Therefore by the rules of exponents (Section 2.2.2)
we have
ap+q = ap aq = xy.
So by definition, loga (xy) = p + q. Putting this together gives the first statement.
For the second, let t ∈ R. Then by the rules of exponents

atp = apt = (ap )t = xt
which by definition gives loga (xt ) = tp = t loga (x).
Note. Because we’ll use natural logarithms most in this course, let’s write our identities out that
way:
ln(xy) = ln(x) + ln(y),
ln(xt ) = t ln(x)
and we might as well add

ln(x/y) = ln(x) − ln(y)
because ln(x/y) = ln(xy −1 ) = ln(x) + ln(y −1 ) = ln(x) + (−1) ln(y) = ln(x) − ln(y).
14
Example 2.2.11. Simplify ln(xey ).
Solution: ln(xey ) = ln(x) + ln(ey ) since products become sums. Then, since ln = loge , we note that
ln(ey ) = y. Thus we have
ln(xey ) = ln(x) + ln(ey ) = ln(x) + y.
When you have a variable in an exponent, you will very often use a logarithm to simplify the
expression. If you cannot get your terms over a common base, the default is to use ln(x).
Example 2.2.12. To solve 3x = 25, we could apply log3 to both sides to get
x = log3 (25).
This is, however, not very enlightening; I don’t have a log3 button on my calculator. Instead, let’s
use ln (which is what we’ll always do in this course, anyway):
ln(25)
3x = 25 ⇐⇒ ln(3x ) = ln(25) ⇐⇒ x ln(3) = ln(25) ⇐⇒ x= .
ln(3)
ln(25)
Note. In fact, log3 (25) = , or, more generally, for all a, b > 0
ln(3)
ln(b)
loga (b) =
ln(a)
which you can either memorize, or else remember how we got it (which is a lot more useful, since
it’s a “trick” we use a couple of times in this course):
ln(b)
x = loga (b) ⇐⇒ ax = b ⇐⇒ ln(ax ) = ln(b) ⇐⇒ x ln(a) = ln(b) ⇐⇒ x= .
ln(a)
Example 2.2.13. Solve 2x = 32x+1 .
Well, it’s a nightmare, but the nightmare is in the exponents, and both sides are positive, so we
carefully apply ln to both sides:
ln(2x ) = ln(32x+1 ) ⇐⇒ x ln(2) = (2x + 1) ln(3) ⇐⇒ x ln(2) = 2x ln(3) + ln(3).
At this point, your instinct might be to think that this expression is a lot worse: fight it! Notice
that ln(2) and ln(3) are just numbers, and we will treat them as such.5 So just as if the equation
5
We don’t want to replace them with their decimal approximations (0.69 and 1.10, respectively) because that
introduces round-off errors into our calculations, and our final answer would be (close but) wrong.
15
were the nicer-looking “5x = 2x(7) + 7 = 14x + 7”, we bring all the x terms to the left hand side
to get
x ln(2) − 2 ln(3)x = ln(3)

x(ln(2) − 2 ln(3)) = ln(3)
ln(3) ln(3)
x= = .
ln(2) − 2 ln(3) ln(2/9)
This answer is perfect.6
We made some side remarks in the preceding example that are worth pointing out.
Note. It is common to lose accuracy when you round of intermediate steps in a calculation. In
Science, you will keep track using significant figures, which is a good rule of thumb for tracking
measurement error. In Mathematics, you will avoid using decimals, or rounding off, as much as
possible (and when you can’t, you’ll keep many more decimals than you need for your intermediate
steps) — because in math, there is no measurement error. It’s a perfect world!
So that’s a lot of useful stuff about logarithms. But sometimes the best way to understand them
is by noticing things that are NOT true. We learn SO MUCH MORE from our mistakes than we
ever do by getting things right the first time.
Exercise 2.2.14. We want to solve ln(ex + 3) = 5. DANGER. Which of the following manipu-
lations are correct?
ln(ex + 3) = ln(ex ) + ln(3) = x + ln(3)
ln(ex + 3) = (x + 3) ln(e) = x + 3
ln(ex + 3) = 5 =⇒ ex + 3 = 5
ex + 3 = e5 ⇐⇒ ex = e5 − 3 ⇐⇒ x = 5 − ln(3)
Work this out, then check out your answers with the solutions at the end of the notes!
So many pitfalls to avoid! But actually not: it’s that the only true things are (2.1) (or its equivalent
form) and (2.2), and you have to stick to them like glue.
Exercise 2.2.15. Here are some problems to try, to test your facility with powers, exponentials
and logarithms. In all cases: go back and verify that your answer(s) is(are) correct — including
that they lie in the domain of the functions at hand. 7
6
But how can you check it? Well, using a calculator, x ' −0.7304, to a precision of 4 decimal places. Plugging
this value back for x in the left hand side (2x ) and right hand side (32x+1 ) of the equation gives the same answer
(0.603) to (only) three decimal places. Nice!
7
This is a precursor to our work later in the course, where we really have to be aware enough to say that “-5” is
not an acceptable answer if the question is about the number of turkeys in a field, for example.
16
√
x1/2 y 5
simplify x5 y 1/4
, where x, y > 0;
√ √
( x 3 y)−1/2
simplify √
3 2 where x, y > 0;
x y
solve for x: 2x+3 = 162x−1 ;
solve for x: 52x+3 = 74x−1 ;

1 1 1
solve for x: + = ;
x y z
solve for x: log(x + 5) − log(x − 1) = log(x + 1);
solve for x: 2 ln(x) − ln(x + 4) = ln(2).
2.2.4 Fractions and rationalization
Simplifying fractions uses the rule:

a a a c ac
= ×1= × =
b b b c bc
for some nonzero number c. You sometimes instead use d = 1/c so that it reads like
a a a 1/d a/d
= ×1= × = .
b b b 1/d b/d
This is easy when a and b are numbers, but takes more deliberation when the numerator and
denominator are algebraic expressions, and it is often not immediately obvious when two are equiv-
alent.
e−x + 1 1
Example 2.2.16. We have that −x
= (1 + ex ) because
2e 2
e−x + 1 e−x + 1 ex e−x ex + ex e0 + ex

1
−x
= −x
· x
= −x x
= 0
= (1 + ex ).
2e 2e e 2e e 2e 2
Another kind of simplification comes from the rules of fraction addition:

a+b a b s s s
= + , (but remember: 6= + !!!).
d d d t+u t u
x+3 x 3 3
Example 2.2.17. We have that = + = 1 + = 1 + 3x−1 . (The first form is more
x x x x
convenient for graphing, but the latter is better for differentiating. Be flexible!)
17
x x
Example 2.2.18. Is = 1 + ? NO. No way. (Plot their graphs: the left side has a vertical
x+2 2
asymptote at x = −2 and the right side is a straight line!) Sums in denominators rarely disappear.
The correct simplification here uses a simple form of long division:
x x+2−2 (x + 2) − 2 x+2 2 2
= = = − =1− .
x+2 x+2 x+2 x+2 x+2 x+2
Note. The goal of simplification is to get a more tractable form of the expression in front of you,
one that is more suitable to the next step you need to do. So what “simplify” means often depends
on your next step. If it’s the last step: then “simplify” just means bringing it to an easily intelligible
from.
Rationalization is a technique based on the identity
(a + b)(a − b) = a2 − b2
and is useful when a2 is friendlier-looking than a (or b2 is friendlier than b).

1
Example 2.2.19. Rationalize √ .
3−2
√ √
Solution: We multiply the numerator and denominator by the conjugate of 3 − 2, which is 3 + 2,
to get √ √ √
1 1 3+2 3−2 3+2 √
√ =√ ×√ = √ √ = = −( 3 + 2).
3−2 3−2 3+2 ( 3 − 2)( 3 + 2) 3−4
With more complex expressions, there are two common mistakes to avoid.
Mistake #1: wrong expression for the conjugate.
Working too quickly, you might put a minus sign in a likely-looking place, instead of where it’s
needed.
Example 2.2.20. The conjugate of p

x− x2 + 2
is p p
x+ x2 + 2, NOT x − x2 − 2.
You can tell because the multiplication gives a mess instead of something simpler:
p p p p p
(x − x2 + 2)(x − x2 − 2) = x2 − x2 + 2 − x2 − 2 + x4 − 4
and this does not simplify further; nothing in sight is a perfect square.
Mistake #2: neglect of parentheses.
18
Example 2.2.21. Proper use of parentheses is important; for example

p p p 2
(x − x2 + 2)(x + x2 + 2) = x2 − x2 + 2 = x2 − (x2 + 2) = −2.
2 2
If you’d neglected the parentheses (in red), you might √ thought it was x − x + 2 = 2. But
√ have
you can check: plug x = 0 into the left side to get (− 2)( 2) = −2.
Once you have the conjugate, it can help simplify a sum or difference of square root functions in a
denominator.
Example 2.2.22. Using the preceding calculation, we have
√ !
1 1 x + x2 + 2 1 p
√ = √ √ = − (x + x2 + 2).
x − x2 + 2 x − x2 + 2 x + x2 + 2 2
√
We swept one detail under the rug here: in fact, ( x)2 = |x|, not x. It was OK here because
x2 + 2 > 0 for every x, so |x2 + 2| = x2 + 2. More on this in Section 2.3.2, below.
Exercise 2.2.23. Problems to try:
4 + k1
simplify 5 ;
k −2
z −1 + 3
simplify ;
z −2 + 2
1
rationalize the denominator √ ;
10 − 3
√ √
6− 8
simplify √ √ .
6+ 8
2.2.5 Polynomials: factoring and long division
(We will discuss polynomial functions in greater detail in Section 2.4.2.)
Definition 2.2.24. A polynomial is any expression of the form
a0 + a1 x + · · · + an xn
for some real numbers a0 , a1 , · · · , an , and some integer n ≥ 0. If an 6= 0, then n is called the degree
of the polynomial.
The polynomials of degree 0 are constants (boring); the polynomials of degree 1 are called linear
(very useful). Where things get exciting is when n ≥ 2.
19
Basics on quadratics
A degree-2 polynomial is called a quadratic. We solve
ax2 + bx + c = 0
using the quadratic formula √

b2 − 4ac
−b ±
x= .
2a
The solutions are called the roots of the polynomial. The number b2 −4ac is called its discriminant,
and it determines the number of solutions. That is, if b2 − 4ac > 0, we have two roots; if b2 = 4ac,
we have exactly one root; and if b2 − 4ac < 0, then we have no real roots. (We have to use complex
numbers to say more about this case, but that is a topic for MAT1332.)
We know how to factor quadratic polynomials. If we had two roots, r+ and r− then the factorisation
is
ax2 + bx + c = a(x − r+ )(x − r− ).
If there is only one root r, then it is a repeated root, and our factorisation is
ax2 + bx + c = a(x − r)2 .
If there are no real roots, then it does not factor. The most useful thing you might do in this case
(which you can do for any quadratic, whether it factors or not) is to complete the square:
b 2 b2

ax2 + bx + c = a x + +c− .
2a 4a
Applications of quadratics
Some equations don’t appear to be quadratics until you simplify — which is another reason to
always try to simplify an expression before doing the next step.
2
Example 2.2.25. Solve for x if x + x = 3.
Solution: we simplify by multiplying both sides by x, to get
x2 + 2 = 3x ⇐⇒ x2 − 3x + 2 = 0 ⇐⇒ (x − 1)(x − 2) = 0
so there are exactly two solutions, x = 1 and x = 2. We check this by plugging it back into the
ORIGINAL equation8 and yes, they are solutions.
Sometimes, you can recognize that your expression is a quadratic in a function of x.

Example 2.2.26. Solve e6x − e3x = 20.
Solution: Since (e3x )2 = e6x , we can set u = e3x and the equation becomes
u2 − u = 20 ⇐⇒ u2 − u − 20 = 0 ⇐⇒ (u − 5)(u + 4) = 0
8
Always check your answer in the original equation: we will see many examples in this course where we pick up
errant solutions from our intermediate steps that do not solve the original problem.
20
whose solutions are u = 5 and u = −4. But we want solutions for x. The equation u = 5 gives
e3x = 5 or 3x = ln(5) or x = 13 ln(5). But equation u = −4 gives e3x = −4, which has no solutions.
Therefore, we conclude that x = 13 ln(5) is the only solution to our original equation.
Cubics and more
When you have a polynomial of degree ≥ 3 (or higher) like
x3 − 7x + 6
then it can be impossible to find a formula for its roots.9 But (neat fact) if a polynomial has a
factorization with integer roots, those integers must divide the constant term.
So in this case, the possible roots are ±1, ±2, ±3, ±6. Plug in values until you find one root. In
this case, for example, you might find that x = 1 is a root, whence (x − 1) is a factor. Then use
long division
x2 + x − 6
x3

x−1 − 7x + 6
− x3 + x2
x2 − 7x
− x2 + x
− 6x + 6
6x − 6
0
to get
x3 − 7x + 6
= x2 + x − 6 = (x − 2)(x + 3)
x−1
so
x3 − 7x + 6 = (x − 1)(x − 2)(x + 3).
Of course, we might have found all these roots by guessing, and that would have brought us to the
same result (quicker!).
Example 2.2.27. Factor p(x) = x3 − x2 − x − 2.
Solution: Again, any integer roots need to divide −2. We try some values but p(1) 6= 0, p(−1) 6= 0,
... but p(2) = 8 − 4 − 2 − 2 = 0, so 2 is a root, so (x − 2) is a factor.
9
The technique of Newton’s method will give us roots, to any degree of precision, quickly — later in the course.
21
MAT 1330 : Fall 2020 2.3. INEQUALITIES AND ABSOLUTE VALUES (NEW)
We don’t find any other roots, so it is time to use long division.
x2 + x + 1
x3 − x2 − x − 2

x−2
− x3 + 2x2
x2 − x
− x2 + 2x
x−2
−x+2
0
Therefore, p(x) = x3 − x2 − x − 2 = (x − 2)(x2 + x + 1). Using the quadratic formula, we conclude

that
1 √ 1 √
p(x) = (x − 2)(x − (−1 + 3))(x − (−1 − 3)).
2 2
1
√
That is, we have 3 roots: 2 and ± 2 (−1 + 3).
Exercise 2.2.28. Problems to try; be flexible and in each case think about what you could do to
make the expression look friendlier. Be willing to try a couple of options!
1 1
solve for x: + 2 = 1;
x x
√
solve for m: m = m + 6;
4x
solve for x: = 3x;
1+x
solve for y: 4y − 3(2y ) + 2 = 0 (Hint: 4y = (2y )2 . Let z = 2y and thus make this one
complicated problem into two less complicated problems.)
factor x3 + 1000;
5 3
divide x3 + x2 + x + 3 by x + ;
4 2
find all values of k for which the quadratic x2 + 2kx + 9k − 8 = 0 has only one solution for x.
2.3 Inequalities and absolute values (new)
You have used inequalities and absolute values in high school, but often not to the extent and in
the abstraction that we’ll need in this course. Much of this section may therefore be new to you,
and we’ll spend time on this in class.
22
2.3.1 Solving inequalities
The important thing to remember:
Note. Multiplying an inequality by a negative value changes its direction. Remember,
2<3 ⇐⇒ −2 > − 3.
−3 −2 −1 0 1 2 3
because multiplication by a negative number acts by a reflection on the number line.
Example 2.3.1. Solve for x:

1
= 2.
x−2
5
We multiply both sides by x − 2, to get 1 = 2(x − 2) or 1 = 2x − 4 or 2x = 5 or x = . So this is
2
the unique solution.
Example 2.3.2. Solve for x:

1
< 2.
x−2
Wrong answer: 1 < 2(x − 2) = 2x − 4 so x > 52 . Why is it wrong? Let’s look at the graph!
y = 1/(x − 2) is in blue; y = 2 is in red. The vertical asymptote at x = 2 is marked in yellow and

the solution to the equality is x = 5/2 in green. The answer to the inequality y = 1/(x − 2)< 2
should be all x-values for which the blue graph lies below the red line: so all x < 2 and all
x > 5/2.
What went wrong?
Answer: We multiplied both sides of the inequality by x − 2, which is sometimes positive and
sometimes negative. So when x − 2 > 0 (meaning x > 2) then our reasoning holds, and we find
that the only values of x > 2 which satisfy the inequality are x > 5/2.
23
But when x − 2 < 0, meaning x < 2, then we have to change the direction of the inequality when
we multiply:
1
(x < 2) : < 2 ⇐⇒ 1 > 2(x − 2).
x−2 |{z}
because x − 2 < 0
We can then further simplify to deduce that the inequality in this case is equivalent to
5
1 > 2x − 4 ⇐⇒ 5 > 2x ⇐⇒ x< = 2.5.
2
What?! This is saying: when x < 2, the condition to satisfy the inequality is that x must be less
than 5/2 — which is always true. Therefore every x < 2 satisfies the condition.
Our answer: all x < 2 and all x > 5/2 : this is written as
(−∞, 2) ∪ (5/2, ∞)
which we read out loud as: the union of the open interval from negative infinity to 2 and the open
interval from 2.5 to positive infinity.
1
Example 2.3.3. Another way to solve < 2: avoid the problematic multiplication by
x−2
a variable term.
1 1
<2 ⇐⇒ −2<0
x−2 x−2
1 2(x − 2)
⇐⇒ − <0 common denominator
x−2 x−2
5 − 2x
⇐⇒ <0
x−2
2x − 5
⇐⇒ >0 multiplied both sides by −1
x−2
which holds only if the numerator and the denominator have the same sign. So we work out:
2x − 5 > 0 iff x > 5/2; and x − 2 > 0 iff x > 2. So both are positive if x > 5/2 since 5/2 > 2.
2x − 5 < 0 iff x < 5/2, and x − 2 < 0 iff x < 2, so both are negative iff x < 2, since 2 < 5/2.
This again gives

(−∞, 2) ∪ (5/2, ∞).
2x − 5
Another way to deduce the set of values x for which > 0 is to use a table, based on the
x−2
following principles:
the sign of a polynomial can only change at a root;
the sign of a product or quotient is the product of the signs.
24
Thus, noting that the numerator can only change sign at its root (2.5) and the denominator can
only change sign at its root (2), our table is
x x<2 2 < x < 2.5 x > 2.5

sign of 2x − 5 - - +
sign of x − 2 - + +
2x − 5
∴ sign of + - +
x−2
This kind of reasoning is solid and valid. If you are using it on a test, be sure to think it through,
rather than using some old memory of how the signs normally turn out — we will have to use this
reasoning for more complex functions, later.
Let’s do another example.
Example 2.3.4. Find all x for which

2
+ 5 < 3.
x
Solution: We want to isolate the x. We begin with
2 2 2
+5<3 ⇐⇒ < −2 ⇐⇒ >1
x x −2x
(where in this last step we divided both sides by −2, which is negative, so the inequality changed
direction). So our inequality is actually
−1
> 1.
x
Two cases:
If x > 0 then we can multiply both sides by x, to get −1 > x, or x < −1. But there are no positive
values of x which are less than −1! We conclude that there are no solutions arising from this case.
If x < 0, then we can still multiply both sides by x, but this time the direction of the inequality
changes. We thus get −1 < x. The negative values of x that satisfy the inequality are those between
−1 and 0.
Our total answer: the solution set is the interval (−1, 0) = {x ∈ R | −1 < x < 0}.
We can check our answer against reality by sketching the graph (for the original inequality).
25
y = 2/x + 5 is in blue; y = 3 is in red; the answer should be all x-values for which the blue graph
lies below the red line: so just the region between −1 and 0.
Example 2.3.5. Another way to solve the previous question:
2 2 2 2x 2 + 2x 2(x + 1)
+5<3 ⇐⇒ +2<0 ⇐⇒ + <0 ⇐⇒ <0 ⇐⇒ < 0.
x x x x x x
This fraction is < 0 iff the numerator and the denominator have opposite signs. If the numerator
is negative (so x < −1) then the denominator is forced to be negative, so that doesn’t work. If the
numerator is positive (so x > −1) and the denominator is negative (so x < 0), then the quotient is
negative. So that’s the only solution interval.
(Or: create a table, based on the roots of the numerator and denominator.)
Exercise 2.3.6. Try the following to practice solving inequalities:
x
(a) easy: solve for all x such that − 3 > 5;
2
2
(b) harder: solve for all x such that −3>5 ;
x
1
(c) solve for t : t + 4 > −1;
3
3
(d) solve for t: + 4 < −1;
x
(e) solve for x: x2 − 3x + 2 < x + 5;
x2 − 3x + 2
(f ) solve for x: < 0.
x+5
26
2.3.2 Absolute values: how to handle them and how to solve equations
Definition 2.3.7. (
x if x ≥ 0;
|x| =
−x if x < 0.
Example 2.3.8. So |5| = 5 and | − 3| = −(−3) = 3.
Note. If your expression has variables, you can’t think of the absolute value as “stripping away the
minus sign” — because you can’t tell if your variable is negative or not. Instead, use the definition.
Exercise 2.3.9. Show that
|x − 3| =
6 |x| + 3 and |x − 3| =
6 |x| − 3.
That is, the above equalities are sometimes accidentally true, but are NOT ALWAYS true. Find
values of x that make the equalities fail.
That is, you cannot simplify an expression by moving the absolute value signs.
In fact, |x − 3| is hiding two formulas. Using Definition 2.3.7 we write it out as

(
x−3 if x − 3 ≥ 0, i.e. x ≥ 3
|x − 3| =
−(x − 3) if x − 3 < 0, i.e. x < 3.
So how do we solve equations when there is an absolute value involved?
Example 2.3.10. Solve |x − 3| = 5.
Solution: So
|x − 3| = 5 means x − 3 = −5 or 5.
That is, one equation with absolute values actually turns out to be two equations, in disguise.
We solve:
x−3=5 ⇐⇒ x = 8, and
x − 3 = −5 ⇐⇒ x = −2.
Thus we found two solutions: x = −2 and x = 8. We check by plugging these back in: yes! X
27
Example 2.3.11. Solve |x2 − 3| = 1.
Solution: So
|x2 − 3| = 1 happens if and only if x2 − 3 = ±1.
We solve both equations:
x2 − 3 = 1 ⇐⇒ x2 = 4 ⇐⇒ x = ±2;
and √
x2 − 3 = −1 ⇐⇒ x2 = 2 ⇐⇒ x = ± 2.
We got four solutions; plugging them back in, we see they are correct. X
√
Our answer: there are four solutions: x ∈ {± 2, ±2}.
Sometimes, you get surprises.

Example 2.3.12. Solve |x2 − 3| = 7.
Solution: As before, we solve x2 − 3 = ±7. We have

√
x2 − 3 = 7 ⇐⇒ x2 = 10 ⇐⇒ x = ± 10.
But
x2 − 3 = −7 ⇐⇒ x2 = −4 . . .?
which has no (real number) solutions.10
√
So in this case, there are only two solutions, x ∈ {± 10}.
We use the same techniques to solve inequalities with absolute values.

Example 2.3.13. Solve for all x such that |x2 − 5| < 4.
Solution: The simplest way to approach this is to first solve the equality, and then solve the
inequality.
So let’s solve |x2 − 5| = 4, as above (exercise). We get four solutions: x ∈ {±1, ±3}.
Now consider a number line (or a table, if you prefer) with these 4 points:
−4 −3 −2 −1 0 1 2 3 4
Now, just as with polynomials, the function |x2 − 5| − 4 can’t change sign without going through
zero.11 The consequence: on each of these intervals between the red points the inequality we want
is either true or false, and we can detect which by plugging in values, for example. That is, we can
build a table to figure out the signs, just as we did before.
10
It does, however, have two complex solutions: x = ±2i, where i is a square root of −1. The complex numbers
are obtained from the real numbers by including this “imaginary” number i. We’ll only need or talk about complex
numbers a bit in MAT1332.
11
This is because it’s a continuous function, as we’ll review in Chapter 4.
28
interval sample x |x2 − 5| < 4?

(−∞, −3) −5 20 no
(−3, −1) −2 1 yes
(−1, 1) 0 5 no
(1, 3) 2.1 0.59 yes
(3, ∞) 23423452 a lot no
Putting all these intervals together, we deduce the final answer is the union of two intervals:
(−3, −1) ∪ (1, 3)
that is, x satisfies |x2 − 5| < 4 if and only if either −3 < x < −1 or 1 < x < 3.
A slightly easier example is as follows.

Exercise 2.3.14. Solve for all x such that |x2 − 4| ≤ 5.
Note. When you want to solve an inequality with an absolute value, first solve the equality, and
then use a table to values to decide where the inequality holds true.
Things get more difficult when there are variables on both sides; it can happen that your two
equations generate ghost answers that do not solve the original equation.
Example 2.3.15. Solve |x2 − 3| = 3x + 1.
Solution: we first solve x2 − 3 = ±(3x + 1) (notice the parentheses!). This gives two equations to
solve:
x2 − 3 = 3x + 1 ⇐⇒ x2 − 3x − 4 = 0 ⇐⇒ (x − 4)(x + 1) = 0 ⇐⇒ x ∈ {−1, 4},
and
x2 − 3 = −(3x + 1) ⇐⇒ x2 − 3 = −3x − 1 ⇐⇒ x2 + 3x − 2 = 0.
To solve the latter, we need the quadratic formula; we get
√
−3 ± 9 + 8 3 1√
x= =− ± 17.
2 2 2
We make a table and plug these values in to check, and get a surprise:
x |x2 − 3| 3x + 1 equal?
4 13 13 yes
-1 √ 2√ -2 √ no
1 1 1
2 (−3 + √17) | 2 (7 − 3√17)| 2 (−7 + 3√17) yes
1 1 1
2 (−3 − 17) 2 (7 + 3 17) 2 (−7 − 3 17) no
√
Therefore, the solutions are (only) x = 4 and x = 12 (−3 + 17).
Notice what happened: we accidentally solved the equation |x2 − 3| = |3x + 1|, by allowing all
combinations of signs; but not every solution to this equation is also a solution to our equation.
29
MAT 1330 : Fall 2020 2.4. FUNCTIONS
Note. Moral: When you want to solve an equality with an absolute value, like |f (x)| = 3x + 2 :
solve the two equations f (x) = 3x + 2 and f (x) = −(3x + 2) and then plug them back in to check .
In fact, this is the approach you need to use for many problems in this course: check your answer,
because we do a lot of complicated steps, and sometimes, we accidentally get stray “solutions”.
Exercise 2.3.16. Try the following problems to build your confidence with absolute values:
solve for x: | x2 − 3| > 5;

solve for x: |x2 − 5| = 1;
solve for x: |x2 − 16| = 9;
solve for x: | x2 − 3| > 5;
solve for y: |12 − x2 | > 3.
End of lecture # 1
2.4 Functions
A function is a rule which assigns to each element in the domain a unique element in the range. In
this course, the domain and range will always be subsets of the real numbers. One way to specify
the domain D explicitly is to write such a function f with domain D ⊆ R as
f: D →R
x → f (x).
For example,
f : (0, ∞) → R
1
x→ √ .
x
Very often, we just write something like
x−2
f (x) = √
x−5
from which you may deduce that the natural
√ domain or domain of definition is all those x for which
the formula is well-defined. Here, since x − 5 is only defined if x − 5 ≥ 0, and since x = 5 is
excluded because we can’t divide by 0, we conclude that the domain of definition of f is {x | x > 5}
(that is, D = (5, ∞)).
We sometimes restrict the function to a smaller domain (for example, so that it becomes one-to-
one); in this case we say we specify the function on a given domain.
The name of the function is f ; its value on a point x is f (x).
30
2.4.1 Potential characteristics of functions
You can plug values into a function to get some values, but to understand a function is to under-
stand its behaviour as a whole. Calculus is about understanding this behaviour, but even without
Calculus, you can often identify key characteristics of functions that dictate its graph.
Even and odd functions
Definition 2.4.1. A function is even if for all x in the domain, −x is also in the domain and
f (x) = f (−x); its graph will be symmetric via reflection in the y-axis. A function is odd if instead
f (−x) = −f (x), which means the symmetry is via reflection in both axes.
Examples of even functions: x2 , cos(x); examples of odd functions: x3 , sin(x). Most functions are
neither even nor odd.
y = cos(x) is on the left, it is even; y = sin(x) is on the right; it is odd.
Increasing and decreasing functions
Definition 2.4.2. A function f is said to be increasing on an interval I in its domain if for every
x1 , x2 ∈ I with x1 < x2 , we have f (x1 ) ≤ f (x2 ). (It is called strictly increasing if you in fact have
the stronger condition f (x1 ) < f (x2 ).) The function f is an increasing function if it is increasing
on every interval of its domain.
y = −1/x is on the left; it is increasing on (−∞, 0) and on (0, ∞) but not defined at 0, so is an
increasing function. (A break in the domain means you are allowed to reset.) The graph of an
increasing function which is not strictly increasing is on the right.
We can defined a decreasing function similarly.
31
Linear transformations We can transform functions by linear operators, which retain the gen-
eral shape of its graph. For example,
the graph of y = f (x) + 2 is obtained from the graph of y = f (x) by shifting two units up;
the graph of y = f (x + 2) is obtained from the graph of y = f (x) by shifting two units to the
left;
the graph of y = 2f (x) is obtained from the graph of y = f (x) by doubling the y-values
(stretching vertically by a factor of 2);
the graph of y = f (2x) is obtained from the graph of y = f (x) by halving the x-values
1
(stretching horizontally by a factor of ).
2
Notice how transformations to the output variable have the obvious effect you would guess, but
transformations to the input variable have the opposite effect. If you’re ever confused for a moment:
ask yourself what happens at x = 1, for example.
The graph of y = f (x) = x4 − x3 − x2 is in blue; the graph of y = f (x) + 5 is in black; and the
graph of y = f (x + 5) is in red.
Composition More generally, we can compose functions, which means to evaluate them se-
quentially (rather than in parallel, as one does for multiplication). The important point is that
composition is not commutative, meaning, order matters.
For example, if f (x) = x5 + 3x and g(x) = x2 , then the composition f ◦ g is
(f ◦ g)(x) = f (g(x)) = f (x2 ) = (x2 )5 + 3(x2 ) = x10 + 3x2
whereas composing them in the opposite direction gives
(g ◦ f )(x) = g(f (x)) = g(x5 + 3x) = (x5 + 3x)2 = x10 + 6x6 + 9x2 .
Notice that neither is equal to the product f (x)g(x) = x7 + 3x3 .
32
Splicing functions together = piecewise-defined functions Another way to create a new

function is to “splice” two or more functions together.
Example 2.4.3. The recommended maximum dosage, in mg, of a certain pain reliever, as a
function of age x in years, is given by

0
 if x ≤ 2
f (x) = 250 if 2 < x ≤ 12

500 if x > 12.

This function is obtained by splicing three constant functions together. Some of the most interesting
medical questions about this dosage model concern the transition points; these are also the most
interesting points from the point of view of Calculus.
Example 2.4.4. Another drug’s maximum daily dosage, in mg, is expressed as a function of mass,
in kg, is given by (
0 if x < 30
g(x) =
4x − 120 if x > 30.
Notice that the x-intercept of the function 4x − 120 is x = 30, so there is no jump at the transition
point. The transition is still sharp (sketch the graph); we say the graph has a “cusp”. Again,
this cusp is the most interesting point, from both the applications and the mathematical point of
view.
Inverse functions If f is one-to-one on an interval I in its domain (equivalently, it passes the

horizontal line test there), then there exists a function, denoted f −1 , which is the inverse function
taking values in I. The inverse function is characterized by the relation
y = f −1 (x) ⇐⇒ x = f (y) and x ∈ I.
In other words, given a function y = f (x), if you solve for x in terms of y (and get a unique answer!)
then the formula you get is called the inverse function.
Note. Notice that we swapped variables to define the function. The convention is always to use x
as the independent variable, and y as the dependent variable.
5x + 1
Example 2.4.5. Find the inverse function of y = .
3x − 2
Solution: To make sure the inverse function exists, we could sketch the graph and verify that f is
one-to-one; or else we can try to solve for x in terms of y, and if we get a unique solution for each
y (as opposed to having choices) then our function is one-to-one and we will have found a formula
for the inverse. So let’s do that:
(3x − 2)y = 5x + 1
⇐⇒ 3xy − 2y = 5x + 1 expand
⇐⇒ 3xy − 5x = 2y + 1 group terms with x together
⇐⇒ x(3y − 5) = 2y + 1
2y + 1
⇐⇒ x=
3y − 5
33
therefore the inverse function is

2x + 1
y = f −1 (x) = .
3x − 5
Some important inverse functions:
ln(x) is the inverse function of ex (and vice versa);

√
f −1 (x) = x is the inverse function of f (x) = x2 with the restricted domain I = [0, ∞);
g −1 (x) = arcsin(x) is the inverse function of g(x) = sin(x) with the restricted domain I =
[−π/2, π/2].
In the latter two cases, we had to limit the domain of definition to a smaller one on which the
function was one-to-one. There were many choices but the world (and your calculator) has agreed
to use these.
Now let’s look at several important classes of functions.
2.4.2 Polynomial functions
A polynomial in x is a function of the form

1 7
1 + 2x + 4x2 or − πx + x .
47
Most generally, it would look like
f (x) = a0 + a1 x + · · · + an xn
where n ∈ N (the natural numbers, N = {0, 1, · · · }) and each of the ai are real numbers.
If n = 1, then the function is a linear function, like

f (x) = mx + b,
where m and b are real numbers. We know that the graph of f is a straight line with slope m and
y-intercept b.
Linear functions are the simplest kind of relationship that we can express between two variables.
Very often, in an experiment, you will hope your points fall on a straight line, confirming a linear
relationship.
If n = 2, then the function is a quadratic function, of the form

f (x) = ax2 + bx + c,
for real numbers a, b, c (and a 6= 0). We know that the graph of f is a parabola which is concave
up ∪ if a > 0 and concave down ∩ if a < 0. Its x-intercepts, if any, are given by factoring, or by
applying the quadratic formula: √
−b ± b2 − 4ac
x= .
2a
34
Quadratic functions show up in some logistic growth (= limited resource) population models: the
rate of growth relative to population size hits a peak, then decreases.
If n ≥ 3, then this is a higher-degree polynomial. We recognize their shapes, but it is more difficult
to find intercepts algebraically. Moreover, these functions have many more features (local maxima
and minima) which are easiest to find using Calculus.
y = (x − 5)3 + (x − 5)2 − 3(x − 5) + 5 is in blue; y = (x + 2)4 + 2(x + 2)3 − 5(x + 2)2 + 2 is in red;
y = x5 − 2x4 − 5x3 + 3x2 + x − 7 is in black. Note the limits as x → ±∞, as well as the number of
local minima and maxima, in relation to the degree of the polynomial.
Given a polynomial of degree ≥ 3, if we know one root, we can use long division to simplify and
perhaps find other roots. See Section 2.2.5.
2.4.3 Rational functions
A rational function is one of the form

p(x)
f (x) =
q(x)
where p(x), q(x) are polynomials — so a fraction with polynomials instead of numbers. Since we
can never divide by zero, the domain of such a function is the set of all x such that q(x) 6= 0.
These kinds of functions commonly arise from an inverse-proportional relationship, such as in

animal populations where the probability of an individual being eaten is inversely proportional to
the size of the population of the prey (and directly proportional to the size of the population of
predators). Rational functions can also give more realistic models than polynomials, because one
can arrange for various kinds of asymptotes, and for the graph to remain positive for all x > 0.
They can also arise when there is a natural vertical asymptote (such as trying to achieve the speed
of light, or absolute zero in temperature).
Example 2.4.6. The domain of
x3
f (x) =
(x − 2)(x − 1)
is D = (−∞, 1) ∪ (1, 2) ∪ (2, ∞), that is, all real numbers except 1 and 2, which are the roots of the
denominator.
35
Very often (that is, unless the root of the denominator is also a root of the numerator), a root of
the denominator induces a vertical asymptote of the graph of f .
The graph of y = 1/x is in blue; the graph of y = 1/x2 is in red. Notice that y = 1/x is an odd
function whereas y = 1/x2 is even. Both have a vertical asymptote at x = 0.
x+1 2 5
Exercise 2.4.7. Find the domain of f (x) = , and of g(x) = + .
x2 − 2 x − 1 x2 + 3
2.4.4 Root or radical functions
A function like p √ √
3
f (x) = x2 + 2, g(x) = x − 5, or h(x) = x−1
is a root or radical function. More generally, a function of the form
f (x) = g(x)a/b
where a/b is a reduced fraction that is not an integer and g(x) is a polynomial, might be called a
radical function.
If b is even then
– if a/b > 0 then the domain is D = {x | g(x) ≥ 0}

– if a/b < 0 then the domain is D = {x | g(x) > 0}.
If
√ b is odd (like a cube root) then f (x) is also defined for negative values (for example,
3
−8 = −2)
– so if a/b > 0 then the domain is D = R

– if a/b < 0 then the domain is D = (−∞, 0) ∪ (0, ∞).
If the power is a real number but not expressed as a fraction, then the domain of definition
is the same as for the b even case.
36
These kinds of functions commonly showp up as part of a distance formula; for example, the distance
between (x, 1) and (1, 2) is d(x) = (x − 1)2 + 1. They are also common in various biological
applications. For example, Kleiber’s law states that the metabolic rate is proportional to m3/4
where m is the individual’s mass. For another example: heart rate has been observed to be
proportional to m1/4 .
√
Example 2.4.8. Find the domain of g(x) = x2 − 5.
√ √
Solution:
√ x2 − 5 is defined only if x2 − 5 ≥ 0, or x2 ≥ 5. So the domain is the set (−∞, − 5) ∪
( 5, ∞).
√
The square root function y = x in particular is only defined on half of the real line.
√ √
The graph of y = x2 − 5 is in blue; the graph of y = x is in red. Notice how they are not
defined on the entire real line.
2.4.5 Absolute value function (new)
Definition 2.4.9. The absolute value is a function, defined by the formula

(
x if x ≥ 0;
|x| =
−x if x < 0.
37
The graph of y = |x|. The graph of the absolute value function is obtained by splicing together
the graphs of y = −x (for x < 0) with the graph of y = x (for x > 0).
Example 2.4.10. In Example 2.3.10 we solved |x − 3| = 5, and found the solutions were x = −2
and x = 8. We can see this makes sense by sketching the graph of y = |x − 3| (which is the
translation of the graph of y = |x| by 3 units to to the right) and seeing where it intersects the line
y = 5, below.
y = |x − 3| is in blue; y = 5 is in red; the answer to |x − 3| = 5 is the x-values for the points of

intersection, which are −2 and 8.
What happens when we compose the absolute value function with another function f ?
Given y = f (x), the graph of y = |f (x)| is obtained by reflecting any parts below the x-axis
upwards. That’s because we took the absolute value of the output of f (x), so our final answer
must be ≥ 0.
On the other hand, the graph of y = f (|x|) is obtained by erasing the parts to the left of the
y-axis and instead reflecting the graph to make it even. That’s because we took the absolute
value of the input to f : we refuse to evaluate f on any negative inputs.
38
The graph of y = | sin(x)| is on the left; the graph of y = sin(|x|) is on the right.
Notice how we had packed a rather complex function into some really concise notation.
Example 2.4.11. Expand the formula for f (x) = |x2 − 3|.
Solution: by the definition of the absolute value, we have

(
2 x2 − 3 if x2 − 3 ≥ 0
|x − 3| =
−(x2 − 3) if x2 − 3 < 0.
Now, that’s all well and good, but to understand for which x we need to use x2 − 3 and for which
x we need to use −(x2 − 3), we need to solve the inequality x2 − 3 ≥ 0. Good thing we reviewed
that in Section 2.3.1.
So, to get a really useful formula for |x2 − 3|, we need to solve the conditions x2 − 3 ≥ 0 (and
x2 − 3 < 0). Well,
√ √
x2 − 3 ≥ 0 ⇐⇒ x2 ≥ 3 ⇐⇒ x ≥ 3 or − x ≥ 3.
√ √
Since −x ≥ 3 is the same as x ≤ − 3, we can rewrite our definition of |x2 − 3| in the following
very helpful way: 
2
√
x − 3
 if x ≤ − 3;
√ √
|x2 − 3| = −(x2 − 3) if − 3 < x < 3;

 2 √
x −3 if x ≥ 3.
You can verify this is true by plugging it different values for x (like −10, 0, 10 and seeing that
|x2 − 3| coincides with the formula above.
In Example 2.3.11 we solved |x2 − 3| = 1 algebraically. Now we can see graphically why we ended
up with a total of four solutions: the bottom part of the parabola y = x2 − 3 was reflected up, and
thus the graph of y = |x2 − 3| means y = 1 four times in total. See the graph below.
39
y = |x2 − 3| is in black; y = 1 is in red; the answer is the√x-values for the points of intersection,
which are ±2 and ± 2.
You can also use this graph to understand why the equation |x2 − 3| = 7 in Example 2.3.12 has
only two solutions.
Example 2.4.12. In Example 2.3.13 we solved |x2 − 5| < 4 algebraically and deduced that the
answer was x ∈ (−3, −1) ∪ (1, 3). If we draw the graph we see why it is true.
y = |x2 − 5| is in blue; y = 4 is in red; the answer is the x-values such that the blue curve lies
below the red line, which are the intervals (−3, −1) and (1, 3).
Exercise 2.4.13. Sketch the graphs of y = |x2 − 3| and y = 3x + 1 to see where they intersect.
Compare with Example 2.3.15.
2.4.6 Trigonometric functions
Note. Set your calculator to radians.
40
Recall: 2π radians is 360◦ . π radians is 180◦ .
The basic trigonometric functions are sine and cosine. Their quotient is called the tangent function
because you can think of it as measuring a slope.
Three perspectives:
Triangles Given a right triangle with an acute angle θ, we label the adjacent and opposite sides,
as well as the hypotenuse, and have
opp adj opp
sin(θ) = , cos(θ) = , tan(θ) = .
hyp hyp adj
This perspective is good for practical applications, and for figuring out the values of your
trigonometric functions at your favourite angles (using geometry) but it’s limited to 0 < θ < π2 .
The circle Given a point on the unit circle, at angle θ (measured counterclockwise from the
positive x-axis), its coordinates are
(x, y) = (cos(θ), sin(θ))
and the slope of the line from the origin to this point is tan(θ) (except when θ is an integer
multiple of π/2, where the line is vertical, so tan(θ) is not defined).
This persective is excellent: you can tell with a glance at the quadrant what the signs will
be, and can fit your triangles into the picture to figure out the common ratios.
Their graphs For every input θ, we get an output sin(θ). Therefore this is a function on the real
line, and we can draw the graph of input versus output as usual. That said: our favourite
letter for the independent variable (input) of a function is x, and our favourite letter for the
dependent variable (output) of a function is y, so in this context we’d write y = sin(x), or
y = cos(x), etc.
This is the perspective we’ll use most in Calculus: trigonometric functions are the most
fundamental examples of periodic functions.
Note. Calculus only works with radians. Degrees have been obsolete since the 1650s. Spread the
word.
The six trigonometric functions Sine and cosine are two of the six trigonometric functions
sin(x) 1 1 cos(x)
tan(x) = , csc(x) = , sec(x) = , cot(x) = .
cos(x) sin(x) cos(x) sin(x)
These kinds of functions are crucial for modeling periodic phenomena, like the swinging of a pen-
dulum, or population cycles, or circadian rhythms (eg, sleep-wake cycle). See their graphs, below.
41
Graphs of the six basic trigonometric functions, from

https://www.onlinemathlearning.com/trigonometry-graphs.html.
You should be able to sketch these. Remember that sin(0) = 0 (the “sign” of 0 is 0” – ha ha) but
cos(0) = 1. For the rest, you identify the vertical asymptotes by where the denominator vanishes,
and then fill in from there. From the graphs above, you can read the domain and range of each of
the functions.
Wave functions These trigonometric functions are the building blocks we use in many instances
to model periodic phenomena. As an important example, a wave function is characterized by its
mean, amplitude, period and phase. For example, the function

2π
f (x) = M + A cos (x − ϕ)
T
has:
mean M (meaning, this is the level around which it oscillates),
amplitude A (meaning, this is its maximum distance from the mean, in absolute value),
period T (meaning, the graph repeats after time T ; more precisely, f (x) = f (x + T ) for all
x), and
phase ϕ (meaning, the left-right displacement of the wave; here, the peak will be at x = ϕ
rather than the usual x = 0 which it would be for cos(x)).
42

2π 5π
The graph of y = 3 + 1.5 cos x− , above the graph of y = cos(x), for comparison. It
4π 2
has mean 3, amplitude 1.5, period 4π and phase 5π/2. Each of these is a standard graph
transformation (shift or stretch), but physicists tend to use these special names to identify them.
Trigonometric identities There are many excellent trigonometric identities, but the key one is
sin2 (x) + cos2 (x) = 1.
The next most useful thing to know: the values of sin(x) and cos(x) at x = 0, π/6, π/3, π/2, 2π/3, π.
x (x in degrees) cos(x) sin(x) x (x in degrees) cos(x) sin(x)

√
1 3
0 0◦ 1 0 2π/3 120◦ −
2 2
√ √ √
3 1 2 2
π/6 30◦ 3π/4 135◦ −
2 2 2 2
√ √ √
2 2 3 1
π/4 45◦ 5π/6 150◦ −
2 2 2 2
√
1 3
π/3 60◦ π 180◦ -1 0
2 2
π/2 90◦ 0 1 3π/2 270◦ 0 −1
There’s a lot of repetition in this table. In fact, by reflecting through the four quadrants you can
43
see:
sin(−x) = − sin(x) and sin(π − x) = sin(x)
whereas
cos(−x) = cos(x) and cos(π − x) = − cos(x).
Don’t forget that these are not the only repeated values! Sine and cosine are periodic functions of
period 2π, meaning
sin(x) = sin(x + 2kπ) and cos(x) = cos(x + 2kπ)
for every integer (positive or negative) k. On the other hand, tangent is already periodic of period
π, as you can see from the graph:
tan(x) = tan(x + kπ) for all integers k.
Note. Trigonometric functions are not one-to-one.
Example 2.4.14. The solutions of

1
sin(x) =
2
are
π 5π
x= + 2kπ and x = + 2kπ, for all integers k
6 6
where we got the second basic solution via the identity sin(x) = sin(π − x) and the rest from the
identity sin(x) = sin(x + 2kπ) for all integers k.
Your calculator has a button for the inverse sine function, called arcsine or inverse sine and written
arcsin or sin−1 ; but it only gives one answer (being a function!), and the one it gives is the answer
π π
in the interval − ≤ θ ≤ . USE RADIANS.
2 2
Note. WARNING: sin−1 is the inverse function of sin; it is NOT csc. THIS IS HORRIBLE
NOTATION AND I’M REALLY SORRY. We want to get rid of it but notice how we’re still
fighting, almost 400 years later, to get rid of degrees; fixing this annoying notation is going to take
a while.
Forewarned is forearmed. I’ll always use arcsin(x) in class.
Similarly, to solve for all x such that cos(x) = r, arccos(r) gives you just one answer (in the interval
0 ≤ x ≤ π) and you use the identity cos(x) = cos(−x), as well as its periodicity, to generate the
other solutions.
Note. Remember to use radians in Calculus.
44
2.4.7 Exponential and logarithmic functions
See also Sections 2.2.2 and 2.2.3; there’s repetition because this is important, and will be used a
lot in this course. Our perspective here is about the functions, rather than just the algebra.
For any number a > 0 we define the exponential function with base a as f (x) = ax . The domain
is all of R, but its range is only the nonnegative real numbers (because ax < 0 can never happen).
The base a = e ' 2.718..., called Euler’s number, is the distinguished base in Calculus, called the
natural base. We sometimes write f (x) = ex = exp(x), especially when the exponent is bulky and
would be hard to write above the e.
The graphs of various exponential functions. We have y = ex in blue, y = 2x in black (note that
1 x
1 < 2 < e), y = 2 = 2−x in red (note that 0 < 12 < 1), and y = 10x in green (note that 10 > e).

A short line segment of slope one is tangent to the graph of y = ex at x = 0; the other exponential
functions have slope ln(a) at x = 0.
Exponential functions arise in a huge variety of applications, including bacterial growth, radioactive
decay, and continuously compounded interest. It is one of the most important functions used in
biology.
Note. The laws of exponents: for all a > 0 and x, y ∈ R, we have:

1
ax+y = ax ay a−x =
ax
ax
axy = (ax )y ax−y = y
a
a0 = 1 a1 = a.
45
For any a > 0 we also have the inverse function of the exponential, which is called the logarithm.
It is defined by the relation
loga (x) = y ⇐⇒ ay = x
which is saying that these are inverse functions. In particular, note that the domain of the log
function is only the positive part of the real line, (0, ∞), but its range is all of R.
The natural logarithm is ln(x) = loge (x). In the physical sciences, log10 (x) is very common (such as
for the pH scale); so if we write log(x) without specifying the base, we mean base 10.12 Logarithms
come up a great deal because exponential functions do, and to solve an equation like
300 = 200 + 5ex−7
you need the logarithm.
Note. The laws of logarithms (let’s state them for ln, but similar rules hold for all logarithms):
for every x, y > 0 we have
ln(xy) = ln(x) + ln(y) ln(ex ) = x ln(e) = 1
ln(xy ) = y ln(x) eln(x) = x ln(1) = 0
But ln(0) is undefined (because you can’t solve ex = 0).
The graph of y = ln(x).
The logarithm is a key tool when your measurements will vary by orders of magnitude (from 10−7
to 107 , for example), and so it is actually the exponent you are interested in, rather than the digits
of the value. For example, the pH of a chemical solution is given by p = − log10 (k).
In Calculus, the logarithm is a wonderful function that allows you to simplify complex expressions
so they are easier to deal with. For example, for all x > 3 we have
2
x (x − 3)
ln = 2 ln(x) + ln(x − 3) − 7 ln(x + 2).
(x + 2)7
The right side is a lot easier to differentiate, or even to plug in values on your calculator.
12
In computer science, the most common is base 2, so in that subject log() means by default log2 (x).
46
Note. Love logarithms! They turn exponents into multiplication, and multiplication into addition
— they make expressions EASIER!
DANGER: ln(x + y) does not simplify. Accept it.
2.4.8 Summary
Knowing the graphs of your basic building-block functions is essential.
From Gizmodo.com, with thanks; posted by Philips Shiu. Can you spot and fix the mistakes?
End of lecture # 2
47
Chapter 3
Discrete Time Dynamical Systems

(DTDS)
Let’s now put these functions to their intended use: mathematical modeling.
3.1 Modeling in the Life Sciences
Change is what we study in science, and the life sciences are full of examples. Individuals grow and
die; the size of a population varies; individuals physically move within their environment; individuals
can change; wounds heal; hearts beat regularly; the immune system responds to threats; diseases
spread through populations; drugs are absorbed into the bloodstream; ...
One key goal is to be able to predict future states from the present state, based on understanding
the mechanisms of the change. For example, if we know how an organism’s life cycle depends on
the external temperature, we can predict future developments under climate change.
Experiments can sometimes tell us what the future could bring, by allowing us to extrapolate from
past to future — but experiments can be costly, risky, impractical, or have large time requirements.
Mathematical models are invaluable tools to help prediction. Based on experimental data and/or
our understanding of the mechanisms of change, mathematical models are used in a huge variety
of applications. We use them to try to predict the weather, the stock market, the progress of a
pandemic, and to regulate species harvesting and management.
Prediction of events with a mathematical model is a three-step process:
1. from life science to mathematics (modeling)

2. mathematical understanding (analysis)
3. from mathematics to life sciences (interpretation).
In this course, we will work through examples of doing each of the three steps, with a very large
48
MAT 1330 : Fall 2020 3.2. FIRST EXAMPLES
emphasis on the second step (analysis). The ultimate goal of MAT1330 and MAT1332 is for you
to have the tools to model, analyse and interpret phenomena in the life sciences, pertinent to your
scientific interests.
3.2 First examples
3.2.1 Bacterial growth
Suppose we grow bacteria in four petri dishes, measuring the amount of bacteria in cm2 (that is,
by area) before and after a 24-hour growth period. We collect the following empirical data.
Dish Before After

1 3 6
2 0.5 1
3 2 4
4 4 8
Plot of bacterial population, in cm2 , after a
24 hour period, as a function of the initial
population, in cm2 , at the beginning of the
24 hour period.
From these data, we develop our best guess about the growth of these bacteria: The bacteria double
in 24 hours.
Let’s write this as a formula. If xtoday is the area covered by bacteria today, then
xtomorrow = 2xtoday .
Excellent! This is a simple mathematical model.
Predicting one day into the future is nice, but how can we go further? Answer: repeat !
xin two days = 2xtomorrow = 2(2xtoday ) = 4xtoday .
xin three days = 2(xin two days ) = 8xtoday

and so forth.
Let’s formalize this idea, and also give ourselves a cleaner and clearer notation.
49
3.2.2 Definition of a DTDS
Definition 3.2.1. A discrete time dynamical system (DTDS) consists of:
a quantity whose change is tracked at time t : xt ;
a time step (eg, days, years — the units in which t is measured);
an updating function f that describes the change during a single time step.
The DTDS is then

xt+1 = f (xt ).
Example 3.2.2. (Bacteria)
The quantity was the area covered by bacteria in our petri dish. The time step is days (or 24-hour
increments). The updating function in our bacteria example was f (x) = 2x. Writing x0 for the
amount today, x1 for the amount tomorrow, x2 for the amount in two days, etc, we say our DTDS
is xt+1 = f (xt ), or, explicitly:
xt+1 = 2xt .
Example 3.2.3. (Tree height)
Bamboo is one of the fastest-growing plants on Earth, with a growth rate of 3 cm/hour (!). So
let us take x as the length of bamboo in cm and t the time measured in hours (so the time step
is 1 hour). At the end of each hour, x has increased by 3. Therefore the updating function is
f (x) = x + 3. The DTDS is xt+1 = xt + 3.
Example 3.2.4. (Medication)
Suppose you prescribe your patient pain medication as a daily dose. The daily dose increases the
amount of drug in their body, but over the course of the day, the body is absorbing and eventually
eliminating (some of) the drug.
Let us use a time step of 1 day, and let xt be the amount of drug in the body in mg just after the
daily dose is administered. What is the updating function in this case?
We think of what happens over the course of a day. If the patient began with xt mg in their
bloodstream, then over the day they eliminate a certain percentage of the drug. Just before we
next measure, a constant amount of drug is added. We could write this mathematically as
elimination intake
xt −−−−−−−→ rxt −−−−→ rxt + c
where 0 ≤ r < 1 is the percentage (meaning: fraction) of drug left in the bloodstream after one
day (so 1 − r is the rate of elimination) and c > 0 is the amount of drug administered.
Therefore our updating function is f (x) = rx + c and the DTDS is xt+1 = rxt + c.
50
Example 3.2.5. In the preceding example, if a certain drug is cleared at a rate of 25% per day
and the daily dose is 10mg, then r = 1 − 0.25 = 0.75 and c = 10, giving a DTDS of
xt+1 = 0.75xt + 10.
Exercise 3.2.6. Here are some exercises for you to try.
1. In Example 3.2.5, suppose instead that we measure the amount of drug in the body each day
immediately before the daily dose. What is the DTDS in this case?
2. Suppose the DTDS is xt+1 = 0.75xt + 10 and the initial drug level in the bloodstream was x0 = 8
mg (that is, immediately after the first dose). What is the amount of drug in the body after the
next daily dose? In two days? Plot these points. In your opinion, how does the amount of drug
in the bloodstream vary over the course of the day? Should we just connect the dots with straight
lines, or will the graph be much spikier? Does this model capture the maximum amount of drug
in the bloodstream per day, or the minimum? What about if we model it as the exercise above?
3. A DTDS need not only model applications in the life sciences. Assume you borrow $ 1000 and
agree to pay back $ 50 per month. The bank charges 0.5% in continuously compounded interest
per month.1 Write the updating function for this DTDS, where xt is the amount that you owe
immediately after making the tth payment, and t is measured in months.
What else does the updating function of a DTDS give you? The updating function f of
a DTDS tells you what happens to your measured value from one time step to the next. Therefore
it can also see several steps into the future, or the past.
Mathematics Life sciences

composition: (f ◦ f )(x) = f (f (x)) two time steps: xt+2 = f (f (xt ))
inverse: x = f −1 (y) previous time: xt = f −1 (xt+1 )
Note. The updating function is NOT telling you what your measured value is at time t — it is
only describing CHANGE. That is, we calculate NOT f (t) but rather f (xt ).
Exercise 3.2.7. 1. Consider f (x) = 12 x + 5. If we start at time t = 0 with x0 = 5, what

will be the value of x1 , x2 , x3 ? Now suppose that instead of a DTDS, we have the function
g(t) = 21 t + 5. What are the values of g(0), g(1), g(2) and g(3)? Do you have xt = g(t), or
are these different? Do they model the same phenomena?
1
This is because you have a good relationship with your bank. Your credit card would charge you closer to 1.6%
per month.
51
MAT 1330 : Fall 2020 3.3. SOLUTIONS OF A DTDS
2. Consider f (x) = 12 x + 2. Calculate the two-time step map: f ◦ f . Calculate the previous time
step map: f −1 .
3. Suppose our DTDS models bacterial growth by xt+1 = 3xt , where xt is measure in cm2 and t
is measured in intervals of 6 hours. Thinking of 24 hours as iterating the DTDS four times,
give the DTDS for bacterial growth where t is instead measured in days (that is, each interval
is 24 hours). Starting with an initial condition of 10 cm2 of bacteria, check your answer by
finding x4 in the first (6-hour) model and finding x1 in the second (24-hour) model.
4. Using the DTDS of the previous exercise: how much bacteria did we start with if after 12
hours we have 100 cm2 ? Check by evaluating the original DTDS on the value you got.
3.3 Solutions of a DTDS
We now know what a DTDS is. What is the “solution” of a DTDS? The goal was to predict future
and past events using the present (initial condition) and the short-term mechanism for change (the
updating function). So one answer is: the solution of a DTDS is the sequence of all future values.
Definition 3.3.1. The solution of the DTDS xt+1 = f (xt ) with initial value x0 is the sequence
{x0 , x1 , x2 , x3 , . . . }
where each xt satisfies xt+1 = f (xt ).
Example 3.3.2. (Bacteria) The solution of xt+1 = 2xt with x0 = 20 is {20, 40, 80, 160, . . . }.
Example 3.3.3. (Bamboo) The solution of xt+1 = xt + 3 with x0 = 0 is {0, 3, 6, 9, 12, . . . }.
Example 3.3.4. (Drug model) If our DTDS is xt+1 = 0.75xt +10 with initial concentration x0 = 0,
then we calculate
x1 = f (x0 ) = 0.75(0) + 10 = 10
x2 = f (x1 ) = 0.75(10) + 10 = 17.5
x3 = f (x2 ) = 0.75(17.5) + 10 = 23.125
x4 = f (x3 ) = 0.75(23.125) + 10 = · · ·
and so our solution is {0, 10, 17.5, 23.125, . . . }.
So a solution is not a single number, nor is it a finite set of numbers — it is an entire sequence,
consisting of infinitely many numbers. Here, we are talking about the solution to a dynamical
system, rather that the solution to an equation.
Another way of saying it: the solution is the graph of xt versus t, which is an infinite sequence of
points, one per time t.
52
Remark 3.3.5. Note that the solution of the DTDS is not the same as the updating function at
all: the solution is xt versus t but the updating function was xt+1 versus xt . You can see they are
different in the examples above.
So if you have written down the solution up to the 20th element of the sequence2 , then you can
easily answer “What is xt ?” for each t from 0 to 20.
That said, wouldn’t it be nice to have a simple formula (in terms of t) for each element of the
sequence? Then we wouldn’t have to write a long list, but could instead answer “What is xt ?”
using the formula.
3.3.1 Fixed points
There is one case where it’s super easy to write down the solution: when x0 is a fixed point of the
DTDS.
Definition 3.3.6. Suppose a DTDS has updating function f (x). Then any value x∗ satisfying
f (x∗ ) = x∗ is called a fixed point or an equilibrium or a steady state of the DTDS.
In this notation, the ∗ is just a decoration we put on the letter x to make it stand out, and you
don’t need to use it.
Example 3.3.7. We notice that when f (x) = 2x, then x∗ = 0 is a steady state. Indeed, if we start
with a population of x0 = 0, then the solution is just {0, 0, 0, . . . } — that is, the population stays
steady at that value.
This is always true: if x0 = x∗ , then x1 = f (x0 ) = f (x∗ ) = x∗ , and x2 = f (x1 ) = f (x∗ ) = x∗ , and
so on. That is, the solution is {x∗ , x∗ , x∗ , · · · }. It is an fixed point because it doesn’t change over
time; this makes it an equilibrium of the system.
Suppose our updating function is a linear function of the form

f (x) = rx + c.
Then we can solve f (x) = x for x in terms of r and c:
c
rx + c = x ⇐⇒ c = x − rx ⇐⇒ x(1 − r) = c ⇐⇒ x= .
1−r
Thus there is exactly one fixed point, if r 6= 1, which is
c
x∗ = . (3.1)
1−r
.
Example 3.3.8. Consider the drug model DTDS xt+1 = 0.75xt +10. Then c = 10 and 1−r = 0.25
so the fixed point is x∗ = 10/0.25 = 40. That is, if we start with x0 = 40, then our solution will be
(try it!) {40, 40, 40, 40, . . . }. The daily morning dose of 10mg is ensuring that the patient has the
same constant (steady state) level of drug (40mg) in their blood each morning3 .
2
Let’s call x0 the 0th element of the sequence, for simplicity.
3
(though of course it goes down over the course of the day, and up with the dose in the morning)
53
3.3.2 General solution formula for a linear DTDS
Assume that we are given a linear DTDS , that is, a DTDS of the form
xt+1 = rxt + c
where r and c are some constants, that is, real numbers that will not vary over time. For example,
for bacterial growth, we take r = 2 and c = 0; for bamboo growth, we take r = 1 and c = 3; for
medication levels, our constants will satisfy 0 ≤ r < 1 and c > 0, with actual values depending on
the drug.
Let’s calculate the solution and look for a pattern.
x0
x1 = rx0 + c
x2 = rx1 + c = r(rx0 + c) + c = r2 x0 + c(r + 1)
x3 = rx2 + c = r(r2 x0 + c(r + 1)) + c = r3 x0 + cr(r + 1) + c = r3 x0 + c(r2 + r + 1)
x4 = rx3 + c = r4 x0 + cr(r2 + r + 1) + c = r4 x0 + c(r3 + r2 + r + 1)
Now we see the pattern:
xt = rt x0 + c(rt−1 + rt−2 + · · · + r2 + r + 1). (3.2)
This is wonderful! To find xt , we can use (3.2) directly, instead of iterating our updating function
t times. And we can simplify further, using our geometric series formula (Theorem 2.1.5), which
says that if r 6= 1 then
1 − rt
rt−1 + rt−2 + · · · + r2 + r + 1 = . (3.3)
1−r
Let’s first consider when r = 1, though. Then our DTDS is of the form xt+1 = xt + c. The sum on
the right of 3.2 is equal to t, and since rt = 1, our general solution is just
xt = x0 + ct (3.4)
which is what you’d expect for (eg) bamboo growth.
Now let’s consider every other case. If r 6= 1, then using (3.3) the general formula for the solution
becomes
1 − rt

t
xt = r x0 + c . (3.5)
1−r
Actually, we can simplify this a bit further, by noticing that the formula we’d discovered earlier for
c
the fixed point f this linear DTDS, x∗ = , shows up. After that, let’s factor out the rt :
1−r
xt = rt x0 + (1 − rt )x∗ = rt x0 − rt x∗ + x∗ = rt (x0 − x∗ ) + x∗ .
Nice! We can summarize what we have found by writing it as a theorem.
54
Theorem 3.3.9. Let xt+1 = rxt + c be a linear DTDS, with initial condition x0 . Then the general
solution formula is
xt = x0 + ct, if r = 1, and
c
xt = rt (x0 − x∗ ) + x∗ if r 6= 1, where x∗ = is the fixed point.
1−r
Example 3.3.10. Suppose xt+1 = 0.75xt + 10, as in Example 3.3.4. Then r = 0.75, c = 2 and
the fixed point is x∗ = 40 (as we calculated in Example 3.3.8). Therefore since r 6= 1 the general
solution formula is
xt = (0.75)t (x0 − 40) + 40.
If our initial concentration is x0 = 0, then this simplifies to
xt = 0.75t (−40) + 40.
Plugging in t = 0, 1, 2, 3, . . . gives
0, 10, 17.5, 23.125, · · ·
which is exactly what we had computed by iterating f earlier in Example 3.3.4. (But now: I can
calculate x10 = 37.75 in a second flat.... with a calculator.)
Notice how the term (x0 − x∗ ) being negative implies that our solution is always less than x∗ —
but it doesn’t force the solution to be negative (which wouldn’t make sense).
2
Example 3.3.11. Suppose xt+1 = 1
2 xt + 2. Then r = 1
2 6= 1, and c = 2, so x∗ = 1 = 4.
1− 2
Therefore the general solution to the DTDS is
t t
1 ∗ ∗ 1
xt = (x0 − x ) + x = (x0 − 4) + 4.
2 2
For example, if the initial condition is x0 = 10 then the first few terms of the solution (using the
original DTDS) are
{10, 7, 5.5, 4.75, · · · }
which (check!) we can also get by plugging t = 0, 1, 2, 3 into the formula
1 6
xt = (10 − 4) + 4 = t + 4.
2t 2
In this example, we could graph the solution function s(t) = 6(2−t ) + 4 on a graph of x versus t:
sketch the decreasing exponential function y = 2−t , then scale it by a factor of 6 in the vertical
direction, and then shift it up by 4 units. Then the solution to the DTDS will be the points on
this graph with integer t-coordinates (check). This lets you quickly see what happens as t → ∞!
Example 3.3.12. (Bacterial model) The DTDS was xt+1 = 2xt = 2xt + 0 so it is a linear DTDS
with r = 2 and c = 0. This gives x∗ = 0/(1 − 2) = 0/(−1) = 0, and general solution formula
xt = 2t (x0 − 0) + 0 = 2t x0 .
We are greatly relieved to see this is the general solution, as it was exactly the exponential growth
model we expect for bacterial growth.
55
MAT 1330 : Fall 2020 3.4. BEHAVIOUR OF GENERAL DTDS: COBWEBBING
Example 3.3.13. (Fixed point) If x0 = x∗ , a fixed point, then the general solution formula tells
us that
xt = rt (x0 − x∗ ) + x∗ = rt (x∗ − x∗ ) + x∗ = x∗
for all t. Again, we are happy to see we get the answer we expected.
Note. Take away messages:
A DTDS consists of: a quantity xt that varies with time t ∈ {0, 1, 2, 3, . . . }; the units of this
time variable; and an updating function f . It is written as xt+1 = f (xt ).
A solution of a DTDS is an infinite sequence {x0 , x1 , x2 , . . . } obtained by applying f repeat-

edly, starting with an initial value x0 .
A fixed point x∗ is a number that satisfies f (x∗ ) = x∗ ; if x0 = x∗ then xt = x∗ for all t.
There is a general solution formula for any linear DTDS, given by Theorem 3.3.9 : When
you plug in values for r, c and x0 , it gives you a formula for xt as a function of t.
Notice that we derived our wonderful formula by iterating until we saw a pattern, and then using
the geometric series identity. But if our updating function was not linear, then we wouldn’t get
this pattern — in fact, it might be incredibly difficult to find any pattern at all! So what can we
do?
End of lecture # 3
3.4 Behaviour of general DTDS: cobwebbing
So we defined a DTDS as a system of the form xt+1 = f (xt ), where xt is the value of the object of
interest at time t, and a time step for t (and units for xt ) have been given.
When the updating function f is linear, that is, f (x) = ax + b for some constants a and b, then we
found an explicit solution, that is, a formula that immediately tells us the value of xt for every t,
without having to iterate through all the preceding values. (See Example 3.3.11 for instance.)
Our goal now: Visualize the behaviour of solutions, even if f is not linear.
Method 1: Calculate several terms of the solution, and plot them
First, a familiar example:
Example 3.4.1. Suppose our DTDS is xt+1 = 21 xt + 1, and the initial condition is x0 = 1.
To make this concrete, we could say that xt is the concentration of a drug in a patient’s bloodstream
at time t, given that the patient receives a daily dose that counteracts the natural elimination of
the drug. So t is measured in units of days and xt is the concentration right after the daily dose.
56
We calculate:
x1 = .5(1) + 1 = 1.5
x2 = .5(1.5) + 1 = 1.75
x3 = .5(1.75) + 1 = 1.875
···
And we plot these on a graph of t versus xt :
Note: we only plot the values

for = 0, 1, 2, . . ., not the times
in between.
Our discrete time dynami-
cal systems are often models
about very specific times (eg,
just after a drug dose) and not
for all the times in between.
Plot of xt versus t for t = 0, 1, 2, . . . , 6, for the DTDS xt+1 =

1
2 xt + 1.
This plot shows us that the concentration of the drug is increasing over time, but that this concen-
tration seems to be levelling off to about x∗ = 2.
(Check for yourself that this fits perfectly with the general solution formula we derived in Theo-
1 1
rem 3.3.9: since c = 1 and r = , x∗ = = 2 and so xt = (0.5)t (x0 − 2) + 2.)
2 (1 − 12 )
Now for a new example:

2xt
Example 3.4.2. Suppose our DTDS is xt+1 = and our initial condition is x0 = 4. (This is
1 + xt
an example of a limited population model. It expresses that the birth rate is inversely proportional
to 1+ current population, an effect often seen as individuals compete for resources.4 ) Note that
this is a nonlinear DTDS because the updating function
2x
f (x) =
1+x
is not a linear function. In particular, our solution formula from the previous section does not
apply.
4
For a similar example with more realistic numbers, see Example 3.3.3 in the book.
57
We calculate:
8
x1 = = 1.6
5
3.2
x2 = ' 1.23
2.6
x3 ' 1.10
x4 ' 1.05
(Careful: you have to carry many more dig-

its in your intermediate answers, or else the
rounding error on your results will be signifi-
cant.) Plot of xt versus t for t = 0, 1, 2, 3, 4, for the DTDS xt+1 =
2xt
.
1 + xt
This method is tedious and prone to round-off errors. Good for computers, but not for us.
Method 2: Cobwebbing
Cobwebbing is a technique for calculating iterations of the DTDS graphically, instead of numerically.
It is fast and gives you an overall sense of what is happening to the solution over time.
Note. Here is the algorithm (or recipe) for cobwebbing; it’s done on a graph of xt+1 (on the y-axis)
versus xt (on the x-axis):
1) Graph the updating function y = f (x) and also the diagonal line y = x.
2) Start with x0 on the x-axis and go vertically to the graph of f : this intersection is the point
(x0 , x1 ) where x1 = f (x0 ).
3) Go horizontally to the diagonal; this gives the point (x1 , x1 ).
4) Repeat steps 2) and 3).
5) Collect the values xt ; these are the solution.
If required, we can then draw the solution over time on a separate graph, as the points (t, xt ) on a
graph of t versus x.
Example 3.4.3. Let’s apply this algorithm to our familiar example of a linear DTDS.
58
An example of a cobweb for the DTDS xt+1 = 21 xt + 1. The graph on the left shows the cobweb
diagram, starting from some point x0 ; the corners along the graph of the updating function f give
the values x1 , x2 , . . .. To graph the general solution, we plot (0, x0 ), (1, x1 ), (2, x2 ), . . . in a graph
of t versus xt , at right.
The solution (at right) is the same as the graph plotted in Example 3.4.1.
Example 3.4.4. Now let’s do a nonlinear DTDS using cobwebbing. Suppose the updating function
is
2x
f (x) =
1+x
and x0 = 0.3. We can graph y = f (x) using the tools of Calculus. We will remind ourselves of
the techniques for doing so later; for now, note that its graph starts at (0, 0) and has a horizontal
asymptote at y = 2.
2xt
An example of a cobweb for the DTDS xt+1 = . The graph on the left shows the cobweb
1 + xt
Note that the solution (at right) is also what you get by plugging x0 = 0.3 into f and iterating.
59
MAT 1330 : Fall 2020 3.5. STEADY STATES
Example 3.4.5. Another linear DTDS:
An example of a cobweb for the DTDS xt+1 = xt + 1. The graph on the left shows the cobweb
This geometric approach gives us a lot more qualitative information about the DTDS; plugging
values into the updating function repeatedly gave us more quantitative information.5 Both are
doing the same thing: iterating f .
In the next sections, we infer properties of the DTDS and its solutions from these cobweb diagrams.
3.5 Steady states
Recall that a fixed point or steady state or equilibrium 6 of the DTDS xt+1 = f (xt ) is a number x∗
such that
f (x∗ ) = x∗ .
Thinking now in terms of graphs: a fixed point is where the graph of y = f (x) intersects the graph
of y = x.
Suppose that our initial value x0 happens to be where y = f (x) intersects the diagonal line y = x.
We do the cobweb algorithm... but nothing happens! We stay put. This is what we noticed
quantitatively last time: if x0 = x∗ is a steady state, then the updating function f does nothing to
it, and our solution is just the constant {x∗ , x∗ , x∗ , · · · }.
5
Motivation #1 for Calculus: How can we sketch updating functions and thereby do cobwebbing, rather than
using a calculator and being surprised by how the numbers turn out? Answer: Calculus tells you the shapes of
curves.
6
Plural of equilibrium: equilibria.
60
These points are very important, both mathematically and for the application we are modeling,
which is why they got a special name (actually, many special names for the same thing).
3.5.1 Examples: finding equilibria
Example 3.5.1. Last time, we saw that for a linear DTDS xt+1 = rxt + c there is exactly one
c
equilibrium if r 6= 1, namely x∗ = .
1−r
Example 3.5.2. Suppose xt+1 = 12 xt + 1. Find all the equilibria.
Solution: The updating function is f (x) = 12 x + 1, which is a linear function. Therefore we can
c 1
apply the formula: x∗ = = = 2.
1−r 1 − 12
Suppose we’ve forgotten the formula. Here’s what we do. A steady state is a solution to x = f (x),
in other words, to
1
x = x + 1.
2
Subtract 12 x from both sides to get
1
x=1 ⇐⇒ x = 2.
2
So there is only one steady state, and it is x∗ = 2. (Note that we use a special notation to help us
emphasize the significance of 2 — it’s not just any old value, it’s a fi∗ed point.)
We check our answer: f (2) = 12 (2) + 1 = 1 + 1 = 2; yes, it’s a fixed point of the DTDS.
Now let’s do a nonlinear example.

2xt
Example 3.5.3. Suppose xt+1 = . Find all steady states of this DTDS.
1 + xt
2x
Solution: The updating function is f (x) = . A steady state is a solution to the equation
1+x
x = f (x). In this case, it means solving:
2x
x=
1+x
⇔ x(1 + x) = 2x
⇔ x + x2 = 2x
⇔ x2 − x = 0
⇔ x(x − 1) = 0.
Therefore this DTDS has exactly two steady states: x∗ = 0 and x∗ = 1.

0 2
We check: f (0) = = 0 and f (1) = = 1. Yes!
1 1+1
61
Note: common mistake #5 in these problems is to cancel the x while solving, and end up (in this
case) with only one answer, x = 1. What happened? Division by zero! When you divide by x, first
ask yourself: what happens if x = 0? Once you know if 0 is a solution or not, you can say “ok, now
let’s consider x 6= 0” and continue on to divide by x to find the remaining solutions.
Example 3.5.4. Suppose xt+1 = xt + 1. Find all fixed points.
Solution: The updating function is f (x) = x + 1. This is linear, but we cannot apply our formula
because r = 1 (go see the problem!). A steady state is a solution to the equation x = f (x). Here,
this means solving
x=x+1
but this equation has no solution! (Quick check: indeed, the lines y = f (x) and y = x are parallel
in this case.) Therefore this DTDS does not have any equilibria.
Exercise 3.5.5. Compare the answers found here with the cobweb graphs of the previous section,
to agree that we have found all the equilibria.
Remark 3.5.6. When solving for steady states, it always helps to sketch the graph, so that you
can see how many solutions to expect.
3.5.2 Stability of equilibria
Our second observation from our cobweb examples is that sometimes our solutions approach an
equilibrium, and sometimes they move away from an equilibrium.
To explore this phenomenon more fully, let’s consider a richer example: a population exhibiting
the Allee effect7 .
Example 3.5.7. (Population showing Allee effect)
Consider the DTDS

4x2t
xt+1 = .
1 + x2t
4x2
The updating function is f (x) = and the graph of the updating function is characterized
1 + x2
by an S-shaped curve (as below, but we’ll usually exaggerate the S, as in the next figure after that
— which will not change the general nature of the solutions).
7
From Wikipedia: “The Allee effect is a phenomenon in biology characterized by a correlation between population
size or density and the mean individual fitness (often measured as per capita population growth rate) of a population
or species.”
62
4x2
A plot of y = , on a graph of xt+1 versus xt .
1 + x2
The shape of this curve is the interesting part. The way to read it: the slope of the curve at any
point tells you about the reproductive rate as a function of the current population. When it is
shallow, the population does not grow much from xt to xt+1 , so has a low reproductive rate; when
it is steep, the population grows quickly from xt to xt+1 , corresponding to a high reproductive rate.
So what this updating function is modeling about our population are the following observations
(which are written in words below and correspond to the shape of the graph above):
If the population is low, then the scarcity of mates and general low fitness of the population
implies a low reproductive rate.
If the population is large enough, then the reproductive rate is high.
If the population is too high, then the limited resources reduce the reproductive rate.
Let us apply the cobweb algorithm to some different initial values x0 and see what happens.
63
4x2
Cobweb diagram applied to the updating function f (x) = . An initial condition x0 between
1 + x2
0 and the second fixed point gives a cobweb that tends down to 0 (in blue); an initial condition of
x0 = x∗ (in green) does not change over time; an initial condition x0 between the second and
third fixed points gives a cobweb that tends to the third fixed point (in red).
We see that the middle equilibrium is not approached by any cobweb diagram, but the two other
fixed points are approached by cobweb diagrams from various initial states.
Definition 3.5.8. A fixed point x∗ is called stablea if all nearby initial conditions give solutions
that approach x∗ . A fixed point x∗ is called unstable if there is at least one initial condition near
x∗ such that the solution does not approach x∗ .
a
also called “asymptotically stable”
Example 3.5.9. In the preceding example, the middle fixed point is unstable.
Example 3.5.10. In the linear DTDS xt+1 = 12 xt + 1, the fixed point x∗ = 2 was stable.
4x2t
Exercise 3.5.11. Calculate the fixed points of xt+1 = explicitly. Prove that the smallest
1 + x2t
and the largest of these fixed points is stable, by drawing cobwebs starting on either side of each
fixed point.
Exercise 3.5.12. Is the fixed point in Example 3.4.4 stable or unstable?
64
MAT 1330 : Fall 2020 3.6. STABILITY IN LINEAR MODELS: A THEOREM
Exercise 3.5.13. Do a cobweb for the linear DTDS xt+1 = − 21 xt + 2. Identify any fixed points
and classify by stability.
Note. Explore cobwebbing and graphing using the Excel file provided: linearDTDS.xls.
End of lecture # 4
3.6 Stability in linear models: a theorem
Recall:
Given a DTDS xt+1 = f (xt ), a fixed point is a number x∗ such that f (x∗ ) = x∗ . There may
be none, one, or many fixed points for a given DTDS.
There are two kinds of fixed points (=equilibria, steady states): stable and unstable.
– A fixed point x∗ is stable if all nearby x0 give solutions that approach x∗ ;

– x∗ is unstable if at least one nearby initial state x0 gives a solution that does not approach
x∗ .
Note. The stability of its steady states tells us about the long-term behaviour of a DTDS.
So let’s start by figuring out the stability of fixed points in linear models.8 All the linear models
with fixed points we have looked at in the last two classes have been stable, but this is a little
misleading, as we’ll see.
Let us begin with simple examples of linear DTDS:
xt+1 = rxt , x∗ = 0.
For different values of r, to decide stability we must draw at least two cobwebs: one for an initial
condition x0 > x∗ and one for an initial condition x0 < x∗ . The fixed point is stable only if all
nearby initial states give solutions that approach it.
8
Why linear models? First: because linear updating functions give DTDS for which we have a general solution
formula and so can answer all questions completely. Second: we will see that this answer gives us the hint we need
on how to predict stability in general.
65
Cobwebs applied to the updating function y = rx, with two initial conditions each: x0 < 0 and
x0 > 0. At left, the slope of the updating function is 0 < r < 1; at right, the slope of the updating
function is r > 1. We see that in the former x∗ is a stable fixed point; in the latter, x∗ is an
unstable fixed point.
What if r < 0?
Cobwebs applied to the updating function y = rx, with two initial conditions each: x0 < 0 and
x0 > 0. At left, the slope of the updating function is −1 < r < 0; at right, the slope of the
updating function is r < −1. We see that in the former x∗ is a stable fixed point; in the latter, x∗
is an unstable fixed point.
Exercise 3.6.1. Show that if r = 0 then the fixed point is stable. Decide if this makes sense, given
that the DTDS is xt+1 = 0 · xt = 0?
Exercise 3.6.2. Examine the stability of the fixed points when r = 1 or r = −1. Argue (making
reference to the DTDS, and to the kinds of cobwebs you obtain) that these two borderline cases are
66
an incredible balancing act. We won’t consider them here — they are just too extreme to occur in
nature.
Well, that pattern seems clear enough: the fixed point is stable if −1 < r < 1 and is unstable if
|r| > 1. In fact, it is true even with a general linear updating function; let’s see why.
Theorem 3.6.3. Let xt+1 = rxt + c be a linear DTDS with r 6= ±1. Then the fixed point
c
x∗ =
1−r
is stable if |r| < 1 and is unstable if |r| > 1.
Proof. Remember our general solution formula:
xt = rt (x0 − x∗ ) + x∗ .
Assume x0 6= x∗ . If |r| < 1 then rt gets smaller and smaller, approaching9 0 as t → ∞. So as

t → ∞, xt → 0(x0 − x∗ ) + x∗ = x∗ . So x∗ is stable (and this stability doesn’t depend on our initial
condition x0 or even the value of c!).
If |r| > 1, then the powers of r will grow (in absolute value!), and in fact |rt | → ∞ as t → ∞. In
particular, rt 6→ 0, so xt 6→ x∗ , so the fixed point is unstable.
1
Example 3.6.4. Consider the medication model xt+1 = xt + 1, with constant dose 1. The fixed
2
point is
c 1
x∗ = = = 2.
1−r 1 − 21
1
Since |r| = < 1, this fixed point is stable. Therefore, if we continue this regular daily dose, the
2
concentration in the bloodstream will eventually stabilize to x∗ = 2.10
Now let’s do an important example that explains why we’re working so hard to understand the
relationship between the DTDS and its solution.
Example 3.6.5. Suppose we are in the setting of the previous example and we want to change the
daily dose so that the steady state is x∗ = 3. Your first guess might be: add 1 to the daily dose.
Let’s try that:
1 2
xt+1 = xt + 2 ⇒ x∗ = = 4.
2 1 − 21
Oops! No, that was not the correct approach, because we forgot that some of the extra daily dose
is being kept in the system from one day to the next (and thus we overdosed our patient).
9
Motivation #2 for Calculus: what does it mean that rt → 0 as t → ∞? How could we decide this if the expression
were more complicated? Answer: limits.
10
Mathematically speaking, xt will get closer and closer to 2 without ever being equal to 2 — but in real life, we
can only measure the concentration to a given precision, so what you’ll measure, eventually, is a concentration of 2.
67
MAT 1330 : Fall 2020 3.7. STABILITY IN NONLINEAR MODELS: EXAMPLES
Solution: the actual problem we want to solve is the following. We want to choose the daily dose
c so that the linear DTDS xt+1 = 12 xt + c has a steady state of x∗ = 3. So we solve
c 3
1 =3 ⇔ c= .
1− 2
2
So our answer is: the daily dose should be increased to 1.5 from 1 to achieve a steady state of
x∗ = 3.
Exercise 3.6.6. Suppose the DTDS for a different drug is xt+1 = 23 xt + 4. Find x∗ and explain
why it is stable, using two different arguments. (Hint: cobweb, theorem). Now suppose we instead
want a steady state of x∗ = 10. What should the new daily dosage be?
3.7 Stability in nonlinear models: examples
Nonlinear models are very common in nature. Our eventual goal is a simple criterion that could
tell us (mathematically) whether a given fixed point is stable or not.
In Life Sciences: only stable states are visible in nature! (You would have to conduct a controlled
experiment to observe the initial few solutions, given any biological process; under normal circum-
stances, you’re seeing what has happened in the long run.) So the long term behaviour is about
the steady states.
Example 3.7.1. (Allee effect; see Example 3.5.7)
0.7x2t
xt+1 =
1 + 0.1x2t
The graph of the updating function f (x) = 0.7x2 /(1 + 0.1x2 ) of a DTDS displaying the Allee
effect, in red. The diagonal y = x is in blue; the axes are xt+1 versus xt . The fixed points are the
solutions to f (x∗ ) = x∗ , which are 0, 2 and 5; these are the intersections of the two graphs.
0.7x2
When we cobweb on the updating function f (x) = , we see that the fixed points 0 and 5
1 + 0.1x2
∗
are stable, whereas the middle fixed point, at x = 2, is unstable.
See also the Excel file AlleeDTDS.xls, to vary the parameters and see the effect on the graph of
the updating function and on the long-term behaviour.
68
We make some observations from the examples we have seen so far:
When the graph of f crosses the diagonal from below to above, then x∗ is unstable.
When the graph of f crosses from above to below with positive slope), then x∗ is stable.
We’d love to talk about and calculate the slope of f at x∗ , and compare it to the slope of y = x.11
Let’s now do several more in-depth examples.
Example 3.7.2. (Alcohol absorption dynamics)
So far, we have used a simple model for drug elimination in the bloodstream. In reality, more
factors can come into play. For example, the rate at which a person’s body absorbs or eliminates
alcohol in the bloodstream depends on the alcohol level: the more alcohol in the body, the smaller
the fraction that can be absorbed and eliminated.
Let t be time in hours, and zt the concentration of alcohol in the blood at time t. (The units in
this model are such that z = 7 corresponds to one drink for an average-sized person.)
First model: pure absorption (no new alcohol added to the body). Instead of a constant rate
of absorption R, we imagine there is a function R(z), depending on the concentration z of alcohol
already in the blood, which tells us the fraction of alcohol absorbed over one hour. Then our DTDS
is
zt+1 = zt − R(zt )zt .
What is R(zt )? We do experiments! For example, using empirical data, we establish that for a
certain population a good fit is
(
10
when z ≥ 6
R(z) = 4+z
1 when z < 6.
So in this example, the body can completely eliminate the alcohol in one hour if the concentration
is below 6 at the beginning of the hour, but otherwise, it only eliminates some.
Therefore, we have zt+1 = 0 if zt < 6 and for zt ≥ 6 our model is

10
zt+1 = 1 − zt .
4 + zt
The graph of the updating function


 1 − 10 x(x − 6)
x= if x ≥ 6
f (x) = 4+x x+4
0 when x < 6

is given below.
11
Motivation #3 for Calculus!
69
The graph of the updating function f (x) = x(x−6)

x+4 (for x > 6, and f (x) = 0 for x < 6) of a DTDS
for alcohol elimination, in red. The diagonal y = x is in blue. Zero is the steady state (as you can
see by cobwebbing). The axes are zt and zt+1 .
We only sketched part of the graph; are we sure there isn’t another fixed point way off the graph?12
So we check, for x > 6:
x(x − 6)
x = f (x) ⇔ x = .
x+4
So if x > 6, x 6= 0 so we can divide by x to conclude
x+4=x−6
which has no solution. Therefore there are no fixed points with x > 6 (and only the obvious fixed
point x = 0 in the region x ≤ 6).
Second model: absorption plus drinking. Assume the subject raises their blood alcohol level by
d each hour through drinking. Then we have
zt+1 = zt − R(zt )zt + d

10
= 1− zt + d = f (zt ).
4 + zt
Let’s solve for the steady states, which we call z ∗ , that is, we solve f (z) = z:
z = z − R(z)z + d
⇔ R(z)z = d
10z
⇔ =d
4+z
⇔ 10z = 4d + zd
⇔ (10 − d)z = 4d
4d
⇔ z∗ = .
10 − d
(We call the answer z ∗ ). What a strange answer! As with all Life Science applications, it is helpful
to ask ourselves: when is this positive, and when is it negative?
12
And what would it mean?
70
The graphs of the updating function f (x) = x − R(x)x + d) of a DTDS for alcohol elimination
with drinking, in red; on the left, d < 10 and on the right, d > 10. The diagonal y = x is in blue.
By cobwebbing, we see that the fixed point in the case d < 10 is stable. When d > 10, the fixed
point is negative and is not biologically relevant.
We see that the steady state is only biologically relevant when d < 10, because that’s the only
condition under which z ∗ ≥ 0; and this corresponds to a steady level of alcohol in the bloodstream
in the long run. We conclude that the body remains intoxicated, but at some steady level (which
is not d, but rather 4d/(10 − d), which can be higher or lower than d !).
When d > 10, we see that the fixed point is negative; doing cobwebbing, we see that it is also
unstable and that the concentration of alcohol over time climbs without bound (until death).
Note. You can experiment with this model (changing the parameter d and the initial value) in the
excel file provided : AlcoholDTDS.xls.
Let’s contrast this with another drug: caffeine.
Example 3.7.3. (Caffeine absorption)
Caffeine absorption/elimination is essentially independent of concentration. Let t be time in hours,

ct the concentration of caffeine in the body, and d the amount of caffeine concentration added to
the body each hour. Then the DTDS is
ct+1 = 0.87ct + d.
This is a linear DTDS and its steady state is

d d
c∗ = = .
1 − 0.87 0.13
This steady state always exists (and is biologically relevant). This is a linear DTDS, and the slope
r = 0.87 satisfies |r| < 1, so the fixed point is stable.
Our conclusion: the more coffee one drinks per unit time, the greater the concentration in the body
(as d increases, the steady state increases); but at a constant rate of drinking coffee, the level of
caffeine in the body levels off and stabilizes.
71
Now let’s consider a famous population model: logistic growth. It captures the phenomenon that
reproductive rate can decline with increasing population.
Example 3.7.4. (The logistic equation)
Let t represent time in years and xt the population at time t, normalized so that 1 represents the
maximum population that the resources can sustain. Then the logistic DTDS is
xt+1 = rxt (1 − xt ) for some 0 < r < 4
xt+1
where the per capita growth rate is proportional to 1 − xt with factor r. This means that the
xt
rate of growth declines with the density of the population, due to intraspecific13 competition for
resources.
Let’s solve for the fixed points, then determine their stability using cobwebbing. We have f (x) =
rx(1 − x) so a fixed point satisfies
x = rx(1 − x) ⇔ rx2 + (1 − r)x = 0 ⇔ x(rx + 1 − r) = 0
r−1
so we have two fixed points: x∗ = 0 and x∗ = .
r
Now to set up the cobweb, we need to choose r; and it turns out that the behaviour is very different
depending on the value of r!
Cobweb diagrams for the logistic equation with 0 < r < 1, 1 < r < 2, 2 < r < 3 and 3 < r < 4.
What we conclude is:
13
intraspecific: between individuals of a single species
72
if 0 < r < 1, then 0 is the only nonnegative equilibrium, and it is stable : r is too small and
the population dies out;
if 1 < r < 3, then there is a positive equilibrium, and it is stable, though if r < 2 the
population climbs up to the steady state and if r > 2 it fluctuates around the steady state
until it stabilizes;
if r > 4, then the population fluctuates more and more wildly, in boom and bust cycles, until
it dies out; the equilibrium is unstable.
Compare this to the linear model we analysed in Section 3.6, to notice that the stability is similarly
related to the “slope of the tangent line of f at x∗ ”.
Note. Try it out yourself: LogisticDTDS.xls
The variety of interesting applications of DTDS to the life sciences is huge. Check out more
examples in the textbook. Of particular interest is a sophisticated model of the heartbeat, using a
discontinuous updating function that lets one understand arrhythmia from an electrical viewpoint.
End of lecture # 5
73
Chapter 4
Limits and the path to Calculus
The most important scientific discovery of the second millenium was the discovery of Calculus. It
changed natural philosophers into scientists: able to quantify not only the observations about the
state of matter, but also about its change.
The breakthrough result was the understanding of the concept of a limit. Isaac Newton formulated
limits as a theory of infinitesimals — theoretical “numbers” so small that when you square them
you get zero — but our modern version expresses itself as:
extrapolate what f (x) wants to be at a point x = a from what f (x) is as x

gets infinitely close to a.
Where this becomes Calculus is when you use your understanding of the function at hand to do so.
Note. Weird fact: There is no number that is “right after” or “right next to” 0. If you choose a
number that’s close, like 0.000000001, there are always a ton of numbers that are even closer to 0,
like 0.000000000134345243098. Just like there is no largest number, there is no smallest positive
number. You can always zoom in closer with your microscope!
4.1 Limits of functions: the concept
The goal: characterize the behaviour of a function at a point where it might not be defined.
Example 4.1.1. (Motivating example #1) The average rate of change of a function g over an
interval [x, a] in its domain is the rise over the run (as we’ll discuss in greater detail in Chapter 5).
The rise is g(x) − g(a) and the run is x − a; their quotient is the slope of the so-called secant line
(the line joining the point (a, g(a)) to the point (x, g(x))).
Thinking of a as being a fixed number, we could create a new function
g(x) − g(a)
f (x) = if x 6= a
x−a
74
MAT 1330 : Fall 2020 4.1. LIMITS OF FUNCTIONS: THE CONCEPT
which, as x varies, gives the slope of all the possible secant lines through (a, g(a)).
The instantaneous rate of change of g at the point a would be obtained by choosing x to be equal
to a (but, oops, no: f (a) is illegal, being division by 0), or “right next to a” (but, oops, no: there
is no such x).
Example 4.1.2. (Motivating example #2) In our study of DTDS, we wanted to know the long-
term behaviour of the general solution, which is a function of the variable t (like xt = 4( 13 )t − 6).
“Long-term behaviour” kind of means “when t is ∞” — but ∞ is not a number, so the general
solution function is not defined there; there is no number we can plug into our formula to give the
answer. And if we just choose a large number t, how do we know if xt is correctly predicting the
value from then on?
In both of these motivating examples, the solution is to take the limit of the function, in the first
case as x goes to a (today’s lecture) and in the second case as t goes to ∞ (next lecture).
A first try. Suppose in the setting of Example 4.1.1 we have g(x) = x3 , and a = 1. Then we are
trying to understand the function
x3 − 1
f (x) = x 6= 1,
x−1
as x approaches 1. 1 Using a calculator, we could make the following tables of values (approaching
from the left and from the right):
x 0.9 0.99 0.999 x 1.1 1.01 1.001

f (x) 2.71 2.9701 2.997. . . f (x) 3.31 3.0301 3.003. . .
We infer that the closer x gets to 1, the closer f (x) gets to 3. We formalize this idea in the following
definition.
Definition 4.1.3. We say that the limit of a function f as x approaches a is equal to a number
L, and we write
lim f (x) = L,
x→a
if we can make f (x) as close to L as we wish by choosing x sufficiently close to a.
This seems to be the case in our example above, that is, we want to say
x3 − 1
lim =3
x→1 x − 1
(which is in fact true); but limits can be tricky and sometimes deceiving, so let’s start slowly.
1
We note that when we plug in x = 1 in the formula for f (x), we get 00 — which is pure garbage. There is NO
WAY to pretend that this is a valid number; we call it an indeterminate form.
75
A key observation: independence of f (a). The first thing to notice is that it doesn’t matter
what f (a) is, or even if f is defined at the point a. The limit is inferring what f would like to be
at a, not necessarily what it is.
Example 4.1.4. Consider three functions, whose graphs are drawn below.
The graphs of three different functions, each having the same limit as x goes to 1, despite being
different at the point x = 1.
The first graph represents f1 (x) = x2 . The second represents
x3 − x2
f2 (x) = x 6= 1.
x−1
The third represents (
x2 if x 6= 1;
f3 (x) =
2 if x = 1.
For example, f3 could represent the rule for winnings in a game, where the rules have an exception
that when you hit 1 exactly on the nose, your winnings double.
In all three cases,

lim f (x) = 1,
x→1
because from looking at the graph, we see that we can pick a y-value c as close to 1 as we want,
and go back and find an x value b such that f (b) = c. This is what the definition of the limit asks
us to verify.
Another observation: disagreement is possible. It can happen that the function does not
have a limit as x approaches a.
Example 4.1.5. Consider the function
f (x) = sin(π/x),
whose graph is amazing!
76
The graph of y = sin(π/x).

As x goes towards 0, the function oscillates more and more wildly between 0 and 1, and most
certainly does not settle down to a limit.

(
x if x < 1;
f (x) =
3−x if x > 1.
whose graph is:
The graph of y = f (x).

As we approach 1 from the left, our y-values are approaching 1. But as we approach 1 from the
right, our y-values are approaching 2. Again, the function has not settled on a value, so the limit
does not exist. 2
This last example suggests we might sometimes do well to also consider one-sided limits.
2
To relate this to the definition of the limit: the limit can’t be L = 2 because there are bad x-values like x = 0.99
that are super-close to x = 1 but yet f (x) ' 1 is very far from 2. Similarly, you can exclude L = 1 as the limit, and
in fact you can exclude every value. Hence the limit doesn’t exist.
77
MAT 1330 : Fall 2020 4.2. EVALUATING LIMITS
Definition 4.1.7. We say that the limit of a function f as x approaches a from above (or from
the right) is equal to the number L, and write
lim f (x) = L
x→a+
if we can make f (x) as close to L as we wish be choosing x sufficiently close to a and larger than
a. Similarly, we say that the limit of a function f as x approaches a from below (or from the left)
is equal to the number L, and write
lim f (x) = L
x→a−
if we can make f (x) as close to L as we wish be choosing x sufficiently close to a and smaller than
a.
These limits are called the one-sided limits of the function f at a; when needed we call them
the right-hand limit and left-hand limit, respectively.
One-sided limits are also the correct thing to consider when the function is only defined on one side
of the point a.
√
Example 4.1.8. Consider f (x) = x, whose domain of definition is the interval [0, ∞). In this
case, we cannot ask about limx→0 f (x), or limx→0− f (x), because f is not defined on any number
less than 0. But it is reasonable to ask about the right-hand limit:
√
lim x = 0
x→0+
which we reason from the graph.
So in fact we can put this together into a concrete test.
Proposition 4.1.9 (Existence test). We say that the limit of f as x approaches a exists if the two
one-sided limits exist and are equal, that is,
lim f (x) = lim f (x).

x→a− x→a+
√
So for example, we would say that limx→0 x does not exist, because the left-hand limit is not
defined.
On the other hand, notice that p

lim |x|
x→0
does exist, because it is defined on both sides of 0, and the limit from either side is 0.
4.2 Evaluating limits
We have three options for evaluating limits:
78
MAT 1330 : Fall 2020 4.3. ALGEBRAIC LIMIT LAWS
(a) by calculator. But this can go wrong (see below) and is generally not accepted in this course.
(b) by reading the graph (see above). Great if you know the graph; but otherwise, not an option.
(c) by limit laws and algebraic manipulation. This is the most elegant method, and the only precise
way, to determine limits — and is the method required in this course.
Motivation: Examples of how a calculator can fail.

√
x6 + 25 − 5
f (x) = .
x6
We want to know if lim f (x) exists. If we make a table of values, we get
x→0
x 1 0.1 0.01 0.001

(with my calculator)
f (x) 0.099 0.1 0 0
whereas in fact, as we will show later, the limit is 0.1. The problem here was the round-off error
inherent to calculators.
Example 4.2.2. Consider the function f (x) = sin(π/x) we drew the graph of earlier. If we make
the following table of values
x 1 0.1 0.01 0.001

f (x) 0 0 0 0
we might erroneously think that the limit was 1. But the problem here was our choice of numbers
x approaching 0; we could have chosen a different sequence of x-values to make a table, like
x 3 0.3 0.03 0.003

f (x) 0.5 -0.866 -0.866 -0.866
√
and think the limit was − 3/2 (!!).
The problem with using a calculator is that we can’t possibly test every single way that x approaches
0, and by focussing just on some numbers, we might miss the big picture completely.
4.3 Algebraic limit laws
The following at the indisputable laws of limits.
79
MAT 1330 : Fall 2020 4.3. ALGEBRAIC LIMIT LAWS
Theorem 4.3.1 (Limits laws).
1. lim c = c. (“If the function doesn’t depend on x, neither does its limit.”)
x→a
2. lim x = a. (“If x goes to a, then x goes to a.”)

x→a
For the rest of the laws, suppose you already know that lim f (x) and lim g(x) exist (eg, from
x→a x→a
Laws #1 and #2). Then
3. lim (f (x) + g(x)) = lim f (x) + lim g(x);

x→a x→a x→a

4. lim (f (x)g(x)) = lim f (x) lim g(x) ;
x→a x→a x→a
f (x) limx→a f (x)

5. if lim g(x) 6= 0 then lim = .
x→a x→a g(x) limx→a g(x)
Example 4.3.2. Let’s use the limit laws to determine
lim (x3 − 3x + 5).

x→2
Assuming for the moment that all the limits exist, we could use the sum rule (Law 3) to write
lim (x3 − 3x + 5) = lim x3 + lim (−3x) + lim 5.

x→2 x→2 x→2 x→2
We can apply the product rule (Law 4) to each of x3 = x × x × x and −3x = (−3) × x (and apply
the constant rule (Law 1) to the limit of −3) to get
3
= lim x + (−3) lim x + lim 5.
x→2 x→2 x→2
Finally, applying Laws 1 and 2, we see that yes, in fact, all the limits exist, so we deduce that our
assumption was valid and thus conclude
= (2)3 − 3(2) + 5 = 7.
Therefore limx→2 (x3 − 3x + 5) = 7.
What this example shows is that if your function f (x) is a polynomial function, then
lim f (x) = f (a),

x→a
because you can repeatedly apply the limit laws until you’re just taking the limit of x as x goes to
a (and the constant limit).
Are there other functions that behave this perfectly? Certainly. They are the most magnificent,
wonderful, desirable functions we know.
80
MAT 1330 : Fall 2020 4.4. CONTINUOUS FUNCTIONS
4.4 Continuous functions
Definition 4.4.1. A function f is called continuous at a point a (in its domain) if lim f (x) exists
x→a
and is equal to f (a). If lim f (x) does not exist, or is not equal to f (a), then the function is called
x→a
discontinuous (at a). A function that is continuous at every point in its domain is simply called
continuous.
Example 4.4.2. The function f (x) = x3 − 3x + 5 is continuous, as is any polynomial function,

1
like g(x) = 15x5 − πx2 + .
17
We know the graphs of many key functions already. Consider the next figure, and note from the
graph that each function has the following features: for every point a in the domain, you can make
f (x) be as close to f (a) as you like by choosing x sufficiently close to a (that is, lim f (x) = f (a)).
x→a
f (x) = ex at left; g(x) = ln(x) at right.
f (x) = cos(x) at left, g(x) = |x| at right. Although g(x) has a cusp at x = 0, there is no question
that for x very near zero, |x| is very near zero, and vice-versa.
f (x) = 1/x at left, g(x) = tan(x) at right. There are points missing from their domains, but these
functions are continuous at every point in their domains.
81
Another way to say that a function is continuous: you can draw its graph without lifting your
pencil (accepting that you can reset at a vertical asymptote, for example).
Theorem 4.4.3 (Our favourite functions are continuous). Polynomial, rational, exponential, loga-
rithmic, trigonometric, inverse trigonometric, absolute value, and root functions are continuous at
every point in their domain.
Proof. We know from their graphs that this is true: there are no gaps or jumps (except at those
points we have removed from the domain).
Combining this theorem with our limit laws gives us an incredible array of continuous functions to
work with!
Corollary 4.4.4 (Combining continuous functions).
(a) if f and g are continuous, so is their sum, difference, product and quotient;a
(b) if f and g are continuous, so is their composition f ◦ g.
(c) if f is continuous and g is any function such that lim g(x) = b exists, then
x→a

lim f (g(x)) = f lim g(x) = f (b).
x→a x→a
a
Don’t forget that the domain of the quotient f /g excludes any point where g(x) = 0.
Proof. (This proof may seem very pedantic; I include it here to illuminate how the corollary can
be deduced from the limit laws as easily as we deduced the continuity of polynomial functions. For
any particular function at hand, you could plug it into this proof to be convinced that the function
must be continuous — without needing to be able to sketch the graph of the function! Cool, eh?
(But you can skip ahead to the example if this is not your cup of tea.))
(a) Suppose f and g are functions that are continuous at a common point x = a of their domains.
That means we know that
lim f (x) = f (a) and lim g(x) = g(a).

x→a x→a
Therefore, by Limit Laws 3 and 4, respectively, we know that
lim (f (x) + g(x)) = f (a) + g(a) and lim (f (x)g(x)) = f (a)g(a).

x→a x→a
Therefore the sum f + g and product f g functions are also continuous at x = a.
Since f (x) − g(x) = f (x) + (−1)g(x), we can combine limit laws to deduce
lim (f (x)−g(x)) = lim f (x)+ lim (−1)g(x) = f (a)+ lim (−1) lim g(x) = f (a)+(−1)g(a) = f (a)−g(a).
x→a x→a x→a x→a x→a
82
and therefore the difference function f − g is continuous at a. And finally x = a is in the domain
of f /g if and only if g(a) 6= 0, in which case by Law 5 we have
f (x) f (a)
lim = .
x→a g(x) g(a)
(b) Now suppose that a is in the domain of g, and that g(x) = b, and that furthermore b is in the
domain of f . Then f ◦ g is the composition function and (f ◦ g)(a) = f (g(a)) = f (b). Then by
hypothesis we know that
lim g(x) = g(a) and lim f (x) = f (b).

x→a x→b
But that’s actually a bit confusing, since it’s not the same x on both sides. So let’s use the variable
name y for f (y) instead corresponding to the fact that it’s the “y-values” of g that we plug into
the function f . Mathematically speaking, there is no difference in meaning if I write instead
lim g(x) = g(a) and lim f (y) = f (b).

x→a y→b
But now we can see: if x goes to a, then the continuity of g says that g(x) goes to g(a); but y = g(x)
and b = g(a) so we’re saying y goes to b — and then the continuity of f says that f (y) goes to f (b)
or in other words
lim f (g(x)) = f (g(a)).
x→a
thus f ◦ g is continuous at x = a.
(c) The new thing here is that it is not important if g is even defined at a, or continuous at a;
we just need the innermost limit to exist. In fact you can just repeat the preceding paragraph
replacing g(a) with b and you’ll conclude that
lim f (g(x)) = f (b),

x→a
which is quite handy.
Note. We usually remember part (c) as saying: if f is continuous, then

lim f (g(x)) = f lim g(x) ,
x→a x→a
that is, if you know where the pieces of your function are going in the limit, then you can deduce
where the whole function is going in the limit.
Example 4.4.5. The following functions are continuous:

1
, ln(ex + x2 ), sin(3x5 ),
2−x
since these are formed from continuous functions by a combination of the above constructions.
83
MAT 1330 : Fall 2020 4.5. BACK TO FINDING LIMITS: THE NICE CASE
4.5 Back to finding limits: the nice case
So, what was the point of continuity, again? Ah, yes, it made finding limits easy: if f is continuous
at a then
lim f (x) = f (a),
x→a
that is, we can just substitute x by a and that’s the answer!
Example 4.5.1. Suppose we wanted to know the following limit:
5x4 − 3x + 4
lim .
x→2 x2 − 7
This is a rational function, which is continuous on its domain. We see that 2 is in the domain;
therefore, the limit can be evaluated by direct substitution. That is:
5x4 − 3x + 4 5(2)4 − 3(2) + 4 78

lim 2
= 2
= = −26.
x→2 x −7 (2) − 7 −3
√ √
(But we cannot use the direct substitution rule to determine its limit as x → 7 or x → − 7,
because these values are not in the domain.)
ex−2 − 1
Example 4.5.2. Find lim .
x→a ex − 1
Solution: First note that this function is continuous. The denominator vanishes when ex = 1 or
x = 0. So if a 6= 0, then a is in the domain, and so by direct substitution we may conclude
ex−2 − 1 ea−2 − 1
(a 6= 0) : lim = .
x→a ex − 1 ea − 1
However, if a = 0, then this is not in the domain, and we need other methods to find the limit.
Example 4.5.3. Find the value of the parameter c, if it exists, such that the following function is
continuous: (
x + c if x < 0;
f (x) =
cos(x) if x ≥ 0.
Solution: this question doesn’t at first seem to have anything to do with limits — until you look
at the definition of continuity (Definition 4.4.1). What the question is actually saying is: for what
value of c does lim f (x) = f (0)?
x→0
Well, since the function is defined by different formulas on the left and on the right of 0, we have
to split the problem into left-hand and right-hand limits.
On the right side, we have x > 0 so the formula for f (x) is f (x) = cos(x). Therefore we have
lim f (x) = lim cos(x) = cos(0) = 1,

x→0+ x→0+
84
MAT 1330 : Fall 2020 4.6. FINDING LIMITS: METHODS FOR THE TRICKIER CASES
since cos(x) is continuous everywhere and so direct substitution applies. On the other hand, on
the left side we have x < 0 and so the formula for f (x) is f (x) = x + c. Thus
lim f (x) = lim (x + c) = c,

x→0− x→0−
by our limit laws (or continuity). For the limit to exist, these two one-sided limits have to be
equal, so we deduce that we must choose c = 1. Finally, we have to check that the resulting limit
coincides with f (0). Looking at the formula, you see that you use the second one for x = 0, so
f (0) = cos(0) = 1. Excellent! That’s exactly the limit we got.
Conclusion: this function is continuous at 0 if and only if c = 1.
End of lecture # 6
4.6 Finding limits: methods for the trickier cases
The substitution rule is very helpful in many cases, but the real cases of interest are those for which
the rule cannot be applied (like in motivating example #1) — in particular, when a is not in the
domain. So the strategy in such cases is: use algebraic manipulation to transform your function
into another form where a is in the domain.
x3 − 1
Example 4.6.1. (Simplify) Consider lim . When we plug in x = 1, we get 0/0, which
x→1 x − 1
means we don’t know. In this case, we can use long division to see that
x3 − 1 (x − 1)(x2 + x + 1)
= = x2 + x + 1 except at x = 1.
x−1 x−1
That is, these two functions are identical everywhere except at x = 1, where the former function is
not defined but the latter function is. Therefore their limits as x → 1 are the same; and since 1 is
in the domain of the latter function, we can evaluate by the Direct Substitution Rule:
x3 − 1
lim = lim (x2 + x + 1) = 3.
x→1 x − 1 x→1
This was the example we considered at the beginning of this section.
0
When direct substitution yields the completely illegal expression “ ”, we call the limit an indeter-
0
minate form. It is not a number, and could come out to anything. For example, the indeterminate
form in the preceding example came out to be 3!
Example 4.6.2. (Rationalize) When the problem has a difference of square roots, rationalisation
can transform it into something
√ very different that might be easier to deal with. For example,
x6 + 25 − 5
plugging in x = 0 into gives the indeterminate form 0/0. So we transform this
x6
85
quotient by rationalizing it:

√ √ √ !
x6 + 25 − 5 x6 + 25 − 5 x6 + 25 + 5
lim = lim ·√
x→0 x6 x→0 x6 x6 + 25 + 5
x6 + 25 − 25
= lim √
x→0 x6 ( x6 + 25 + 5)
1
= lim √
x→0 x6 + 25 + 5
1 1
=√ = by direct substitution
0 + 25 + 5 10
3 Nice to get this answer: this was the example where our calculator lied to us.
The key is: if when you plug the value into a quotient and it comes out as “0/0”, then what you
can hope is that there is secretly a common factor between the numerator and denominator that
could be cancelled by algebraic manipulation.
(2 + t)2 − 4
Example 4.6.3. (Simplify: find a common factor) Plugging in t = 0 in gives the
t
nonsense answer 0/0 so to evaluate the following limit we expand and factor:
(2 + t)2 − 4 4 + 4t + t2 − 4 4t + t2
lim = lim = lim = lim(4 + t) = 4.
t→0 t t→0 t t→0 t t→0
z−3
Example 4.6.4. (Simplify: find a common factor) Plugging in z = 3 in gives the
z2 − 9
indeterminate form 0/0 so we look for a common factor:
z−3 z−3 1 1
lim 2
= lim = lim = .
z→3 z − 9 z→3 (z − 3)(z + 3) z→0 z + 3 6
Sometimes, your function is just a mess, and simplifying is about cleaning it up.
3
x − x2 + 4
Example 4.6.5. (Simplify) We can’t even plug in x = 0 in the expression so we just
5 + x1
clean it up:
3
x − x2 + 4 3 − x3 + 4x 3
lim 1 = lim = = 3,
x→0 5+ x x→0 5x + 1 1
where in the second-last equality we evaluated the limit using direct substitution.
You can also use the limit laws to evaluate one-sided limits.
3
Notice how we use the language of limits here: as long as the expression has an “x” in it, we wrote “limx→0 ” in
front of it. We removed the limit symbol exactly when we replaced the x with 0 for our direct substitution.
86
Example 4.6.6. Find lim f (x), if it exists, when

x→0
(
x2 − 4x if x > 0
f (x) =
ex if x ≤ 0.
In this case, we do not have a single formula that is valid on both sides of 0; therefore we have no
choice but to consider the one-sided limits. Namely,
lim f (x) = lim (x2 − 4x) because that’s the formula when x > 0
x→0+ x→0+
=0 by Direct Substitution Rule.
whereas
lim f (x) = lim ex because that’s the formula when x < 0

x→0− x→0−
0
=e =1 by Direct Substitution Rule.
In this case, the left-hand and right-hand limits are different, so lim f (x) does not exist.
x→0
Remember that the absolute value function is one of these piecewise-defined functions in disguise!
Example 4.6.7. (The sign function)
The sign function sgn(x) is

x
sgn(x) = for x 6= 0.
|x|
What is lim sgn(x), if it exists? To figure this out, we have to deal with the absolute value, which
x→0
means we need to separate when its argument is positive or negative.
x x
lim = lim since x > 0 means |x| = x
x→0+ |x| x→0+ x
= lim 1 = 1 by Limit Law 1.
x→0+
whereas
x x
lim = lim since x < 0 means |x| = −x
x→0− |x| x→0 −x
−
= lim (−1) = −1 by Limit Law 1.

x→0−
x
Since the two limits are different, lim does not exist.
x→0 |x|
That’s kind of obvious from the graph! Because in fact, sgn(x) is the function that returns 1 if
x > 0 and −1 if x < 0.
Notice that when we compute lim f (x), we only care about values of x near a. That is a handy
x→a
observation when the function has many strange features.
87
MAT 1330 : Fall 2020 4.7. DISCONTINUOUS FUNCTIONS
Example 4.6.8. Find lim f (x) where

x→−2

−4x − 8 if x < −2;

f (x) = sin(πx) if −2 ≤ x ≤ 2;

−4x + 8 if x > 2.

Again, we use one-sided limits.
lim f (x) = lim (−4x − 8) = −4(−2) − 8 = 0.

x→−2− x→−2−
On the right side, it is not true that f (x) = sin(πx) for all x > −2, but this equality does hold for
all x > −2 and close to −2 (namely, x < 2). So we may still write
lim f (x) = lim sin(πx) = sin(π(−2)) = 0,

x→−2+ x→−2+
by the direct substitution. Since the limit exists and equals f (−2) = 0, f is continuous at x =
−2.
Exercise 4.6.9. Is the function f from the above example continuous at x = 2? Justify your
answer.
Just because a function is piecewise-defined, doesn’t mean you have to use one-sided limits.
Example 4.6.10. Find lim sgn(x).
x→4
In this case, we notice that near 4 (namely, for all x > 0), the function is given by sgn(x) = 1.
Therefore lim sgn(x) = lim 1 = 1.
x→4 x→4
4.7 Discontinuous functions
Continuous functions are the best, but many very interesting functions are discontinuous.
Example 4.7.1. Examples of discontinuous functions.
the extended sign function given by


0 if x = 0;
sgn(x) = x
if x 6= 0.
|x|

is discontinuous at 0 4 because (as we saw) lim sgn(x) does not exist.

x→0
The nearest integer function
bxc = round x down to the nearest integer that’s ≤ x
has a staircase-like graph. It is discontinuous at each integer value of x.

4
no matter how we chose the value of sgn(0), in fact.
88
MAT 1330 : Fall 2020 4.8. LIMITS INVOLVING INFINITY
In our model of drug absorption in the body, with a daily dose, we agreed that “connecting the
dots” of the general solution was not a good model of what actually happened over the course
of the day. Instead, we could model the concentration of drug in the body as a discontinuous
function: decreasing linearly but with a jump discontinuity with each daily dose that makes
the concentration suddenly jump much higher. (See graph below.)
A graph representing the change in level of a drug in the body over time, according to the DTDS
xt+1 = 12 xt + 1 from Example 3.4.1 with a discrete daily dose. This function is naturally
discontinuous, since it models as discrete (not continuous) phenomenon.
Note. The most common occurrence of a discontinuity is at the junctions of a piecewise defined
function. The points of discontinuity, and the points excluded from the domain of f , are often
where some of the most critical features of a function are to be found.
4.8 Limits involving infinity
There are several ways that a limit question might involve ∞. That said, it is really important to
remember:
Note. Infinity is NOT a number. Infinity is a concept. Arithmetic with ∞ does not follow all the
rules of arithmetic. So we can say
∞ + ∞ = ∞, ∞ × ∞ = ∞,
n × ∞ = ∞ if n > 0, n × ∞ = −∞ if n > 0,
and similarly
1
(∞)(−∞) = −∞, = 0.
±∞
But the following expressions make NO SENSE; we call them indeterminate forms (like was 0/0):
∞
∞ − ∞, .
∞
For example, see the illogical mess we get if we “subtract ∞ from both sides of ∞ + ∞ = ∞.”
89
At issue is that functions can approach ∞ at different rates, and that makes all the difference.
4.8.1 Limits that diverge to infinity: vertical asymptotes
Definition 4.8.1. We say that the limit of a function f as x approaches a is infinity (or: diverges
to infinity), and we write
lim f (x) = ∞
x→a
if we can make f (x) as large as we wish by choosing x sufficiently close to a. Similarly, we say the
limit of f as x approaches a is negative infinity (or: diverges to negative infinity), and we write
lim f (x) = −∞
x→a
if we can make f (x) as large a negative value as we wish by choosing x sufficiently close to a.
We also apply this definition to one-sided limits. Geometrically, a one-sided limit which diverges
to ∞ or −∞ corresponds to a vertical asymptote on the graph.
1
Example 4.8.2. f (x) = ; what is lim f (x)?
x x→0
Note. You almost always have to do the one-sided limits separately.
1
First consider lim f (x). If x > 0 but is getting smaller, then gets bigger. In fact, if we want
x→0+ x
f (x) > 10n , then we should take x < 10−n . (Example, to get f (x) > 1000, choose 0 < x < 0.001.)
Thus we can make f (x) arbitrarily large by taking x close enough to 0, so we conclude
1
lim = ∞.
x→0+ x
1
Next consider lim f (x). If x < 0 and is close to zero, then will be very “large negative”. For
x→0− x
example, to get f (x) < −10n , we should choose −10−n < x < 0. So we conclude that
1
lim = −∞.
x→0− x
These results are confirmed when we look at the graph of y = 1/x. (See Section 4.4 if you’ve
forgotten the graph, but then memorize it for future reference.) There is a vertical asymptote at
x = 0, and the graph goes down to −∞ to the left of it, but up to +∞ to the right of it.
In this case, the two one-sided limits are different; this often happens. We say the limit does not
exist, but then we say: it diverges to +∞ on the right and −∞ on the left.
Other rational functions can be treated similarly.
90
3 3
Exercise 4.8.3. Find lim and also lim . Describe your reasoning.
x→5+ x−5 x→5+ 5−x
Note. We abbreviate what we have understood in this example with the following mnemonic:
1 1
= ∞, = −∞,
0+ 0−
1
which means: if f (t) goes to 0 on the positive side then f (t) goes to ∞; and if f (t) goes to 0 on the
1
negative side then goes to −∞.
f (t)
In general, if substitution of x = a gives c/0 (for some number c 6= 0), then you can reason that
the function is going to grow very large as x → a, and can reason whether it is going to ∞, −∞,
or oscillating in between (in which case we just say it diverges).
Example 4.8.4. We have

sin(x)
lim tan(x) = lim = −∞,
x→ π2 + x→ π2 + cos(x)
because when x > π/2, cos(x) < 0 but very close to 0, while the numerator is near the positive
number 1. Therefore the quotient is a large negative number, and so in the limit goes to −∞.
On the other hand, since cos(x) > 0 if x < π/2 and x is close to π/2 (and sin(x) is still near 1 > 0),
we conclude
lim tan(x) = ∞.
x→ π2 −
Again, these results are exactly consistent with what we see when we sketch the graph of y = tan(x)
(see Section 4.4 if needed).
Exercise 4.8.5. Argue that

lim ln(x) = −∞,
x→0+
both from reasoning about what the logarithm function does on very small values of x, and from
describing the graph of y = ln(x).
Note that in this case we cannot take the left-hand limit because ln(x) is not defined for x ≤ 0.
4.8.2 Limits as x goes to ∞: horizontal asymptotes and long-term behaviour of

functions
The kind of limit we encountered in DTDS were those where t → ∞, where in that case the function
was xt (depending on the variable t). We define these limits as follows.
91
Definition 4.8.6. We say that the limit of a function f as x goes to ∞ is equal to the number L,
and write
lim f (x) = L,
x→∞
if we can make f (x) as close to L as we wish by choosing x arbitrarily large. We can define each
of the following expressions in a similar way:
lim f (x) = L, lim f (x) = ∞, lim f (x) = ∞, ....

x→−∞ x→∞ x→−∞
When the limit as x → ∞ or x → −∞ is a number (as opposed to not existing, or diverging to

±∞) then geometrically this corresponds to a horizontal asymptote.
Example 4.8.7.
1
lim =0
x→∞ x
because as x grows huge (like 10400 ), its reciprocal (10−400 ) gets closer and closer to 0, and we can
1
get as close to zero as we like by choosing x sufficiently large.
x
Similarly,
1
lim =0
x→−∞ x
1
as well. We again confirm this by looking at the graph of y = , which has horizontal asymptote
x
y = 0 at both extremes (that is, as x → ∞ and as x → −∞).
1
Note. We could abbreviate what we have understood here by saying that = 0.
±∞
Another favourite function with horizontal asymptote is ex — but only as x → −∞:
lim ex = 0.
x→−∞
Many functions grow without bound as x goes to ∞. For example,
lim xn = ∞, , lim ex = ∞, and lim ln(x) = ∞,

x→∞ x→∞ x→∞
where n > 0 is any positive power of x (even not an integer).

Exercise 4.8.8. Confirm the limits above by sketching the graphs of the corresponding functions.
Exercise 4.8.9. Let α be a real number. What is lim xα ? Hint: your answer will be different for
x→∞
certain different values of α — there are three cases.
Exercise 4.8.10. Let r > 0 be a real number. What is lim rx ? Hint: your answer will be different
x→∞
for certain different values of r — there are three cases.
92
Exercise 4.8.11. We didn’t define one-sided limits when talking about limits to ±∞. Write a little
story about what “ lim f (x)” or “ lim f (x)” should mean; you’ll have to be creative. Result: we
x→∞+ x→∞−
don’t.
You might like to use the Limit Laws — but remember that they only apply to limits that exist.
We can extend them to infinite limits ONLY if it doesn’t lead to any indeterminate forms. So for
example, it’s OK to say that
2
∞ *∞
4x3 =
:
“∞ + ∞” = ∞

lim
3000x +
x→∞
but it is wrong to say

2
∞ *∞
4x3 =
:
− “∞ − ∞” =????

lim
3000x
x→∞
because ∞ − ∞ is an indeterminate form. Here: you’d have to decide which of the two terms goes
to ∞ the “fastest” to see who wins, or if there is a tie.5
A function need not have a limit, or diverge to ∞, as x → ∞. Key examples to remember are the
trigonometric functions. For example,
lim sin(x) does not exist

x→∞
because the value of sin(x) oscillates between −1 and 1 as x goes larger and larger, never settling
to any single value L.
4.8.3 Methods for finding limits as x → ±∞
There are some standard techniques for working out limits as x → ±∞.
Example 4.8.12. (Factor out the dominant term.) We have
1 : −∞ :1
1
lim (3000x2 − 4x3 ) = lim (−4x3 )(−750 + 1) = lim 3
(−4x ) ·
(−750
+ 1) = −∞.
x x
x→∞ x→∞ x→∞

In fact, all polynomials functions of degree at least 1 give limx→∞ = ±∞, where the signs are
determined by the coefficient of the highest degree term.
Example 4.8.13. (Factor numerator and denominator by highest power term in the
denominator)
3x2 + 2x + 1 3 + 2/x + 1/x2 3
lim 2
= lim 2
= = −3
x→∞ −x + 3 x→∞ −1 + 3/x −1
since the other terms in the numerator and the denominator are going to zero. Similarly
3x3 + 2x + 1 3x + 2/x + 1/x2

lim = lim = −∞
x→∞ −x2 + 3 x→∞ −1 + 3/x2
5
We will come back to this question again later in the course, when we have more tools of Calculus to help. For
now, we focus on a few very important methods (that, incidentally, teach us important properties of these functions).
93
because the numerator is growing without bound towards ∞ while the denominator is staying very
close to −1. At the other extreme,
3x2 + 2x + 1 3/x + 2/x2 + 1/x3
lim = lim =0
x→∞ −x3 + 3 x→∞ −1 + 3/x3
since this time the numerator is going to 0 while the denominator is staying constant near 1.
This technique also works with other rapidly-growing functions. What it amounts to is to scale
the numerator and the denominator by a factor which will make the denominator go to a constant,
nonzero value.6
Example 4.8.14. (Scaling with exponentials)
ex − 1 1 − e−x 1
lim x
= lim −x
=
x→∞ 4 + 5e x→∞ 4e +5 5
where in the first step we divided numerator and denominator through by ex , the dominant term.
Note that we could find the final answer by a kind of direct substitution (coming from continuity)
since lim e−x = 0.
x→∞
x−2
Example 4.8.15. (Scaling with radicals) Consider f (x) = √ . As x → ∞, the numerator
3x2 + 4
goes to −∞ and the denominator goes to ∞. You might be tempted to divide numerator and
denominator by x2 , but look what that gives in the denominator:
r r
1p 2 1p 2 1 4
x +4= x +4= +
x2 x4 x2 x4
which goes to 0 as x → −∞. Since the scaled numerator would also go to 0 (check!) we deduce we
scaled by too much: we changed our indeterminate form ∞/∞ into another indeterminate form of
1
type 0/0. So actually, we meant to multiply by in the numerator and denominator instead.
x
A different approach: factor the leading (most important) term out of the denominator, carefully:
x−2
lim f (x) = lim √
x→∞ x→∞ 3x2 + 4
x−2
= lim p
x→∞ x (3 + 4/x2 )
2
x−2 √
= lim p since x2 = |x|
x→∞ |x| 3 + 4/x2
x−2
= lim p since x → ∞ means x > 0
x→∞ x 3 + 4/x2
1 − 2/x x−2 2
= lim p since =1−
x→∞ 3 + 4/x2 x x
1−0
=√ since 1/x → 0
3+0
1
=√
3
6
Alternately, you can scale so as to make the numerator go to a constant, nonzero value instead; the only difference
is that if in that case your denominator goes to zero, you need to decide if it is approaching zero from above or below
to decide if the limit diverges to ∞ or −∞.
94
1
In fact, this is the same thing as multiplying numerator and denominator by .
x
Note. Scale by the net power of the denominator, and watch out for the algebra.
x−2
Example 4.8.16. (Scaling with radicals : careful with negatives) Consider now lim √ .
x→−∞ 3x2 + 4
This is legitimate, as the domain of this function is all of R. The work is almost the same, except,
crucially, as x → −∞, we have x < 0 so |x| = −x. Therefore we have:
x−2 x−2
lim √ = lim p
x→−∞ 3x2 + 4 x→−∞ x2 (3 + 4/x2 )
x−2 √
= lim p since x2 = |x|
x→−∞ |x| 3 + 4/x2
x−2
= lim p since x → −∞ means x < 0
x→−∞ −x 3 + 4/x2
−1 + 2/x x−2 2
= lim p since = −1 +
x→−∞ 3 + 4/x 2 −x x
−1 + 0
=√ since 1/x → 0
3+0
−1
=√
3
We used a nice shortcut in our last few example: We know that 1/x → 0 as x → ∞, so we “plugged
in 0” for 1/x when we evaluated the limit. This is in fact an application of Corollary 4.4.4 for
infinite limits. Let’s do examples.
Example 4.8.17. Since the exponential function is continuous,

lim 1/x
lim e1/x = e x→∞ = e0 = 1.
x→∞
Example 4.8.18. We want to evaluate

lim (ln(x + 3) − ln(x − 1))
x→∞
but plugging in ∞ for x gives us ∞ − ∞, which is meaningless. So instead we simplify (always the
first thing to do) using rules of logarithms, which gives

x+3
= lim ln
x→∞ x−1
and since ln is continuous, we can exchange it with the limit to say

x+3 3 + 3/x
= ln lim = ln lim = ln(3).
x→∞ x − 1 x→∞ 1 − 1/x
End of lecture # 7
95
Chapter 5
The Derivative
One of the two central notions in Calculus is that of the derivative. The derivative of a function
at a point is its instantaneous rate of change at that point; if we know the derivative of f at every
point x, this gives us a new function f 0 (x). Finding this function, and understanding what it tells
us about f , is the object of this chapter.
5.1 The definition
We begin by talking about rates of change.
5.1.1 Rate of change: the idea
Suppose first that we have a linear function like y = 3x + 2. Then the rate of change of y with
respect to x is 3: for each unit of increase of x, we get a 3 unit increase of y, and so forth.
Now consider what happens if we have a nonlinear function, like y = x2 . When x = 1, increasing
x by ∆x = 1 unit increases y from 1 to 4, so the change in y is ∆y = 3; when x = 2, increasing x
∆y
by ∆x = 1 unit increases y from 4 to 9, so ∆y = 5. Point: the rate of change ∆x depends on the
value of x.
∆y
Even worse: the fraction ∆x depends on the value of ∆x. For example, say x = 1. Then
∆y
if ∆x = 1 then we saw ∆y = 3 so ∆x = 3; but
∆y
if ∆x = 0.5 then y goes from 1 to (1.5)2 = 2.25 so ∆y = 1.25, and ∆x = 2.25.
Upshot: we have to be precise about what we mean by “rate of change.”
96
MAT 1330 : Fall 2020 5.1. THE DEFINITION
5.1.2 Average rate of change (over an interval)
What we did above was to say: suppose we start at a particular value x = a and then change to a
new point x = b. We want to know how this changes the y-value, from y = f (a) to y = f (b).
The line joining (a, f (a)) and (b, f (b)) is called a secant line of the curve y = f (x). This image is
taken from https://www.shmoop.com/derivatives/slope-function.html, with thanks.
∆y
The change is often denoted ∆x , where this notation means: the change in y-value divided by the
change in x-value. You can think of it as the slope of the secant line:
∆y f (b) − f (a)
= .
∆x b−a
This leads to the following definition.
Definition 5.1.1. The average rate of change of a function f over an interval [a, b] in its domain
is the rise over the run:
f (b) − f (a)
fav =
b−a
which is the slope of a secant line of the curve.
The average rate of change tells us something about the function. For example:
if f (t) represents the reading on your odometer at time t, then fav is your average speed from
time t = a to time t = b;
if f (t) represents the population of an organism at time t, then fav is the average net growth
rate from time t = a to time t = b;
if f (x) is the amount in mg of a chemical (or drug) absorbed by the lungs when the amount
in each breath is x (which varies as x varies! 1 ), then fav is the average marginal rate of
absorption of the drug as the concentration rises from x = a to x = b.
This information is crude, however: in the case of your odometer, it doesn’t tell you if you drove
within the speed limit during that interval; in the case of chemical absorption, it only gives a kind
of rule of thumb (eg: “when x increases from a to b, you are probably absorbing half of the extra
drug with each breath”). This is not precise enough to do science (or avoid a speeding ticket).
1
This kind of variation is called functional response. In Chemistry, you might call it Michaelis-Menten or Monod
reaction kinetics; it’s the effect that your lungs (or any absorbing substance) reach saturation and can’t absorb more.
See also Absorption Functions in your textbook for varied examples.
97
MAT 1330 : Fall 2020 5.1. THE DEFINITION
Example 5.1.2. Find the average rate of change of f (x) = ex on the interval [0, 1] and on the
interval [1, 2].
Solution: On the interval [0, 1], the average rate of change of f is

f (1) − f (0)
= e1 − e0 = e − 1 ≈ 1.718,
1−0
whereas on the interval [1, 2], the average rate of change of f is
f (2) − f (1)
= e2 − e1 = e(e − 1) ≈ 4.671.
2−1
This says that the graph of f is rising much more steeply from x = 1 to x = 2 than from x = 0 to
x = 1.
5.1.3 Instantaneous rate of change
What we want is the instantaneous rate of change of f at the point x in its domain. In the case
of your odometer, the instantaneous rate of change means the value on your spedometer (your
instantaneous speed); in the case of chemical absorption, the instantaneous rate of change tells you
about the precise sensitivity of your lungs to the uptake of the drug as a function of concentration,
which can let you correctly prescribe an increased dosage that will have the effect you want.
The idea: we know how to find the average rate of change on any interval [a, b]. So now consider
smaller and smaller intervals
[a, b] and [b, a]
for values of b getting closer and closer to a. These correspond to secant lines that are getting
closer and closer to the tangent line of f at a: the line that represents the slope of the curve “at
a”. So we take the limit as b → a.
Definition 5.1.3. Let f be a function defined on an interval around a. Then f is called differen-
tiable at a if the following limit exists:
f (b) − f (a)
lim
b→a b−a
in which case we denote the limit f 0 (a), and call it the derivative of f at a. If f is differentiable at
every point in an interval, then this defines a function f 0 and we say that f is differentiable, and
f 0 is the derivative (function).
So the derivative at a point a is a number f 0 (a); we can also say the derivative at a point x is the
number f 0 (x). If we evaluate the derivative at every point x, then we get a function f 0 (x).
The names of the variables are not relevant, as long as we are consistent. So if we want a formula
for the derivative at x, we might rename the variables so that we have
f (u) − f (x)
f 0 (x) = lim .
u→x u−x
98
MAT 1330 : Fall 2020 5.2. EXAMPLES OF USING THE DEFINITION
It’s often easier to make a new variable h = u − x and then notice that as u → x, we have h → 0.
This gives the equivalent definition:
f (x + h) − f (x)
f 0 (x) = lim .
h→0 h
Sometimes we’ll use ∆x for this difference h, because it represents the difference in the x-value; so
we could write
f (x + ∆x) − f (x)
f 0 (x) = lim .
∆x→0 ∆x
All these equivalent definitions mean (and calculate) the same thing: the slope of the tangent line
to the curve y = f (x) at the point (x, f (x)).
Other notation for the derivative: if y = f (x) we can write

df dy
f 0 (x) == = y0.
dx dx
f (x + h) − f (x) f (u) − f (x) ∆y
Also: when necessary, we call the quotient or or the difference
h u−x ∆x
quotient.
5.2 Examples of using the definition
Mathematical definitions are gold: they tell you exactly what you have to do to calculate the
answer.
Example 5.2.1. Consider the function f (x) = mx + b. This is a straight line with slope m; the
derivative should therefore come out to be equal to m. Let’s check:
f (x + h) − f (x) m(x + h) + b − (mx + b)
f 0 (x) = lim = lim
h→0 h h→0 h
mx + mh + b − mx − b mh
= lim = lim = lim m = m,
h→0 h h→0 h h→0
so yes, indeed, the derivative of a linear function is its slope.
Notice how we had to algebraically manipulate the difference quotient in order to evaluate the
limit. By construction, the difference quotient gives an indeterminate form of type 00 (that’s kind
of the point); but as we saw in the last chapter, we know several ways of simplifying such a fraction
to cancel the hidden common factor in the numerator and denominator and thus reveal the actual
limit.
Example 5.2.2. Consider a quadratic function like f (x) = 5x2 . The slope of this curve varies
with x, so we expect a more interesting answer. Indeed:
f (x + h) − f (x) 5(x + h)2 − 5x2
f 0 (x) = lim = lim
h→0 h h→0 h
2 2
5(x + 2xh + h ) − 5x 2 5x2 + 10xh + 5h2 − 5x2
= lim = lim
h→0 h h→0 h
10xh + 5h2
= lim = lim (10x + 5h) = 10x.
h→0 h h→0
99
MAT 1330 : Fall 2020 5.3. FIVE WAYS NOT TO BE DIFFERENTIABLE AT x
We look at this formula and judge that it makes sense: as x gets larger positive, y = f (x) gets
steeper (larger slope), and when x = 0, the slope of y = 5x2 is 0, and when x is large negative, then
the slope is a large negative number, too. We could also sketch the graph of y = f (x) carefully and
measure the slope of the tangent line at each point to compare.
The number 5 was almost just a decoration in the preceding calculation. We could do the same
thing for a general abstract quadratic function.
Example 5.2.3. Let f (x) = ax2 + bx + c, where a, b, c are parameters (that is, do not vary with
x). Then we compute
a(x + h)2 + b(x + h) + c − ax2 + bx + c

0 f (x + h) − f (x)
f (x) = lim = lim
h→0 h h→0 h
a(x2 + 2xh + h2 ) + bx + bh + c − ax2 − bx − c 2axh + ah2 + bh
= lim = lim
h→0 h h→0 h
= lim (2ax + ah + b) = 2ax + b.
h→0
This gives us the general formula

f 0 (x) = 2ax + b,
which says the derivative of any quadratic function is a linear function. Note that it coincides with
the particular case of a = 5, b = c = 0 of the preceding example.
More complex functions generally take more work to solve for the limit.
√
Example 5.2.4. Let f (x) = x. Then if x > 0, we have:
√ √
0 f (x + h) − f (x) x+h− x
f (x) = lim = lim
h→0 h h→0 h
√ √ √ √
x+h− x x+h+ x x+h−x
= lim ·√ √ = lim √ √
h→0 h x + h + x h→0 h( x + h + x)
h 1 1
= lim √ √ = lim √ √ = √ .
h→0 h( x + h + x) h→0 x+h+ x 2 x
If x = 0, this formula fails — with good reason.
√
First: since x is not defined on both sides of x = 0, we are not allowed to define the derivative of
√
x at 0. The rule is: the function must be defined on both sides of the point; we have to take the
two-sided limit.
Secondly: as x → 0+ , the slope of the curve is increasing to ∞, so the instantaneous rate of change
f 0 ∞”;

is growing without bound. Although it seems reasonable to do so, no, we don’t say “ (0)=
we say “f 0 (0) does not exist”.
5.3 Five ways not to be differentiable at x
There are several ways that a function could fail to be differentiable. Each one indicates that the
behaviour of the function at that point is in some way unpredictable, which means it will be an
important point to understand — it will be the place where the function acts in an interesting way.
100
Reason #1: The graph has a corner or a cusp at x. Consider for example
(
x if x ≥ 0;
f (x) = |x| =
−x if x > 0.
The graph of y = |x|. Its slope is −1 if x < 0 and +1 if x > 0, and the two do not agree at x = 0.
Since this is a piecewise defined function, to calculate the derivative at x = 0 we need to compute
the two one-sided limits. As h → 0+ we have h > 0 so therefore
|0 + h| − |0| h

lim = lim =1
h→0+ h h
h→0+
whereas
|0 + h| − |0| −h
lim = lim = −1.
h→0− h h→0 − h
Since the two one-sided limits disagree, the (two-sided) limit does not exist, so f is not differentiable
at x = 0.
√
Reason #2: The graph is vertical at x. Consider for example f (x) = 3
x = x1/3 , which is
the inverse function to y = x3 . Its graph is below.
The graph of y = x1/3 . Its slope increases to ∞ as you approach 0.
101
Note that f (x) is still a function — it passes the vertical line test — but at the instant it passes
zero its tangent line is vertical. We can see this from the definition as well:
f (0 + h) − f (0) h1/3 1
lim = lim = lim 2/3 = ∞.
h→0 h h→0 h h→0 h
So f 0 (0) does not exist.
Reason #3: The function does not exist on both sides of x. This is part of the requirement
for the derivative, and it reflects the idea that it only makes sense to talk about the instantaneous
rate of change at a point if you can pass through that instant.
√
For example, consider f (x) = x3/2 = x3 , which is only defined for x ≥ 0. Its graph is drawn
below.
The graph of y = x3/2 . The derivative of this function is not defined at 0, because the function is
not defined on both sides of 0.
So f 0 (0) does not exist. In this case, however, we might reasonably as about limx→0 f 0 (x), which
would be answering the (interesting, but not equivalent) question “What is the limit of the slope
√
of f as x approaches 0?”. You can verify that f 0 (x) = 23 x, so this limiting slope (as opposed to
instantaneous slope) is 0, which is vaguely reasonable-looking from the graph.
Reason #4: The function is discontinuous at x. For example, consider a function like
(
1 if x ≥ 0;
sgn(x) =
−1 if x < 0.
This is discontinuous at 0 and we claim that the derivative at 0 does not exist. We look at the
graph, below.
102
The graph of y = sgn(x) in blue. A secant line of the function going through (0, f (0)) but starting
from the left, is drawn in green.
You might feel it is quite reasonable to say that the derivative of f (x) = sgn(x) is 0 at 0, since at
every point except x = 0, we have f 0 (x) = 0, and therefore
lim f 0 (x) = 0.
x→0
But wait: that is a cute fact, and good reasoning, but it is NOT THE DEFINITION OF THE
DERIVATIVE. The derivative of f at 0 is given by
f (h) − f (0)
f 0 (0) = lim .
h→0 h
But since f (h) = 1 if h ≥ 0 and f (h) = −1 if h < 0, we have
f (h) − f (0) f (h) − f (0) −1 − (1)

lim =0 whereas lim = lim = ∞.
h→0+ h h→0− h h→0− h
(This latter limit is measuring the slope of the secant lines that are drawn in green in the figure
above.) Since the one-sided limit diverges, f 0 (0) doesn’t exist.
This is not just a technical oddity: it’s really important. When the curve is discontinuous, we
don’t have a tangent line, so we can’t have a derivative.
This is a general fact, which we can state in two equivalent ways, as follows.
Theorem 5.3.1. Every function f that is differentiable at a point x is automatically continuous

there. Equivalently: if a function f is not continuous at a point x, then it cannot be differentiable
there.
103
Note. As a rule, if you have a piecewise-defined function like

(
g(x) if x ≥ a;
f (x) =
h(x) if x < a
then all we can write, as a first pass, is

(
g 0 (x) if x > a;
f 0 (x) =
h0 (x) if x < a.
That is, we omit the transition point; if you then determine that the function is continuous, you
can see if the derivatives on both sides match as well.a
a
Technically, we are assuming that the derivatives on both sides are also continuous functions here; see upper-year
math courses for examples where this can fail.
Reason #5: The function f is not defined at x. If f is not defined at x, we cannot even
write down the formula for the derivative, because we have no value for f (x). At best, we could be
asking for the limit of the derivative as we approach x (which, as we saw in the previous example,
is just absolutely totally different from asking for the derivative at x itself).
An interesting example might be f (x) = x1 , which is undefined at 0.
The graph of y = 1/x in blue. It is undefined at 0 so we cannot draw a secant line through
(0, f (0)); moreover, any average rate of change across an interval containing 0 is meaningless.
Here, f is not defined at 0 so neither is f 0 . We can encode this observation as follows.
Note. The domain of f 0 can never be bigger than the domain of f , but it can be smaller.
We saw an example where the domain of f 0 is smaller than the domain of f : remember f (x) = x2/3 .
104
MAT 1330 : Fall 2020 5.4. WHAT f 0 TELLS YOU ABOUT f
5.4 What f 0 tells you about f
Since we defined the derivative in terms of the tangent line, there is a nice correspondence between
the graph of a function and the properties of its derivative:
f 0 (x) > 0 ⇔ f is increasing

f 0 (x) < 0 ⇔ f is decreasing
f 0 (x) = 0 ⇔ f has a horizontal tangent
So for example, by looking at the graph of a function f , we can sketch the graph of f 0 :
The graph of f (x) in blue, and the inferred graph of f 0 in red. Where f has a horizontal tangent,
f 0 is 0; where f is increasing, f 0 is positive; where f is decreasing, f 0 is negative.
We can also reverse this process, that is, sketch f from the graph of f 0 ; but the answer will not be
unique. The derivative determines the shape of the function, but not where exactly it is.2 Thus
f (x) and f (x) + c for any constant c will have the same derivative.
End of lecture # 8
2
For example, if you know exactly what speed you were driving at every instant of a day, you could figure out
how far you’d travelled in that day — but not where you were!
105
MAT 1330 : Fall 2020 5.5. DIFFERENTATION RULES: THE BASICS
5.5 Differentation Rules: The basics
Although the definition of the derivative can be used to compute derivatives, this is quite tedious.
Thankfully, over the years since the discovery of the derivative, people have figured out a number
of simple rules that, taken together, can be used to evaluate the derivative of almost function that
is given by a formula. The definition of the derivative then only needs to be used if
one is given a function as a graph, or as a table of data;

or when the function is one that is spliced together from others — that is, where you don’t
have a formula for f .
We state all the rules here and then give some examples of how they are used. This section concludes
with an explanation of why each of the rules is true.
Theorem 5.5.1. The following rules of derivatives hold:
1. (Power rule) If f (x) = xn for some n ∈ R, then f 0 (x) = nxn−1 . In particular, the derivative of
the constant function 1 is 0.
2. (Constant multiple rule) If f is differentiable and c is a constant, then g(x) = cf (x) is differen-
tiable and g 0 (x) = cf 0 (x).
Suppose now that f and g are differentiable functions. Then:
3. (Sum/difference rule) h(x) = f (x) ± g(x) is differentiable and h0 (x) = f 0 (x) ± g 0 (x).
4. (Product rule) h(x) = f (x)g(x) is differentiable and h0 (x) = f 0 (x)g(x) + f (x)g 0 (x).
f (x)
5. (Quotient rule) h(x) = is differentiable and
g(x)
g(x)f 0 (x) − f (x)g 0 (x)

h0 (x) = .
(g(x))2
6. (Chain rule) h(x) = f (g(x)) is differentiable and h0 (x) = f 0 (g(x))g 0 (x).
Note. Please memorize these formulas, in whatever way that works for you. I remember the
quotient rule as : “the bottom times the derivative of the top, minus the top times the derivative
of the bottom, all over the bottom squared.” Others write h(x) = uv and remember vdu − udv over
v2.
Note. For the chain rule: remember that f was evaluated at g(x), so that is where you have to
evaluate f 0 : it’s f 0 (g(x)) NOT f 0 (x) in the chain rule.
We can apply these rules to infer some of the things we proved directly from the definition. For
example:
106
If f (x) = x = x1 then f 0 (x) = 1x0 = 1, by the power rule.
So by the constant multiple rule, if f (x) = mx for some constant m, then f 0 (x) = m.
Finally, by the sum rule, if f (x) = mx + b for some constants m and b, then f 0 (x) =
(mx)0 + (b)0 = m(x)0 + b(1)0 = m(1) + b(0) = m.
Example 5.5.2. If f (x) = x47.5 then f 0 (x) = 47.5x46.5 by the power rule.
√
Example 5.5.3. If f (x) = 3
x then rewrite this as f (x) = x1/3 . So by the power rule,
1 1 1 1
f 0 (x) = x 3 −1 = x−2/3 = 2/3 .
3 3 3x
Notice that the domain of f 0 (x) excludes 0.
1
Example 5.5.4. If f (x) = , then rewrite this as f (x) = x−3 . So by the power rule,
x3
−3
f 0 (x) = −3x−4 = .
x4
These rules can be combined to compute derivatives of all rational functions.

Example 5.5.5. If f (x) = 5x2 + 4x + 2 then by the power, sum and constant multiple rules,
f 0 (x) = 5(2x) + 4(1) + 2(0) = 10x + 4.
Example 5.5.6. If f (x) = (2x + 1)(3x + 4) then by the product rule, we have
f 0 (x) = (2)(3x + 4) + (2x + 1)(3) = 12x + 11.
We could also have gotten this answer by multiplying out f (x) = 6x2 + 11x + 4 and applying the
sum and constant multiple rules.
The product rule extends to as many factors as needed. For example,

d
(f (x)g(x)h(x)) = f 0 (x)g(x)h(x) + f (x)g 0 (x)h(x) + f (x)g(x)h0 (x).
dx
(You can check this by defining j(x) = g(x)h(x) and then using the product rule carefully twice:
once on f (x)j(x) and then, when needed, to find j 0 (x).)
Example 5.5.7. If
x2
f (x) =
x+4
then by the quotient rule, we have
(x + 4)(2x) − x2 (1) x2 + 8x
f 0 (x) = = .
(x + 4)2 (x + 4)2
107
Using the chain rule requires you to be strongly aware of the composition of functions. For example,
here is a table of some compositions of functions:
inner function outer function composition

g(x) f (u) F (x) = f (g(x))
√ √
4x + 1 u 4x + 1
√ √
x 4u + 1 4 x+1
1
1 + x2 1/u
1 + x2
In each case, we apply the chain rule to find the derivative as F 0 (x) = f 0 (g(x))g 0 (x):
√
Consider F (x) = 4x + 1. This is f (g(x)) where g(x) = 4x√ + 1 (so g 0 (x) = 4) and f (u) =
√
u = u1/2 (so f 0 (u) = 12 u−1/2 ). Therefore F (x) = f (g(x)) = 4x + 1 has derivative
1 2
F 0 (x) = (4x + 1)−1/2 · 4 = √ .
2 4x + 1
When we want to work it out in stages, we might write:
d√ d 1 1 1 d 1 1 2
4x + 1 = (4x + 1) 2 = (4x + 1)− 2 (4x + 1) = (4x + 1)− 2 · 4 = √ .
dx dx 2 dx 2 4x + 1
√ √
Consider F (x) = 4 x + 1. We could think of it as f (g(x)) with g(x) = x and f (u) = 4u + 1.
Then g 0 (x) = 21 x−1/2 and f 0 (u) = 4, so
1 2
F 0 (x) = 4 · x−1/2 = √
2 x
as we can see by applying the constant multiple rule directly.
1
Consider F (x) = . We can think of this as f (g(x)) with g(x) = 1 + x2 and f (u) = 1/u.
1 + x2
Then g 0 (x) = 2x and f 0 (u) = −u−2 , so we have
−2x
F 0 (x) = −(1 + x2 )−2 · (2x) =
(1 + x2 )2
as we can check directly with the quotient rule (but this way is faster).
1
Example 5.5.8. If f (x) = then by the quotient rule
3x4 +x
(3x4 + x)0 − 1(12x3 + 1) −12x3 − 1
f 0 (x) = = .
(3x4 + x)2 (3x4 + x)2
Alternately, we could write f (x) = (3x4 + x)−1 and then use the chain rule
−12x3 − 1
f 0 (x) = −(3x4 + x)−2 (12x3 + 1) = ,
(3x4 + x)2
which of course comes out the same.
108
We can apply these rules in surprising ways.
Example 5.5.9. Suppose h(x) is a differentiable function and h0 (x) is its derivative. Now suppose
that
f (x) = (h(x))3 .
Then by the chain rule
f 0 (x) = 3(h(x))2 · h0 (x).
h(x)
Similarly, if g(x) = , then by the quotient rule,
x
xh0 (x) − h(x)
g 0 (x) = .
x2
Note. It is hugely important to practice these rules! Over the coming sections, we will be adding
the rules for differentiating more functions, and combining functions in better ways. These rules
get easier to use the more you practice with them. Like knowing your multiplication tables by
heart, being able to differentiate easily will make everything we do after this point make better
sense and go more easily.
Exercise 5.5.10. Differentiate each of the following functions using the rules in this section. In
each case, consider if there are multiple ways of writing or interpreting the function, so that you
use different rules, and verify that you always get the same answer.
(a) f (x) = x2 (3x + 2) (d) j(x) = (3x(x + 1))2/3 5x1/5 (x + 1)

(g) m(x) =
√ x+2
x2
(b) g(x) = (e) k(x) = 3x1/3
3x + 2
√ √ p √
(c) h(x) = x2 + 4 (f ) `(x) = 5x 2x (h) n(x) = 2 x
The following subsections on why the rules are true are optional, but you are encouraged to read
them. Appreciating where these rules come from and why they are true is an important part of
making the connection between the definition of the derivative and these great rules — and really
helps with remembering them.
5.5.1 Why the power rule is true
Note. The power rule: for any real number n,

d n
x = nxn−1
dx
109
We can explain here why the power rule is true for the case that n is a positive integer. (To prove
it holds for all values of n requires using the exponential and logarithm functions.)
Recall the binomial theorem, which says that

n
n
X n k n−k
(x + h) = x h = xn + nxn−1 h + · · · + nxhn−1 + hn .
k
k=0
In particular, each term is divisible by h except the first; and after the second term, each term is
divisible by h2 .
Thus when we calculate

f (x + h) − f (x) x n
n + nxn−1 h + · · · + nxhn−1 + hn − x
lim = lim

h→0 h h→0 h
we can divide what is left evenly by h, leaving
n(n − 1) n−2
= lim nxn−1 + x h + · · · + nxhn−2 + hn−1 = nxn−1
h→0 2
as we expected.
5.5.2 Why derivative is linear
Note. If a and b are constants, and f (x) and g(x) are differentiable functions, then k(x) = af (x) +
bg(x) is differentiable and its derivative is
k 0 (x) = af 0 (x) + bg 0 (x).
The constant multiple rule and the sum/difference rule can be summarized as the one rule above,
which is mathematically the statement that “differentation is a linear operator on the vector space
of functions.” You get the constant multiple rule by taking b = 0 and you get the sum rule by
taking a = 1 = b and the difference rule by taking a = 1, b = −1.
Let’s prove it is true.

k(x + h) − k(x)
k 0 (x) = lim
h→0 h
af (x + h) + bg(x + h) − (af (x) − bg(x))
= lim
h→0 h
a(f (x + h) − f (x)) + b(g(x + h) − g(x))
= lim
h→0
h

f (x + h) − f (x) g(x + h) − g(x)
= lim a +b
h→0 h h
0 0
= af (x) + bg (x)
where in that last part we used the Limit Laws (Theorem 4.3.1) to evaluate the parts of the limit
separately.
110
5.5.3 Why the product rule is true
Note. The product rule says that

d
(f (x)g(x)) = f 0 (x)g(x) + f (x)g 0 (x).
dx
In particular, the derivative of a product is NOT the product of the derivatives.
The reason for the strange mixed term comes from geometry. Algebraically, the way we have to
look at the difference is as follows:
f (x + h)g(x + h) − f (x)g(x) = f (x + h)g(x + h) − f (x)g(x + h) + f (x)g(x + h) − f (x)g(x)

= (f (x + h) − f (x)) g(x + h) + f (x) (g(x + h) − g(x)) .
So now we can see what happens as we take the limit as h → 0:

f (x + h)g(x + h) − f (x)g(x) f (x + h) − f (x) g(x + h) − g(x)
lim = lim g(x + h) − f (x)
h→0 h h→0 h h

f (x + h) − f (x) g(x + h) − g(x)
= lim g(x + h) − lim f (x)
h→0 h h→0 h

f (x + h) − f (x) g(x + h) − g(x)
= lim lim g(x + h) − f (x) lim
h→0 h h→0 h→0 h
0 0
= f (x)g(x) − f (x)g (x)
where we have used the continuity of g at x to conclude that lim g(x + h) = g(x), and we have
h→0
used the definition of the derivative in the other two cases.
5.5.4 Why the chain rule is true
Note. The chain rule says that

d
f (g(x)) = f 0 (g(x))g 0 (x).
dx
Notice that we differentiate each function once, and multiply the result.
Example 5.5.11. f (h) = distance travelled in km as a function of time h measured in hours;

g(t) = t/60 is the function that converts minutes to hours, f (g(t)) measures distanced travelled in
km as a function of time measure in minutes. So if you travel 10 km/h (so f 0 (h) = 10 for all h)
then you are travelling
d
f (g(t)) = f 0 (g(t))g 0 (t)
dt
km per minute. Since g 0 (t) = 1
60 , this gives 10 km/h × 1
60 h/min = 1
6 km/min.
111
This example was boring because the derivatives were all constants. To see why the rule holds in
general, here is an argument.
So let’s write g(x) = y and for each h, define a new variable k by the formula k = g(x + h) − g(x).
So g(x + h) = y + k for some small value k. Since g is continuous at x, we see that as h → 0, we
also have k → 0. That lets us write
f (g(x + h)) − f (g(x)) f (y + k) − f (y)
lim = lim
h→0 h h→0
h
f (y + k) − f (y) k
= lim ·
h→0 k h

f (y + k) − f (y) g(x + h) − g(x)
= lim lim
k→0 k h→0 h
0 0
= f (y)g (x)
= f 0 (g(x))g 0 (x).
(What we didn’t allow for in this formula was the possibility that g is constant, so that k = 0; but
there are other ways to deduce the same formula even in these weird cases.)
5.5.5 Why the quotient rule is true
Note. The quotient rule says that
g(x)f 0 (x) − f (x)g(x)

d f (x)
= .
dx g(x) g(x)2
Since we already know why the product rule and the chain rule are true, we can use those to prove
the quotient rule (which is shorter than using the definition).
f (x)
First, rewrite F (x) = as F (x) = f (x)(g(x))−1 . By the product rule
g(x)
d d
f (x)(g(x))−1 = f 0 (x)(g(x))−1 + f (x) (g(x))−1 .

dx dx
By the chain rule,
d g 0 (x)
(g(x))−1 = −(g(x))−1−1 g 0 (x) = −

.
dx g(x)2
Therefore, putting this together gives
0
d f (x) 0 −1 g (x)
= f (x)(g(x)) + f (x) −
dx g(x) g(x)2
which over a common denominator gives
f 0 (x) f (x)g 0 (x) f 0 (x)g(x) − f (x)g 0 (x)
= − =
g(x) g(x)2 g(x)2
as required.
112
MAT 1330 : Fall 2020 5.6. DERIVATIVES OF EXPONENTIAL FUNCTIONS
5.6 Derivatives of exponential functions
Consider an exponential function
f (x) = ax , where a > 0.
To find its derivative, lacking any other ideas, we use the definition:
ax+h − ax
h
ah − 1

0 f (x + h) − f (x) x a −1
f (x) = lim = lim = lim a = ax lim = ax f 0 (0).
h→0 h h→0 h h→0 h h→0 h
(5.1)
x
We have gone in a bit of a circle here: we wanted the derivative of f (x) = a at some random point
x, but instead figured out that the derivative will satisfy
f 0 (x) = ax f 0 (0).
Well, at least that is a simpler problem: just figure out the derivative at 0, which should be the
slope of the tangent line to the curve y = f (x) at x = 0.
Graphs of various exponential functions : y = 2x in black, y = 5x in red, and y = ex in green. The

short line segment at in blue has slope exactly 1, and is tangent to the graph of ex .
This leads us to one way to define the natural base e (Euler’s number): e is the base of the
exponential function f (x) = ex for which f 0 (0) = 1. To three decimal places, e ' 2.718. Leonhard
Euler (1707–1783) calculated e to 18 decimal places (!).
So f (x) = ex , the natural exponential, is the one that satisfies, for any x, f 0 (x) = ex , that is,
Note.
d x
e = ex
dx
That’s an incredible property, for a function to be equal to its own derivative; in fact, the only
functions which this property are those of the form Kex for some constant K.
Example 5.6.1. Find the derivative of g(x) = x2 ex .
113
MAT 1330 : Fall 2020 5.6. DERIVATIVES OF EXPONENTIAL FUNCTIONS
Solution: This is a product of two functions, so we use the product rule.

g 0 (x) = (2x)ex + x2 (ex ) = 2xex + x2 ex = (2x + x2 )ex = x(x + 2)ex .
(Tip: this last form is the most useful for finding roots of g 0 (x), but if we needed to compute g 00 (x),
leave it in the second-last form for efficiency.)
Example 5.6.2. The normal distribution describes a standard bell curve, and the basic form is
2
h(x) = e−x .
2
The graph of y = e−x , which describes a standard bell curve in Statistics.
Problem: Find h0 (x).
Solution: This is a composition of two functions, so we apply the chain rule. We have h(x) = f (g(x))
where g(x) = −x2 is the innermost function and f (u) = eu is the outermost function; therefore
2 2
h0 (x) = f 0 (g(x))g 0 (x) = eg(x) g 0 (x) = e−x (−2x) = −2xe−x .
Example 5.6.3. Consider F (x) = xn e−x , where n is some fixed number; this is related to another
important function in Statistics, called the Gamma Distribution. Then by the product rule and
the chain rule
d
F (x) = nxn−1 e−x + xn (−e−x ) = (n − x)xn−1 e−x .
dx
Example 5.6.4. If f (x) = eg(x) then f 0 (x) = eg(x) g 0 (x) by the chain rule. Similarly, if h(x) = g(ex )
then h0 (x) = g 0 (ex )ex = ex g 0 (ex ) by the chain rule.
So we have solved for the derivative of f (x) = ex . What about f (x) = ax , for some other a > 0?
We don’t have a formula to differentiate this directly. Instead, we rewrite ax as an exponential to

base e:
y = ax ⇐⇒ ln(y) = ln(ax ) ⇐⇒ ln(y) = x ln(a) ⇐⇒ eln(y) = ex ln(a) ⇐⇒ y = ex ln(a) .
In other words, we have just proven the following exceptionally useful formula.
114
MAT 1330 : Fall 2020 5.7. DERIVATIVES OF LOGARITHMS
Theorem 5.6.5. For any a > 0,

ax = ex ln(a) .
So instead of tackling f (x) = ax , we rewrite it as f (x) = ex ln(a) and apply the chain rule (remem-
bering that ln(a) is just a number, because a is some fixed number):
f 0 (x) = ex ln(a) ln(a) = ax ln(a),
that is,
Note.
d x
a = ax ln(a).
dx
In fact, we have actually found the mysterious value from the beginning of this section, in (5.1):
the derivative of ax at x = 0! That is, since f 0 (0) = ln(a) we have shown
ah − 1
lim = ln(a).
h→0 h
Exercise 5.6.6. What a surprising limit; we didn’t do any of our usual tricks to find it. Convince
h
yourself it is true, at least for a = 2: evaluate 2 h−1 for smaller and smaller values of h and compare
the answer with ln(2).
3 +1
Example 5.6.7. If f (x) = 2x then by the above and the chain rule we have
3 3
f 0 (x) = 2x +1 ln(2) (3x2 ) = (3 ln(2)x2 )2x +1 .
3 +1)
Alternatively, we’d rewrite f (x) = eln(2)(x to deduce
3 +1)
f 0 (x) = eln(2)(x · 3 ln(2)x2
which is the same thing, if you look carefully.
5.7 Derivatives of logarithms
In the previous section, we found the derivatives of exponential functions, by using the definition
and discovering the number e for which the definition gives a nice limit. Now we want to differentiate
f (x) = ln(x) (or more generally loga (x) for some fixed constant a). We could use the definition,
but now we actually have more tools available, so there is an easier way.
We start with the identity

eln(x) = x.
The left hand side is a function F (x) = eln(x) which is the composition f (g(x)) where f (u) = eu
and g(x) = ln(x). We do not know what g 0 (x) is, but we reason: the left hand side is a function
115
MAT 1330 : Fall 2020 5.7. DERIVATIVES OF LOGARITHMS
equal to the function on the right hand side at every point x, and so their graphs are the same and
their derivatives are the same. So this should give is an equation to find g 0 (x)!
Write g(x) = ln(x) to keep distractions minimal. Then our equation is
eg(x) = x.
The derivative of the left hand side is
f 0 (g(x))g 0 (x) = eg(x) g 0 (x)
whereas the derivative of the right hand side is 1. Therefore, differentiating both sides gives the
new equation
eg(x) g 0 (x) = 1
which says that
1
g 0 (x) = .
eg(x)
Now g(x) = ln(x) so eln(x) = x; thus we conclude:
Note.
d 1
(ln(x)) =
dx x
This is an incredible formula! When we differentiate the natural logarithm, the answer doesn’t
have a logarithm, or even an exponential, in it.
Notice that it fills a gap in our differentiation tables: the derivative of xn is nxn−1 , so there was
no function that gave us as derivative the function x−1 . Now we’ve found it — it’s the natural
logarithm, which you can think of as the slowest growing function that goes to ∞ as x → ∞.
Example 5.7.1. Find the derivative of ln(x2 + 1).
Solution: This is a composition of functions, so we apply the chain rule.

d 1 2x
(ln(x2 + 1)) = 2 (2x) = 2 .
dx x +1 x +1
p
Example 5.7.2. Find the derivative of ln(x) + 4.
Solution: This is a composition of functions so we apply the chain rule:

d p 1 1 1
( ln(x) + 4) = (ln(x) + 4)−1/2 · = p .
dx 2 x 2x ln(x) + 4
116
MAT 1330 : Fall 2020 5.8. DERIVATIVES OF FUNCTIONS LIKE f (x)g(x)
What about other bases?
Let a > 0 be a constant. Let’s find the derivative of
f (x) = loga (x).
We only know the derivative of g(x) = ln(x), so we need to change base. As with exponentials,
there is a standard method:
y = loga (x) ⇐⇒ ay = x ⇐⇒ ln(ay ) = ln(x) ⇐⇒ y ln(a) = ln(x),
which implies the following theorem.
Theorem 5.7.3. Let a > 0. Then

ln(x)
loga (x) = .
ln(a)
Therefore
d d ln(x) 1 1 1
(loga (x)) = = · = ,
dx dx ln(a) ln(a) x x ln(a)
that is
Note.
d 1
(loga (x)) = ,
dx x ln(a)
where we have remember that a is just a constant, so ln(a) is just a number.
Example 5.7.4. If f (x) = log2 (ex + x) then by the above and the chain rule we have
1
f 0 (x) = (ex + 1) .
ln(2)(ex + x)
Alternatively, we rewrite f (x) = ln(ex + x)/ln(2) and then

1 1
f 0 (x) = · · (ex + 1)
ln(2) ex + x
which is the same.
5.8 Derivatives of functions like f (x)g(x)
d n d x
We have seen that x = nxn−1 , if n is a constant. We have also seen that a = ax ln(a), if a
dx dx
is a positive constant. So what is
d x
x ?
dx
117
MAT 1330 : Fall 2020 5.8. DERIVATIVES OF FUNCTIONS LIKE f (x)g(x)
Would it be x(xx−1 ) “power rule” or xx ln(x) “exponential rule”? Answer: NEITHER. You may
only apply a rule under the hypotheses in which it was derived, and in this case, both are wrong.
Instead, we go back and recall how we solved for the derivative of ax : we converted it to base e.
That process will work here as well:
y = xx ⇐⇒ ln(y) = ln(xx ) ⇐⇒ ln(y) = x ln(x) ⇐⇒ y = ex ln(x) .
Fabulous! This is now a function that we can differentiate, using the exponential and chain rules:
d x d x ln(x) d 1
x = e = ex ln(x) (x ln(x)) = ex ln(x) (1 · ln(x) + x · ) = xx (ln(x) + 1).
dx dx dx x
(We used that ex ln(x) = xx to simplify the expression in the last step.)
More generally, we can remember the following formula.
Proposition 5.8.1. f (x)g(x) = eg(x) ln(f (x))
It comes from the identity ab = eb ln(a) , for any a > 0 and any b.
Example 5.8.2. Find the derivative of h(x) = (x2 + 1)ln(x) .
Solution: We use identity ? with a = f (x) = x2 + 1 and b = g(x) = ln(x) to rewrite f as

2 +1)
(x2 + 1)ln(x) = eln(x) ln(x .
2 +1)
(Since we want base e, it isn’t useful to simplify this to xln(x ; notice what a crazy function this
is, with so many equivalent forms.)
Now we use the chain rule:

0 1
ln(x) ln(x2 +1) 2 1
h (x) = e ln(x + 1) + ln(x) 2 (2x)
x x +1
ln(x2 + 1) 2x ln(x)

= (x2 + 1)ln(x) + 2 .
x x +1
Note. Therefore, we have a general rule:
g(x)f 0 (x)

d
f (x)g(x) = f (x)g(x) g 0 (x) ln(f (x)) +
dx f (x)
but this is too ridiculous to memorize; instead we remember the technique of Proposition 5.8.1.
Exercise 5.8.3. You can use this method to go back and prove the power rule for any power n ∈ R,
not just positive integers, by rewriting xn = en ln(x) and simplifying your answer. So the power rule
is a consequence of the derivatives of exponentials!
End of lecture # 9
118
MAT 1330 : Fall 2020 5.9. IMPLICIT DIFFERENTIATION
5.9 Implicit differentiation
A function y = f (x) is a particular kind of relation between the variables x and y — one whose
graph passes the vertical line test. If our variables satisfy a relation like
x2 + y 2 = 9
then the corresponding graph is not a function, and does not pass the vertical line test. However,
we know we can decompose this graph into pieces, such that each piece is a function; in this case,
the graph is the union of the graphs of
p p
y = 9 − x2 and y = − 9 − x2 .
Now here’s the clever idea: if we want to find the slope of the tangent line to the circle at a certain
point, do we really need to solve for y in terms of x? After all, if we know that y is a function of x
near each point, then we can differentiate y with respect to x. For example,
d 2 dy
(y ) = 2y .
dx dx
What this implies is that sometimes we can solve for y 0 without first having to solve for y. (This is
in fact the trick we used when finding the derivative of y = ln(x); now we’ll state it more generally.)
For example, if x2 + y 2 = 9 then near any point we can think of both sides as being functions of x;
since they’re equal, their derivatives are equal. So we have
2x + 2yy 0 = 0
and we can solve for y 0 , to get

x
y0 = − .
y
We check that at various points (x, y) on the circle, this formula does indeed give the correct slope
of the tangent line.
Remark 5.9.1. Notice that our answer for the derivative in this case contains both x and y! That’s
because unlike a function, where x is enough to determine the point on the graph, here we’ll need
both coordinates because there might be more than one point with that x-coordinate.
Example
√ 5.9.2. Find the equation of the tangent line to the curve x2 + y 2 = 9 at the point
(1, − 8).
Solution: this is indeed a point on the curve, and by the preceding, the slope of the tangent line at
that point is √
0 −x −1 1 2
m=y = = √ =√ = .
y − 8 8 4
√ √
A line is y = mx + b; given the point (x, y) = (1, − 8) and the slope m = 42 we solve to get
√
√ 2 √ 1√ 9√
b = y − mx = − 8 − = −2 2 − 2=− 2.
4 4 4
√ √
Thus the equation of the tangent line is y = 42 x − 49 2, which you can judge to be about right
(using a calculator to find out what these values are like).
119
So:
At a theoretical level, implicit differentiation is saying that if y is a differentiable function of

x, then the derivative exists even if we can’t actually find a formula for y in terms of x. As
a consequence of the chain rule, you can therefore differentiate the relation itself to deduce a
formula for y 0 .
At a mechanical level, implicit differentiation is saying that if you have an equation with
variables which depend on x, then you can differentiate both sides with respect to x using
the chain rule — remembering that
Note.
dx dy
=1 but = y0.
dx dx
Example 5.9.3. Find the derivative of y = (x2 + 1)ln(x) .

2 +1)
We could rewrite this as y = eln(x) ln(x but here is an equivalent way; check that it gives you
the same answer.
Apply ln to both sides to get

ln(y) = ln(x) ln(x2 + 1)
and now differentiate with respect to x:
1 0 1 1 ln(x2 + 1) 2x ln(x)
y = ln(x2 + 1) + ln(x) 2 (2x) = + 2
y x x +1 x x +1
whence, upon multiplying the far left and the far right sides by y we get
2

0 2 ln(x) ln(x + 1) 2x ln(x)
y = (x + 1) + 2
x x +1
which is the same as we’d get by the other method, of course.
Applying the logarithm to a complicated equation y = f (x) to make it ln(y) = ln(f (x)) (and
then using the laws of logarithms to simplify the ln(f (x)) term), and then differentiating, is called
logarithmic differentiation. It is a good approach to use when f (x) is deeply ugly and unwieldy.
2
Exercise 5.9.4. Find the derivative of y = (ln(x2 + 1))x using two methods: (1) be rewriting
2 2
the function as y = ex ln(ln(x +1)) ; and (2) by writing ln(y) = x2 ln(ln(x2 + 1)) and differentiating
implicitly.
5.9.1 More examples
Example 5.9.5. Find the equation of the tangent line to the astroid
x2/3 + y 2/3 = 5
120
at the point (1, 8).
Solution: we verify that (1, 8) is indeed a point on the curve. We differentiate both sides with
respect to x:
2 −1/3 2 −1/3 0
x + y y =0
3 3
whence
x−1/3 −y 1/3

0
y = − −1/3 = .
y x
At the point (1, 8), we get y 0 = (−8/1)1/3 = −2. Therefore the equation fo the tangent line is
y − 8 = −2(x − 1) or y = −2x + 10.
Using software to sketch the graph of this curve, we see this answer looks about right.
The graph of x2/3 + y 2/3 = 5. This shape is called an astroid and is cut out by a small circle
rolling along the inside of a large circle: http://mathworld.wolfram.com/Astroid.html.
121
Example 5.9.6. The bifolium has equation
(x2 + y 2 )2 = 4xy 2 .
Three points
√ on the graph are (0, 0), (1, 1)
and (3/4, 3/4). Find the slope of the tan-
gent line, when defined.
Solution: We differentiation both sides with

respect to x to get
2(x2 + y 2 )(2x + 2yy 0 ) = 4y 2 + 4x(2yy 0 ).
Now we isolate and solve for y 0 :
4x(x2 + y 2 ) + 4y(x2 + y 2 )y 0 = 4y 2 + 8xyy 0
whence
(4y(x2 + y 2 ) − 8xy)y 0 = 4y 2 − 4x(x2 + y 2 ),
or
y 2 − x(x2 + y 2 )
y0 = .
y(x2 + y 2 ) − 2xy
At (0, 0), this is 0/0 so undefined. On the graph we see that there’s a huge mess at the origin; of
course there’s no tangent line.
At (1, 1) this is −2/0 so again undefined; but on the graph we see that in fact there’s a vertical
tangent line at this point. (So it’s not a function of x there; rather, x is a function of y.)
√
At (3/4, 3/4), we have √
0 3/16 − (3/4)(3/4) 3
y = √ √ =2
( 3/4)(3/4) − (3 3/8) 3
which looks reasonable from the graph.
You can even find the second derivative this way.
122
MAT 1330 : Fall 2020 5.10. DERIVATIVES OF SINE AND COSINE
Example 5.9.7. Consider x4 = x2 −y 2 . The

set of all points (x, y) satisfying this equation
is a lemniscate (pictured at right). We √
want
0 00 1
to find y and y at the point (− 2 , − 4 ). 3
We differentiate once, to get:
4x3 = 2x − 2yy 0
or y 0 = 1
x(1 − 2x2 ), after simplifying. At
√y
1 3
(− 2 , − 4 ), this is √13 .
Now differentiate the relation 4x3 = 2x −
2yy 0 , noting that dx
d
(yy 0 ) = y 0 y 0 + yy 00 by the
product rule, to get:
12x2 = 2 − 2y 0 y 0 − 2yy 00
√
or y 00 = (1 − (y 0 )2 − 6x2 )/y after simplifying. Pluggin in the point (x, y) = (− 12 , − 3
4 ) and the first
derivative y 0 = √13 at this point yields
4
y 00 = √ .
3 3
We compare with the graph, and agree that the slope is positive and around 0.6 at that point; we
agree that the curve is concave up (see Section 6.2).
5.10 Derivatives of sine and cosine
We go back to the definition to understand the derivative of f (x) = sin(x):
sin(x + h) − sin(x)
f 0 (x) = lim
h→0 h
sin(x) cos(h) + sin(h) cos(x) − sin(x)
= lim
h→0
h
cos(h) − 1 sin(h)
= lim sin(x) + cos(x)
h→0 h h

cos(h) − 1 sin(h)
= sin(x) lim + cos(x) lim .
h→0 h h→0 h
So it all comes down to understanding these two limits — which are exactly the derivatives of
cos(x) and of sin(x) at x = 0.
In fact, we have:3
3
That sin0 (0) = 1 is ONLY TRUE when we measure our angle in RADIANS. If you change the units with which
you measure the x-axis, such as by using degrees, the value of the slope will change (in this case, to the useless and
annoying value π/180). Use RADIANS for Calculus.
123
MAT 1330 : Fall 2020 5.10. DERIVATIVES OF SINE AND COSINE
Note.
cos(h) − 1 sin(h)
cos0 (0) = lim =0 and sin0 (0) = lim =1
h→0 h h→0 h
(see below). Thus sin0 (x) = cos(x). A similar process with the definition of the derivative of cosine
comes down to the same two limits, and after some work we conclude that
Note.
d d
sin(x) = cos(x) and cos(x) = − sin(x).
dx dx
5.10.1 Geometric arguments
Here are some good arguments, using geometry, to explain why the derivative of sin(x) at 0 is 1
and the derivative of cos(x) at 0 is 0.
sin(h)
Why limh→0 h =1 We can make a geometric argument.
Why limh→0 cos(h)−1

h = 0 One way to know this: we know that the graph of y = cos(x) at x = 0
has a horizontal tangent, so we expect the derivative to be 0 there.
Or:
cos(0 + h) − cos(h) cos(h) − 1
cos0 (0) = lim = lim .
h→0 h h→0 h
We know that
sin2 (x) + cos2 (x) = 1.
(This is more correctly written as (sin(x))2 + (cos(x))2 = 1.) Therefore we can differentiate both
sides to give
2(sin(x)) sin0 (x) + 2 cos(x) cos0 (x) = 0.
124
MAT 1330 : Fall 2020 5.11. DERIVATIVES OF OTHER TRIGONOMETRIC FUNCTIONS
Now at this moment we don’t know the derivative of sin(x) everywhere (that depends on knowing
this limit!), but when x = 0, the fact that sin(0) = 0 and cos(0) = 1 is enough to tell us that
0 + 2 cos0 (0) = 0 ⇐⇒ cos0 (0) = 0.
Hence the limit of the difference quotient is 0.
A deeper connection with exponentials As an aside: the same mathematician Euler who
discovered and calculated e also figured out why the derivatives
√ of exponential and trig functions
show the same dependency on their values at 0. If we let i = −1 denote a complex number whose
square is −1, and flesh out what this should mean in terms of functions, we get the identity:
eix = cos(x) + i sin(x).
This formula is only valid if x is measured in radians. So sine and cosine functions are essentially
special cases of exponential functions — if you are willing to work with complex numbers.
That said, although complex numbers are the only way to discuss electricity and magnetism, for
example, their main application in the life sciences is through linear algebra rather than Calculus
(as we’ll see later in MAT1332). So we won’t be pursuing this thought further here.
5.11 Derivatives of other trigonometric functions
The other trigonometric functions are
Note.
sin(x) cos(x) 1 1
tan(x) = , cot(x) = , csc(x) = , sec(x) = .
cos(x) sin(x) sin(x) cos(x)
Therefore we can just apply the quotient rule to deduce their derivatives of those of sin(x) and
cos(x).
d
Example 5.11.1. Find tan(x).
dx
Solution: we apply the quotient rule
d d sin(x) cos(x) cos(x) − sin(x)(− sin(x)) 1
tan(x) = = 2
= = sec2 (x).
dx dx cos(x) cos (x) cos2 (x)
In this example, we applied the key trigonometric identity:
Note.
sin2 (x) + cos2 (x) = 1
125
MAT 1330 : Fall 2020 5.12. INVERSE TRIGONOMETRIC FUNCTIONS
d
Example 5.11.2. Find csc(x).
dx
Solution: we apply the chain rule to csc(x) = (sin(x))−1 . This gives
d − cos(x) cos(x) 1
csc(x) = −(sin(x))−2 cos(x) = =− = − cot(x) csc(x).
dx sin2 (x) sin(x) sin(x)
Exercise 5.11.3. Use the quotient rule and standard identities to find the derivatives of sec(x)
and of cot(x).
It is very useful to memorize the derivatives of the six standard trigonometric functions:
Note.
d d
sin(x) = cos(x), cos(x) = − sin(x)
dx dx
d d
tan(x) = sec2 (x), cot(x) = − csc2 (x)
dx dx
d d
sec(x) = sec(x) tan(x), csc(x) = − csc(x) cot(x)
dx dx
Example 5.11.4. If f (x) = sec(x2 + 1), then
f 0 (x) = sec(x2 + 1) tan(x2 + 1)(2x) = 2x sec(x2 + 1) tan(x2 + 1)
by the chain rule. Notice that sec(x) tan(x) does NOT occur in this expression.
5.12 Inverse trigonometric functions
Inverse trigonometric functions have a special place in Calculus, because their derivatives are such
astonishingly normal-looking functions. This means that inverse trigonometric functions sometimes
pop up when you need to find anti-derivatives (see “integrals”, later in this course) even when there
are no trigonometric functions in sight! (This is a bit how the logarithm ln(x) shows up as an
anti-derivative of 1/x, a rational function.)
We also need inverse trigonmetric functions whenever we want to solve an equation like cos(x) = 0.3
or tan(x) = 17.
5.12.1 The inverse sine function, arcsin(x) or sin−1 (x)
Both notations are acceptable but you must recall that sin−1 (x) means the inverse function of sine,
NOT csc(x), DESPITE the suggestive −1. The “−1” is intended to evoke “inverse function” NOT
126
reciprocal. We write arcsin(x) for the inverse sine function in this course (and in the homework
software Mobius).
So we sketch the graph of y = sin(x); this is not one-to-one; therefore, like we did for y = x2 , we
have to agree on a portion of the domain of y = sin(x) to which we can restrict the function. We
have universally agreed on [−π/2, π/2].
The graph of y = sin(x) on left, with the portion over [−π/2, π/2] on which it is one-to-one
highlighted in green, together with the graph of y = arcsin(x), which is the inverse of sin
restricted to [−π/2, π/2], and thus has domain [−1, 1]. Note the scales on the axes.
So we conclude that:
Note.
sine is one-to-one on domain [−π/2, π/2] with image [−1, 1]; so
arcsine is defined on domain [−1, 1] with image [−π/2, π/2].
They are related by:
y = sin(x) ⇔ x = arcsin(y) for all x ∈ [−π/2, π/2]
Example 5.12.1. Find all x such that sin(x) = 0.
Solution: exactly one solution is given by x = arcsin(0) = 0. To find all the others, we look at the
graph of y = sin(x) and see that kπ, for any integer k, is also a solution.
Example 5.12.2. Find all x such that sin(x) = 12 .
Solution: exactly one solution is given by x = arcsin( 12 ) = π/6. To find all others, we look at the
graph of y = sin(x), or we use the identities:
sin(x + 2πk) = sin(x) for any integer k
and
sin(x) = sin(π − x).
We see that these account for all the possible solutions, so our final answer is:
x = π/6 + 2πk, or x = (π − π/6) + 2πk = 5π/6 + 2πk
for any integer k.
127
The derivative of arcsin(x)
So
y = arcsin(x) ⇐⇒ sin(y) = x and −π/2 ≤ x ≤ π/2.
We apply implicit differentiation the equation sin(y) = x to get
x x
cos(y)y 0 = x ⇐⇒ y 0 = = .
cos(y) cos(arcsin(x))
This looks somewhat hideous: but let’s simplify it.
So arcsin(x) is the angle y, with −π/2 ≤ y ≤ π/2, such that sin(y) = x. Now we know
sin2 (y) + cos2 (y) = 1
and moreover, on −π/2 ≤ y ≤ π/2, cos(y) ≥ 0. Thus
q p
cos(arcsin(x)) = cos(y) = 1 − sin2 (y) = 1 − x2 .
Therefore:
Note.
d 1
(arcsin(x)) = √ for all x ∈ (−1, 1)
dx 1 − x2
A quick reality check: indeed, this function is defined only for −1 < x < 1, which is what you’d
expect for the derivative of arcsin(x). It is, nonetheless, a little shocking that the derivative of this
function isn’t another inverse trig function — but notice that the formula is definitely related to
trigonometry, which is more obvious from the following example.
√
Example 5.12.3. Let’s prove that cos(arcsin(x)) = 1 − x2 using triangles. We pretend that
0 < arcsin(x) < π/2 but the argument can be adapted for −π/2 < arcsin(x) ≤ 0 to give the same
answer as well.
Draw a right angled triangle with base angle

opp
θ = arcsin(x). Then sin(θ) = x = x1 = hyp , so up 1
to similarity, our triangle has opposite of length x x
and hypotenuse of length 1. By the Pythagorean
√
theorem, the adjacent side has length 1 − x2 .
adj
Therefore cos(arcsin(x)) = cos(θ) = hyp = θ
√ √
1−x . 2
1 − x2
Now that we have a formula for the derivative of arcsin(x), we can use it to differentiate any
function involving the arcsine function; we don’t need to rederive it each time.
Example 5.12.4. Let y = arcsin(ex + x2 ). Then
1
y0 = p · (ex + 2x)
1 − (ex + x2 )2
Remark 5.12.5. Notice that when arcsin is the outermost function of a composition, it does not
occur in the derivative.
128
5.12.2 The inverse tangent function arctan(x) = tan−1 (x)
Sketching the graph of y = tan(x), we see that our natural choice for restricting the domain is
similar to that for sin(x); except we must exclude endpoints because of the vertical asymptotes.
The graph of y = tan(x), with a maximal portion selected on which it is one-to-one, together with
the graph of y = arctan(x), which is the inverse of tan restricted to (−π/2, π/2), which has
domain all of R.
Note.
tan is one-to-one on domain (−π/2, π/2) with image R; so
arctan is defined on domain R with image (−π/2, π/2).
y = tan(x) ⇔ x = arctan(y) for all x ∈ (−π/2, π/2)
Example 5.12.6. Find all solutions to tan(x) = 1.
One solution is arctan(1) = π/4. The graph of y = tan(x) is periodic with period π, and so in fact,
we simply have that all solutions are
x = π/4 + πk
for some integer k.
The derivative of arctan(x)
Suppose y = arctan(x); then tan(y) = x. Implicit differentiation gives
sec2 (y)y 0 = 1
129
so
1 1 1
y0 = 2
= 2 =
sec (y) 1 + tan (y) 1 + x2
by the identity
Note.
sec2 (θ) = 1 + tan2 (θ).
So
Note.
d 1
arctan(x) = for all x.
dx 1 + x2
Again, this is reasonable; this function is defined on all of R and goes to 0 as x goes to ±∞, as
you’d expect from the graph of arctan(x).
We can use this formula to differentiate functions involving arctan.

1
Example 5.12.7. Let y = . Then y = (arctan(x))−1 so
arctan(x)
1
y 0 = −(arctan(x))−2 .
1 + x2
(This function, you’ll notice, is completely unrelated to sec2 (x), because 1/ arctan(x) is completely
unrelated to tan(x).)
5.12.3 The remaining inverse trig functions
We can do the same with arccos(x).
The graph of y = cos(x) on left, with the portion over [0, π] on which it is one-to-one highlighted
in green, together with the graph of y = arccos(x), which is the inverse of cos restricted to [0, π],
and thus is defined on domain [−1, 1]. Note the scales on the axes.
130
Note.
cosine is one-to-one on domain [0, π] with image [−1, 1]; so
arccosine is defined on domain [−1, 1] with image [0, π].

y = cos(x) ⇔ x = arccos(y) for all x ∈ [0, π]
√
Example 5.12.8. Find all solutions to cos(x) = 3/2.
√
One solution is arccos( 3/2) = π/6; this is the only solution in the interval [0, π] since cos(x) is
one-to-one there. To find all other solutions, we use the identities:
cos(x + 2πk) = cos(x) for all integers k
and
cos(−x) = cos(x).
We see from the graph that these give us all other solutions. Therefore the answer is
x = π/6 + 2πk, or x = −π/6 + 2πk
for any integer k.
The derivative of arccos(x)
We could repeat the argument used above, but instead we might stare at the graph of arccos(x)
and realize that it has a very similar shape to that of arcsin(x), because they are each portions of
a sinusoidal curve in y. How can we use this?
We can relate the portions of the sine and cosine graphs on which we took the inverse functions.
We know that for 0 ≤ x ≤ π,
cos(x) = sin(π/2 − x)
with −π/2 ≤ π/2 − x ≤ π/2. So
y = arccos(x) ⇒ x = cos(y) = sin(π/2 − y) ⇒ π/2 − y = arcsin(x)
so that
Note.
arccos(x) = π/2 − arcsin(x).
It follows that
Note.
d −1
(arccos(x)) = √
dx 1 − x2
131
MAT 1330 : Fall 2020 5.13. SUMMARY OF KNOWN DERIVATIVES
So that was a bit boring.
Exercise 5.12.9. You might ask about the remaining inverse trigonometric functions (which no
one uses). They can be expressed in terms of the ones we know. For example, if you want to know
about the inverse cosecant function, you would reason as follows:
1
y = arccsc(x) means x = csc(y) = ,
sin(y)
which is valid for y ∈ [−π/2, 0) ∪ (0, π/2]. (Exercise: why?) Then you solve:
1 1
x= ⇐⇒ sin(y) = ⇐⇒ y = arcsin(1/x).
sin(y) x
Therefore arccsc(x) = arcsin(1/x).
(a) Find similar expressions for arcsec(x) and arccot(x), as well as their domains and ranges.
(b) Find the derivatives of arccsc(x), arcsec(x) and arccot(x).
They’re interesting, but since most calculators don’t even have these functions as buttons, it’s kind
of pointless to use them.
5.13 Summary of known derivatives
It is good to remember all our rules of differentiation as the first step of a chain rule. In the
following, u represents some function of x; we are giving a formula to reduce the problem of finding
the derivative of a composite function to finding the derivative of u (i.e. reducing to a smaller
problem). Iterating this gives you the answer.
Note.
d d n du
1=0 u = nun−1 , n 6= 0
dx dx dx
d u du d 1 du
e = eu ln(u) =
dx dx dx u dx
d du d du
sin(u) = cos(u) cos(u) = − sin(u)
dx dx dx dx
d du d du
tan(u) = sec2 (u) cot(u) = − csc2 (u)
dx dx dx dx
d du d du
sec(u) = sec(u) tan(u) csc(u) = − csc(u) cot(u)
dx dx dx dx
d 1 du d 1 du
arcsin(u) = √ arctan(u) =
dx 1 − u2 dx dx 1 + u2 dx
End of lecture # 10
132
Chapter 6
Applications of the Derivative
Suppose we are given a formula for a function that models a phenomenon of interest (eg. drug
absorption over time, population as a function of environmental pollutants). From that formula,
using our pre-Calculus skills, we can deduce its domain and the values at its endpoints. Using
limits, we can often work out its behaviour near gaps in its domain, or towards ±∞.
In the previous chapter, we learned how to differentiate everything. If a function has a formula
that you recognize, then you can differentiate it at most points, and can additionally tell where
there is a cusp (a place where it is continuous but turns sharply, like does y = |x|). That is, we can
now say how the function changes with its input.
The goal in this chapter is to show how powerful differentiation is as a tool for understanding
functions. We begin by interpreting the first and second derivative of a function (Sections 6.1 and
6.2, and then see how to use these to sketch graphs (Section 6.3). We then identify local and
global extrema of functions (Section 6.4). These extrema help us model phenomena and optimize
functions (Section 6.5).
The derivative does much more, too. We also develop a new tool for finding certain kinds of
limits, called L’Hôpital’s rule (Section 6.6). It lets us approximate complex functions with simpler
ones (Section 6.7), come up with criteria for when fixed points of nonlinear DTDS are stable or not
(Section 6.8), and find roots of unsolvable equations to any degree of accuracy we want (Section 6.9).
6.1 The first derivative
One of the key things the derivative can tell us is where the function does something interesting.
We call these critical points.
Definition 6.1.1. A number c in the domain of a function f is a critical point or a critical number
of f if either f 0 (c) = 0 or else f 0 (c) is undefined.
Example 6.1.2. Here are some typical examples:
133
MAT 1330 : Fall 2020 6.1. THE FIRST DERIVATIVE
The function f (x) = 2x + 3 has derivative f 0 (x) = 2 everywhere, so it has no critical points.
The function f (x) = 5x2 has derivative f 0 (x) = 10x, which is 0 at x = 0, so c = 0 is a critical
point of f .
The derivative of the function f (x) = |x| is not defined at x = 0, so 0 is a critical point of f .
In each case, the critical point identifies the existence of an interesting feature of the graph (none,
a minimum, and a cusp, respectively).
What happens between critical points? Well: suppose f 0 (x) is continuous1 . Then f 0 (x) can’t
change sign between critical points — it’s either positive on the whole interval, or negative on the
whole interval.
Note. Recall from Section 5.4 that if f 0 (x) > 0 on an interval, then f is increasing; and if f 0 (x) < 0
on an interval, then f is decreasing.
So if we divide the real line into intervals delimited by critical points, then we can determine the
sign on each interval, and thereby see where f is increasing, decreasing, or has a horizontal tangent.
Example 6.1.3. Consider f (x) = −x3 + 3x2 + 45x − 8. We find
f 0 (x) = −3x2 + 6x + 45 = −3(x2 − 2x − 15) = −3(x − 5)(x + 3)
so the domain of f 0 is R and the only critical points are its roots, where f 0 (x) = 0, that is,
x ∈ {−3, 5}.
We make a table with columns for the critical points and the intervals in the domain that they
define. On each interval and point, we evaluate f 0 (x); then we interpret this in terms of the
behaviour of f (x). We also record the value of the function at each critical point, because this can
help us check our answer makes sense.
x (−∞, −3) −3 (−3, 5) 5 (5, ∞)

sign of −3 - - - - -
sign of (x − 5) - - - 0 +
sign of (x + 3) - 0 + + +
sign of f 0 (x) - 0 + 0 -
behaviour of f (x) decreasing −89 increasing 166 decreasing
The values of f at the critical points are consistent with the function being increasing in between.
We can also check that lim f (x) = ∞ and lim f (x) = −∞ so it’s all consistent, and therefore
x→−∞ x→∞
we have confidence that we didn’t make a mistake.
The graph of f is below; notice how it has a horizontal tangent line exactly at the critical points
we found.
1
This is true of all the functions we consider in this course. There are stranger functions for which f 0 (x) is not
continuous — but then we just add the points of discontinuity to the list of critical points.
134
The graph of y = −x3 + 3x2 + 45x − 8. Notice that the graph is increasing between 0 and 2, and
decreasing otherwise.
Don’t forget to take into account the domain in your analysis!

Example 6.1.4. Find out where f (x) = x−2 is increasing and decreasing.
Solution: we compute f 0 (x) = −2x−3 . This is undefined at x = 0 because you can’t divide by 0
−2
(and −2x−3 = 3 ) but x = 0 isn’t in the domain of f , so is not a critical point. Thus f has no
x
critical points.
DANGER: You might be tempted to say “since f has no critical points, f 0 never changes sign;
since f 0 (1) = −2, f is always decreasing.” But let’s look at the graph of f (x) = 1/x2 to see if this
is true.
The graph of y = x−2 . Notice that the graph is increasing on (−∞, 0) and decreasing on (0, ∞).
What happened? Ah: f 0 changed sign at x = 0, which is where it is undefined. Makes sense.
The correct table to make is the following, which is defined by ALL the points where f 0 is zero or
undefined:
x (−∞, 0) 0 (0, ∞)
sign of −2 - - -
sign of x−3 - undefined +
sign of f 0 (x) + undefined -
behaviour of f (x) increasing undefined decreasing
135
The point: the table should include all points where f could change from increasing to decreasing
and vice versa.
Note. Algorithm for finding where f is increasing and decreasing:
1. Identify all critical points of f .
2. Create a table with all intervals in the domain cut out by critical points. Include critical points,
gaps in the domain, and endpoints (if applicable).
3. On each interval, find the sign of f 0 (x). If f 0 (x) > 0, then f is increasing; if f 0 (x) < 0 then f is
decreasing.
Note. Two ways to find the sign of f 0 (x) on each interval:
write f 0 (x) as a product of factors, and reason out the signs of each of the factors and then
multiply them (as in the table in Example 6.1.3); or
choose a sample point in each interval, and just plug that point in for x in the formula for
f 0 (x). (This works because the sign of f 0 (x) is the same for all points in that interval, by our
construction; but for complicated functions it is tedious and prone to errors.)
Either way is fine. On a test, you must clearly communicate your reasoning, so either work out the
signs of the factors as above or state which sample point you used in each interval, for example.
Some more examples Note that we can use this method to analyse functions for which we do
not yet know anything about the graph.
Example 6.1.5. Find the critical points of f (x) = x2 e−x , and where this function is increasing
and decreasing. This is an example of a Gamma distribution, which is used to model the expected
length of time it will take for something to occur (eg: for three synaptic impulses to occur; for you
to receive four phone calls) when the average time between these random events is known.
Solution: We first calculate the derivative.
f 0 (x) = 2xe−x + x2 e−x (−1) = (2x − x2 )e−x .
The domain of f 0 (x) is R, so the only critical points are where f 0 (x) = 0. Since e−x 6= 0 for any x,
f 0 (x) = 0 ⇐⇒ 2x − x2 = 0 ⇐⇒ x(2 − x) = 0 ⇐⇒ x ∈ {0, 2}.
behaviour of f (x). We also record the value of the function at each critical point, because this can
help us check our answer makes sense.
136
MAT 1330 : Fall 2020 6.2. THE SECOND DERIVATIVE
x (−∞, 0) 0 (0, 2) 2 (2, ∞)

sign of x - 0 + + +
sign of 2 − x + + + 0 -
sign of e−x + + + + +
sign of f 0 (x) - 0 + 0 -
behaviour of f (x) decreasing 0 increasing 4e−4 decreasing
Notice that f (0) < f (2) so it makes sense that the function is increasing there. When f models the
Gamma distribution, we are only interested in x ≥ 0. This table tells us that the probability that
the random event will occur increases until x = 2, and then decreases: meaning it is most likely
the event will occur around x = 2.
1
Example 6.1.6. Consider f (x) = x ln(x). We have f 0 (x) = ln(x) + x = ln(x) + 1. This is
x
undefined if x ≤ 0 (OK, that’s not in the domain of f anyway) and
is zero if ln(x) = −1 (which we solve : x = e−1 ).
So there is just one critical point: x = e−1 .
behavious of f (x):
x (0, e−1 ) e−1 (e−1 , ∞)

0
f (x) negative 0 positive
f (x) decreasing −e−1 increasing
So our function decreases on 0 < x < e−1 until it bottoms out at the point f (e−1 ) = −e−1 , after
which it is increasing; since lim x ln(x) = ∞, there is no horizontal asymptote. (But what happens
x→∞
near x = 0? Good question! Stay tuned in a few classes...)
6.2 The second derivative
So the first derivative tells us where f is increasing, but that isn’t the whole story. Let’s consider
an example for motivation.
x
Example 6.2.1. Consider the two functions f (x) = ex and g(x) = for x ≥ 0. Exercise:
1+x
1
f 0 (x) = ex and g 0 (x) = .
(1 + x)2
137
Note that both functions have positive first derivatives, hence, both are increasing functions on
x ≥ 0. However, their graphs look quite different.
x
The graphs of y = ex in red and y = in blue. Both are increasing on x ≥ 0 but their shapes
1+x
are opposite.
We observe that the slope of f increases with x but the slope of g decreases with x. In other words,
the function f 0 (x) is increasing but the function g 0 (x) is decreasing. Let’s calculate this (for x ≥ 0):
d 0 d 0 −2
[f (x)] = ex > 0, whereas [g (x)] = < 0.
dx dx (1 + x)3
Given a function f (x), then its second derivative of f is the derivative of f 0 (x), which we denote
f 00 (x). The third derivative is the derivative of f 00 (x), and we usually write f (3) (x) rather than
f 000 (x) just because it’s confusing otherwise. We can define the nth derivative of a function, denoted
f (n) (x).
The first derivative of f tells us about the rate of change of f . If f is increasing, then f 0 (x) > 0; if
f is decreasing, then f 0 (x) < 0.
Therefore, the second derivative of f tells us about the rate of change of f 0 : if f 00 (x) > 0 then f 0 is
increasing; if f 00 (x) < 0 then f 0 is decreasing. What does this look like?
Note how the slopes of the tangent lines are decreasing in the figure on the left (as x increases)
and increasing with x on the right. This change in slope of the tangent line forces the curve to
take on a characteristic shape. With thanks to The MathRoom
http://www.the-mathroom.ca/cal1/ and this diagram from
http://www.the-mathroom.ca/cal1/cald4/cald4.htm.
Let’s reason this out further:
If f 00 (x) > 0 on an interval, meaning f 0 is increasing there, then the slope of the tangent line
138
to f is increasing.
On this kind of graph, the tangent line is under the curve, because the tangent line is “pushing
it up”.
We call this shape concave up, and the shape is like that of a cup ∪ (or a smile).
Similarly, if f 0 is decreasing, then f 00 (x) < 0.

Since the slope is decreasing, the tangent line is on top of the curve; it is pushing the curve
down.
We call this shape concave down, and the shape is ∩ like that of a frown . /
Example 6.2.2. Consider f (x) = x3 and g(x) = ln(x).
Since f 0 (x) = 3x2 and f 00 (x) = 6x, we see that f 00 (x) < 0 for x < 0 and f 00 (x) > 0 for x > 0. This
confirms that f 0 (x) = 3x2 is decreasing on (−∞, 0) and increasing on (0, ∞). By the above, it tells
us that f is concave down on (−∞, 0) and concave up on (0, ∞).
On the other hand, g 0 (x) = 1/x = x−1 and g 00 (x) = −x−2 which is negative everywhere on its
domain (which is bigger than the domain of g, but we only care about the domain of g). Thus
g 00 (x) < 0 and so g 0 (x) is decreasing and so g(x) is concave down.
We confirm all these observations by sketching the well-known graphs of these functions.
The graph of f (x) = x3 on the left and the graph of g(x) = ln(x) is on the right. Note their
concavity.
The change from concave up to concave down (or vice versa) is a subtle but important feature on
a graph. We give it a name.
Definition 6.2.3. An inflection point on the graph of y = f (x) is a point (x, y) on the curve where
the concavity of the curve changes: from concave up to concave down, or vice versa.
Example 6.2.4. So (0, 0) is an inflection point of f (x) = x3 because the graph goes from concave
down to concave up there. The graph of g(x) = ln(x) has no inflection points.
Note. Notice how we use the language: each inflection point of f occurs at a critical point of f 0
(that is, a point in the domain where f 00 is either 0 or undefined), but that not every critical point
of f 0 is an inflection point of f . The critical points of f 0 are just “potential inflection points”.
139
Example 6.2.5. The function f (x) = 1/x is concave down on (−∞, 0) and concave up on (0, ∞)
but does not have any inflection points: where it changes sign is not a point on the curve (it’s at
the vertical asymptote).
The function g(x) = x4 is always concave up, even though g 00 (x) = 12x2 is zero at x = 0. This is
an example of a critical point of g 0 (that is, a place where g 00 is zero) that didn’t turn out to be an
inflection point of g (because the concavity ended up staying the same on both sides).
The graphs of f (x) = 1/x at left and g(x) = x4 at right. Neither has an inflection point, for
different reasons.
A good example to know are the power functions. Let p be a real number and consider
f (x) = xp .
We compute
f (x) = xp , f 0 (x) = pxp−1 , f 00 (x) = p(p − 1)xp−2 .
Let’s focus on what they look like when x > 0 (because this is in the common domain of all off
these functions). We make a table:
f (x) f 0 (x) f 00 (x) on x > 0 only, (note x < 0 may

xp pxp−1 p(p − 1)xp−2 not be in the domain, or be different)
if p > 1 (like x3 or x3/2 ) : + + + increasing and concave up
√ √
if 0 < p < 1 (like x or 3 x): + + - increasing and concave down
√
if p < 0 (like 1/x or 1/ x): + - + decreasing and concave up
We can sketch the three kinds of shapes (for the part with x > 0).
Graphs of power functions in the first quadrant: case p < 0 at left (x−1 ); case 0 < p < 1 in the
middle (x1/3 ); case p > 1 at right (x3 ).
140
MAT 1330 : Fall 2020 6.3. GRAPHING FUNCTIONS
6.3 Graphing functions
Our knowledge of limits and derivatives lets us graph functions by identifying the most important
features, rather than by plotting points and connecting the dots. This is important for two reasons:
(a) when you plot points and connect the dots, you may easily miss key features of the graph that
would completely change a cobwebbing, for example;
(b) when you consider functions of more that one variable next term, you definitely can’t plot
points anymore, because the graph is 3D (or more!).
Here are the things to look for when you are graphing a function:
1. The domain and the zeros of f ;
2. The limit of f as x approaches a point not in the domain (eg, asymptotes) as well as lim f (x)
x→∞
and lim f (x);
x→−∞
3. The derivative;
4. The critical points of f (i.e. where f 0 (x) = 0 or undefined) and the intervals of increase or
decrease (i.e. the sign of f 0 between critical points);
5. The second derivative;
6. Where f 00 (x) = 0 or undefined, and the sign of f 00 between these points : to tell us concavity
of f , and to identify any inflection points of the graph.
7. Consistency! Make sure these clues all fit together and make sense. (Otherwise: check your
work.)
Example 6.3.1. We saw that the updating function for one kind of limited population model was
2x
f (x) = .
1+x
Graph this function.
1. The domain is all x 6= −1, and it is zero at x = 0 only.
2. There are four limits to take:

2x 2
lim = lim = 2;
x→−∞ 1 + x x→−∞ 1/x + 1
2x
lim = 2, by the same calculation;
x→∞ 1 + x
2x
lim = ∞;
x→−1− 1 + x
2x
and lim = ∞ (in both cases: we observe the numerator is constant and the denom-
x→−1+ 1 + x
inator goes to zero; then we look at the signs to decide if it is going to +∞ or −∞).
141
(1 + x)(2) − 2x(1) 1
3. f 0 (x) = 2
=
(1 + x) (1 + x)2
4. f 0 (x) > 0 for all x at which it’s defined, so f is increasing on each interval of its domain and
there are no critical points;
5. f 00 (x) = −2(1 + x)−3 ;
6. If x < −1 then (1 + x)3 < 0 so f 00 (x) > 0 and the function is concave up;
if x > −1 then (1 + x)3 > 0 so f 00 (x) < 0 and the function is concave down;
7. Since it never reaches a local extremum, but goes to a horizontal asymptote, it must approach
the asymptote from one side. We sketch the graph by drawing some dotted lines for the
horizontal asymptotes at each side (not shown: just draw them at the extremes of the graph,
since the graph can sometimes cross a horizontal asymptote) and also at (0, 0) (the point we
found in part 1.) then little arrows for the 4 limits; then connect them, respecting concavity:
We then look back at our analysis and check: yes, increasing on each interval of the domain.
No crtical points, inflection points or extrema; just some asymptotes.
Note. The key thing is that all the clues you collect offer multiple confirmations of the same overall
picture; if something seems impossible to draw: check your derivatives!!
Example 6.3.2. The updating function for a population showing the Allee effect was
4x2
f (x) =
1 + x2
Let’s graph it.
1. Domain R, only 0 at 0
142
4x2
2. lim = 4 (divide by highest power)
x→±∞ 1 + x2
3.
(1 + x2 )(8x) − 4x2 (2x) 8x
f 0 (x) = 2 2
=
(1 + x ) (1 + x2 )2
4. The only critical point is 0; f 0 (x) < 0 when x < 0, so f is decreasing there; but f 0 (x) > 0 for
x > 0, so f is increasing there;
5. f 0 (x) = 8x(1 + x2 )−2 so
8 − 24x2
f 00 (x) = 8(1 + x2 )−2 − 16x(1 + x2 )−3 (2x) = 8(1 + x2 ) − 32x2 (1 + x2 )−3 =

(1 + x2 )3
6. f 00 (x) = 0 when 24x2 = 8 or x = ± 1/3. We do a table for concavity:

p
√ √ √ √ √ √
x (−∞, −1/ 3) −1/ 3 ' −0.58 (−1/ 3, 1/ 3) 1/ 3 ' 0.58 (1/ 3, ∞)
00
f (x) : - 0 + 0 -
f (x) is concave down 1 concave up 1 concave down
7. Putting this together : we plot our 3 points and 2 limits and connect the dots (with the
concavity as indicated) to yield
Again, we confirm our result by comparing with our analysis of the first derivative. There
are two inflection points and one local and global minimum.
We’ve written it as seven steps above, but another way to think about it: we glean information
from the formula for f , then from f 0 , and then from f 00 , and put it all together.
143
ln(x)
Example 6.3.3. Sketch the graph of f (x) = .
x
Solution:
Information from f (x):
domain is x > 0.
x-intercept is ln(x) = 0 or x = 1.
ln(x)
lim = −∞ (vertical asymptote)
x→0+ x
ln(x)
lim = 0 (as we’ll see in Example 6.6.6) (horizontal asymptote)
x→∞ x
x(1/x) − ln(x) 1 − ln(x)

Information from f 0 (x) = 2
= :
x x2
f 0 (x) is defined on the entire domain of f
f 0 (x) = 0 when 1 = ln(x) or x = e
so only one critical point: x = e (where f (x) = 1/e ≈ 0.37)
if x < e then f 0 (x) > 0, so f is increasing there
if x > e then f 0 (x) < 0, so f is decreasing there
conclude that there’s a local maximum at x = e (with coordinates (e, 1/e) ≈ (2.72, 0.37),
which must in fact be the absolute maximum since there are no gaps in the domain, and f
never bounces again.
x2 (−1/x) − 2x(1 − ln(x)) −3 + 2 ln(x)

Information from f 00 (x) = 4
= :
x x3
f 00 (x) is defined on the entire domain of f
f 00 (x) = 0 when 3 = 2 ln(x) or x = e3/2 ≈ 4.48 (where f (x) = 1.5e−1.5 ≈ 0.33)
when x < e3/2 , f 00 (x) < 0 so f is concave down there — good, because our critical point in
on this interval and we’d said it was a maximum!
when x > e3/2 , f 00 (x) > 0 so f is concave up there
conclude that we change concavity at (e1.5 , 1.5e−1.5 ) ≈ (4.48, 0.33), so this is an inflection
point
Putting these clues together gives the graph, and our sketch will be quite accurate.
144
MAT 1330 : Fall 2020 6.4. EXTREMA
Graph of y = ln(x)/x. The intercepts, critical point and point of inflection are marked. Notice
the properties of this graph are consistent will all clues from the function and its derivatives.
End of lecture # 11
6.4 Extrema
When a function f comes up in an application, we are particularly interested in its extrema. For
example, if f is the function describing the valuation of a certain stock, then you’re interested in
its highs and lows: they tell you about the volatility and about the range over a period of time.
Definition 6.4.1.
1. A function f has a local maximum at x = c if there is an open interval containing c which is

in the domain of f and such that for every x in that interval, we have f (x) ≤ f (c).
2. A function f has a local minimum at x = c if there is an open interval containing c which is

in the domain of f and such that for every x in that interval, we have f (x) ≥ f (c).
3. A local maximum or a local minimum is also called a local extremum (plural: local extrema).
So a local maximum (respectively, minimum) is a part of the curve that’s not an endpoint, and
where if you zoom in close enough, f (c) is the largest (respectively, the smallest) y-value of your
function in that neighbourhood. It’s an interesting feature of the graph.
Contrast this with global, or absolute, extrema:
145
Definition 6.4.2.
1. A function f attains a global or absolute maximum at a point c in its domain if f (x) ≤ f (c)
for all x in the domain of f .
2. Similarly, we say f attains a global or absolute minimum at a point c in its domain if

f (x) ≥ f (c) for all x in the domain.
3. A global maximum or global minimum is called a global extremum (plural: global extrema).
So a global maximum or minimum is the very largest or smallest y-value of the graph.
Here are some examples to help us think about the difference between “local” and “global” and
also about how not every graph will have local or global extrema.
Example 6.4.3. local extrema but no global ex-
trema. Consider
f (x) = x(x − 1)(x + 1) = x3 − x.
This function does not attain a global maximum or

a global minimum on its domain, since lim x3 = ∞
x→∞
and lim x3 = −∞. It does, however, attain local
x→−∞ √
extrema at x = ±1/ 3.
√
So: y = x3 − x has a local
√ max (at x = −1/ 3),
a local min (at x = 1/ 3), but no global max or Graph of y = x3 − x.
min.
Example 6.4.4. local and global max coincide,

but no local or global min. Consider the bell curve
function
2
y = e−x .
It has a local and global maximum at x = 0, but no
local minimum, and no global minimum (since there
is no value of x that gives y = 0).
2
The graph of y = e−x .
Example 6.4.5. global extrema exist, but no

local extrema.
The graph of y = arcsin(x) has a global maximum at
x = 1, a global minimum at x = −1, but these are at
the endpoints. The function has no local maxima
or minima because local extrema are by definition
features on the interior of the domain (on an open
interval in the domain).
The graph of y = arcsin(x).
146
Example 6.4.6. infinitely many local and

global extrema.
y = cos(x) has infinitely many local and global max-

ima and minima, occuring at each peak x = kπ for
k ∈ Z. The ones with k even are local and global max-
ima, while the ones with k odd are local and global
minima.
The graph of y = cos(x).
Example 6.4.7. Consider sin(x). The graph of sin(x) attains an absolute maximum at x = π2
because sin(π/2) = 1 and for all x ∈ R, sin(x) ≤ 1. This is also a local maximum. Sine also attains
an absolute maximum at x = 5π/2, and at 9π/2 (since it also takes value 1 at those points).
Example 6.4.8. no extrema, local or global.

The graph of y = tan(x) has no local or global ex-
trema: the function is always strictly increasing at
every point in its domain. (If instead we restrict the
domain to a closed interval [a, b] within (−π/2, π/2)
then we’d have global extrema at each endpoint.)
The graph of y = tan(x), which has

no extrema (local or global).
Example 6.4.9. again, no local or global
extrema, but for different reasons.

x
 if 0 ≤ x < 1
f (x) = 0 if 1 ≤ x ≤ 2

x − 3 if 2 < x ≤ 3.

is not continuous, has no local extrema, and

does not attain its global max or min.
That is, there is no value of x that gives
f (x) = 1; we can get as close as we want to
this peak value but we never actually achieve
it. Similarly no x gives f (x) = −1. See
the graph. Therefore there are no global ex- The graph of y = f (x).
trema.
Note. Summary: Not every function attains a global maximum or global minimum, and even if it
does, the x-value where the extremum is attained need not be unique. Not every local extremum
is a global extremum; not every global extremum is a local extremum.
147
6.4.1 Local extrema : First and second derivative tests
Considering these examples, we notice that the local extrema, when there were any, occurred at
the critical points of f (Definition 6.1.1 : a critical point is a number c in the domain of f such
that either f 0 (c) = 0 or f 0 (c) is undefined.) In fact, local extrema can only occur at critical points.
This is a theorem due to Fermat2 .
Theorem 6.4.10. If f has a local extremum at c, and f 0 (c) exists, then f 0 (c) = 0.
Equivalently:
If f is a continuous function, then its local extrema, if any, can only occur at critical points
(but not all critical points give local extrema).
Please note: Fermat’s theorem only goes one way. That is, just because f 0 (c) = 0 you cannot
deduce that c gives a local extremum. Think of f (x) = x3 , which has a critical point at 0 but no
extremum there.
Knowing that local extrema occur at critical points, we can now use what we learned in previous
sections to classify all local extrema (without needing to know the graph).
6.4.2 Methods for finding local extrema
Proposition 6.4.11 (First Derivative Test). Suppose c is a critical point of f and f is continuous
at c. Then if in a small interval on both sides of c we have
if f 0 (x) < 0 for x < c and f 0 (x) > 0 for x > c, then (c, f (c)) is a local minimum of f ;
if f 0 (x) > 0 for x < c and f 0 (x) < 0 for x > c, then (c, f (c)) is a local maximum of f .
This just says that you increase to a maximum and then decrease afterwards, and vice-versa for a
minimum.
Example 6.4.12. We saw in Example 6.1.6 that x = e−1 is a critical point of the function f (x) =
x ln(x), since f 0 (x) = ln(x)+1 is zero there. When x ∈ (0, e−1 ), we had f 0 (x) < 0 and when x > e−1
we have f 0 (x) > 0, therefore by the first derivative test, the point (e−1 , f (e−1 )) = (e−1 , −e−1 ) is a
local minimum.
Thinking about concavity gives us a second test that is simpler, but please note that sometimes it
doesn’t apply or it gives no answer:
2
Pierre Fermat had a lot of theorems; his most famous was called Fermat’s Last Theorem, about solutions to
equations like x3 + y 3 = z 3 . He’d written the theorem in the margin of a book in 1637, with a little note to the effect
that he had a “marvelous proof, but the margin is too small to contain it.” No one ever found the proof, but Andrew
Wiles famously finally proved the theorem by other means in 1995.
148
Proposition 6.4.13 (Second Derivative Test). Suppose c is a critical point of f such that f 0 (c) = 0.
If f 00 (c) > 0, then (c, f (c)) is a local minimum of f .
If f 00 (c) < 0, then (c, f (c)) is a local maximum of f .
If f 00 (c) = 0, then anything can happen.
This is saying that if your curve is concave up at a critical point, then it must be a local minimum;
and if it is concave down, then it must be a local maximum. Nice!
Example 6.4.14. Find and classify the local extrema of f (x) = x2 e−x .
Solution: We have
f 0 (x) = 2xe−x + x2 e−x (−1) = (2x − x2 )e−x ,
which gives us two critical points: x = 0 and x = 2. To use the second derivative test, we first
compute the second derivative:
f 00 (x) = (2 − 2x)e−x + (2x − x2 )e−x (−1) = (x2 − 4x + 2)e−x .
Now we put the critical points into f 00 . Since f 00 (0) = 2 > 0, x = 0 gives a local minimum. Since
f 00 (2) = 4 − 8 + 2 = −2 < 0, x = 2 gives a local maximum. (Compare with Example 6.1.5, using
the First Derivative Test.)
Example 6.4.15. Consider f (x) = x2 . Then f 0 (x) = 2x so 0 is a critical point; since f 00 (x) = 2 > 0,
by the second derivative test, 0 is a local minimum.
Example 6.4.16. Consider f (x) = sxp where p and s are some real numbers such that p > 2.
Then the domain of f is R and f 0 (x) = spxp−1 so 0 is a critical point; but f 00 (x) = sp(p − 1)xp−2 so
f 00 (0) = 0, which tells us nothing, according to Proposition 6.4.13. In fact, anything can happen:
s = 1, p = 3 gives f (x) = x3 , which does not have a local extremum at (0, 0);
s = 1, p = 4 gives f (x) = x4 , which has a local minimum at (0, 0);
s = −1, p = 4 gives f (x) = −x4 , which has a local maximum at (0, 0).
So: the second derivative test fails; we have to use the first derivative test for these functions.
Example 6.4.17. Consider f (x) = |x|, which is the function

(
x if x ≥ 0
f (x) =
−x if x < 0
so its derivative is
(
1 if x > 0 (note that we exclude x = 0)
f 0 (x) =
−1 if x < 0.
We know from the graph that f is not differentiable at 0. Thus the only critical point is x = 0.
We cannot apply the second derivative test, but by the first derivative test we infer this is a local
minimum.
149
Example 6.4.18. The function f (x) = x(x + 1)(x − 1) = x3 − x has f 0 (x) = 3x2 − 1 so two critical
1
points where f 0 (x) = 0 or 3x2 = 1 or x = ± √ . We have f 00 (x) = 6x so by the second derivative
√ 3 √
test, x = 1/ 3 gives a local minimum and x = −1/ 3 gives a local maximum. This is consistent
with our sketch of this cubic function (see Example 6.4.3).
√
So y = x(x − 1)(x + 1) has a local maximum at x = − √13 , where f (x) = 1/ 3 − 1/3 ' 0.244. But
you can find many points x such that f (x) > 0.244, like x = 10 for which f (10) = 990 — so this
local maximum is not a global maximum.
√ 1
Example 6.4.19. The function f (x) = x is defined only on x ≥ 0. We have that f 0 (x) = √ ,
2 x
which is not defined at 0, so 0 is a critical point. Since f 0 (0) is undefined, the second derivative test
does not apply. Since f 0 (x) is not defined for x < 0, the first derivative test does not apply either.
Is this a problem? No: (0, 0) can’t be a local extremum because 0 is an endpoint of the domain:
the function is not defined on both sides of c = 0. (As it happens, (0, 0) is the global minimum of
f .)
6.4.3 Global Extrema and the Extreme Value Theorem
In the previous section, we determined that local minima can only occur at critical points, and
learned two tests to classify them. In this section we classify the global extrema.
The first question to ask: When do global (or absolute) extrema exist? Well, there is one case,
which is very common in practice, where we are in fact guaranteed that both an absolute maximum
and an absolute minimum exist.
Theorem 6.4.20 (Extreme Value Theorem). Suppose that f is a continuous function. Then for
any closed interval [a, b] in the domain of f , f attains both a global maximum and a global minimum
on [a, b].
Example 6.4.21. The function f (x) = x2 attains a global maximum and a global minimum on
any interval of the form [a, b], by the Extreme Value Theorem. For example,
if [a, b] = [−10, 15] then the global maximum is (15, 225) and the global minimum is (0, 0)
because these are points on the graph and 0 ≤ f (x) ≤ 225 for all x ∈ [−10, 15];
if [a, b] = [−2, −1] then the global maximum is at (−2, 4) and the global minimum at (−1, 1),
because these are points on the graph and 1 ≤ f (x) ≤ 4 for all x ∈ [−2, −1].
What the theorem amounts to saying is that if you start at a point (a, f (a)) and draw the graph of
a continuous function (with no gaps or breaks) until (b, f (b), then you necessarily hit a maximum
and a minimum value: no asymptotes, no jumps, no question about “close but not quite”.
150
This seems kind of obvious! But it’s helpful to consider why we needed “continuous” and “closed
interval” to make the theorem work, by looking back on our examples.
Example 6.4.22. In Example 6.4.9 we considered


x
 if 0 ≤ x < 1
f (x) = 0 if 1 ≤ x ≤ 2

x − 3 if 2 < x ≤ 3.

It is defined on a closed interval [0, 3] but is not continuous so the Extreme Value Theorem does
not apply. In fact, −1 < f (x) < 1 on this interval and we can get arbitrarily close to these extremes
— but there is no value of x for which f (x) = 1 or f (x) = −1 so f does not attain an absolute
max or an absolute min on [0, 3].
Example 6.4.23. Consider f (x) = 2x on the interval (1, 3). This interval is not closed so the
Extreme Value Theorem does not apply. In fact, we can get arbitrarily close to 2 and to 6, but
there is no value of x such that f (x) = 2 or f (x) = 6, so f does not attain its max or min.
This failure to attain a maximum or a minimum can also happen when the interval is an open
unbounded interval.
Example 6.4.24. Consider f (x) = 1/x. It is defined and continous on the open unbounded
interval (0, ∞) but is always strictly decreasing there, and never attains either a global maximum
or a global minimum.
The Extreme Value Theorem doesn’t say that if f is discontinuous, or if f is defined on some other
kind of domain, that it doesn’t have global extrema. It’s just removing the guarantee in that case.
So those were some examples of where the Extreme Value Theorem doesn’t apply. Let’s now see
how to use it when it does apply.
6.4.4 Method for finding global extrema
We can’t evaluate the function at all of the points in its domain; there are infinitely many. But we
can use Calculus to reduce the question to consideration of a few points.
Note. Principle: If f is continuous on a closed interval, then its absolute maximum and absolute
minimum must occur at critical points or endpoints of the interval.
So suppose f is continuous and the domain is a closed interval [a, b]:
1. Find all critical points c of f and evaluate f (c) at each one.
2. Evaluate f at each boundary point, that is, calculate f (a) and f (b).
3. The largest is the global maximum of f , and the smallest is the global minimum of f .
151
As a perk: in the process, you are sometimes able to deduce what critical points are local extrema
(not always, but often).
Example 6.4.25. Consider f (x) = |x| on the interval −1 ≤ x ≤ 2. Find its global maximum and
minimum on this domain.
Solution:
1. The only critical point is x = 0 and f (0) = 0.
2. The endpoints are −1 and 2. f (−1) = | − 1| = 1 and f (2) = |2| = 2.
3. Comparing these values, and knowing that the function is continuous between these points,
we conclude that f attains a global max at x = 2 and a global min at x = 0.
We can also therefore deduce that there is a local minimum at 0 but no local maximum.
Now, suppose f is not continuous, or the interval is not closed. Then there is no guarantee that a
global maximum or minimum exists. Nevertheless, the method is similar:
1. Find all critical points c of f and evaluate f (c) at each one.
2. Evaluate f at each boundary point, and at each point of discontinuity.
3. Evaluate the left and right hand limits (as appropriate) as x approaches the boundary (if the
boundary is not in the domain) or the points of discontinuity.
4. If there is a largest value (that is, you didn’t get ∞ as a limit) and that value is actually a
point on the curve (c, f (c)) (that is, it’s not just a limit you never reach), then you have a
global maximum.
5. If there is a smallest value (that is, you didn’t get −∞ as one of your limits) and that value
is actually attained for some point on the graph (c, f (c)) (that is, it’s not just a limit you
never reach), then you have a global minimum.
√
Example 6.4.26. Let f (x) = xe−x . Find all local and global maxima and minima.
Solution: By default, we use the domain of definition, which is [0, ∞).
1. We find the derivative

√

0 1 1 − 2x
f (x) = √ − x e−x = √ e−x .
2 x 2 x
The critical points are x = 0 (since f 0 (0) is undefined) and x = 1

2 (where f 0 is zero). We
evaluate: √ √
1
f (0) = 0, f ( ) = e−1/ 2 / 2 > 0.
2
2. The endpoint is 0, and f (0) = 0.
152
√
3. In this case, we don’t necessarily know lim xe−x (wait for L’Hôpital’s rule in Section 6.6),
x→∞
so we reason otherwise: when x > 1/2 we have f 0 (x) < 0 so f decreases after x = 21 , but from
from the formula we deduce that f (x) ≥ 0 for all x ≥ 0. So the limit is between 0 and 12 (in
fact, it’s 0).
4. Therefore: f attains a global maximum at x = 12 . Moreover, since it has a global maximum

at a critical point that is not an endpoint, that’s a local maximum.
5. Also, f has a global minimum at (0, 0). It has no local minima, since the minimum is at an
endpoint.
√
x
Example 6.4.27. Consider f (x) = . Find its local and global extrema.
1+x
Solution: The domain of definition is [0, ∞).
1. The derivative is
1 1 √ 1−x
f 0 (x) = ( √ (1 + x) − x) = √ ,
(1 + x)2 2 x 2 x(1 + x)2
so the critical points are x = −1 (not in the domain), x = 0 (an endpoint) and x = 1 (where
f 0 (x) = 0). We compute f (1) = 21 .
2. The only endpoint is 0 and f (0) = 0.
3. In the other extreme, we take a limit

√
x x−1/2 0
lim = lim −1 = = 0.
x→∞ 1 + x x→∞ x +1 1
4. Comparing values, we see that f has a global max at x = 1 and a global min at x = 0.
We also infer that f has a local max at 1 but no local min. We could also infer this from other
tests, as follows.
Note that we could apply the first derivative test to the critical point 1, because calculating f 00 (x)
looks hard (making the second derivative test unappealing). In fact, we can reason as follows:
1+x √
f 0 (x) > 0 √ > x
⇐⇒
2 x
√ √
and since x > 0, we can safely multiply both sides by x to get
1+x √
√ > x ⇐⇒ 1 + x > 2x ⇐⇒ x < 1.
2 x
Thus f is increasing before x = 1 and decreasing after, so this is a local max. In fact it must also
be a global max.
153
MAT 1330 : Fall 2020 6.5. OPTIMIZATION
These examples came out nicely; answering about the global extrema told us about the local
extrema as well. So effectively there are several methods, not all perfect, that you can use to verify
if a critical point is a local extremum:
first derivative test;

second derivative test;
evaluating f on all critical points, endpoints and finding all limits and deducing which are
local extrema by connecting the dots.
Note. All methods are acceptable but you must share your logical reasoning — explain why a
point is irrefutably a local (or global) maximum or minimum (with reference to a test, the Extreme
Value Theorem, or how the function must behave, given that you have determined all of the critical
points etc.).
End of lecture # 12
6.5 Optimization
Finding extreme values of functions is the goal of optimization. Respecting our constraints and our
goals, we want the highest yield, the minimum cost, the highest temperature or the lowest dosage.
These are all the maximum or the minimum of a function.
6.5.1 Maximization with trade-offs
The yield of crop in agriculture changes with the amount of fertilizer (for example, nitrogen)
applied. When nitrogen levels in the soil are low, then adding some nitrogen will greatly increase
yield. When nitrogen levels are already very high, however, adding more might decrease yield.
Assume that yield Y as a function of the amount of nitrogen in the soil N is given by the equation
N
Y (N ) = .
1 + N2
What is the optimal level of nitrogen in the soil?
Solution. We want to choose N so as to maximize Y , so we are looking for the absolute maximum
value of the function Y (N ).
We compute
(1 + N 2 ) − N (2N ) 1 − N2
Y 0 (N ) = = ,
(1 + N 2 )2 (1 + N 2 )2
so the only critical points are where 1 − N 2 = 0 or N = ±1. Since N is the amount of nitrogen,
N ≥ 0, and there is only one critical point to consider. Since Y 0 (N ) > 0 if 0 ≤ N < 1 and
Y 0 (N ) < 0 if N > 1, we conclude that N = 1 gives a local and global maximum.
1
Thus the maximum yield is Y (1) = 2 at it occurs at N = 1.
154
6.5.2 Areas and volumes:
Minimize the material used to produce a cylindrical can of a fixed volume V = 355 cm3 .
Solution. We draw a picture, and introduce some variables by labeling the important parts of the
picture. We obtain equations by relating the variables. Once we have thus translated the question
into math, we can decide what we need to optimize.
Denote by r the radius of the bottom of the can and by h its height, in cm.
Then the volume is
V = πr2 h = 355 cm3 (6.1)
and the surface area is
A = 2πrh + 2πr2 . (6.2)
So the question says we want to minimize A — but right now this is a function of two variables, r
and h (which is bad, because this is a one-variable Calculus course). We need to use the information
that V is a fixed value (355); this constraint allows us to solve for h in terms of r. Namely, from
(6.1)
h = 355/(πr2 )
so that we rewrite (6.2) as
355 710
A = 2πr · 2
+ 2πr2 = + 2πr2
πr r
which expresses A as a function of one variable, r.
Now differentiate A with respect to r to get
A0 = −710r−2 + 4πr
355
whose zeros are 4πr = 710r−2 or r3 = . This has only one root, at about r ' 3.8372 cm.
2π
Caution: We chose more than 3 significant figures at this point because we will need
our final answer to be accurate to the same precision as the data given: 3 significant
figures, and round-off error comes in whenever we use estimates.
Since A00 = 1420r−3 + 4π > 0 for all r > 0, this function is always concave up, and so our critical
point is a local and global minimum. The dimensions of the cylinder having volume 355 cm3
155
and with minimal surface area is thus r = 3.84 and h = 7.68 (cm); the minimal surface area is
A = 277 cm2 . (Note all these answers were only rounded to 3 significant digits at the last step.)
Note that if we used a parameter V in place of 355 in the above computation, we

p could have come
3
up with a formula for the minimum dimensions as a function of volume: r = V /2π so h = 2r
and A = 6πr2 = 5π(V /2π)2/3 .
6.5.3 Distances
Find the distance of the line y = 1 + 2x from the origin and find the point on the line that is closest
to the origin.
Solution. Draw a picture, assign variable names and decide what we are trying to minimize.
p
Here, a point on the line is (x, y), and its distance to the origin is d = x2 + y 2 .
Again, there are 3 variables and we need to cut it down to two; again, there is an equation relating
x and y, namely y = 1 + 2x.
In this case: minimizing the distance is equivalent to minimizing the square of the distance, since
the square root function is an increasing function. So let’s minimize
D = x2 + y 2 = x2 + (1 + 2x)2
instead, because it is easier. We have D0 = 2x + 2(1 + 2x)(2) = 10x + 4, which has a unique critical
point at x = − 52 . Since D00 = 10 > 0, the function is concave up everywhere, so this is a local
minimum and must also be a global minimum (by concavity).
2 1
So the minimum distance occurs when x = − , and thus y = ; the distance is
5 5
s
2 2
2
1 1
d= − + =√ .
5 5 5
156
Exercise 6.5.1. In the setting of the example above, minimize instead the distance function
p p
d(x) = x2 + (1 + 2x)2 = 5x2 + 4x + 1
(rather than the square of the distance function) and verify that you get the exact same answer
(with a touch more work).
6.5.4 Maximize yield in a DTDS.
Assume that a population grows logistically and is being harvested regularly. In this case, we model
“harvesting” by saying that at each time t, a certain fraction h of the population is removed. (Thus,
a higher population yields a higher harvest, but a smaller population yields a smaller harvest.)
Actually (to make the final formula simpler), the way we determine harvest is relative to last
month’s population: We start with population xt , we let it grow over the course of a month, then
harvest a total of hxt individuals (instead of hxt+1 ). 3
We model our logistic population with harvesting according to the DTDS
xt+1 = 2.5xt (1 − xt ) − hxt
where h > 0 is a parameter which denotes the intensity of harvesting. Let us assume that this
DTDS has a stable positive steady state x∗ . Then in the long term, the yield of the harvest is
Y (h) = hx∗ .
How should we choose h so that the steady state yield Y is maximized?
Solution. We notice that our function Y (h) depends on h and on x∗ , but not on xt or t. So in fact,
the first step is to find the equilibria of the DTDS and decide what x∗ is.
Recall that x∗ is an equilibrium of a DTDS with updating function f if x∗ = f (x∗ ). In this case,
we have updating function
f (x) = 2.5x(1 − x) − hx
so f (x) = x means
2.5x(1 − x) − hx = x ⇐⇒ 2.5x − 2.5x2 − hx = x ⇐⇒ 2.5x2 = (1.5 − h)x.
Therefore we have two equilibria, x∗ = 0 (of course) and

1.5 − h 3/2 − h 3 − 2h
x∗ = = = .
2.5 5/2 5
The zero equilibrium means zero harvest, so let’s consider the other one. In the question, we are
told that this is a stable steady state4 .
3
So in a slow-growing population, h < 1, but in a fast-growing population, you could have h > 1 — harvesting
more than what was left after the last harvest.
4
which we will learn to verify in a few lectures
157
But wait: 15 (3 − 2h) will be negative if 3 − 2h < 0, meaning h > 3/2. That doesn’t make sense: so
we have to restrict our values of h to where the equilibrium is positive (so actually exists) and also
where h ≥ 0. So the domain we are considering is
0 ≤ h ≤ 1.5.
Good. For this range of values of h we now have a function for Y (h) just in terms of h:
(3 − 2h) 1
Y (h) = hx∗ = h = (3h − 2h2 ).
5 5
We want the maximum value of this function on the domain [0, 1.5], which exists by the Extreme
Value Theorem.
Now in fact this is a parabola which is concave down, so has a unique global maximum at its critical
point. We compute Y 0 (h) = 35 − 45 h so the unique critical point is where
3 4
− h=0 ⇐⇒ 4h = 3 ⇐⇒ h = 3/4.
5 5
Since 0.75 ∈ [0, 1.5], this is in our domain; since it’s the global maximum of Y on R, it is for sure
the global maximum on our interval. Success!
Thus the optimate harvesting rate is h = 0.75 and the maximum steady state harvest will be
!
3 3 − 32

∗ 3 9
Y =Y = =
4 4 5 40
which is measured in the units of our original population.
So for example if this is an annual harvest, then by harvesting 75% of the population each year
our population continues to grow to a positive steady state of x∗ = 0.3 (30% of the maximum
population possible given the limited resources, as per the logistic growth model) and we harvest
75% of this (which is 9/40).
Of course, we can add a harvesting component to any population model. For example, if we consider
a population growing according to a Ricker model xt+1 = 2xt e1−0.4xt , then if we harvest at a rate
of h (meaning: we harvest hxt just before the next (t + 1) population count), this gives us a new
model with harvesting of
xt+1 = 2xt e1−0.4xt − hxt .
Exercise 6.5.2. (Challenge) You might ask: why didn’t we just set up a constant harvesting rate
and optimize from there? That is, define your DTDS with constant harvesting by xt = 2.5xt (1 −
xt ) − h for some parameter h, meaning you harvest the same amount, regardless of population.
Biologically speaking, this isn’t a good strategy, because you know that you should harvest less when
the population is small. Mathematically speaking, this is a more complex model that takes into
account how intensely can I harvest before I drive the population to extinction — and the stability
of the fixed point is a huge concern.
158
6.5.5 Minimal perimeter for area
Let’s start with a farmer wanting to build a rectangular yard for his sheep. The fence costs $ 20
per meter, and he wants to enclose 100m2 . What should the dimensions be to minimize cost?
Proof. We draw a picture.
A rectangle, representing the fence and field. We label the sides a, b, a, b, which represent length
in meters.
So the perimeter of the fence is 2a + 2b and so the cost is C = 20 × (2a + 2b) = 40a + 40b $.
Now C is currently a function of two variables, a and b, so we are not ready yet. We look back
and see that we haven’t yet taken into account that the area should be 100 m2 . The area of the
rectange is ab; so the equation is ab = 100 or b = 100a−1 .
That gives C = 40a+4000a−1 . Excellent: cost as a function of the length of one side; let’s minimize.
The derivative is
C 0 = 40 − 4000a−2
so C 0 = 0 when 4000a−2 = 40 or 100 = a2 or a = 10.
To decide if it is a minimum, we compute
C 00 = 8000a−3 > 0
so the function is concave up at a = 10 (and in fact on (0, ∞)) so this is a global minimum on the
domain (0, ∞).
When a = 10, b = 100/a = 10. It’s a square!
The minimum cost is C(10) = 400 + 400 = 800$. The minimal perimeter is P = 40.
So the optimal shape for minimizing perimeter for fixed area (or: for maximizing area given the
perimeter) if you start with a rectangle is a square: the most symmetric one.
Next question: consider the different regular polygons: equilateral triangle, square, regular pen-
tagon, regular hexagon, etc. Could the farmer do better? With a perimeter of P = 40, could he
get more area with a different shape? (See textbook.) Answer: yes, with a circle.
Neat fact: Bees literally deal with this problem, wanting to use the least amount of material to
build their honeycombs but have the most space for honey. But they don’t want just one cell; they
want to create dozens of them stacked together, so if there are gaps, they would waste space (and
159
material). The only regular polygons that tile the plane are triangles, squares and hexagons — so
bees use hexagons!
6.5.6 Some more examples
Optimal age of reproduction Semelparous organisms, like Pacific salmon, reproduce only once
in their lifetime, and then die. Typically, they can produce more female offspring as they get older,
which is an advantage for population growth. But if they wait too long, then they might die before
they reproduce.
What then is the optimal age of reproduction?
To answer this, we need a mathematical model and we need to refine our question.
Fact:5 If we denote by `(x) the probability that an individual lives to age x and by m(x) the
average number of female offspring of an individual at age x, then the average annual reproduction
is given by
ln(`(x)m(x))
r(x) = .
x
We6 want to maximize r as a function of x.
Specific problem: Suppose that our semelparous organism is such that `(x) = e−ax and m(x) =
bxc , for some positive constants a, b, c. Find the value of x that maximizes r as above.
Solution. We are given r as a function of x, so this question is purely an extreme value problem.
Let’s simplify r before differentiating:
1 1 1
r(x) = (ln(`(x)) + ln(m(x))) = (ln(e−ax ) + ln(bxc )) = (−ax + ln(b) + c ln(x))
x x x
so
ln(b) ln(x)
r(x) = −a + +c .
x x
So
x x1 − ln(x) 1 1
r0 (x) = − ln(b)x−2 + c = 2 (− ln(b) + c − c ln(x)) = 2 (c − ln(bxc )).
x2 x x
The critical points are x = 0 (technically not a critical point, since it’s not in the domain of r, but
it certainly is a critical value in this model) and where c = ln(bxc ). We solve this:
ec
ec = bxc ⇐⇒ xc = ⇐⇒ x = eb−1/c .
b
The critical point x = 0 is irrelevant, as it means a lifetime of length 0. The other critical point
is positive, since b > 0. Since ln is an increasing function, as is xc , we deduce that c − ln(bxc ) is
5
For example, see Vaupel JW, Missov TI, Metcalf CJE (2013) Optimal Semelparity. PLoS ONE8(2): e57133.
https://doi.org/10.1371/journal.pone.0057133
6
Why isn’t it just `(x)m(x)? Because that’s just for one individual; we have to average over the whole population
size, so there are more individuals if they reproduce more often. This formula takes all that into account; see MAT2379
intro to biostatistics.
160
decreasing, so goes from positive to negative. Thus eb−1/c is a local maximum (and in fact global
maximum).
We deduce that x = eb−1/c is a formula for the optimal age of reproduction for this species.
Optimal clutch size. If an organism produces only few offspring, then each has a high probability
of survival; if there are many offspring then the survival probability individually declines7 .
At how many offspring is the total number of survivors maximized?
Again, to get to a mathematical question, we need to convert this concept to an equation using a
mathematical model.
Let R denote the total resources (per adult female) available for reproduction and N the clutch
size. Then the amount of resources per offspring is x = R/N. Denote the survival probability of
an offspring having resources x as f (x). This function should be positive (between 0 and 1) and
non-decreasing (since more resources should not decrease survival). Then the expected number of
surviving offspring is
R
w(x) = N f (x) = f (x).
x
We want to maximize the number of offspring w and we have expressed this number as a function
of x, the amount of resources per offspring.
Let us solve this question with

x2
f (x) =
x2 + k 2
for some constant k > 0.
Solution. Since
R
w(x) = f (x)
x
we have
w0 (x) = R(xf 0 (x) − f (x))/x2 ,
which gives critical points x = 0 (not relevant) and xf 0 (x) = f (x).
x2
If f (x) = x2 +k2
, then
(x2 + k 2 )(2x) − x2 (2x) 2xk 2
f 0 (x) = =
(x2 + k 2 )2 (x2 + k 2 )2
so
2x2 k 2 x2
xf 0 (x) = f (x) ⇐⇒ = ⇐⇒ 2k 2 = x2 + k 2 ⇐⇒ x2 = k 2
(x2 + k 2 )2 x2 + k 2
gives the only critical points as x = ±k; only x = k is biologically relevant.

7
This is often called r vs K strategy; see Wikipedia, for example
161
We compute w00 (x) to classify this critical point. To do so efficiently, let’s write
−R R
w0 (x) = 2
f (x) + f 0 (x)
x x
so that
2R R R R 2R R
w00 (x) = 3
f (x) − 2 f 0 (x) − 2 f 0 (x) + f 00 (x) = 3 (f (x) − 2Rxf 0 (x)) + f 00 (x),
x x x x x x
and at the critical point, the first term in this last expression is 0 since xf 0 (x) = f (x). Great! So
R
the concavity at the critical point is comes down to the sign of f 00 (x), which is the same as the
x
sign of f 00 (x).
We compute
(x2 + k 2 )2 (2k 2 ) − 2xk 2 (2(x2 + k 2 )(2x))

f 00 (x) =
(x2 + k 2 )4
(x2 + k 2 )(2k 2 ) − 2xk 2 (2(2x))
=
(x2 + k 2 )3
2x2 k 2 + 2k 4 − 8x2 k 2
=
(x2 + k 2 )3
so that f 00 (k) = −4k 4 /(2k 2 )3 < 0 and thus the critical point is a local maximum.
Since the function w(x) is concave down at x = k, we deduce that f is increasing before k and
decreasing after k; and since there are no other critical points, this must therefore be a global
max.
Optimize food intake by adjusting residence time. Consider a bee consuming nectar from
flowers. Suppose that it remains at each flower for a fixed amount of time before it travels to the
next flower. If that residence time is small, then the bee might leave valuable nectar behind. If it
is large, then it might be depleting all the nectar and getting less than if it went to look for the
next flower. What is the optimal residence time?
Approach: To answer this question, we need to know how much food the bee collects in t time
units while at one flower, measured from t = 0 its arrival at the flower. Let’s call this function
F (t).
So we do some science. First, we reason that

F should be a positive, non-decreasing func-
tion, but would probably be concave down af-
ter some point, as the bee has to work harder
or wait longer to get more nectar, a bit like:
Guess at what F should look like.
162
So we collect some data and plot points and choose a curve that seems to fit the data and our
expectations well, and come up with
t
F (t) = .
t + 0.5
But we’re not done yet. We have to take into account how long it takes the bee to go between
flowers — if the flowers are close, it’s negligible, but if the flowers are far apart, maybe bees should
spend more time where they are, right?
Thus we create a parameter: suppose now that the bee takes on average d time units to fly to the
next flower. Then if it spent time t at the flower and time d flying, and gained F (t) units of nectar
over that time, then the rate of nectar collection (amount of nectar per unit time) is
F (t)
R(t) = .
t+d
The bee wants to maximize R as a function of t, the amount of time it spends at one flower.
t
Solution for the specific function F (t) = . We have
t + 12
t
R(t) = 1 where d is a positive constant parameter.
(t + 2 )(t + d)
Thus
(t + 21 )(t + d) − t(2t + d + 12 ) t2 + (d + 12 )t + 21 d − 2t2 − (d + 12 )t
R0 (t) = =
(t + 12 )2 (t + d)2 (t + 12 )2 (t + d)2
thus
1 2
2d − t
R0 (t) = .
(t + 12 )2 (t + d)2
This is undefined when t = − 21 and t = −d (neither of which are biologically relevant); and it is
q
zero when t2 = 12 d or t = ± d2 . Again, only the positive root is relevant.
q
Therefore there is only one biologically relevant critical point, t = d2 . The denominator is positive;
q q
the sign of R0 (t) is positive when 0 < t < d2 and is negative if t > d2 , so this is a local (and in
fact, on this domain) global maximum.
q
Conclusion: in this model, the bee should spend d2 time at each flower to maximize its average
yield. For example, if d = 50 then t = 5; if d = 2 then t = 1. Funny: if the flowers are
closer together, it spends less time at each flower (but spends a larger percentage of its time on
flowers).
What does it all mean? Sometimes, it’s helpful to work with a more general formula so we can
better see the patterns.
163
MAT 1330 : Fall 2020 6.6. L’HÔPITAL’S RULE FOR FINDING LIMITS
General Solution. So now just suppose that F is increasing and concave down and R(t) = F (t)/(t+
d) for some constant parameter d. We differentiate R:
(t + d)F 0 (t) − F (t)

R0 (t) = .
(t + d)2
F (t)
This is zero when (t + d)F 0 (t) = F (t) or F 0 (t) = = R(t).
t+d
Wow: this means something! The critical point is where the average rate of nectar collection (R(t))
is equal to the instantaneous rate of nectar collection (F 0 (t)) on the flower.
Since F is increasing and concave down, F 0 is decreasing but F is increasing. Therefore the
numerator (t + d)F 0 (t) − F (t) is will be negative after the critical point, and positive before. That
is, we conclude that R0 > 0 before the critical point and R0 < 0 after, meaning it is a local (and in
fact global) maximum.
The result is the marginal value theorem: the bee should leave the flower if the instantaneous food
intake falls below the average food intake.
This gives us an alternative solution:
Solving the F (t) = t

t+ 12
model with the general solution. So, specifically, if F (t) = t
0.5+t , then F 0 (t) =
((0.5 + t) − t)/(t + .5)2 = 0.5/(t + 0.5)2 . So now we solve for t by setting F 0 (t) = R(t) (the marginal
value theorem):
0.5 t 1
= ⇐⇒ 0.5(t + d) = t(t + 0.5) ⇐⇒ d = t2
(t + 0.5)2 (t + 0.5)(t + d) 2
q
so when t = 12 d (since t in this case must be ≥ 0).
End of lecture # 13
6.6 L’Hôpital’s rule for finding limits
We studied limits, and continuous functions, back in Chapter 4, and used them to help analyse
the graphs and behaviour of functions in Section 6.3. But we did encounter some functions whose
limits we could not evaluate using existing algebraic methods. Calculus, to the rescue!
164
6.6.1 Recall: Algebra of limits with infinity
For infinite limits, some arithmetic is valid; for example, let c > 0 be any finite number then
∞±c=∞ (−1)∞ = −∞
∞+∞=∞ ∞·∞=∞
c·∞=∞
c c
=∞ = −∞
0+ 0−
c ∞
=0 =∞
∞ c
∞ ∞
=∞ = −∞
0+ 0−
(along with many other variations).
Example 6.6.1.
x+4 7
lim = “ ” = −∞
x→3− x − 3 0−
1
lim e2x ( + 4) = “∞ · 4” = ∞
x→∞ x
Note. But the following are examples of what are called indeterminate forms and their value
cannot be assessed without analyzing the functions involved:
0 ∞
0 ∞
∞−∞ 0·∞
Example 6.6.2.
x2 − 4
lim
x→2 x − 2
is an indeterminate form of type 0/0; to find the limit we have to algebraically manipulate the
function (here, simplify) to deduce the true value (which is 2). The point is that the indeterminate
form holds no information about the value of the limit.
6.6.2 Recall: evaluating limits
Earlier in the course, we talked about the limit — but then we jumped to the theorem that all our
favourite functions are continuous, and in that case you can evaluate the limit
lim f (x)
x→a
by just plugging in a for x. In many cases this works when a = ±∞ as well.
165
Example 6.6.3.
lim tan(x) = −∞ whereas lim tan(x) = −∞

x→(π/2)+ x→(π/2)−
lim ex = ∞ whereas lim ex = 0

x→∞ x→−∞
So therefore
lim etan(x) = 0
x→π/2+
since as x → π/2+ , we have tan(x) → −∞, so etan(x) → 0.
ln(x)
Example 6.6.4. Find lim .
x→0+ x
We know that lim ln(x) = −∞ (so there’s a vertical asymptote at 0).

x→0+
So as x → 0+, the numerator goes to −∞ and the denominator goes to 0 (on the positive side).
Dividing by a very small positive number makes you bigger — so we see that the fraction ln(x)/x
also tends to −∞, by our rules of “algebra with infinity.”
ln(x)
So lim = −∞.
x→0+ x
6.6.3 Indeterminate forms and L’Hôpital’s rule
Let’s consider
ln(x)
lim .
x
x→∞
This time we cannot reason it out so easily: both ln(x) and x go to ∞ as x → ∞. We call this an
indeterminate form of type ∞/∞ — indeterminate because we can’t determine the answer without
thinking more about the functions involved.
Another one we’ve previously encountered was
sin(x)
lim
x→0 x
which we can solve geometrically (see Section 5.10.1).
So what’s going on with these types of limits? For ln(x)/x it’s a question of who “gets to infinity”
fastest, and for sin(x)/x, it’s about how quickly sin(x) goes to 0 as compared with x going to 0. In
other words: if they are both headed for zero or both headed for ∞, then what we need to do is
compare their rates — that is, their derivatives.
166
Theorem 6.6.5 (L’Hôpital’s rule). Let f and g be differentiable functions such that g 0 (x) is nonzero
around a. If
f (x) 0 ∞
lim is an indeterminate form of type or ±
x→a g(x) 0 ∞
then
f (x) f 0 (x)
lim = lim 0 .
x→a g(x) x→a g (x)
This formula is also valid for one-sided limits, and limits x → ±∞.
∞
Example 6.6.6. Type: ∞
ln(x) ∞
lim = “ ”, so we can apply L’Hôpital’s rule
x→∞ x ∞
0 1/x
=L H lim =0
x→∞ 1
0
Example 6.6.7. Type: 0
sin(x) 0
lim = “ ”, so we can apply L’Hôpital’s rule
x→0 x 0
0 cos(x)
=L H lim = cos(0) = 1
x→0 1
Note this was a lot easier than our geometric argument — but we had to use the geometric argument
back then because we were trying to find out the derivative of sin(x)!
Example 6.6.8. What about

ex
lim ?
x→0+ x
ex
It is WRONG to say this limx→0 1 = 1. Why? BECAUSE IT WASN’T AN INDETERMINATE
FORM!!!
ex 1
lim=“ ”=∞
x→0+ x 0+
and it’s quite clear that yes, there should be a vertical asymptote at x = 0, and this answer is the
correct one.
Note. You CANNOT apply L’Hôpital’s rule UNLESS it’s an indeterminate form of type 0
0 or ± ∞
∞.
6.6.4 Product and difference indeterminate forms
Type 0 · ∞
This doesn’t mean actually 0, it means a limit of a product where one term is shrinking to 0 and
the other is growing to infinity.
167
For example, consider

lim sin(x) ln(x).
x→0+
Since sin(x) → 0 and ln(x) → −∞, this is an indeterminate form of type 0 · ∞. Does the function
sin(x) go to zero faster than ln(x) goes to −∞? or vice versa? Or do their rates match and cancel?
0 ∞ b a
Note. Convert 0 · ∞ into 0 or ∞ by the identity : ab = a−1
or ab = b−1
.
We have
ln(x) 1
lim sin(x) ln(x) = lim since = csc(x)
x→0+ x→0+ csc(x) sin(x)
−∞
=“ ” so we can apply L’Hôpital’s rule
∞
0 1/x ∞
=L H lim (still , but let’s simplify)
x→0+ − csc(x) cot(x) ∞
1/x
= lim (converted to sine and cosine)
x→0+ − cos(x)/ sin2 (x)
sin2 (x)
= lim
x→0+ −x cos(x)
0
= “ ” so we can apply L’Hôpital’s rule
0
0 2 sin(x) cos(x)
=L H lim
x→0+ − cos(x) + x sin(x)
= 0,
(because the numerator is 0 and the denominator is −1) so sin(x) won the race.
The graph of y = sin(x) ln(x), which indeed tends to 0 as x → 0, even though ln(x) → −∞.
Type ∞ − ∞
To solve, put over a common denominator.

Example 6.6.9. We note that
lim (csc(x) − cot(x)) = “∞ − ∞”, an indeterminate form
x→0+
168
so we think of ways to simplify this and/or put it over a common denominator to make it a fraction.
Since it’s trig, our first thought is to write everything in terms of sine and cosine:

1 cos(x)
lim (csc(x) − cot(x)) = lim −
x→0+ x→0+ sin(x) sin(x)
1 − cos(x)
= lim
x→0+ sin(x)
0
0
0 sin(x)
=L H lim = 0,
x→0+ cos(x)
so these two functions cot(x) and csc(x) get almost identical as x → 0+ .
Once it’s over a common denominator, plug in the values again, and decide if you’ve got an
indeterminate form (so apply L’Hôpital’s rule) or else a limit that you can reason out.
Sometimes you have to use tricks like rationalization to “create” a fraction.

p p
Example 6.6.10. Find lim ( x3 − x − x3 + x). We know that x3 ± x → ∞ as x → ∞
x→∞
√
(because we know how polynomials behave), and is a continuous function, so overall this is an
indeterminate form of type ∞ − ∞. Looking at the functions, you see the difference is always
negative, and it looks like it should get bigger in absolute value as x increases. Let’s see what we
can do:
p p
lim ( x3 − x − x3 + x) = “∞ − ∞”
x→∞
√ √
p
3
p
3
( x3 − x + x3 + x)
= lim ( x − x − x + x) √ √
x→∞ ( x3 − x + x3 + x)
(x3 − x) − (x3 + x)
= lim √ √
x→∞ ( x3 − x + x3 + x)
−2x
= lim √ √
x→∞ ( x3 − x + x3 + x)
∞
∞
−2
= lim 1 .
x→∞ (3x2 − 1)(x3 − x)−1/2 + 1 (3x2 + 1)(x3 + x)−1/2
2 2
This is terrifying, but let’s understand it piece by piece. Note that
1 x2 (3 − 1/x2 ) x1/2 (3 − 1/x2 )

lim (3x2 − 1)(x3 − x)−1/2 = lim p = lim p = ∞.
x→∞ 2 x→∞ 2x3/2 1 − 1/x2 x→∞ 2 1 − 1/x2
Similarly,
1
lim (3x2 + 1)(x3 + x)−1/2 = ∞,
x→∞ 2
so in fact the denominator is going to ∞ + ∞ = ∞ while the numerator is constant. Thus we can
conclude from the above calculations that
p p
lim ( x3 − x − x3 + x) = 0.
x→∞
169
So in fact: in the limit, the little ±x under the square root didn’t make a difference, and “at
infinity” these functions are indistinguishable! (Check it out with a calculator to see that this is
completely true.)
Exercise 6.6.11. Our initial step was to convert our ∞ − ∞ limit into the more tractable limit
−2x
lim √ √
x→∞ ( x − x + x3 + x)
3
and then to apply L’Hôpital’s rule. We didn’t have to use L’Hôpital’s rule, though. Show that you
can reason out this limit using our old techniques.
6.6.5 Exponential indeterminate forms
Note. Three exponential indeterminate forms are ∞0 , 00 and 1∞ .
Examples of these three indeterminate forms are
lim x1/x type ∞0

x→∞
lim xx type 00
x→0+
1 x
lim (1 + ) type 1∞
x→∞ x
In all three cases, it’s a exponent where we have two competing rules that each say the opposite:
“∞0 should be ∞ because the base is ∞” but “∞0 should be 1 because any number to the
power 0 is 1” (but remember that ∞ is not a number and it’s only the limit as x → 0, not
really 0)
“00 should be 0 because the base is 0”, but “00 should be 1 because the exponent is 0”
“1∞ should be 1 because 1 to any power is 1” but “if the base is just a bit bigger than 1 then
(1+)∞ should be ∞ because anything bigger than 1 to the power ∞ is ∞” but then again
“if the base is just a big smaller than 1 then (1−)∞ should be 0 because anything just under
1 to the power ∞ is 0”. Phew!
These lines of reasoning are fine: the fact that you get contradictory answers is what says you don’t
have enough information, and that this is an indeterminate form.
The method of solution is the same for all exponential types like this: convert to base e. (Notice
how this is the solution to a lot of problems with exponentials in this course.)
170
Note. Recall that
y = f (x)g(x) ⇐⇒ ln(y) = g(x) ln(f (x)) ⇐⇒ y = eg(x) ln(f (x)) .
So suppose we figure out (by doing a fair bit of work) that
lim g(x) ln(f (x)) = c.

x→a
Then the continuity of the exponential function tells us that
lim f (x)g(x) = lim eg(x) ln(f (x)) = ec .

x→a x→a
Example 6.6.12. So
lim x1/x = lim eln(x)/x = e0 = 1
x→∞ x→∞
by Example 6.6.6.
Example 6.6.13. On the other hand
lim xx = lim ex ln(x) = elimx→0+ x ln(x)

x→0+ x→0+
and since lim x ln(x) is an indeterminate form of type 0 · ∞, we have to work a bit harder; let’s
x→0
focus on this limit.
ln(x) L0 H 1/x
lim x ln(x) = lim = lim = lim −x = 0
x→0+ x→0+ 1/x x→0+ −1/x2 x→0+
where we had carefully checked that we had an indeterminate form of type ∞/∞ before we applied
L’Hôpital’s rule in the second step.
Now we can finish our calculation:
lim xx = elimx→0+ x ln(x) = e0 = 1.

x→0+
The answer isn’t always 1!

Example 6.6.14.
1 x 1
) = lim ex ln(1+ x )
lim (1 +
x→∞ x x→∞
Again, this exponent is an indeterminate form of type 0 · ∞, and so we apply the techniques of the
preceding section to solve it.
1 ln(1 + x1 )
lim x ln(1 + ) = lim 1
x→∞ x x→∞
x
1 1
1+1/x (− x2 )
= lim
x→∞ −1/x2
1
= lim =1
x→∞ 1 + 1/x
171
So that
1 x 1
lim (1 + ) = lim ex ln(1+ x ) = e1 = e.
x→∞ x x→∞
This is one of the coolest formulas for e, ever.
Summary
To find limits of continuous functions, try plugging in the values. When the result is an indeter-
minate form (0/0, ∞/∞, 0 · ∞, ∞ − ∞, 00 , ∞0 or 1∞ ), either simplify it algebraically to find the
answer, or else turn the limit into an indeterminate form of type 0/0 or ∞/∞ so that you can
apply L’Hôpital’s rule (as many times as it takes).
As you saw in the examples in this section: always take the time to simplify; and ALWAYS check
that you have the right kind of indeterminate form before using L’Hôpital’s rule.
6.6.6 Graphing even more complex functions
Example 6.6.15. Graph the function f (x) = x2 e−x .
1. The domain is R, and f (x) = 0 only when x = 0
2. lim x2 e−x = ∞ since both x2 → ∞ and e−x → ∞; lim x2 e−x — oops, an indeterminate
x→−∞ x→∞
form of type ∞ · 0 so we have to work:
x2
lim x2 e−x = lim x
x→∞ x→∞ e
∞
= “ ” so apply L’Hôpital’s rule
∞
0 2x
=L H lim x
x→∞ e
∞
= “ ” so apply L’Hôpital’s rule
∞
L0 H 2
= lim
x→∞ ex
=0
3. f 0 (x) = 2xe−x + x2 e−x (−1) = (2x − x2 )e−x .
4. Only critical points are 2 and 0 (see Example 6.1.5):
x<0 x=0 0<x<2 x=2 x>2

f 0 (x) : - 0 + 0 -
f (x) is decreasing 0 increasing −4
4e ' 0.073 decreasing
5. f 00 (x) = (2 − 2x)e−x + (2x − x2 )e−x (−1) = (x2 − 4x + 2)e−x
172
√ √
6. This is zero only if x2 − 4x + 2 = 0, so when x = 12 (4 ± 16 − 8) = 2 ± 2:
√ √ √ √ √ √
x<2− 2 x=2− 2 2− 2<x<2+ 2 x=2+ 2 x>2+ 2
f 00 (x) : + 0 - 0 +
f (x) is ∪ 0.19 ∩ 0.38 ∪
7. This gives the following graph, once you put all the pieces together. Start by plotting the
points of interest (x, f (x)) (the critical points, and potential inflection points), and draw
little arrows for all the limits, then connect them, using the concavity information to get the
curvature right:
Label all the features on your graph (sorry, can’t do it with FooPlot): critical points (there
are two), inflection points (there are two), local and global minimum at (0, 0), local maximum
at (2, 0.073), horizontal asymptote at y = 0 as x → ∞.
Let’s do another example of graphing a function using all the information we can readily glean
from the function itself, and its first and second derivatives.
Example 6.6.16. Let f (x) = e1/x .
Information from f :
f has domain all real numbers except x = 0.
f is always positive.
Since f is undefined at 0, we need to find the limit of f as x approaches 0, that is, what does
f look like near x = 0? So:
lim e1/x = “e∞ ” = ∞
x→0+
since for x < 0, 1/x > 0 and as x → 0+ we have 1/x → ∞; so e1/x → inf ty also. On the
other hand
lim e1/x = “e−∞ ” = 0
x→0−
173
since as x → 0−, 1/x → −∞; but we know that limz→−∞ ez = 0 so we deduce e1/x → 0.
This is a weird answer; so we check, but it’s right.
Finally, we would like to know if there are horizontal asymptotes, or more generally, how the
graph of f behaves as x → ∞ and x → −∞:
lim e1/x = “e1/∞ ” = e0 = 1

x→∞
and
lim e1/x = “e−1/∞ ” = e0 = 1
x→−∞
so there are horizontal asymptotes at both ends.
Remember: a graph can cross a horizontal asymptote! (Look at our graph for y = ln(x)/x, for
example.)
Ok, this has given us quite a few details about the graph but now we look for the bumps and
valleys, the local extrema, which really start to define the shape of the curve in between the points
we’ve figured out so far.
Next, the first derivative. We calculate

−1 1/x
f 0 (x) = e
x2
and then:
f 0 (x) is undefined only at x = 0, which is not in the domain of f anyway
f 0 (x) = 0 is never true, since f 0 (x) is a product of two functions and neither one is ever zero.
So f has no critical point on its domain, meaning it attains no local extrema
We see that e1/x > 0 for all x 6= 0, and −1/x2 < 0 for all x 6= 0. So f 0 (x) < 0 for all x 6= 0.
This says f is decreasing on every connected component of its domain. That is, f is
decreasing on (−∞, 0) and also on (0, ∞).
Remark 6.6.17. We might also want to know

−1 1/x
lim e
x→0− x2
since that tells us the angle at which we will be approaching (0, 0). (We didn’t need to know this
angle for the other limits, because (try it) there are no choices: the graph flattens out to horizontal
at infinity, and at the vertical asymptote the graph gets steeper and steeper). The above limit is an
indeterminate form of type 0/0 so we use l’Hôpital’s rule
−1 1/x L0 H e1/x x−2 e1/x

lim e = lim = lim
x→0− x2 x→0− 2x x→0− 2x3
174
YUCK! This is worse than what we started with!! So let’s flip things around and see if it improves:
−1 1/x −x−2 ∞
lim e = lim indeterminate form
x→0− x2 x→0− e−1/x ∞
0 2x−3
=L H lim
x→0− e−1/x (−x−2 )
−2x−1 ∞
= lim −1/x indeterminate form
x→0− e ∞
2x −2
0
=L H lim −1/x
x→0− e (−x−2 )
= lim −2e1/x = 0
x→0−
so the graph comes in to (0, 0) from the left at a shallow angle.
Now for the second derivative:
x2 (−e1/x (−x−2 )) − (−e1/x )(2x) 1 + 2x

f 00 (x) = 4
= e1/x
x x4
f 00 (x) is undefined only for x = 0.
f 00 (x) = 0 when 1 + 2x = 0 or x = − 12
For x < − 21 , f 00 (x) < 0 so the graph is concave down
For − 21 < x < 0, f 00 (x) > 0 so the graph is concave up
For x > 0, f 00 (x) > 0 so the graph is concave up
We see that (−0.5, e−2 ) is an inflection point since the graph changes concavity there
From these details, we piece together a very good sketch of the graph. Your hand-drawn sketch
would be better, because you’d exaggerate the features and pick a better scale.
175
MAT 1330 : Fall 2020 6.7. APPROXIMATING FUNCTIONS WITH POLYNOMIALS
Graph of y = e1/x . The point of inflection and horizontal asymptote are marked. Notice the
properties of this graph are consistent will all clues from the function and its derivatives and the
values of the limits as x → 0± and x → ±∞.
You should do practice problems with functions that make you fearful — because they will challenge
you! The hard parts are finding the derivatives and then finding the critical points of f and of f 0
— once you have those, it becomes a fun puzzle of connecting the dots.
x
Exercise 6.6.18. Sketch f (x) = xe−1/x and f (x) = √ , noting all the critical points, points
2
x +1
of inflection, asymptotes, and local and global extrema, as above.
End of lecture # 14
6.7 Approximating functions with polynomials
Having the full graph of a function, or a formula for it, is wonderful. But many functions are
√
difficult to work with and to evaluate (such as x, ln(x), ex ) compared to polynomial functions
(such as linear, quadratic functions). Can we locally approximate the more complex functions with
polynomials?
This kind of approximation is used to simplify complex mathematical calculations. But it is also
important when you need to make complex mathematics accessible. For example, if the absorption
model of a drug is given by a complicated function, but you want those who use it to understand
176
the effect of modifying their dosage, then you are better off estimating it with a linear function
that can be explained in words.
Let’s start with a √

simple example of where
√ we use these kinds
√ of estimates quite naturally. If we
√
want to estimate 150, we would say: 144 = 12 and 169 = 13 so since x is a monotone
(increasing) function, we know that √
12 < 150 < 13.
Can we get a better estimate, without a calculator?
6.7.1 Estimating a function using a secant line
The simplest approach is to connect the dots. Formally, we are saying we might approximate our
function by a secant line approximation. That is, given a function f (x) and two points a, b defining
an interval in the domain of the function, the secant line of f from a to b is the straight line from
(a, f (a)) to (b, f (b)).
The slope of the secant line is thus

f (b) − f (a)
m=
b−a
and the equation for the line is

f (b) − f (a)
y = f (a) + m(x − a) = f (a) + (x − a).
b−a
We usually leave this is factored form, because it’s quite natural to evaluate x − a.
√
Example 6.7.1. Give the √ secant line approximation of the function x on the interval [144, 169],
and use it to approximate 150.
Solution: We have (a, f (a)) = (144, 12) and (b, f (b)) = (169, 13) so
f (b) − f (a) 1
m= =
b−a 25
and thus the secant line is given by
1
y = 12 + (x − 144).
25
6
When x = 150, this gives y = 12 + = 12.24. That’s quite a decent approximation to the actual
√ 25
value 150 ≈ 12.247.
So this is great. But can we get a more sophisticated approximation by using our knowledge of the
derivative of f ? In particular, can we get away with approximating values of f from only knowing
f at a single point?
177
6.7.2 Linear approximation
The point of Calculus and the derivative is that locally near any point, the graph of a differentiable
function is quite close to its tangent line. In practice, this means you can estimate f (x) for x near
a by its linear approximation, meaning, the function for this tangent line.
That is: think of the tangent line to f at a as another function (a far simpler function!!!) which is
pretty close to f near a.
What is a formula for the tangent line to f at (a, f (a))?
y − f (a) = f 0 (a)(x − a)
or
y = f (a) + f 0 (a)(x − a).
Note. So given a function f (x) and a base point a, the tangent line to f at a is the graph of
T (x) = f (a) + f 0 (a)(x − a)
and this function is the linear approximation to f at a (also called the linearization of f at a) in
the sense that
it’s a linear function, and
T (x) is close to f (x) for x near a, and
T (a) = f (a), that is, the functions coincide at a, and
of all the lines that satisfy these three properties, T (x) gives the best approximation to f (x).
Some other common notation for T (x) is L(x), for Linearization.

√
Example 6.7.2. Consider f (x) = x at a = 100. We have f 0 (x) = 1
√
2 x
. Then
√
1 1
T (x) = f (100) + f 0 (100)(x − 100) =
100 + √ (x − 100) = 10 + (x − 100).
2 100 20
√ 1
√
So 150 ∼ T (150) = 10 + 20 (50) = 12.5. Check: 150 ∼ 12.247, not bad.
We should get a better approximation if we choose a base point closer to 150.

√
Example 6.7.3. Consider f (x) = x with base point a = 144. Then
1
T (x) = f (144) + f 0 (144)(x − 144) = 12 + (x − 144)
24
√ 1
so 150 ∼ T (150) = 12 + 24 (6) = 12.25, which is really close; this is even better than our secant
line (“connect the dots”) estimate!
178
√
The graph of y = x is in red. Its tangent line at the base point a = 100 is given in blue and its
tangent line at the base point a = 144 is given in black. Compare the closeness of the
approximation at x = 150.
Example 6.7.4. Find the linear approximation to f (x) = ex near x = 0.
Solution: We write down the equation of the tangent line.
T (x) = f (a) + f 0 (a)(x − a) = e0 + e0 (x − 0) = 1 + x.
We compare:
f (0.1) = e0.1 = 1.10517
whereas
T (0.1) = 1 + 0.1 = 1.1
and we needed a calculator to find f (0.1) whereas we didn’t for T (0.1).
However, if x is not near a, then your linear approximation won’t be very good: for example
e1 ∼ 2.718 but T (1) = 1 + 1 = 2. This is obvious from the graph.
The graph of y = ex and its tangent line T (x) = 1 + x at the origin.
179
Note. The linear approximation T (x) = f (a) + f 0 (a)(x − a) is useful anytime you know f (a) and
f 0 (a) and want to estimate f (b) for some number b near a : just calculate T (b).
If your function f is given numerically, then you might estimate its derivative by divided differences,
and then T is some nice numerical extrapolation of your data.
Remark 6.7.5. In Physics, it is routine to replace sin(x) or cos(x) with their linearizations at 0
when the physical problem involves smaller angles. For example, if
f (x) = sin(x), a=0
then its linear approximation is
T (x) = sin(0) + cos(0)(x − 0) = 0 + 1(x) = x.
Replacing sin(x) by x makes it easier to solve for x in certain formulas, such as approximating the
period of oscillation of a pendulum, or in the defraction of light through a lens.
6.7.3 Taylor polynomials
Linear approximation is fine, but we could do better. The second derivative tells us about the
concavity of the function, so it we took it into account, we could find a simple function that
matches both the slope of f and the concavity of f at a point a.
√ √
Example 6.7.6. Our linear approximation of 150 was too high, because the graph of y = x is
concave down near x = 150. Our linear approximation of e0.1 was too low, because the graph of
y = ex is concave up near x = 0.1.
Theorem 6.7.7. Let f be a functiona and a a base point. Then for any n > 0 there is a polynomial
of degree n, called the Taylor polynomial of degree n and denoted Tn (x), with the property that
Tn (a) = f (a), Tn0 (a) = f 0 (a), Tn00 (a) = f 00 (a), ··· Tn(n) (a) = f (n) (a)
where f (n) (a) denotes the nth derivative of x evaluated at the number a.
a
such that the first n derivatives of f exist at a
That’s quite cool: it says for any function whatsoever, you can come up with a polynomial that
matches it perfectly, up to the nth derivative, at the point a. (And notice that this is not about
fitting a curve to datapoints — we are only using our knowledge of the function at a single point
a.)
But even cooler is the formula for Tn (x):
f 00 (a) f 000 (a) f (n) (a)

Tn (x) = f (a) + f 0 (a)(x − a) + (x − a)2 + (x − a)3 + · · · + (x − a)n .
2 3! n!
Note that:
180
dn
f (n) (a) = dxn f (x) at x = a, the n-th derivative;
n! = 1·2·3·· · ··(n−1)·n, called “n-factorial” : for example 3! = 1·2·3 = 6, 4! = 1·2·3·4 = 24

and by convention we say 0! = 1;
T1 (x) = T (x), the linear approximation = the function defining the tangent line to f at a;
You get the nth approximation from the (n − 1)st approximation but just adding one term :
1 (n)
f (a)(x − a)n .
n!
√ −1
Example 6.7.8. So let f (x) = x, so that f 0 (x) = 1
√
2 x
, f 00 (x) = 4x3/2
. At a = 100 we have
√
a = 10 and a3/2 = 103 = 1000, so
f 00 (a)
T2 (x) = f (a) + f 0 (a)(x − a) + (x − a)2
2
1 −1
= 10 + (x − 100) + (x − 100)2
20 2 · 4000
1 1
= 10 + (x − 100) − (x − 100)2
20 8000
(which will indeed be less than our linear approximation, as desired) and we can evaluate
50 502 5 5
T2 (150) = 10 + − = 10 + − = 12.1875.
20 8000 2 16
This time we are a little too small — but much closer.
We can always do better by choosing the base point a closer to the x at which we want the
approximation.
√ √
Example 6.7.9. Find T2 (x) for f (x) = x at a = 144, and use it to estimate 150.
√
Solution: as above, we need to know 144 = 12, 1443/2 = 123 so
1 1 36 1
T2 (x) = 12 + (x − 144) − 3
(x − 144)2 = 12.25 − 3
= 12.25 − ∼ 12.247
24 2 · 4 · 12 8 · 12 384
which is correct to three decimal places!
181
√
The graph of y = x is in red. Its quadratic Taylor approximation at the base point a = 100 is
given in blue and its quadratic Taylor approximation at the base point a = 144 is given in black.
Compare the closeness of the approximation over a large interval to that of the linear
approximation.
So we can get a better approximation by choosing a base point closer to x; or we can get a better
approximation by choosing a higher order Taylor polynomial.
√ √
Example 6.7.10. Find T3 (x) for f (x) = x at base point a = 100, and use it to estimate 150.
Solution:
f 00 (a) f 000 (a)
T3 (x) = f (a) + f 0 (a)(x − a) + (x − a)2 + (x − a)3
2 3!
√ 1 −1 1 3
= a + √ (x − a) + 3/2
(x − a)2 + · 5/2 (x − a)3
2 a 2 · 4(a) 6 8a
1 1 1
= 10 + (x − 100) − (x − 100)2 + (x − 100)3
20 8000 16 · 105
5
= 12.1875 + ∼ 12.265.
64
√
So we can approximate x with a linear, quadratic, cubic or even higher order polynomial.
What happens if we do the cubic Taylor approximation of a cubic function?
Example 6.7.11. Find the cubic Taylor polynomial of f (x) = x3 at the point a = 8 and use it to
estimate (8.1)3 .
Solution: We have f (x) = x3 so f 0 (x) = 3x2 , f 00 (x) = 6x and f 000 (x) = 6. Thus f (a) = 83 = 512,
f 0 (a) = 3(64) = 196, f 00 (a) = 6(8) = 48 and f 000 (a) = 6. Then
48 6
T3 (x) = 512 + 192(x − 8) + (x − 8)2 + (x − 8)3 = 512 + 192(x − 8) + 24(x − 8)2 + (x − 8)3
2 3!
which gives
T3 (8.1) = 512 + 19.2 + 0.24 + 0.001 = 531.441
182
which is exactly spot on!
Amazing, right? Well, not so amazing. If you multiply out the formula for T3 (x) you get simply
x3 . So Taylor’s theorem in this case just gives us a really cool refactorisation of x3 which makes it
easy to evaluate near the base point.
6.7.4 More examples of Taylor approximations
Example 6.7.12. Find the 4th Taylor polynomial of f (x) = cos(x) at the base point a = 0.
Solution: We make a table:
1 (n)
function function evaluated at a = 0 Taylor polynomial coefficient n! f (a)
f (x) cos(x) 1 1
f 0 (x) − sin(x) 0 0
f 00 (x) − cos(x) -1 − 12
f 000 (x) sin(x) 0 0
1 1
f (4) (x) cos(x) 1 4! = 24
and conclude that

1 1 1 1
T4 (x) = 1 − (x − 0)2 + (x − 0)4 = 1 − x2 + x4 .
2 24 2 24
Remark 6.7.13. In fact, you can continue this pattern infinitely to get the Taylor series of f at
a = 0:
1 1 1 1
cos(x) = 1 − x2 + x4 − x6 + x8 ± · · ·
2 24 6! 8!
and yes, if you really could add all these infinitely many terms, the answer would completely equal
cos(x). Basically, this is how your calculator evaluates cos(x) (with x in RADIANS!!); it works
better than drawing triangles and measuring ratios...
Example 6.7.14. Find the third Taylor polynomial of ln(x) at base point a = 1 (you can’t use
a = 0 here!!) and use it to estimate ln(1.1).
Solution: We make a table
1 (n)
f (x) ln(x) 0 0
1
f 0 (x) 1 1
x
1 1
f 00 (x) − 2 -1 −
x 2
2 2 1
f 000 (x) 2 =
x3 3! 3
183
so we have
1 1
T3 (x) = 0 + (x − 1) − (x − 1)2 + (x − 1)3
2 3
and thus we estimate ln(1.1) by
1 1
T3 (1.1) = 0.1 − (0.01) + (0.001) = 0.0953.
2 3
With a calculator, we check that ln(1.1) = 0.0953101798; pretty fabulous approximation to have
done by hand.
Again, you can infer a pattern from the derivatives and figure out the Taylor series in this case.
√ √
Exercise 6.7.15. Estimate 6 using the cubic Taylor polynomial of f (x) = x at the base point
a = 4.
Exercise 6.7.16. Estimate sin(1) (radians!!!!) using the quintic Taylor polynomial of f (x) = sin(x)
at the base point a = 0.
Our final question would be: can we estimate the error
|f (x) − Tn (x)|
on the approximation, for a given value of x near a, if we do not know f (x)? This is a hugely
significant question, and absolutely essential in mathematical modeling. After all, in science it’s
not good enough to say that the answer is “near” 5; we need to say something like “it’s within the
range 5 ± 0.5”.
In brief: the answer is yes, we certainly can; Proposition 6.7.23, coming up below, is just the barest
of introductions to the subject, which you can explore in much greater depth in a subsequent course
on mathematical modeling or differential equations.
To get there, we need to talk about the second (of a trio) of theorems in Calculus.
6.7.5 The Mean Value Theorem
We know how to get from function to its derivative : take the limit of the slope of the secant line.
But soon we will want to go backwards: from the derivative back to the function. The first clue is
a deceptively simple theorem which will end up being the key to understanding how things work,
called the Mean Value Theorem.8
8
This is one of the grand trio of key theorems about continuous and differentiable functions: the Intermediate
Value Theorem, the Mean Value Theorem and the Extreme Value Theorem.
184
Theorem 6.7.17 (Mean Value Theorem). Suppose f is a function that satisfies the following
hypotheses:
(i) f is continuous on the closed interval [a, b] (and maybe more); and
(ii) f is differentiable on the interval (a, b) (and maybe more).
Then there is at least one (unknown) value c somewhere in the interval [a, b] for which
f (b) − f (a)
f 0 (c) = .
b−a
That is, this theorem says that for a nice function like f , at some point in the interval, the
instantaneous rate of change of f is equal to the average rate of change over the whole interval (or,
the slope of the secant line on that interval).
Let us agree that this is very plausible. If you drive from Ottawa to Montreal (200km) in 2 hours
(f (b) − f (a) = 200, b − a = 2) you could not have driven less than 100km/h (your average speed)
the whole time; nor could you have driven more than 100km/h the whole time. At some instant —
maybe just ONE instant in the whole 2 hours, maybe for lots of minutes during those two hours,
you were driving exactly 100km/h. Any one of those instants could be called c.
6.7.6 Applications of the Mean Value Theorem (optional)
Sometimes we can solve for c.

Example 6.7.18. Suppose f (x) = x3 on the interval [0, 1]. Then the slope of the secant line on
that interval is
f (b) − f (a) 13 − 03
m= = = 1.
b−a 1−0
The derivative of f (x) is f 0 (x) = 3x2 . So the derivative equals the slope of the secant line when
1 1
3x2 = 1 ⇐⇒ x2 = ⇐⇒ x = ±√ .
3 3
So setting c = √1
3
we have f 0 (c) = 1 and a ≤ c ≤ b, as required.
But where this is particularly interesting is when we can’t solve for c — the theorem still tells us
that c exists.
Example 6.7.19. Consider f (x) = sin(x) ln(x) on the interval [π, 2π]. Since f (π) = 0 and f (2π) =
0, the slope of the secant line is 0. Therefore, by the Mean Value Theorem, there is some number
c between π and 2π such that f 0 (c) = 0. If we calculate:
f 0 (x) = cos(x) ln(x) + sin(x)/x
we are completely stuck: it is impossible to solve f 0 (x) = 0 !! But we know a solution exists.
This special case of the Mean Value Theorem is so common that it has its own name.
185
Theorem 6.7.20 (Rolle’s theorem). Let f be a function and let [a, b] be an interval in the domain
of f . Suppose that
f is continuous on [a, b];
f is differentiable on (a, b); and
f (a) = f (b).
Then there is a number c ∈ (a, b) such that f 0 (c) = 0.
Notice that this is just the Mean Value Theorem in the case that m = 0.
Example 6.7.21. Suppose an object follows a straight line path, and occupies the same position
at two different moments in time — that is, there are times t1 < t2 such that s(t1 ) = s(t2 ).
It then follows that there was a time in between, say t3 , such that s0 (t3 ) = 0. But this is just saying
the velocity was zero at time t3 , such as when it turned around.
6.7.7 Proof of Rolle’s theorem and advanced applications of the Mean Value
Theorem (optional)
Proof of Rolle’s theorem. If f is a constant function, then in fact f 0 (c) = 0 for all c ∈ (a, b), so the
theorem is true but boring.
So let’s assume f is not a constant function. Then since it is continuous, by the Extreme Value
Theorem it attains its absolute maximum and its absolute minimum somewhere on [a, b].
If the absolute maximum is f (a) = f (b), then the absolute minimum has to be somewhere in the
middle, at a point c ∈ (a, b), meaning it is a local minimum. Since f is differentiable, then c is a
critical point with f 0 (c) = 0. Done.
Otherwise, the absolute maximum is somewhere in the middle, at a point c ∈ (a, b), meaning it is
a local maximum. Again, this means f 0 (c) = 0.
So no matter what: there is some point c ∈ (a, b) such that f 0 (c) = 0.
As a more advanced application of the Mean Value Theorem, we can prove the following result
about the function sin(x).
Proposition 6.7.22. For all a, b ∈ R,
| sin(a) − sin(b)| ≤ |a − b|.
Interpretation: the difference in the y-values of the function y = sin(x) is always less than or equal
to the difference in the x-values; i.e. the slope of any secant line is always less than or equal to 1.
That sounds good!
186
MAT 1330 : Fall 2020 6.8. STABILITY OF DISCRETE TIME DYNAMICAL SYSTEMS
Proof. The Mean Value Theorem applies to f (x) = sin(x), and says that for every a < b ∈ R there
is a number c ∈ [a, b] such that
f (b) − f (a)
f 0 (c) =
b−a
0
or, since f (c) = cos(c), we can rewrite this as
sin(b) − sin(a)
= cos(c).
b−a
Taking absolute values of both sides, this gives

sin(b) − sin(a)
= | cos(c)| ≤ 1.
b−a
Therefore
| sin(b) − sin(a)|
≤1
|b − a|
which gives
| sin(b) − sin(a)| ≤ |b − a|.
Since |x − y| = |y − x|, this is equivalent to the inequality we wanted to prove.
The Mean Value Theorem can also be used to figure out how far from the correct answer your
Taylor approximation can be, that is, to estimate the error |f (x) − Tn (f )|. As a simple example,
we do this for n = 0, which is the constant approximation T0 (x) = f (a).
Proposition 6.7.23. If for all x ∈ [a, b], we have m ≤ f 0 (x) ≤ M , then
m(x − a) ≤ f (x) − f (a) ≤ M (x − a)
for all x ∈ [a, b].
Proof. By the Mean Value Theorem, f (x)−f x−a

(a)
is equal to f 0 (c) for some c between a and x; by
0
hypothesis this number satisfies m ≤ f (c) ≤ M . Since x − a > 0 in this case, we can multiply
both sides by x − a and the inequalities keep the same direction. Doing this for all x gives the same
approximation, so the approximation holds for all x ∈ [a, b].
End of lecture # 15
6.8 Stability of Discrete Time Dynamical Systems
Recall: A DTDS is an iteration

xt+1 = f (xt )
over discrete time periods t = 0, 1, 2, . . .. A fixed point or equilibrium or steady state is a value x∗
that satisfies x∗ = f (x∗ ). A fixed point is called stable if all solutions from nearby initial values
converge to x∗ , meaning
lim xt = x∗ if x0 is near x∗
t→∞
187
and unstable otherwise.
Our method for distinguishing stable from unstable fixed points was cobwebbing. For example, if
we consider the DTDS
2xt
xt+1 = f (xt ) =
1 + xt
we see it has two fixed points, x∗1 = 0 and x∗2 = 1. If we draw the cobweb on some initial value
between the two fixed points, we get the following:
A cobweb for xt+1 = (2xt )/(1 + xt ).
which shows that x∗1 is unstable and x∗2 is stable. (Well, we should also check an initial value beyond
x∗2 — exercise.)
Goal: Find an analytical way to distinguish stable and unstable fixed points, that is, a method
that doesn’t rely on graphing and cobwebbing, or on numerical tests.
6.8.1 Stability of linear DTDS (recall)
A linear DTDS is a special kind of DTDS, where the updating function f (x) is a linear function
f (x) = rx + c. If the slope r 6= 1, then the corresponding DTDS has exactly 1 fixed point
c
x∗ = rx∗ + c ⇐⇒ (1 − r)x∗ = c ⇐⇒ x∗ = .
1−r
We had examined the stability of the fixed point in Section 3.6. Our conclusion was:
Note. For a linear DTDS xt+1 = rxt + c, with slope r 6= 1, the fixed point x∗ = c
1−r is stable if
the slope satisfies |r| < 1 and unstable if the slope satisfies |r| > 1.
6.8.2 Stability of general DTDS
We saw in Section 6.7.2 that we can approximate a function near a point like a = x∗ by its
linearization. So the stability of the fixed point should be determined by the stability of the fixed
point of the corresponding linear DTDS.
So the idea is: near a fixed point x∗ , the DTDS xt+1 = f (xt ) should have similar behaviour to the
DTDS yt+1 = L(xt ), where L is the linearization of f at x∗ .
188
What is the slope r of the linearization? Why, it’s the derivative evaluated at a = x∗ , that is,
r = f 0 (x∗ ), of course!
Example 6.8.1. Let’s check in our previous example. We have

2x
f (x) =
1+x
so
(1 + x)2 − 2x(1) 2
f 0 (x) = = .
(1 + x)2 (1 + x)2
We note that f (0) = 0 and f 0 (0) = 2 so the linearization of this DTDS at x∗ = 0 is
yt+1 = 2yt
and 0 is an unstable fixed point of this DTDS since r = 2 > 1. On the other hand, f (1) = 1 and
f 0 (1) = 0.5 so the linearization of this DTDS at x∗ = 1 is
yt+1 = 0.5(yt − 1) + 1
and 1 is a stable fixed point of this DTDS since r = 0.5 satisfies |r| < 1.
Note that in fact, in the preceding example, we didn’t need to actually write down the linearized
DTDS: it was enough to find the value of f 0 (x∗ ) to figure out how the stability would work out.
This leads to the following theorem.
Theorem 6.8.2 (Stability of Fixed Points). Suppose xt+1 = f (xt ) is a DTDS and x∗ is a fixed
point. We have:
if |f 0 (x∗ )| < 1, then x∗ is a stable fixed point; and
if |f 0 (x∗ )| > 1, then x∗ is an unstable fixed point.
If |f 0 (x∗ )| = 1, we can’t use this test; see further courses on differential equations.
The idea of the proof. To make the proof easier to read, let’s set
a = x∗ ,
the fixed point in question of the DTDS. Let’s write down the linearization of f at a = x∗ :
L(x) = f (a) + f 0 (a)(x − a)

= a + f 0 (a)x − f 0 (a)a using f (a) = a, a fixed point
= rx + c
where we have gathered our terms and see that
r = f 0 (a) and c = a(1 − f 0 (a)).
189
If xt is close to a, then since the linearization gives a good approximation of f near a we have
xt+1 = f (xt )
≈ L(xt )
= rxt + c
which is saying that xt approximately satisfies the linear DTDS
xt+1 = rxt + c
whose fixed point is also a (check this, using the formulas for r and c above!).
So if |f 0 (a)| = |r| < 1, then since a is a stable fixed point of the linear DTDS, it attracts all
solutions, and xt+1 will be closer to a than xt . So if we repeat the argument, xt+2 will be closer to
a, and so on. Thus it follows that limt→∞ xt = a, and the fixed point is stable.
But if |f 0 (a)| = |r| > 1, then xt+1 will be further away from a, so the approximation will get
worse, not better. Thus although we don’t know what limt→∞ xt is, we at least can be sure that
limt→∞ xt 6= a, and the fixed point is unstable.
Example 6.8.3. Suppose we have the DTDS
xt+1 = xt e−3xt .
Its fixed points are the solutions of x = xe−3x+1 so x∗ = 0 and 1 = e−3x+1 or x∗ = 31 . Are these
fixed points stable or unstable? By the theorem, we should evaluate f 0 (x∗ ) for each, and compare
its absolute value with 1.
Since f (x) = xe−3x+1 , f 0 (x) = e−3x+1 − 3xe−3x+1 = e−3x+1 (1 − 3x). So :
|f 0 (0)| = |e| > 1 ⇐⇒ x∗ = 0 is unstable, by the theorem
and
1 1
|f 0 ( )| = |e0 (0)| = 0 < 1 ⇐⇒ x∗ = is stable, by the theorem.
3 3
We can confirm this by drawing the graph of y = f (x) = xe−3x+1 and cobwebbing.
6.8.3 Example: Allee effect
Consider a population displaying the Allee effect, and described by the DTDS
3x2t
xt+1 = f (xt ) = .
1 + x2t
Find its fixed points and classify them according to their stability.
Solution: First, we find the fixed points. We solve
3x2
x = f (x) = ⇐⇒ x(1 + x2 ) = 3x2
1 + x2
190
so either x = 0 (the usual fixed point we expect) or 1 + x2 = 3x which means

√ √
2 3± 9−4 3 5
x − 3x + 1 = 0 ⇐⇒ x = = ± .
2 2 2
√
We note that 5 < 3 (or use a calculator) to see that this gives a total of three positive fixed
points:
3 1√ 3 1√
x1 = 0, x2 = − 5 ' 0.382, x3 = + 5 ' 2.618.
2 2 2 2
Now, we discuss stability of the fixed points. For that, we need to find f 0 (x):
(1 + x2 )(6x) − 3x2 (2x) 6x

f 0 (x) = 2 2
= .
(1 + x ) (1 + x2 )2
Therefore:
At x1 = 0, we have f 0 (0) = 0. Since |f 0 (0)| < 1, this is a stable fixed point. (So if our
population is too small, it dies out.)
At x2 , we have
6x2
f 0 (x2 ) =
(1 + x22 )2
6x2
= since x2 is a root of 1 + x2 = 3x, see above
(3x2 )2
6 2
= = ' 5.236 > 1
3x2 x2
so that x2 is an unstable fixed point.
At x3 , the same argument gives us

2
f 0 (x3 ) = ' 0.764 < 1
x3
so that x3 is a stable fixed point.
We can confirm this by cobwebbing on the graph:
191
6.8.4 Logistic growth
A population growing under logistic growth follows the dynamics of a DTDS of the form
xt+1 = rxt (1 − xt )
for some parameter r satisfying r > 0. Let’s find the fixed points and classify their stability.9
The updating function is

f (x) = rx(1 − x).
To find the fixed points, we solve x = f (x), which gives
x = rx(1 − x)
so the fixed points satisfy either x = 0 or else

r−1
1 = r(1 − x) ⇐⇒ 1 = r − rx ⇐⇒ rx = r − 1 ⇐⇒ x= .
r
This second fixed point is thus positive (and relevant) only if r > 1.
To determine stability, we compute the derivative of f ; since f (x) = rx − rx2 this is
f 0 (x) = r − 2rx.
So the fixed point x∗ = 0 gives f 0 (x∗ ) = f 0 (0) = r, meaning that it is stable if |r| < 1 and unstable
if |r| > 1. But we said at the beginning that r > 0; so we conclude that 0 is stable when 0 < r < 1
and is unstable when r > 1.
Now assume r > 1 and we have a second relevant fixed point x∗ = r−1 0 ∗
r . We have f (x ) =
r − 2(r − 1) = 2 − r. Thus x∗ is stable if |2 − r| < 1. How do we solve this?
First method: |2 − r| is the distance between 2 and r. So we need r to be less than one unit
away from 2, which means 1 < r < 3.
Second method: the numbers with absolute value less than 1 are the numbers between −1
and 1. Therefore we have
−1 < 2 − r < 1 ⇐⇒ −3 < −r < −1 ⇐⇒ 1 < r < 3.
On the other hand, x∗ is unstable if |2 − r| > 1. Remembering that r > 1, we deduce that this
comes down to r > 3.
Putting these together gives the several different cobwebbing scenarios we found in Example 3.7.4:
9
Notice that we can’t use cobwebbing to solve this question because to cobweb, we need to sketch the graph, which
means we have to choose a value of r.
192
range of r fixed points stability

0<r<1 0 stable
1<r<3 0 unstable
r−1
stable
r
r>3 0 unstable
r−1
unstable
r
We had to do some clever work with absolute values in this example.
Note. If f 0 (x∗ ) contains a parameter, then you need to solve the inequality
|f 0 (x∗ )| < 1
to find the range of parameters that give a stable fixed point. Rewrite this inequality as
−1 < f 0 (x∗ ) < 1.
You can also, if you prefer, write this latter as the pair of inequalities
f 0 (x∗ ) > −1 and f 0 (x∗ ) < 1.
Similarly, if f 0 (x∗ ) has parameters, to solve for the parameters that give |f 0 (x∗ )|, we rewrite it as
f 0 (x∗ ) < −1 OR f 0 (x∗ ) > 1. This time you have to solve the two inequalities separately; each one
gives you conditions for an unstable fixed point.
If x∗ is just a number, then you can apply the Stability of Fixed Points Theorem directly; there’s
nothing to solve for.
6.8.5 The Ricker equation
The logistic model incorporates a diminishing per capita rate of reproduction r(1 − xt ), reflecting
that the rate of reproduction goes down as the size of the population increases. But this model for
the reproduction rate is only valid for xt < 1 (or else it becomes negative).
A more sophisticated model would model the per capita reproduction rate with a function like
er(1−xt )
for some positive constant r. This function is always positive and decreasing. This leads to the
Ricker model which is often used in modeling the population in fisheries:
xt+1 = xt er(1−xt ) .
So here, f (x) = xer(1−x) is the updating function.
193
Find the fixed points and classify they by their stability.
We need to solve x = f (x) = xer(1−x) ; one solution is x = 0, the other must satisfy
1 = er(1−x) ⇐⇒ 0 = r(1 − x) ⇐⇒ x = 1.
To discuss stability, we differentiate:
f 0 (x) = er(1−x) + x(−r)er(1−x) = (1 − rx)er(1−x) .
Thus f 0 (0) = er > 1 for r > 0; this is always unstable and that small populations always grow.
For x∗ = 1, we compute f 0 (1) = (1 − r)er(1−1) = 1 − r. So this fixed point is stable if and only if
|1 − r| < 1
⇐⇒ −1<1−r <1
⇐⇒ − 2 < −r < 0 (subtracted 1 from all terms)
⇐⇒ 0<r<2 (multiplied all terms by −1, and changed the direction of inequalities).
So this fixed point is stable if 0 < r < 2 and unstable if r > 2.
The graph of f (x) = xe3(1−x) is given below. Try cobwebbing to see what kind of instability we
have at x∗ = 1 in this case. Both fixed points being unstable essentially drives the population into
chaos. In terms of what’s happening in the fishery: the fish reproduce so much one year that they
annihilate the resources in their environment, leading to a population crash the following year. The
population swings can seem completely random — the dynamics of this population are in fact an
example of the mathematical concept of chaos.
6.8.6 Harvesting and optimization : DTDS
Recall in Section 6.5.4 that we considered a population undergoing logistic growth, but being
harvested regularly at a rate h, which led to a DTDS of the form
xt+1 = 2.5xt (1 − xt ) − hxt
where h > 0 is a parameter which denotes the intensity of harvesting.
We started by assuming that this DTDS had a stable positive steady state x∗ , so that we could
talk about the yield of the harvest in the long term, which would be Y (h) = hx∗ .
194
Now, we can complete the problem by determining if the level of harvest that we chose gives a
steady state that is stable.
We begin by finding the steady states by solving x = f (x) = 2.5x(1 − x) − hx. This gives one
solution x = 0 and the other satisfies
1.5 − h
1 = 2.5(1 − x) − h ⇐⇒ 1 = 2.5 − 2.5x − h ⇐⇒ 2.5x = 1.5 − h ⇐⇒ x= .
2.5
As always, we first decide what the biologically relevant range is. This second fixed point is
nonnegative if 1.5 − h ≥ 0 or h ≤ 1.5. Since h is a rate of harvesting, h ≥ 0. So the region of
interest is
0 ≤ h ≤ 1.5.
Next, we discuss stablity.
We calculate f (x) = 2.5x − 2.5x2 − hx and differentiate with respect to x to get
f 0 (x) = 2.5 − 5x − h.
1.5 − h
When x∗ = this gives
2.5

0 ∗ 1.5 − h
f (x ) = 2.5 − 5 − h = 2.5 − 2(1.5 − h) − h = 2.5 − 3 + 2h − h = −0.5 + h
2.5
so the steady state arising from harvesting is stable only if | − 0.5 + h| < 1 or
−1 < −0.5 + h < 1
or (adding 0.5 to all three terms):

1 3
− <h< .
2 2
We conclude that, among the harvesting rates that give nonnegative steady states (0 ≤ h ≤ 32 ),
all give stable steady states. (For − 21 < h < 0, these steady states are also stable, but do not
correspond to the biologically relevant part of our model.)
In particular, the harvesting rate we found, of h = 0.75, falls in the good range.
Exercise 6.8.4. In the above example, the range of values of h for which the nonzero steady state
was stable was wider than the range of values of h for which the nonzero steady state was positive.
Find the range of h ≥ 0 that make the nonzero steady state x∗ of the following DTDS (a) positive
(b) stable:
xt+1 = 2xt (2 − xt ) − hxt
and notice that this time it is possible to choose h to give a nonstable positive steady state. This
would be a very problematic rate of harvesting, as it would give a different yield each year, in some
kind of chaotic dynamics, and the potential of extinction.
End of lecture # 16
195
MAT 1330 : Fall 2020 6.9. THE INTERMEDIATE VALUE THEOREM
6.9 The Intermediate Value Theorem
The final application of differentiation we’ll consider is about solving equations. We know a great
many algebraic techniques, but occasionally come across equations that are impossible to solve
algebraically. Along the way, we’ll meet the third of our trio of theorems in Calculus.
Motivation: Find the fixed points of the DTDS xt+1 = e−xt .
Solution: We have to solve x = e−x . But applying ln gives ln(x) = −x, which is just as difficult.
We look at the graph:
The graphs of y = e−x (in blue) and y = x (in red). They have a unique point of intersection near
x = 0.55.
We agree that there is a solution. Actually, we can do better than that.
Strategy: Form the function f (x) = x − e−x . Then
f (0) = 0 − e0 = −1 < 0 and f (1) = 1 − e−1 > 0,
so there should be a number c, with 0 < c < 1, such that f (c) = 0.
In other words, if our function is negative in one place and positive in another, then it must have
gone through 0 ... right?
Caution: Imagine our function f had a graph like one of the following.
On the left, the graph of a discontinuous function y = f (x) which satisfies that f (0) < 0 and
f (1) > 0 but there is no number c between 0 and 1 such that f (c) = 0. On the right, the graph of
f (x) = 1/x, which satisfies f (−1) < 0 and f (1) > 0 but there is no x for which f (x) = 0.
It is important that the function be continuous on your interval for this strategy to work!
196
Theorem 6.9.1 (Intermediate Value Theorem). Suppose that f is a function which is continuous
on the interval [a, b], and that y is a number between f (a) and f (b). Then there is a number
c ∈ [a, b] such that f (c) = y.
The idea of the theorem is: if your function is continuous on [a, b], then if you draw the curve from
(a, f (a)) to (b, f (b)), you have to cross every horizontal line (every y-value) between f (a) and f (b).
You can’t avoid solving f (c) = y.
The graphs of a continuous function f (x) and two marked points in red (−2, −18) and (6, 43.12),
whose y-values are marked on the y-axis in green. The Intermediate Value Theorem says: every
horizontal line between y = f (a) and y = f (b) has to cross the graph at least once, at an x-value
lying between a and b.
Remark 6.9.2. This is a different application of an idea we have used before in a completely
different context: if we have an interval (a, b) and f has no critical points in that interval (meaning
f 0 (x) is never zero or undefined) then necessarily f 0 is either always positive or always negative on
that interval. In other words: if f 0 changes sign on an interval, then that interval must contain a
critical point of f . This is the Intermediate Value Theorem, applied to f 0 !
Now the theorem says that if your function is continuous on an interval and changes sign from one
endpoint to the other, then it must cross the x-axis somewhere in between.
But WHERE? How do we find this root?
6.9.1 Classic solution: Bisection Method
So let’s apply the Intermediate Value Theorem to our example, to solve x = e−x .
The function f (x) = x−e−x is defined and continuous on all of R, so in particular on any subinterval.
Tip: in general, watch out for asymptotes!!
We have f (0) < 0 and f (1) > 0 so there is a root of f (that is, a value x such that f (x) = 0)
between 0 and 1.
We compute f (0.5) = −0.11 < 0. So there is a root of f between 0.5 and 1.
We compute f (0.75) = 0.28 > 0. So there is a root of f between 0.5 and 0.75.
197
We compute f (0.625) = 0.09 > 0. So there is a root of f between 0.5 and 0.625.
We compute f (0.57) = 0.004, which is equal to zero to two decimal places, which we think is
good enough.
Our method yields the estimate of 0.57 for the solution to x = e−x .
This method is a good classic, but it is slow.
6.9.2 Sophisticated solution: Newton’s Method
(So named because he used it very effectively.)
Let’s use information about the derivative and the tangent line to make better guesses at a root of
f (x) = x − e−x . The idea is in the following picture, where we try the root of the linearization as
a guess at the root of the function.
The graph of f (x) = x − e−x in black. An initial guess x0 = 1 gives a point on the curve y = f (x).
We draw the tangent line to the curve and see where it intersects the x-axis.
1. Make an initial guess x0 , so that f (x0 ) ≈ 0.

2. Write down the equation of the tangent line of f at (x0 , f (x0 )).
3. Let x1 be where this tangent line intersects the x-axis.
4. Take x1 as your next guess, and repeat steps 2 to 4 until f (xn ) is as close to zero as you need.
Actually, if we just work this out in general, we’ll get a simple formula.
So given f (x), our goal is to solve f (x) = 0. We assume we have used the Intermediate Value
Theorem to help us make an initial guess x0 .
The equation of the tangent line to f at x0 is
L(x) = f (x0 ) + f 0 (x0 )(x − x0 ).
198
Solve for x such that L(x) = 0:
f (x0 )
0 = f (x0 ) + f 0 (x0 )x − f 0 (x0 )x0 ⇐⇒ f 0 (x0 )x = f 0 (x0 )x0 − f (x0 ) ⇐⇒ x = x0 − .
f 0 (x0 )
Therefore:
f (x0 ) f (x1 )
x1 = x0 − , x2 = x1 − , ...
f 0 (x0 ) f 0 (x1 )
that is, we have in effect created a DTDS which ought to converge to the root we were looking for!
Note. Newton’s method: To solve f (x) = 0 with initial guess x0 , use the iterative formula
f (xn )
xn+1 = xn − , i = 0, 1, 2, 3, . . .
f 0 (xn )
until your numbers agree to the accuracy you need.
Memorize this formula correctly, or else remember how to derive it. Wrong formula = garbage.
Example 6.9.3. Solve x = e−x with an accuracy of three decimal places using Newton’s method.
Solution: We first have to convert the problem into a root-finding problem. Let
f (x) = x − e−x .
Then the roots of f are the solutions to the equation x = e−x that we are looking for. Good; we
can apply Newton’s method. We compute
f 0 (x) = 1 + e−x .
So our Newton’s method formula is

f (xn ) xn − e−xn
xn+1 = xn − = x n − .
f 0 (xn ) 1 + e−xn
Start with x0 = 0.57, which is the best guess we got with our bisection method. Then
0.57 − e−0.57
x1 = 0.57 − = 0.56714181501
1 + e−0.57
x1 − e−x1
x2 = x1 − = 0.5671432904
1 + e−x1
and we stop, because our solution is already accurate to 5 decimal places!
√
Example 6.9.4. Find 2 to 3 decimal places using Newton’s method.
√
Solution: We need to create a nice function with 2 as a root; f (x) = x2 − 2 is a good choice.
Then we need a first guess; x0 = 1 is close. Our formula is
f (xn ) x2n − 2
xn+1 = xn − = x n −
f 0 (xn ) 2xn
199
so that gives
1−2
x1 = 1 − = 1.5
2
2.25 − 2
x2 = 1.5 − = 1.4166667
3
x3 = 1.41421568633
x4 = 1.41421356237
which is 4 decimals; in fact x5 = 1.41421356237 = x4 is already the maximum precision my
calculator can do.
6.9.3 Discussion: So why does it work? Can it fail?
The iterative formula for Newton’s method is

f (xn )
xn+1 = xn − .
f 0 (xn )
So if f 0 (xn ) is not near zero, but f (xn ) is near zero, then we are correcting our guess by the small
amount each time, and by our graph, we see we are getting better and better.
If f 0 (xn ) is near zero — that is, if we have a critical point near our guess — then Newton’s method
can just spike off and give ridiculous numbers. You can tell when that happens: xn is nowhere near
xn+1 , and your value of f 0 (xn ) was suspiciously small.
In particular, if x is a root of both f and f 0 (like a double root of a polynomial) then you shouldn’t
do Newton’s method to f — do it to f 0 instead!
Another possible failure: if you apply Newton’s method to a function with many roots, it could
happen that it converges to a different root than the one you wanted. The solution is to make as
good a first guess x0 as possible, so that your function more or less looks like a straight line from
the root to your guess (no local maxima or local minima in between).
So: Newton’s method can fail, but only for reasons that you should have noticed when you started.
Finally: using a computer, it’s very fun and easy to get maximum precision. Use a spreadsheet,
for example, and just cut and paste the formula from one line to the next. Therefore, as long as
you can differentiate the function, you never have to worry about solving an equation again.
(Newton himself was pretty happy about that.)
End of lecture # 17
200
Chapter 7
Integration
We have explored lots of applications of the derivative; now it’s time to turn the story around and
consider anti-derivatives. We begin with some motivation, and setting the stage for why we really
want to do this.
7.1 Introduction
7.1.1 Motivation for Differential Equations
Utility companies measure the rate of flow of water (or gas, or electricity) into your home in
litres/second (respectively, m3 /second, Watts = joules/second). In the end, though, they bill you
for the total amount consumed in litres (respectively, m3 , kWh). In other words: they measure the
instantaneous rate of change and use that to compute the total amount used. This is the inverse
process of differentiation.
Examples abound:
f (x) f 0 (x)
value differentiation rate of change
position −→ velocity
mass growth rate
volume ←− flow rate
amount anti-differentiation (new) production rate
But wait: does this really make sense? Think of a car:
If my speedometer is broken, I can use my GPS to find out my speed: it knows my position
each second and can use that to estimate my velocity. So knowing your position as a function
of time implies knowing your velocity.
If my GPS is broken, I cannot use my speedometer to find out where I am. The speedometer
could tell me exactly the same velocity function whether I was driving in B.C. or Newfound-
201
MAT 1330 : Fall 2020 7.1. INTRODUCTION
land. BUT if you told me that we started in Ottawa and drove along the TransCanada
Highway west for 22 hours at a steady speed of 100 km/h, then I know that we are now in
Winnipeg.
In math terms:
Given f (x), we can find f 0 (x).
Given f 0 (x), and the value of f (a) for some a, we can find f (x).
Example 7.1.1. Suppose that the volume of a cell increases continuously at a rate of 2µm3 per
second. What is the volume of the cell after 3 seconds, if the cell starts at a volume of 1µm3 ?
Solution: Denote V (t) = volume of the cell. We are given that V 0 (t) = 2µm3 /s. Since the derivative
is a constant, the function must be linear, with slope 2, which gives:
µm3
V (t) = 2 · t + V (0),
s
and this initial value is V (0) = 1µm3 . So our solution is
V (t) = (2t + 1)µm3 .
Notice that:
The units work out! Rate in µm3 /s times time in s gives µm3 .
Different initial conditions give different volumes — of course! Without the initial condition,
we couldn’t tell you V (t) just from the rates of change.
Our answer: after 3 seconds, the volume is V (3) = 7µm3 .
Remark 7.1.2. Problems like the above — and like we will do in the rest of MAT1330 this term —
are called pure-time differential equations. Next term, we will consider also autonomous differential
equations, in which the derivative is given in terms of the function itself.
For example, if the amount x(t) of radioactive material decreases at a continuous rate of 1% per
day, and we start with 10 grams, then this is telling us that
x0 (t) = −0.01x(t), x(0) = 10. (7.1)
That is: we don’t know the actual rate of change at any given time. Instead, we know the rate
of change based on how much there is; but we’re trying to find out how much there is, so it feels
circular! (Don’t worry, it works out: MAT1332.)
If it wasn’t the derivative, but instead just the average rate of change each day, then you would
replace x0 (t) with x(t+1)−x(t)
(t+1)−t on the left, and the result would be the DTDS x(t + 1) = 0.99x(t)
202
(which we would write xt+1 = 0.99xt ). Notice that this DTDS would give a rough approximation to
the correct answer, but we agree that the autonomous differential equation (7.1), which takes into
account that the growth compounds continuously, should give the more accurate answer.
In MAT1332, we’ll show that the solution to (7.1) is x(t) = 10e−0.01t ; for now, you should be able
to verify that x0 (t) = −0.01x(t) and that x(0) = 10, which means it is a solution to (7.1).
In MAT1330, we are concerned with so-called pure-time differential equations, like

x0 (t) = et , x(0) = 5,
or (since the independent variable doesn’t have to be called t):
f 0 (x) = 3x2 + 4, f (0) = 7.
In this case, a solution is called an anti-derivative of f satisfying the given initial condition. If
no initial condition is specified, then any function F (x) satisfying F 0 (x) = f (x) is called an anti-
derivative of f .
7.1.2 Could there be more than one anti-derivative satisfying a given initial
condition?
First, an intuitive argument: Let’s work with some graphs to see how it’s possible to go backwards
from the derivative to the function, given an initial condition.
dV
Example 7.1.3. Suppose = 2 and
dt
V (0) = 1. We have sketched the graph of the
derivative (the red horizontal line), together
with the initial value (0, 1) (red dot). Start-
ing at the red dot, draw a curve (in blue) with
slope equal to V 0 (t) at each point t (which in
this case, means a line with constant slope 2);
this is the graph of V (t).
dV
Example 7.1.4. Suppose = 4 − 2t and
dt
V (0) = 2. We have sketched the graph of the
derivative (the red line), together with the ini-
tial value (0, 2) (red dot). Starting at the red
dot, draw a curve (in blue) with slope equal
to V 0 (t) at each point t (so: initially steep,
with slope 2, but this slope decreases as t in-
creases); this is the graph of V (t).
203
Now, a proof. Suppose you had two functions F (x) and G(x) that both satisfied F 0 (x) = f (x) =
G0 (x). Let’s look at h(x) = F (x) − G(x). This is a function with derivative
h0 (x) = F 0 (x) − G0 (x) = f (x) − f (x) = 0.
What could it be? By the Mean Value Theorem applied to the function h(x) on the interval [a, b],
there is a point c between a and b so that
h(b) − h(a)
h0 (c) = .
b−a
But since h0 (x) = 0 for all x, we have h0 (c) = 0. Therefore after simplifying, we get that h(b) = h(a).
This is true for every such b, so in fact h is a constant function (with graph a horizontal line).
Conclusion: if F and G are two anti-derivatives of f on an interval, then they are almost the
same; in fact, there is a constant C such that
F (x) = G(x) + C
i.e. the graphs of the two anti-derivatives are identical, up to a vertical shift.
In particular, if they satisfy the same initial condition, then C = 0.
7.1.3 Anti-differentiation
Let’s begin by being clear about what we are looking for: we are given a function f (t) and want to
find a new function F (t) such that F 0 (t) = f (t).
Definition 7.1.5. An antiderivative of a function f (t) is a function F (t) with the property that
F 0 (t) = f (t). If F is one anti-derivative of f , then so is F + c, for any constant c. We write
Z
f (t) dt = F (t) + c,
which we read out loud as “the integral of f of t dt isR F(t) plus an arbitrary constant c.” In this
expression, the function f is called the integrand and f (t) dt is called the indefinite integral of f .
We will meet the “definite integral”, which is a number representing the area between the graph of
f and the x-axis, at the end of MAT1330.
Note. So an antiderivative is one function whose derivative is f ; the indefinite integral is the set
of all functions whose derivative is f .
Example 7.1.6. Suppose f (t) = 1. Then F (t) = t as an antiderivative; and F (t) = t + 5 is another
antiderivative. The indefinite integral is
Z Z
f (t) dt = 1 dt = t + c with c ∈ R.
We can turn our differentiation rules into rules for anti-derivatives. Let’s start by making a list of
anti-derivatives of common functions.
204
The power rule
Recall that
d n+1
t = (n + 1)tn ;
dt
therefore,
Note. “Anti power-rule”:

Z
1 n+1
tn dt = t + c, if n 6= −1.
n+1
Examples: Z
1
t3 dt = t4 + c
4
Z
1
x5 dx = x6 + c
6
√
Z Z
1 1 2
x dx = x1/2 dx = 1 x 2 +1 + c = x3/2 + c
2 +1 3
Z Z
1 1
dt = t−2 dt = t−2+1 + c = −t−1 + c.
t2 −2 + 1
Constant multiple rule and sum rule
d d
If a is a constant, then dx (aF (x)) = a dx (F (x)). Therefore, thinking about what this says about
anti-derivatives, we get the rule
Note. Z Z
af (x) dx = a f (x) dx for any constant a.
We also have that (F + G)0 = F 0 + G0 , so turning this around gives
Note. Z Z Z
f (x) + g(x) dx = f (x) dx + g(x) dx.
Examples:
Z Z Z
2 2 1 1 7
(3x − 7x) dx = 3 x dx − 7 x dx = 3 x3 − 7 x2 + c = x3 − x2 + c.
3 2 2
√
Z Z Z
3 4 −1/2 1 1/2 1 2
√ + 3 dx = 3 x dx + 4 x−3 dx = 3 x + 4 x−2 + c = 6 x − 2 + c.
x x 1/2 −2 x
205
But watch out!

3 + 4x2 dx + 4 x2 dx
Z R R
3
√ dx 6= R√ !!!!!!!!!!!!!!!
x x dx
The reason is: we most DEFINITELY did NOT say that the integral of a quotient is a quotient of
the integrals, because that’s false. The derivative of a quotient is MUCH MORE INTERESTING
that just the quotient of the derivatives.
So the way to CORRECTLY solve the final indefinite integral above is to simplify it algebraically
until it is in a form we can handle:
3 + 4x2 4x2
Z Z
3
√ dx = √ + √ dx
x x x
Z
= 3x−1/2 + 4x3/2 dx
Z Z
= 3 x−1/2 dx + 4 x3/2 dx
! !
1 (− 12 +1) 1 ( 32 +1)
=3 x +4 3 x +c
− 21 + 1 2 +1
3 1/2 4 5/2
= x + x +c
1/2 5/2
√ 8
= 6 x + x5/2 + c.
5
And of course we check the answer by quickly differentiating.
Note. Tip: always check your final answer by differentiating. It is very easy to mess up the
coefficients, but if you differentiate and your answer is 34 x−1/2 instead of 3x−1/2 , for example, you
know that you were off by a factor of 4 — and you can fix it, without trying to just repeat your
calculation line by line.
Note. If when you differentiate you get a totally different function than what you were supposed
to get, then you know you did something wrong. Try again.
206
Special functions
Note.
Z
Since (ex )0 = ex , we have et dt = et + c;
Z
Since (sin(x))0 = cos(x), we have cos(x) dx = sin(x) + c
Z
Since (cos(x))0 = − sin(x), we have sin(x) dx = − cos(x) + c
Z
1
Since (arctan(x))0 = 1
1+x2
, we have dx = arctan(x) + c.
1 + x2
Z
1
Since (arcsin(x))0 = √ 1
1−x2
, we have √ dx = arcsin(x) + c.
1 − x2
We also know that (ln(x))0 = 1/x — but this one is annoying. The domain of ln(x) is (0, ∞)
whereas the domain of 1/x is all real numbers except 0. But there turns out to be an easy solution.
Example 7.1.7. The function f (x) = ln |x| satisfies

(
ln(x) if x > 0
f (x) = ln |x| =
ln(−x) if x < 0
so if we differentiate, we get (
1
if x > 0
f 0 (x) = x
1 1
−x (−1) = x if x < 0
which is just perfect!
Remark 7.1.8. There is another little glitch: since there is a gap in the domain of ln |x|, you could
vertically shift the two parts of the graph independently and still have the same derivative. So the
indefinite integral of x1 is the function
(
ln |x| + c1 if x > 0
Z
1
dx = (7.2)
x ln |x| + c2 if x < 0
where c1 and c2 are constants which might not be equal to each other.
Note. This is too much bother! We will be lazy and just write
Z
1
dx = ln |x| + c
x
and if we are ever given a question with two initial conditions (one on each half of the domain)
then we can revert to (7.2).
207
In that same spirit, for all the functions with vertical asymptotes, we use the same shorthand
notation (understanding that in the unlikely event that we need to adjust the vertical shifts in
different parts, we can, by writing them as spliced functions):
Note.
Z
sec2 (x) dx = tan(x) + c
Z
sec(x) tan(x) dx = sec(x) + c
Z
csc2 (x) dx = − cot(x) + c
Z
csc(x) cot(x) dx = − csc(x) + c
In fact, you could write down lots of rules like this, but it gets a little out of hand. For example,
Z
since (xex )0 = ex + xex we know that (ex + xex ) dx = xex ,
Z
2 2 2 2
since (ex )0 = 2xex we know that 2xex dx = ex .
But we can’t possibly write down all the possible rules. What we need are methods to undo the
chain rule and the product rule (coming up in the next two lectures).
Before that, let’s tackle some applications.
7.1.4 Applications of Anti-differentiation
Let’s explore some examples where anti-differentiation arises naturally.

Example 7.1.9. In 1981, there were about 340 cses of HIV infection in the USA. In the years fol-
lowing, the number of cases grew at a continuous rate of around 500t2 /year, where t = 0 corresponds
to 1981 and t is measured in years. How many cases were there in 1991?
Solution: Let A(t) = number of cases, with A(0) = 340. We are told that A0 (t) = 500t2 .
Therefore, Z Z
0 500 3
A(t) = A (t) dt = 500t2 dt = t +c
3
500 3
for some constant c. Since A(0) = 340, plugging in t = 0 gives c = 340. Therefore A(t) = 3 t +340.
We want to know the number of cases in 1991, which corresponds to t = 10. So plug in:
500
A(10) = (10)3 + 340 ∼ 167, 000.
3
208
Example 7.1.10. A bucket falls from a window cleaner’s platform, and experiences constant
acceleration due to gravity of a = −9.8m/s2 .
Recall: the rate of change of position is velocity, and the rate of change of velocity is acceleration.
Suppose the platform is 49m up and the initial speed of the bucket is 0.
(a) Find the equation for the position p(t) of the bucket.
(b) Where is the bucket after 1 second?
(c) When will it hit the ground?
(d) At what speed will it hit the ground?
Solution:
(a) Write v(t) for the velocity of the bucket. So

Z
v(t) = a dt = −9.8t + c
and since v(0) = 0, we conclude c = 0. Next, we have

−9.8 2
Z Z
p(t) = v(t) dt = −9.8t dt = t + c0
2
and since p(0) = 49, we conclude that c0 = 49. Therefore an equation for the position of the
bucket is
p(t) = −4.9t2 + 49.
Check the units: 9.8m/s2 times t2 seconds2 gives m, good.
(b) After 1 second, the position will be p(1) = −4.9 + 49 = 44.1 m above the ground.
(c) It will hit the ground when its position is 0. So we solve p(t) = 0 for t, which gives
√
0 = −4.9t2 + 49 ⇐⇒ 4.9t2 = 49 ⇐⇒ t2 = 10 ⇐⇒ t = ± 10s;
since we are going forward in time, the positive solution is our answer, giving t ∼ 3.2s.
√
(d) Since it hits the ground after 10 seconds, its velocity at that time is
√ √
v( 10) = −9.8( 10) ∼ −31m/s
(negative because downwards), which is about 112 km/h.
Example 7.1.11. (Buckets, continued) Suppose now that the window cleaner tosses the bucket
straight upwards towards another platform, but it misses and falls to the ground. If the bucket is
thrown with an initial velocity of 10m/s,
209
(a) When will it reach its highest point?
(b) How high will this be?
(c) When will it hit the ground?
(d) How fast will it be going?
Solution: Again, the only force exerted on the bucket is gravity, so we have
Z Z
v(t) = a(t) dt = −9.8 dt = −9.8t + c.
Since v(0) = 10, we have c = 10, so v(t) − 9.8t + 10 m/s. Next, we solve
−9.8 2
Z Z
p(t) = v(t) dt = −9.8t + 10 dt = t + 10t + c0 ,
2
and since p(0) = 49, we have c0 = 49. Therefore the equation of motion for the bucket is
p(t) = −4.9t2 + 10t + 49.
(a) The highest point is attained where the bucket stops for a moment, that is, when v(t) = 0. We
solve
10
v(t) = 0 ⇐⇒ −9.8t + 10 = 0 ⇐⇒ t = ' 1.02s.
9.8
(b) Its position at this time is p(10/9.8) = −4.9(10/9.8)2 +10(10/9.8)+49 ∼ 54.1m, which is about
5 m above the platform.
(c) It hits the ground when its position is equal to 0:

√
2 −10 ± 100 + 960.4
−4.9t + 10t + 49 = 0 ⇐⇒ t= = 1.02 ± 3.32
−9.8
and we choose the positive root, which yields t ' 4.34s.
(d) Its speed when it hits the ground will be v(4.34) ' −32.6 m/s, which is about 117 km/h.
Example 7.1.12. (Buckets, continued) Suppose now that the window cleaner tosses the bucket
straight upwards towards another platform 10m higher, and it gets to exactly the correct height
but it isn’t caught and the bucket falls to the ground.
When will it reach its highest point?
Solution: Again, the only force exerted on the bucket is gravity, so we have
Z Z
v(t) = a(t) dt = −9.8 dt = −9.8t + c.
210
MAT 1330 : Fall 2020 7.2. TECHNIQUES OF INTEGRATION: SUBSTITUTION
This time we don’t know anything about v(0), so we just have to continue as is. We have
−9.8 2
Z Z
p(t) = v(t) dt = −9.8t + c dt = t + ct + c0 ,
2
where we’ve remembered the constants are most likely different, and c is just a number so the
constant multiple rule applied.
We are given that p(0) = 49, so c0 = 49. Therefore we have two functions:
p(t) = −4.9t2 + ct + 49 and v(t) = −9.8t + c.
The highest point is attained where the bucket stops for a moment, that is, when v(t) = 0. We
solve
c
v(t) = 0 ⇐⇒ −9.8t + c = 0 ⇐⇒ t = .
9.8
So at time t = c/9.8, we’re at the highest point, which according to the question is 10 m above the
platform, so at position p(t) = 49 + 10 = 59 m above the ground. Using our equation for p(t) with
t = c/9.8 we get

2 2 −4.9 1
59 = −4.9(c/9.8) + c(c/9.8) + 49 ⇐⇒ 10 = c + ⇐⇒ c2 = 196
9.82 9.8
so c = ±14. Since t = c/9.8 is the time when it reaches the max, and time is positive, we deduce
that c = 14 and t = 14/9.8 = 10/7 s.
End of lecture # 18
7.2 Techniques of integration: Substitution
So far we can only calculate indefinite integrals when the integrand is a function whose anti-
derivative we already know. This is a fairly small list (see Table 7.2.1 in the textbook, for example).
Today we’ll learn and practice a method which is based on undoing the chain rule; we call it the
method of substitution.
Recall that the chain rule tells us that

Z
f 0 (g(x))g 0 (x)dx = f (g(x)) + c.
Example 7.2.1. Consider Z

2
2xex dx.
Try g(x) = x2 , then g 0 (x) = 2x, so indeed:

Z Z
2 2
ex 2x dx = eg(x) g 0 (x) dx = eg(x) + c = ex + c.
But how would we recognize that our integrand has this special form? This is where the special
notation of integrals comes in handy.
211
7.2.1 The method of substitution:
1. Define a new variable u = g(x) (typically the innermost function).

2. Differentiate and write:
du
= g 0 (x) ⇐⇒ du = g 0 (x) dx .
dx
This is just notation — a clever way to keep track of things.
3. Write the entire integral in terms of u and du:
Note. Z Z
0 0
f (g(x))g (x) dx = f 0 (u) du.
This can be hard.

You typically start by finding and replacing g 0 (x)dx in your integral with du and then
transforming every other occurrence of x into a u using your equation u = g(x).
Important: the resulting integral CANNOT have a mix of x and u: it must all be in one
variable.
If you cannot find a way to do it, then you cannot do the substitution (which implies that
in fact your integrand did not come from a chain rule with g(x) as the innermost function,
which of course is reasonable).
4. Then integrate: Z
f 0 (u) du = f (u) + c
5. Then substitute back u = g(x):

= f (g(x)) + c.
6. Check your math by differentiating!
Let’s do some examples.

2
Example 7.2.2. Back to 2xex dx: With the substitution u = x2 we get du = 2x dx , we get
R
Z Z Z Z
x 2 u
x2 2
2xe dx = e 2x dx = e du = eu du = eu + c = ex + c.
We check by differentiating: yes!

e3x dx.
We try u = 3x, which gives du = 3 dx. We don’t have a 3 in the integral, but it’s just a constant,
so we can write dx = 13 du. That gives:
Z Z Z
3x u 1 1 1 1
e dx = e du = eu du = eu + c = e3x + c.
3 3 3 3
212

1
dx
1 + 5x
We try the substitution u = 1 + 5x which gives du = 5dx or dx = 15 du so:
Z Z Z
1 1 1 1 1 1 1
dx = ( ) du = du = ln |u| + c = ln |1 + 5x| + c.
1 + 5x u 5 5 u 5 5
R
Example 7.2.5. Given cos(2π(x − 1)) dx, we try the substitution u = 2π(x − 1) which gives
1
du = 2πdx or dx = 2π du so
Z Z
1 1 1
cos(2π(x − 1)) dx = cos(u) du = sin(u) + c = sin(2π(x − 1)) + c.
2π 2π 2π
1
Note. It is nice and easy to make a linear substitution like u = mx + b because then dx = m du,
so it always works.
Example 7.2.6. Find

sec2 (1/x)
Z
dx.
x2
Again, the trickiest part is sec2 (1/x) and we know that sec2 (u) has a nice anti-derivative; so we try
1 1
u = 1/x ⇐⇒ du = − dx ⇐⇒ dx = −du
x2 x2
1
(and we wrote this because we saw the x2
dx sitting there in the integral) which gives
sec2 (1/x)
Z Z
1
dx = sec2 (1/x) dx
x2 x2
Z
= sec2 (u)(−1)du
Z
= − sec2 (u)du
= − tan(u) + c
= − tan(1/x) + c.
Check by differentiating: yes!
Remark 7.2.7. What happens if you pick a “wrong” substitution? What does it mean to say that
it “doesn’t work”? Consider again
sec2 (1/x)
Z
dx.
x2
213
√
If you’d tried u = x2 , then du = 2xdx. There is no “2x” in the integrand; so we can say x = u
√ √
so du = 2 udx or dx = 2√1 u du, fine. But we still have a 1/x — which is 1/ u. So OK, it worked:
√ √
sec2 (1/x) sec2 (1/ u) sec2 (1/ u)
Z Z Z
1
dx = · √ du = du
x2 u 2 u 2u3/2
which is definitely worse, not better.
Alternately, maybe you’d try u = sec(1/x), so du = − sec(1/x)

x2
tan(1/x)
dx ... which is awful, because
we don’t have any of these functions in the integrand, so we probably give up rather than starting
to look at 1/x = arcsec(u).
Moral: keep your options open, and keep thinking critically as you go along.
7.2.2 Other situations where you might try substitution
Example 7.2.8. Find

e−3t
Z
dt.
e−3t + 1
We can do the substitution u = −3t, but it doesn’t get us to a substantially better integral (try
it!) There isn’t an obvious composition of functions in this case. But the thing which is making
this integral difficult is that there is a sum in the denominator, and we notice that the derivative
of u = e−3t + 1 is just du = −3e−3t dt , which (up to a constant) is right there in the numerator.
So we get
Z Z
1 −3t 1 1
e dt = (− 31 )du = − ln |u| + c = ln |e−3t + 1| + c = ln(e−3t + 1) + c
e−3t +1 u 3
where in the last step we noticed that e−3t +1 > 0 for all t so the absolute value sign was superfluous.
We check by differentiating: yes!
Substitution is a great thing to try whenever you see both a function and its derivative in the
integrand.
Example 7.2.9. Find Z
arctan(x)
dx.
1 + x2
This time, there is no composition of functions, so it’s not obvious that substitution is the thing
to do. But we notice that the integrand is a product of a function u = arctan(x) and its derivative
1
du = 1+x 2 dx, so we will just go ahead and make the substitution and see what happens:
Z Z
arctan(x) 1 1
2
dx = u du = u2 + c = (arctan(x))2 + c.
1+x 2 2
We check by differentiation.
So what happened here was: we couldn’t see the term f 0 (g(x)) that normally clues us in because
in this case f (u) = 21 u2 and so f 0 (u) = u.
214
Note. Be willing to try a substitution, to see if it can work out.

sin(ln(3x))
dx.
x
The integrand is a composition of functions (in fact, of three functions). We could try u = 3x, or
u = ln(3x). Let’s choose this second one1
1 1
u = ln(3x) ⇒ du = 3dx = dx
3x x
which is wonderful, since we have x1 dx in our integrand. So we go ahead and make the substitution:
Z Z
sin(ln(3x))
dx = sin(u)du = − cos(u) + c = − cos(ln(3x)) + c.
x
We check that this is correct by differentation.
7.2.3 Trying a substitution in the hopes of simplifying a complicated integrand
Sometimes, you try a substitution without actually realizing what f (u) is going to be.
Example 7.2.11. Find

x3
Z
√ dx.
x2 + 4
√
We see 2
√ that the toughest part is x + 4. We have √ a few choices — namely u = x2 or u = x2 + 4 or
u = x2 + 4 — the first wouldn’t get us very far ( u + 4 is yucky) and the last one is too greedy
(it’s a composition of 3 functions, really, that’s excessive): let’s try u = x2 + 4. So
1
u = x2 + 4, ⇐⇒ du = 2x dx ⇐⇒ x dx = du
2
1
although you could also have tried u = 3x first, which yields 13 sin(ln(u))/udu, and you have to do a second
R
substitution v = ln(u). So it all works out just fine, it just takes a little longer.
215
So now we try to rewrite our integral in terms of u:

x3
Z Z
1
√ dx = √ x2 · x dx
2
x +4 x 2 + 4
Z
1 1
6= √ (x2 ) du INCOMPLETE! you can’t mix x and u
u 2
Z
1 1
= √ (u − 4) du SUCCESS! we used x2 = u − 4
u 2
u−4
Z
1
= 1/2
du
2 u
Z
1
= (u1/2 − 4u−1/2 ) du
2
1 1 3/2 4 1/2
= ( u − u )+c
2 3/2 1/2
1 2
= ( (x2 + 4)3/2 − 8(x2 + 4)1/2 ) + c
2 3
1 2 p
= (x + 4)3/2 − 4 x2 + 4 + c
3
and again we check by differentiating — this time, even the check is a lot more work!
Note. Lesson: sometimes you just try a substitution to see if it will work out.

1
dx
x1/3 +1
Well, this one seems hopeless but the nasty part is x1/3 so let’s just try
1
u = x1/3 + 1 ⇐⇒ du = x−2/3 dx.
3
We certainly don’t have 13 x−2/3 dx in our integrand; but since u = x1/3 + 1, it follows that
x1/3 = (u − 1) so x−2/3 = (u − 1)−2 . Therefore we can rewrite :
1 1
du = x−2/3 dx ⇐⇒ du = (u − 1)−2 dx ⇐⇒ 3(u − 1)2 du = dx
3 3
which means we can in fact perform the substitution:
Z Z
1 1
1/3
dx = 3(u − 1)2 du
x +1 u
Z 2
u − 2u + 1
=3 du
u
Z
1
= 3 u − 2 + du
u
1
= 3( u2 − 2u + ln |u|) + c
2
3
= (x1/3 + 1)2 − 6(x1/3 + 1) + 3 ln |x1/3 + 1| + c
2
3 2/3
= x − 3x1/3 + 3 ln |x1/3 + 1| + c0
2
216
(where we have gathered all the constants into a new c0 ). We check our answer2 is correct, by
differentiating it:

d 3 2/3
x − 3x1/3 + 3 ln |x1/3 + 1| + c0 =
dx 2
3 1
= x−1/3 − x−2/3 + 1/3 ( x−2/3 )
x +1 3
1
−1/3 1/3

= 1/3 x (x + 1) − x−2/3 (x1/3 + 1) + x−2/3
x +1
1
= 1/3 1 + x−1/3 − x−1/3 − x−2/3 + x−2/3
x +1
1
= 1/3
x +1
as required, as if by a minor miracle.
Note. Sometimes, it pays to be persistent and creative with substitutions.

sin(x)
dx.
1 + cos2 (x)
In this one, the danger is being too greedy with your substitution. If you try u = 1 + cos2 (x) then
du = −2 cos(x) sin(x)dx which is a disaster.
However, if we just try

u = cos(x) ⇐⇒ du = − sin(x)dx
then things go very nicely:
Z Z
sin(x) 1
dx = (−1)du
1 + cos2 (x) 1 + u2
Z
1
=− du
1 + u2
= − arctan(u) + c
= − arctan(cos(x)) + c
which we check by differentiation.
Example 7.2.14. Z
tan(x)
dx
ln(cos(x))
We do not see what this will be, but there is a composition of ln with cos(x) as the innermost
function, and tan(x) is related to cos(x), so we give it a shot.
u = cos(x) ⇐⇒ du = − sin(x) dx
2
Careful with Mobius, please — we’ll often rig the questions in Mobius so the absolute value is not needed, because
you’d have to type abs(x) not —x— which is just a pain. In real life, put the absolute values and then remove them
if it is correct to do so; remember the derivative of ln |x| is 1/x for all x 6= 0.
217
and although we do not see sin(x), we realize we can rewrite:

Z Z sin(x) Z
tan(x) cos(x) sin(x)
dx = dx = dx
ln(cos(x)) ln(cos(x)) cos(x) ln(cos(x))
and therefore the substitution will work fine:

Z
1
=− du.
u ln(u)
Well, this integral is definitely an improvement, but we’re still not done. This time, however, we
see that there is a ln(u) in our integral, and also a u1 du, which is exactly what we’d need to make
the substitution for ln(u)! So (and of course we have to take a different letter) we set
1
w = ln(u) ⇒ dw = du
u
so that our integral becomes
Z Z
1 1 1
=− · du = − dw = − ln |w| + c
ln(u) u w
which we have to devolve back into our original variable x as:
= − ln | ln(u)| + c = − ln | ln(cos(x))| + c.
Wow, we really didn’t see that one coming. We check by differentiating:

d −1 1 tan(x)
(− ln | ln(cos(x))|) = (− sin(x)) = ,
dx ln(cos(x)) cos(x) ln(cos(x))
which is what we wanted.
Example 7.2.15.
(4t + 2)2
Z
dt
t2
We look at this integral and think that the most complicated piece is 4t+2 — but wait, a substitution
is the SECOND thing you think of, after “CAN I SIMPLIFY THIS?” because in fact
(4t + 2)2 16t2 + 16t + 4

Z Z Z
16 4 4
2
dt = 2
dt = 16 + + 2 dt = 16t + 16 ln |t| − 3 + c
t t t t 3t
as you can check by differentiating. If you had done the substitution, you would have had to rewrite
the denominator, making it more complex; but that would be awful: a sum in the numerator is
great, because we can simplify, but a sum in the denominator is usually a big headache.
218
7.2.4 Tips on substitution
Not everything needs substitution. Always look to see if you can find an anti-derivative
directly, first.
When in doubt, start small. A little linear substitution like u = 3x or u = x − 2 is always

possible (since du = 3dx or du = dx can always be done) and it might make the mess look a
lot more clear to you.
Don’t be too greedy with your substitution: don’t take u = ln(sin(x)) in one go but instead
start with u = sin(x) and see what happens. You can always do a second substitution.
Be meticulous in your work. Sloppy substitution will give you garbage and is worthless.
Make sure that you have translated every part of your integral to your new variable. Never
write any integral with two different variables in it.
Be flexible. Try a substitution even if you can’t see how it will turn out. If you can’t get it
to work, or it gives a yuckier integral, don’t erase it — you might later see what to do with
it after trying something else.
7.2.5 Two examples where substitution is not enough
Example 7.2.16. Consider

e3t
Z
dt.
e−3t+1
This is very similar to one we had before. We again try u = e−3t + 1 which gives du = −3e−3t dt.
This is trickier: if we try directly we get
Z 6t
e
du incomplete substitution: it’s a mix of t and u
3u
so now we have to use: u = e−3t + 1 which gives e−3t = u − 1 so e3t = 1

u−1 or e6t = 1
(u−1)2
. Thus
the real substitution is
e3t
Z Z
1
dt = du.
e−3t + 1 3u(u − 1)2
This is fine; but we don’t have the necessary techniques to solve this yet. That’s for MAT1332.
R 2
Example 7.2.17. Consider ex dx. If we try u = x2 , we would need du = 2x dx. There is no
√ 1 √
x in the integral. But we could say: u = x2 so u = x. Therefore dx = 2x du = 12 u du. That
means we have Z Z
2 1
ex dx = √ eu du.
2 u
Nice — but we still don’t see an antiderivative. In fact: there is no formula for a function
2
whose derivative is ex (or u−1/2 eu , for that matter). The antiderivative must exist, but it
2
is a brand new function that has no name or formula besides “an anti-derivative of ex ”.
End of lecture # 19
219
MAT 1330 : Fall 2020 7.3. TECHNIQUES OF INTEGRATION : INTEGRATION BY PARTS
7.3 Techniques of integration : Integration by Parts
Last time we learned how to use substitution (which is like an anti-chain rule) to change one integral
into a (hopefully) simpler integral. The goal is to change our integrand into an elementary function
who anti-derivative we know. Today: we will learn how to use integration by parts, which you can
think of as the anti-product rule.
Recall that the product rule tells us that

Z
(f (x)g 0 (x) + g(x)f 0 (x)) dx = f (x)g(x) + c.
So of course if your integrand looks like the left side, you can solve it. However, that’s not what
usually happens. Let’s rewrite the above equation by splitting the left side into a sum of two
integrals and moving it to the other side of the equation:
Z Z
f (x)g (x) dx = f (x)g(x) − g(x)f 0 (x)dx.
0
This tells us: if youR have to solve f (x)g 0 (x) dx, then by using this identity you can reduce the
R
problem to finding g(x)f 0 (x) dx. This is great if this other integral is easier!
7.3.1 Method of integration by parts
1. Divide your integrand into two pieces: a function that is easy to differentiate, and one that
is easy to anti-differentiate.
2. Call the piece you will differentiate u = f (x); call the rest dv = Rg 0 (x)dx. Then differentiate
u to give du = f 0 (x)dx and choose an antiderivative v = g(x) = dv. I write this in a little
u = f (x) dv = g 0 (x)dx
table like
du = f 0 (x)dx v = g(x)
3. Using the notation of the integral makes the rule easier to remember:
Z Z
u dv = uv − v du
since we multiply the two functions (on the diagonal in our table) and then subtract the
integral of the product across the bottom row.
4. Note: unlike with substitution, your resulting integral is still in terms of x; just solve and
check.

xex dx
220
u=x dv = ex dx
We choose to split our integrand into u = f (x) and dv = g 0 (x)dx, so:
du = dx v = ex
Now using the rule for integration by parts:

Z Z
xex dx = x · ex − ex dx
= xex − ex + c
= (x − 1)ex + c
which we verify is correct by differentiation.
u = ex dv = x dx
Remark 7.3.2. What happens if we pick u = ex and dv = x dx? Let’s try: .
du = e dx v = 12 x2
x
That gives Z Z
1 1 2 x
xe dx = x2 · ex −
x
x · e dx
2 2
which is clearly more complicated. It is correct (that is an honest equals sign), but not helpful for
your goal of finding an antiderivative, since the new integral is a bit harder to solve than the old
one.

x2 e3x dx.
We like to use something that differentiates well for u, so u = f (x) = x2 ; and the rest should be
u = x2 dv = e3x dx
easy to integrate, dv = e3x dx: e3x dx either
R
(where we solved
du = 2x dx v = 31 e3x
by staring really hard, or by using the substitution t = 3x). So we have

Z Z
1 2 3x 1
x e dx = x · e − 2x · e3x dx
2 3x
3 3
dv = e3x dx
Z
1 2 u=x
= x2 e3x − xe3x dx by parts again:
3 3 du = dx v = 13 e3x
Z
1 2 1 3x 1
= x2 e3x − xe − e3x dx
3 3 3 3

1 2 3x 2 1 3x
= x2 e3x − xe + e +c
3 9 9 3
1 2 3x 2
= x2 e3x − xe + e3x + c0
3 9 27
which we verify is correct by differentiation.
221
Example 7.3.4. Z
ln(x) dx
We do not know this antiderivative, and there is no substitution to make, since there is only the
u = ln(x) dv = dx
one function, so we try integration by parts. No choice: Therefore
du = x1 dx v = x
Z Z Z
1
ln(x) dx = x ln(x) − x dx = x ln(x) − dx = x ln(x) − x + c
x
R R
where we have remembered that dx = 1 dx. We check by differentiating!
Example 7.3.5. Z
x ln(x) dx
This time we have choices. You might be tempted to set dv = ln(x) dx, since we now know the
integral (try it! it gets messy) but the strategy is: choose u to have a nice derivative, and dv to
u = ln(x) dv = x dx
have a nice integral. So here we go with which gives
du = x1 dx v = 12 x2
Z Z Z
1 1 21 1 1 1 1
x ln(x) dx = x2 ln(x) − x dx = x2 ln(x) − x dx = x2 ln(x) − x2 + c
2 2 x 2 2 2 4
which we again check by differentiating.
Example 7.3.6. Z
ln(x)
dx
x2
Again, doesn’t seem to be one we know, or a good candidate for substitution, so we go to integration
by parts; again, ln(x) is the one you’d like to differentiate, because then it turns into a function in the
u = ln(x) dv = x12 dx
same family as x−2 , which will make the integral easier. This means
du = x1 dx v = −x−1
which gives
Z Z Z
ln(x) ln(x) 1 1 ln(x) ln(x)
2
dx = − − − dx = − + x−2 dx = − − x−1 + c
x x x x x x
which we can check by differentiation.
Remark 7.3.7. If the answers to the two preceding examples seem strangely similar, you might
note that
−2x−2 ln(x) = x−2 ln(x−2 ) = u ln(u) with u = x−2 .
222
Example 7.3.8. Z
ln(x)
dx
x
u = ln(x) dv = x1 dx
Given our success, we just dive right in: which gives3
du = x1 dx v = ln(x)
Z Z
ln(x) ln(x)
dx = ln(x) ln(x) − dx
x x
R ln(x)
(!!!!??!!) But actually: this is marvelous. It’s an equation, and the thing we want ( x dx) can
be isolated: Z
ln(x)
2 dx = ln(x)2
x
so Z
ln(x) 1
dx = (ln(x))2 + c
x 2
where we remember to put +c in the end. We check by differentiating.
Remark 7.3.9. Actually, in the previous example, we should have noticed that it was a prime
candidate for substitution: set w = ln(x) then dw = x1 dx, so
Z Z
ln(x) 1 1
dx = w dw = w2 + c = (ln(x))2 + c.
x 2 2
That’s easier!
t2 ln(t)dt.
R
Exercise 7.3.10. Find

arcsin(x) dx.
We don’t know this anti-derivative, and there’s no substitution to make, so we try by parts. Our
u = arcsin(x) dv = dx
only real choice is: 1 which gives
du = √1−x 2
dx v = x
Z Z
x
arcsin(x) dx = x arcsin(x) − √ dx.
1 − x2
Now we figure out the resulting integral; since we see an xdx in the numerator, we’ll do a substitution
w = 1 − x2 and dw = −2xdx. (You can use u if you want to, but I don’t want to be confusing.)
−1
Z Z
x p
√ dx = w−1/2 dw = −w1/2 + c = − 1 − x2 + c
1 − x2 2
3
We used ln(x) instead of ln |x| here because the ln(x) in the integrand means we are only considering x > 0
anyway.
223
Therefore:
Z p p
arcsin(x) dx = x arcsin(x) − (− 1 − x2 ) + c0 = x arcsin(x) + 1 − x2 + c0
R
Exercise 7.3.12. Find arctan(x)dx.
7.3.2 Applying by parts more than once: two different kinds of examples
x2 sin(3x) dx.
R
Example 7.3.13. Find
This is a product of two unrelated functions so a good candidate for by parts. As usual, we pick
u = x2 dv = sin(3x)dx
for u the one that gets simpler when you differentiate: So
du = 2xdx v = − 13 cos(3x)
Z Z
1 1
x2 sin(3x) dx = − x2 cos(3x) − (− ) cos(3x)(2x)dx
3 3
Z
1 2
= − x2 cos(3x) + x cos(3x) dx. (7.3)
3 3
To work out the resulting integral, we need to use by parts again. So let’s solve
Z
x cos(3x) dx
u=x dv = cos(3x)dx
using by parts which gives
du = dx v = 31 sin(3x)
Z Z
1 1 1 1
x cos(3x) dx = x sin(3x) − sin(3x)dx = x sin(3x) + cos(3x) + c
3 3 3 9
Therefore plugging it back into (7.3) we have:
Z Z
1 2
x2 sin(3x) dx = − x2 cos(3x) + x cos(3x) dx
3 3

1 2 1 1
= − x2 cos(3x) + x sin(3x) + cos(3x) + c0
3 3 3 9
1 2 2
= − x2 cos(3x) + x sin(3x) + cos(3x) + c0
3 9 27
There’s also a stranger way that doing integration by parts twice can pay off.
224

e−θ cos(θ)dθ.
This is a product of two unrelated functions so a good candidate for by parts. We have two choices
in this case, since both functions differentiation and integrate as easily; it doesn’t matter which one
u = e−θ dv = cos(θ)dθ
we take. Let’s go with: so
du = −e−θ dθ v = sin(θ)
Z Z
e−θ cos(θ)dθ = e−θ sin(θ) − (−1) e−θ sin(θ)dθ.
Now the resulting integral looks analogous to the one we had before; it is certainly not easier. But
we persevere. We do integration by parts again.
CAREFUL: if at this point you were to choose u = sin(θ) and dv = e−θ dθ, you would just UNDO
your first step and get back exactly to where you started. Try this, and then compare what happens
with the magic of the following steps.
So we’re trying to solve Z

e−θ sin(θ)dθ
u = e−θ dv = sin(θ)dθ
and we choose which gives
du = −e−θ dθ v = − cos(θ)
Z Z
−θ −θ
e sin(θ)dθ = −e cos(θ) − (−e−θ )(− cos(θ))dθ
Z
= −e−θ cos(θ) − e−θ cos(θ)dθ.
Now let’s carefully write out what we’ve figured out, putting all this together:
Z Z
−θ −θ −θ −θ
e cos(θ)dθ = e sin(θ) + −e cos(θ) − e cos(θ)dθ
and
R −θthe integral we want to solve for DOES NOT CANCEL OUT. In other words, we can add
e cos(θ)dθ to both sides of this equation to get
Z
2 e−θ cos(θ)dθ = e−θ sin(θ) + −e−θ cos(θ)
or Z
1 1
e−θ cos(θ)dθ = e−θ sin(θ) − e−θ cos(θ) + c
2 2
which we check by differentation.
Note: in this example, we could have swapped the functions we used for u and dv; the answer
comes out the same.
225
MAT 1330 : Fall 2020 7.4. MIXED EXAMPLES, AND APPLICATIONS
7.3.3 Tips for integration by parts
Good candidates for u: polynomials, exp, log, trig, inverse trig — anything whose derivative
is a bit simpler
Good candidates for dv: polynomials, exp, sine, cosine — functions whose anti-derivative is
(a) known and (b) hopefully simpler
If you do integration by parts twice, don’t UNDO the first one.
Keep CAREFUL TRACK of all signs. There’s a minus sign in the formula, and often extra
constants floating around. Be meticulous!
If the result of your by parts doesn’t look helpful, don’t erase it! Try another combination of
udv, or maybe come back and do a second by parts, or look for a substitution .... persistence
is key!
Remember that you have two big methods: substitution and by parts. They sometimes both
work on the same integrand, but most of the time, substitution helps you with compositions
of functions and most of the time, by parts helps you with products of functions.
7.4 Mixed examples, and applications
7.4.1 More examples with integration by parts and substitution
Example 7.4.1.
√
Z
sin( x) dx
√
Now the messy part is x so we try a substitution, which effectively turns a composition problem
√
into a product problem. So set t = x. Knowing that we don’t have the derivative available,
we shortcut to saying t2 = x so 2t dt = dx by implicit differentiation. Then we can make the
substitution:
√
Z Z Z
sin( x) dx = sin(t)(2t) dt = 2 t sin(t) dt.
u=t dv = sin(t) dt
This is now a great candidate for by parts: which gives
du = dt v = − cos(t)
Z Z
2 t sin(t) dt = 2 −t cos(t) − (− cos(t)) dt = −2t cos(t) + 2 sin(t) + c
and then finally come back to our original variable x:

√ √ √
= −2 x cos( x) + sin( x) + c.
We check by differentiating; as always with by parts, you see one of the product rule factors of the
first cancels off the derivative of the second summand.
226

ln(y)
√ dy
y
√ 1 2
We can try substitution: x = y so dx = 2√ y dy and y = x ; so
Z Z Z Z
ln(y) 2
√ dy = ln(x )2dx = 2 2 ln(x)dx = 4 ln(x)dx
y
which is a bit of a surprise, perhaps. Anyway, this last integral we solved before, whence
= 4(x ln(x) − x) + c
(We could also have solve this last one using by parts u = ln(y), dv = √1 dy and no substitution.)
y
7.4.2 Application examples
Example 7.4.3. A fish grows in length over time by a function L(t) which obeys the pure-time
differential equation
L0 (t) = 7e−0.1t cm/year.
Suppose that L(0) = 0 (meaning we measure from fertilization); how long until the fish reaches
50 cm in length?
Solution: Since L0 (t) = 7e−0.1t , we have

Z
L(t) = 7e−0.1t dt
and we make the substitution u = −0.1t, so du = −0.1 dt or dt = −10 du. Thus

Z
L(t) = 7 eu (−10) du = −70eu + c = −70e−0.1t + c.
This seems like a stupid answer since the function takes negative values — but let’s find c. Since
L(0) = 0, we have
−70e−0.1(0) + c = 0 ⇐⇒ c = 70,
and thus our answer is
L(t) = 70 − 70e−0.1t = 70(1 − e−0.1t )
which makes perfect sense! In fact, we can sketch the graph of y = L(t) to get:
227
Therefore yes, this is a positive function for t > 0 and we compute
L(t) = 50
5
⇐⇒ 1 − e−0.1t =
7
5
⇐⇒ e−0.1t = 1 −
7
2
⇐⇒ e−0.1t =
7
⇐⇒ −0.1t = ln(2/7)
⇐⇒ t = ln(2/7)/(−0.1) = 12.5 years.
In its lifetime, it approaches a length of 70cm although it never stops growing.
Example 7.4.4. The mass of a worm, M (t), changes over time according to the pure-time differ-
ential equation M 0 (t) = ate−t , for some positive constant a. If M (0) = 0, find a formula for M (t)
and lim M (t).
t→∞
Solution: We need an antiderivative of M 0 (t), so we have

Z Z
M (t) = ate−t dt = a te−t dt.
u=t dv = e−t dt
This is a good candidate for by parts: giving
du = dt v = −e−t
Z Z
M (t) = a −te − (−e ) dt = −ate + a e−t dx = −ate−t − ae−t + c.
−t −t −t
It seems worrisome that the functions are all negative, but let’s plug in the initial condition of
M (0) = 0. This gives
−a(0)e0 − ae0 + c = 0 ⇐⇒ c = a > 0.
So our formula is
M (t) = a(1 − (t + 1)e−t ).
Since M 0 (t) > 0 for all t and M (0) = 0, this is always positive (try it out!) and

t+1
lim M (t) = lim a 1 − t
t→∞ t→∞ e
The quotient gives an indeterminate form of type ∞/∞ in the limit, so we can apply l’Hôpital’s
rule
t+1 1
lim = lim t = 0,
t→∞ et t→∞ e
whence limt→∞ M (t) = a. The mass of the worm increases over its lifetime to asymptotically
approach a.
228
MAT 1330 : Fall 2020 7.5. DEFINITE INTEGRALS
7.4.3 Integrals that we still can’t solve
Example 7.4.5.
ex
Z
dx
x
It is not a function we recognize. If we do a substitution w = ex , then dw = ex dx which is the
numerator (weird!); but we still have the x in the denominator. So we use our first equation to
write x = ln(w), and the substitution becomes
Z x Z Z
e dw 1
dx = = dw
x ln(w) ln(w)
which we can’t solve, either. If we now try the substitution substitution t = ln(w) then we will end
up right back at x.
For integration by parts we have two choices; let’s try them.
u = ex dv = x1 dx
With x we get
du = e dx v = ln |x|
ex
Z Z
dx = ex ln |x| − ln |x|ex dx
x
which is not better; and if we do by parts again with u = ln |x| we’ll end up back where we started.
u = x−1 dv = ex dx
With we get
du = −x−2 dx v = ex
Z Z
−1 x −1 x
x e dx = x e + x−2 ex dx
which is again worse than when we started.
In fact, the reason for our failure is not for lack of ability: this integral again has no elementary
function as antiderivative; there is no formula for the anti-derivative.
Or is there?
End of lecture # 20
7.5 Definite Integrals
We now get to the geometric interpretation of the integral. Remember that taking the derivative
of a function means finding the slope of the tangent line at every point. It turns out that taking
the integral of a function is about measuring the area under the curve.
229
The area problem is as follows. Given a function y = f (x), and two points a and b, find the area
of the region bounded by y = f (x), x = a, x = b and the x-axis, as in the following picture:
Our strategy is to divide the interval [a, b] into n subintervals, and approximate the area over each
subinterval by the area of a rectangle, and add them all together. If we choose n larger and larger,
then we should get closer and closer to the actual value we will define the integral as the limit as
n → ∞.
This looks like:
But what are we really calculating? To make this feel concrete, suppose that f (t) was the velocity
function (the reading from your speedometer) at time t, and that we’re doing this on a long trip
from time t = 0 to t = 3 (hours).
If we use one interval, then we get one rectangle. We pick our sample point (say t = 1.5) and so
the height is f (1.5) (the speed we were going at that time). The area of the rectangle is
h × w = f (1.5) × 3 = speed × time ' distance travelled.
Of course, if our speed varied a lot, this estimate would be pretty bad. But we could do better, by
taking, say, 3 intervals of one hour each:
Choose a moment in each interval, and measure your speed at that moment.
Pretend that we were driving exactly that speed for the entire hour.
Then the area of each rectangle is f (t)km/h × 1h = the distance in km you would have
travelled if you stuck to that speed for one hour.
230
Adding them together: you have estimated how far you drove in the 3 hours.
As we choose smaller and smaller intervals, we should get better and better estimates of how far
we actually drove. In the limit (when our intervals become “infinitely small”) we should have the
exact distance that we drove.
In other words:
Note. The area under the velocity curve from time a to time b is the distance travelled over that
time period. In other words:
Z b
v(t) dt = s(b) − s(a).
a
But there is nothing special about calling f (t) velocity and the answer displacement: this is a general
statement about the area under the curve being a difference of the values of an anti-derivative.
Theorem 7.5.1 (Fundamental Theorem of Calculus, Part 2 OR Evaluation Theorem). Suppose

f is function and that F is an antiderivative of f on [a, b], that is, F 0 (x) = f (x) for all x ∈ [a, b].
Then Z b
f (x) dx = F (b) − F (a).
a
The notation we use when solving problems is:

Z b b
f (t)dt = F (t) = F (b) − F (a) where F 0 = f .
a a
Rb
We call a f (t) dt the definite integral of f (t) between t = a and t = b. The answer is a number,
unlike the indefinite integral, which gives a function. So in fact:
Z b Z b
f (t) dt = f (x) dx.
a a
The name of the variable doesn’t matter.
Let’s do some examples to understand this better.

Example 7.5.2. Consider
Z b
3 dx.
a
This represents the area under the curve of the constant function y = 3 between x = a and x = b.
That’s a rectangle; its area is h × w = 3 × (b − a) (draw a picture). Using the fundamental theorem
of calculus, we can alternately compute this via:
Z b b
3 dx = 3x = 3b − 3a
a a
which equals 3(b−a), as expected. That is to say, we had f (x) = 3 and F (x) = 3x, so F (b)−F (a) =
3b − 3a.
231
Example 7.5.3. Consider Z 10

2x dx.
0
This represents the area under the graph of y = 2x between x = 0 and x = 10, which is a triangle.
The area of a triangle is 12 bh, so since the base is 10 and the height is 20 (draw a picture!), we
deduce the area should be 12 (10)(20) = 100. Using the FTC, we get
Z 10 10
2
2x dx = x = (10)2 − (0)2 = 100,
0 0
as required.
Where it gets interesting is when we calculate areas that we didn’t previously have formulas for.
Example 7.5.4. Find the area under the curve of y = x2 between x = 0 and x = 1.
Solution: you should sketch this.
The graph of y = x2 between x = 0 and x = 1,Rin red. For reference, the unit square is outlined in
1
green, as is its diagonal. The definite integral 0 x2 dx represents the area of the region under the
red curve but above the x-axis and to the left of x = 1.
So it’s definitely less than 21 , since it’s less than half the unit square, but how much should it really
be? The Fundamental Theorem of Calculus gives us that
1 3 1 1 3 1 3 1
Z 1
2
x dx = x = (1) − (0) = .
0 3 0 3 3 3
Wow, cool. You can count squares to estimate the area yourself and see if this is reasonable.
Example 7.5.5. If f (x) = 2x then we saw F (x) = x2 and so

Z 7 7
2x dx = x2 = 72 − 22 = 49 − 4 = 45.
2 2
When the function is below the x-axis, then the integral can give a negative answer. The correct
interpretation is of net area: the difference between the area under the curve but above the x-axis,
and the area under the x-axis but above the curve.
232
Example 7.5.6.
Z 2 2
(−3) dx = (−3x) = (−3(2)) − (−3(1)) = −6 + 3 = −3,
1 1
which is the negative of the area of the rectangle, because it’s below the axis.
Thinking about the integral as relating to (net) area gives us some very nice consequences.
Example 7.5.7. Suppose r(t) is the rate of change of your population, where your population is
given by n(t). So r(t) = n0 (t). Then
Z b
r(t) dt = n(b) − n(a),
a
that is, the area under the rate-of-change curve is just the total change in population.
Example 7.5.8. If ρ(x) represents the linear density of a rod, then each rectangle corresponds to
density times length, which is mass. So in the limit,
Z x
ρ(x)dx
a
is the total mass of a piece of rod starting at the point a and ending at x. That is, if ρ(x) is the
linear density of a rod, then this is m0 (x) where m(x) is the mass of a length b of the rod, measured
Z b
from any arbitrary starting point. Then ρ(x) dx = m(b) − m(a).
a
Remark 7.5.9. What this last example gives us is something unexpected: a formula for the anti-
derivative of any function at all.
2
For example: What is an anti-derivative of f (x) = e−x ?
We saw earlier that we couldn’t find a formula for a function F (x) for which F 0 (x) = f (x). But
now we have something: Z x
2
F (x) = e−t dt.
0
That is: let F (x) be the function that measures the area under the curve of f (t) = tet between 0
and x (the “area so far” function — which we can approximate to any degree of precision we want).
Then F 0 (x) = f (x), so it’s an anti-derivative of f (x). This is quite cool, and is the other part of
the fundamental theorem of Calculus (part 1).
But one weird thing: the fundamental theorem of calculus doesn’t specify which anti-derivative we
have to use, and we know there are infinitely many choices.
Suppose F (x) and G(x) are two anti-derivatives of f (x) on the interval [a, b]. Then we proved
earlier that there must be a constant c such that F (x) = G(x) + c. Now FTC tells us that
Z b Z b
f (x) dx = F (b) − F (a) and also f (x) dx = G(b) − G(a)
a a
233
since both are anti-derivatives. But that’s perfectly OK:

F (b) − F (a) = (G(b) + c) − (G(a) + c) = G(b) + c − G(a) − c = G(b) − G(a).
So indeed: take any antiderivative.
Note. This interpretation of the anti-derivative, and definite integrals, is the starting point we’ll
use in MAT1332.
Z 1
Example 7.5.10. Find arctan(x) dx.
0
Answer: we begin by finding an antiderivative, that is, solving the indefinite integral
Z
arctan(x) dx.
We don’t recognize an antiderivative, we can’t simplify, substitution is a non-starter, so let’s try
u = arctan(x) dv = dx
integration by parts: 1
du = 1+x 2 dx v=x
Z Z
x
arctan(x) dx = x arctan(x) − dx,
1 + x2
and to solve this new integral, we see a good substitution: w = 1 + x2 , dw = 2x dx or x dx = 12 dw.
Thus
Z Z Z
x 1 1 1 1 1 1 1
2
dx = · dw = dw = ln |w| + c = ln |1 + x2 | + c = ln(1 + x2 ) + c
1+x w 2 2 w 2 2 2
whence Z
1
arctan(x) dx = x arctan(x) −ln(1 + x2 ) + c0
2
(for some constant c0 ; here c0 = −c). We can check this by differentiating. Once we’re sure we have
found the general antiderivative, we pick our favourite one (say, the one with no constant term)
and apply FTC. Therefore
Z 1 1
1 2
arctan(x) dx = x arctan(x) − ln(1 + x )
0 2
0
1 1
= 1 arctan(1) − ln(2) − 0 arctan(0) − ln(1)
2 2
π 1
= ( − ln(2)) − (0 − 0)
4 2
π 1
= − ln(2)
4 2
End of lecture # 21
And guess what? We’ve completed MAT1330!
234
Appendix A
Solutions to Selected Exercises
Solution to Exercise 2.2.14:
ln(ex + 3) = ln(ex ) + ln(3) = x + ln(3) : WRONG — ln(a + b) 6= ln(a) + ln(b), so the first
step is false. (But the second equality is true since ln(ex ) = x.)
ln(ex + 3) = (x + 3) ln(e) = x + 3: WRONG — if it were ln(ex+3 ), then we’d be able to

simplify this, but that’s not what we have, so the first step is false. (But the second equality
is true since ln(e) = 1.)
ln(ex + 3) = 5 =⇒ ex + 3 = 5 : WRONG — you are not allowed to erase ln in an

equation, even if you do find it irritating. Instead, remember that to get rid of ln, you take
the exponential — of BOTH SIDES of the equation. So a correct move is
x +3)
ln(ex + 3) = 5 ⇐⇒ eln(e = e5 ⇐⇒ ex + 3 = e5 .
(I often skip writing the middle step but it is still there.)
ex + 3 = e5 ⇐⇒ ex = e5 − 3 ⇐⇒ x = 5 − ln(3) : WRONG— this time it’s the second

step that went haywire. (The first step is fine.) To get rid of an exponential, we have to apply
the logarithm to BOTH SIDES of the equation — and we HAVE TO USE PARENTHESES:
ex = e5 − 3 ⇐⇒ ln(ex ) = ln(e5 − 3) ⇐⇒ x = ln(e5 − 3).
(I often skip writing the middle step but it is still there.)
Solution to Exercise 2.3.14: Solve the equality |x2 −4| = 5 following the pattern of the preceding
examples to get only two solutions: x ∈ {±3}. Make a table of values as you normally do to deduce
that the answer is the closed interval [−3, 3].
235
MAT 1330 : Fall 2020
The intersection of y = |x2 − 3| and y = 3x + 1. There are only two solutions, whereas
intersecting y = x2 − 3 with y = 3x + 1 and y = −(3x + 1) gave four solutions.
1. If we measure before the daily dose, then we start with xt mg/L in the blood, and immediately
add 10 mg/L, giving xt + 10 mg/L. Then as the day progresses, 25% is absorbed, leaving
0.75(xt + 10). Thus the DTDS is
xt+1 = 0.75(xt + 10) = 0.75xt + 7.5.
It’s a different DTDS! But that should make sense: we’re measuring at a different time in the
daily cycle.
2. Using the DTDS xt+1 = 0.75xt + 10: If the initial drug level is x0 = 8, then
x1 = 0.75x0 + 10 = 0.75 ∗ 8 + 10 = 16
and
x2 = 0.75x1 + 10 = 0.75 ∗ 16 + 10 = 22.
We plot these on a graph of xt vs t as points (0, 8), (1, 16) and (2, 22). To make a continuous
graph, we have to go beyond the DTDS, and go back to our understanding of drug absorption.
So we’d add points like (0.99, 6) and (1.99, 12), representing the concentration in the blood just
before the daily dose, and connect the dots to get a zig-zag graph. So measuring after the daily
dose give a (local) maximum, and measuring just before gives a (local) minimum.
3. Let xt be the amount you owe after the tth payment. Then over the course of the month, the
bank adds 0.5% of interest to the amount you owe, increasing the total to x5 +0.005xt = 1.005xt .
But then you pay off $ 50, reducing this total, so xt+1 = 1.005xt − 50. Our updating function
is thus f (x) = 1.005x − 50.
1. x0 = 5 so x1 = 21 x0 + 5 = 7.5, x2 = 12 x1 + 5 = 8.75, x3 = 12 x2 + 5 = 9.375. On the other hand,

g(0) = 5, g(1) = 5.5, g(2) = 6, g(3) = 6.5. So no, xt 6= g(t) and in fact these functions are very
different; g(t) is linear but xt is not. The function f was modeling the dynamics of change; the
function g is modeling the growth (of something simple) over time.
236
MAT 1330 : Fall 2020
2. (f ◦ f )(x) = f (f (x)) = 12 (f (x)) + 2 = 12 ( 12 x + 2) + 2 = 14 + 3. Interpretation: if xt+1 = f (xt )

then xt+2 = (f ◦ f )(xt ). Check.
If y = 21 x + 2 then 12 x = y − 2 so x = 2y − 4. Thus the inverse function is f −1 (x) = 2x − 4.
Interpretation: xt−1 = f −1 (xt ).
3. In words: if you triple every six hours, then you will have done this four times in 24 hours,
yielding a total of 34 times what you started with. In math: if t = 0 is now, and t = 1 is in 6
hours, then t = 4 is in 24 hours. So we calculate:
x1 = 3x0 ; x2 = 3x1 = 3(3x0 ) = 9x0 ; x3 = 3x2 = 3(9x0 ) = 27x0 ; x4 = 3x3 = 3(27x0 ) = 81x0 .
Therefore the DTDS tells us that x4 = 81x0 . Now let’s write yt for the population of bacteria
in cm2 where t is measured in days; we have decided that the model for growth is
yt+1 = 81yt .
Indeed, y1 = x4 .
4. We are told that x2 = 100 and that we want to know x0 . We found an equation relating these
two values above: x2 = 9x0 . Therefore x0 = 100/9. Check.
4x2
Solution to Exercise 3.5.11: The updating function is f (x) = . Therefore we have to
1 + x2
solve
4x2
x= ⇐⇒ x(1 + x2 ) = 4x2 .
1 + x2
We see the common factor of x on both sides, that we would like to cancel. But of course because
there is a common factor of x, x∗ = 0 is a solution! So that is one fixed point. Continuing:
√
2 2 ∗ 4 ± 16 − 4 1√ √
1 + x = 4x ⇐⇒ x − 4x + 1 = 0 ⇐⇒ x = =2± 12 = 2 ± 3.
2 2
√ √
Since 3 < 4, 3 < 4 = 2, so both of these are positive. Therefore there are a total of 3 fixed
points, all of which are biologically relevant. Do the cobwebbing on a well-drawn graph to determine
stability.
2x
Solution to Exercise 3.5.12: In Example 3.4.4, we had the updating function f (x) = . Its
1+x
fixed points are solutions of
2x
x= ⇐⇒ x(1 + x) = 2x ⇐⇒ x2 − x = 0 ⇐⇒ x(x − 1) = 0 ⇐⇒ x∗ = 0, 1.
1+x
We had already cobwebbed with a value near x∗ = 0 and seen that the solution moved away;
therefore x∗ = 0 is an unstable fixed point. For x∗ = 1, our work showed that a cobweb from below
converges towards the fixed point x∗ = 1; it only remains to do a cobweb from above, say x0 = 1.1.
A quick cobweb shows that it also converges to 1, and therefore we conclude that the fixed point
x∗ = 1 is stable.
Solution to Exercise 5.11.3: Consider

1
f (x) = sec(x) = = (cos(x))−1 .
cos(x)
237
MAT 1330 : Fall 2020
Then by the chain rule

sin(x) 1 sin(x)
f 0 (x) = −(cos(x))−2 (− sin(x)) = 2
= · = sec(x) tan(x).
cos (x) cos(x) cos(x)
Next, consider
cos(x)
g(x) = cot(x) = .
sin(x)
Then by the quotient rule
sin(x)(− sin(x)) − cos(x) cos(x) −1
g 0 (x) = 2 = 2 = − csc2 (x).
sin (x) sin (x)
We have used the identity (the favourite one, the one you should know):
sin2 (x) + cos2 (x) = 1.
Solution to Exercise 5.12.9: We found that arccsc(x) = arcsin(1/x). The domain of y =

arcsin(x) is x ∈ [−1, 1] and its range is [−π/2, π/2]. The domain of arccsc(x) is thus all x for which
1/x ∈ [−1, 1], which is x ∈ (−∞, −1] ∪ [1, ∞); its range is y ∈ [−π/2, π/2] except we can never
get y = 0 as an answer. In other words: you can’t evaluate csc(0) so there’s no value x that gives
arccsc(x) = 0.
Given arccsc(x) = arcsin(1/x), by the chain rule we find

d 1 −1 −1 −1
arccsc(x) = p · 2 =√ = √ ,
dx 1 − (1/x) 2 x 4
x −x 2 |x| x2 − 1
√
where in this last step we had to use that x2 = |x|. Notice that the derivative is always negative,
regardless of the sign of x (at every step of our simplification), which is consistent with the graph.
The graph of y = arccsc(x), which is always decreasing.
Similarly, if y = arcsec(x) then sec(y) = x or x = 1/ cos(y), which means cos(y) = 1/x. Thus
y = arccos(1/x), in other words:
arcsec(x) = arccos(1/x).
The domain of arcsec(x) is (−∞, −1] ∪ [1, ∞) and its range is [0, π/2) ∪ (π/2, 1]. Its derivative,
using the above formula, the derivative of arccos(x) and the chain rule, is
d −1 −1 1
arcsec(x) = p · 2 = √ ,
dx 1 − (1/x) 2 x |x| x2 − 1
238
MAT 1330 : Fall 2020
by the same process as above.
The graph of y = arcsec(x), which is always increasing, and has the same (mirrored) shape as
arccsc(x).
Now y = arccot(x) is a different situation. We’re asking about x = cot(y), so let’s begin with the
graph of y = cot(x).
The graph of y = cot(x). It is one-to-one (i.e. passes the horizontal line test) on the interval
(0, π), for example.
Thus while it would seem reasonable to deduce arccot(x) = arctan(1/x), the images only overlap
on (0, π/2). So instead we go at it straight:
y = arccot(x) ⇐⇒ cot(y) = x and 0 < y < π
so
−1 −1 −1
− csc2 (y)y 0 = 1 ⇐⇒ y0 = 2
= 2 = ,
csc (y) 1 + cot (y) 1 + x2
where we used 1 + cot2 (y) = csc2 (y) (derived from our favourite identity by dividing both sides by
sin2 (y)).

−2x −2x
lim √ √ = lim q q
3 3
x→∞ ( x − x + x + x) x→∞ 3/2
x ( 1 − 12 + 1 + x
1
x2
)
−2
= lim q q
x→∞ 1/2 1 1
x ( 1− x2
+ 1+ x2
)
= 0,
239
MAT 1330 : Fall 2020
since the denominator goes to ∞ · 2 and the numerator is −2.

√
Solution to Exercise 6.7.15: So our function is f (x) = x and the base point is a = 4. We
want the cubic Taylor polynomial so we make a table:
1 (n)
√
f (x) x 2 2
f 0 (x) 1 −1/2
2 x 1
4
1
4
−1 −3/2 −1 −1
f 00 (x) 4 x 32 64
f 000 (x) 3 −5/2
8x
3
256
1
512
So our Taylor polynomial is

1 −1 1
T (x) = 2 + (x − 4) + (x − 4)2 + (x − 4)3 ,
4 64 512
√
so that our estimate of 6 = f (6) is
1 1 1
T (6) = 2 + (2) − (4) + (8) = 2.453125
4 64 512
√
whereas we can see with our calculators that 6 = 2.44949; not bad, given how far away we started.
Solution to Exercise 6.7.16: The quintic Taylor polynomial for the sine function centered at 0
is
1 1
T (x) = x − x3 + x5
3! 5!
(as you can check). So T (x) ∼ sin(x) for x near 0; this means
1 1
sin(1) ∼ T (1) = 1 − + = 0.84167,
6 120
and our calculator tells us that sin(1) = 0.8415.
Solution to Exercise 6.8.4: Consider the DTDS xt+1 = 2xt (2−xt )−hxt , for a positive parameter
h representing the harvesting rate. We are first asked about when the steady state is positive. So
we need to find x∗ . We note f (x) = 2x(2 − x) − hx is the updating function, so the fixed points
are all solutions to f (x) = x. So x∗ = 0 is a solution; otherwise, we can divide by x and the other
fixed point is a solution to 1 = 2(2 − x) − h or 1+h ∗ 1+h
2 = 2 − x or x = 2 − 2 . This we have a fixed
point satisfying x∗ > 0 only if 2 − 1+h 1+h
2 > 0 or 2 > 2 or 4 > 1 + h or h < 3. Since we said h > 0,
the answer is: we have a strictly positive steady state if 0 < h < 3.
Next we asked when the nonzero steady state is stable. So we compute f 0 (x) = 4 − 4x − h. At
x∗ = 2 − 1+h
2 we have

1+h
f 0 (x∗ ) = 4 − 4 2 − − h = −4 + 2(1 + h) − h = −2 + h,
2
so that x∗ is stable iff | − 2 + h| < 1 iff −1 < −2 + h < 1 iff 1 < h < 3.
Thus there are some harvesting rates h (0 < h < 1) for which there is a positive fixed point but it
is unstable.
240
Index
absolute maximum, 146 End of lecture # 11, 145

absolute minimum, 146 End of lecture # 12, 154
absolute value, 37 End of lecture # 13, 164
anti-derivative, 203 End of lecture # 14, 176
anti-derivative, initial condition, 203 End of lecture # 15, 187
antiderivative, 204 End of lecture # 16, 195
astroid, 120, 121 End of lecture # 17, 200
autonomous differential equations, 202 End of lecture # 18, 211
average rate of change, 97 End of lecture # 19, 219
End of lecture # 2, 47
bifolium, 122 End of lecture # 20, 229
binomial theorem, 110 End of lecture # 21, 234
biologically relevant, 71 End of lecture # 3, 56
complete the square, 20
compose, 32
concave down, 139
concave up, 139
continuous, 81
continuous at a point a (in its domain), 81
equilibrium, 53, 187
critical number, 133
Evaluation Theorem, 231
critical point, 133
even, 31
definite integral, 231 Exercise 3.2.6, 51
degree of the polynomial, 19 Exercise 3.2.7, 51
derivative, 98 Exercise 5.12.9, 132
derivatives of the six standard trigonometric func- Exercise 2.4.13, 40
tions, 126 Exercise 2.3.14, 29
difference quotient, 99 Exercise 2.2.14, 16
differentiable, 98 Exercise 6.6.11, 170
differentiation: summary of rules, 132 Exercise 3.5.11, 64
discrete time dynamical system (DTDS), 50 Exercise 3.5.12, 64
discriminant, 20 Exercise 6.8.4, 195
diverges to infinity, 90 Exercise 6.7.15, 184
diverges to negative infinity, 90 Exercise 6.7.16, 184
domain, 30 Exercise 5.11.3, 126
domain of definition, 30 Extreme Value Theorem, 150
DTDS, 187
first derivative test, 148
End of lecture # 1, 30 fixed point, 53, 187
End of lecture # 10, 132 formula for Tn (x), 180
241
MAT 1330 : Fall 2020 INDEX
function, 30 nonlinear DTDS, 57

Fundamental Theorem of Calculus, Part 1, 233 normal distribution, 114
Fundamental Theorem of Calculus, Part 2, 231
odd, 31
geometric series, 9 one-sided limits, 78
Geometric series formula, 9
given domain, 30 polynomial, 19
global extremum, 146 pure-time differential equations, 202
global maximum, 146
quadratic, 20
global minimum, 146
quadratic formula, 20
horizontal asymptote, 92
radical, 36
increasing function, 31 range, 30
increasing on an interval, 31 Ricker model, 193
indefinite integral of f , 204 right-hand limit, 78
indeterminate form, 75, 85 roots, 20
indeterminate forms, 165
secant line, 97
inflection point, 139
secant line approximation, 177
initial condition, 203
second derivative, 138
initial value, 52
second derivative test, 149
instantaneous rate of change, 98
solution, 52
integral, 204
stable, 64
integrand, 204
stable fixed point, 187
integration by parts, method of, 220
steady state, 53, 187
interval, 6
strictly increasing, 31
inverse function, 33
substitution, method of, 212
inverse functions, 46
summation symbol, 8
inverse secant function, 132
summation variable, 8
Kleiber’s law, 37
tangent line, 99
left-hand limit, 78 Taylor polynomial of degree n, 180
lemniscate, 123 Taylor series, 183
limit of a function, 75 trigonometric functions, 41
limits of the sum, 8
union, 7
linear approximation to f at a, 178
unstable, 64
linear DTDS, 54
unstable fixed point, 188
linearization of f at a, 178
updating function, 50
local extremum, 145
local maximum, 145 vertical asymptote, 90
local minimum, 145
logarithmic differentiation, 120 wave function, 42
marginal value theorem, 164

Mean Value Theorem, 184
natural base, 45
natural domain, 30
natural logarithm, 46
242

2021 Lecture Notes MAT1330

Uploaded by

Copyright:

Available Formats

2021 Lecture Notes MAT1330

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2021 Lecture Notes MAT1330

Uploaded by

Copyright:

Available Formats

University of Ottawa

MAT 1300: Calculus I for the Life Sciences

Prof : Monica Nevins Fall 2021

These notes are for students registered in MAT1330.

Last update: August 19, 2021

3 Discrete Time Dynamical Systems (DTDS) 48

4 Limits and the path to Calculus 74

A Solutions to Selected Exercises 235

1.1 How to use these notes

Definition 1.1.1. A mathematical definition is crucial: it is where we scintillate an abstract

 Assignments on Mobius, submitted online and graded automatically, developed specifically

 Work problems regularly;

The following marker End of lecture # 0

1.2 How to Succeed

2.1 Mathematical language (new)

S = {the kind of number x is|what characterizes the x in the set}

Sometimes we use : instead of | for “such that.”

Now suppose we are talking about subset of the real numbers R.

We can be fancier; for example

(a, b) = {x ∈ R | a < x < b} =]a, b[

Now we can combine intervals to describe a wide variety of sets.

Finally, we need to talk about ∞.

Note. ∞ is not a number.

We think of ∞ as something larger than every real number. Thus

because we could translate that as saying 5 < x < ∞1 . Similarly,

{z ∈ R | z ≤ 19} = (−∞, 19],

in Section 2.3.1. You see that √ √

2.1.2 Sums and the geometric series formula (new)

You can write a long sum like

but it can get ambiguous. For example, if we write

whereas the second sum meant

Let’s decipher this symbol:

But there is a better way.

We could have written this theorem more compactly as

When t is large, the formula is A LOT faster.

But why does the magic formula work?

Proof. We use a clever trick. Call the sum S; that is,

First assume r 6= 1. Notice that if we multiply S by r, it almost looks the same:

Now do the substraction S − rS:

2.2.1 Parentheses and the order of operations

It may seem lame to say it, but:

Picture taken from reddit.

As a favourite example, note that

2.2.2 Powers and exponentials

Recall that if n is a positive integer and a ∈ R then

Therefore, if n, m are two positive integers, we have

In general, for a > 0 and integers m and n (with n ≥ 2) we have

Example 2.2.3. To solve x4 = 64, we get our positive solution x as above:

y = loga x if and only if ay = x. (2.1)

23 = 8, 23.3 ' 9.85, 23.32 ' 9.987, 23.3219 ' 9.9998, · · · .

 loga (ay ) = y — and similarly, log(10z ) = z, and ln(ex ) = x;

The restriction on x in the second identity is crucial.

Solution: To isolate for an exponent, we can apply log (base 10) :

Theorem 2.2.10 (Rules of logarithms). If a, x, y > 0 and t ∈ R then

For the second, let t ∈ R. Then by the rules of exponents

 ln(xy) = ln(x) + ln(y),

 and we might as well add

Example 2.2.11. Simplify ln(xey ).

Example 2.2.13. Solve 2x = 32x+1 .

Assignments on Mobius, submitted online and graded automatically, developed specifically

Work problems regularly;

loga (ay ) = y — and similarly, log(10z ) = z, and ln(ex ) = x;

ln(xy) = ln(x) + ln(y),

and we might as well add

ln(ex + 3) = ln(ex ) + ln(3) = x + ln(3)

solve for x: 2x+3 = 162x−1 ;

solve for x: 52x+3 = 74x−1 ;

solve for x: 2 ln(x) − ln(x + 4) = ln(2).

the sign of a polynomial can only change at a root;

the sign of a product or quotient is the product of the signs.

solve for x: | x2 − 3| > 5;