efd1-9fac-4e2b-845c-092ce8363092
efd1-9fac-4e2b-845c-092ce8363092
efd1-9fac-4e2b-845c-092ce8363092
So very
simple simply, generative AI takes input that is in natural language. Right? So it
listens to us.
Okay. If we want our sort of definition that's going to happen, a large language
model uses something called neural networks. Neural networks, n e u r a l, which
we'll talk about later. In particular, these are particular kind of neural networks
called the transformer. When I say recent, transformers were actually, labeled
such, let me get the date wrong.
Either 2017 or 2021. But it's, work that came out of Google and Microsoft. And, so
they use neural networks particular, a particular type called transformers to train
on very large collections of natural language text. So large language model train
on text basically the internet as well as other other sources that are, that are
trained for it. So there are a couple of steps that happened before, a a a a
logistic model, is in place.
What's a paragraph? What's a word? Those sorts of things. So the foundational text
gives it a broad set of of, information and this is costly in a number of ways.
First of all, it, it takes time for this training process to happen effectively.
And secondly, there are resources because, one of the things with generative AI as
opposed to earlier versions of AI, there isn't a box in the corner that's running
it. When you run GAI you're using processors from all over. It's sort of like the
model of, of, Napster from the early 2000 or BitTorrent. Those sites. The idea that
if we can get other computers to be working on little pieces of things at the same
time we can put them together.
And this is sort of one of the great insights of AI research in say the last 20
years was that we don't have to do what I'll talk about next week which is, you
know, the goal, the holy grail of creating, a box on your desk that runs like a
human brain. What we can do instead is recognize that one of the things about the
human brain is it's it's electrical. It runs on on, on on, cause and effect. On and
off switches of electrical and chemical impulses, on things called neurons. And if
we take that metaphor to computers, and we break things down we can create instead
of one brain we can create offensive programs that run like one little neuron
firing.
And then what do we have to do? We have to connect it to others. And by connecting
to those others the assumption is that the program becomes more robust, more
accurate, and if we believe the strong AI proponents eventually some sort of, will
go through some sort of transformation and the machines will become sentient on
their own. Why? Just because of the sheer volume of of of of processing and
information and, digital thinking that they're engaged in.
So if you're using thousands of computer processes at a time you're, using
resources. You're using those computers for a purpose that they could be used for
something else. It's like mining bitcoin. You're also, using resources like
electricity, water. Not just electricity to to cool or to to, to process the
information but to cool the computers that are used in on server farms and things
like that, as well.
So you create an AI that's conversational. That can that that that has been trained
on social cues and, one of the very first examples of this kind of AI is from the
19 sixties with something called Eliza, and it was a program that mimicked the
behavior of a, a psychotherapist. So you could type in, conversational things and
it would seem to be engaging with you, but really it was just running from a huge
database of responses recognizing keywords in the in the content or in the sense
that you were typing in. We're easy to fool. In the early 2000, there were
experiments in, digital TAs.
Same thing. People thought they were actually getting attention and, course ratings
went up, but in fact, they were just AI bots that were very, very rudimentary. But
having any kind of feedback from an organization or position of authority seems to
make us satisfied. Right? So that we're happy to to get that.
So what do you do then? You fine tune the, the AI. You fine tune the AI with a
bunch of examples in the domain. And then what it does is it tries to follow those
instructions. And like learning a musical instrument, the system will make lots of
mistakes at the beginning but then get better and better and better as you go
along.
Does anybody remember, translation software like Dragon? Which was a you buy this
software, and, you had to go through like a 3 week training process of reading to
your computer so that the software would come to recognize your voice. I could
never do it because I have a weird, Ontario Mid Atlantic way of pronouncing things.
Sometimes I pronounce things like it's English English. Sometimes I pronounce it
like it's American English.
But my r's and my a's aren't hard enough to be recognized with the American
settings, and they're not soft enough, and they don't roll enough to be recognized
by the British settings. So anytime I spent money on that software, the training
process never worked. And then all of a sudden about 10 years ago we could type
things into Google translate. We could we could say things and suddenly suddenly
Microsoft Word was a better translator than Dragon, had been, not that long before.
The advances, were, enormous.
And there are 3 ways in which in which, that fine tuning happens. 1, we go through
reinforcement learning. So let's think about a, a visual image. You you you, you
insert the image of a cat into, your your AI. You want it to be able to identify
cats so you show it all kinds of of of different versions of cats and every time it
identifies a cat and doesn't call it a dog or a unicorn, it gets it's rewarded.
It's very easy to reward an AI. All you have to do is say yes. Right? There's also
something called supervised learning. So it's rewarded learning where you tell the
AI that yes, you've you've identified a cat.
It's not a horse. It's not a car. The other one is something called supervised
learning. Here's an example where you give the AI a bunch of text and it's supposed
to identify the key people. And if it identifies key people, over time it will
become better and better at doing that.
How do you get the key points of an article? You read the introduction and the
conclusion. So the AI will read the introduction and the conclusion. And if it
identifies the key points, then maybe, it's a combination of supervised learning
and reinforced learning. And then the final, version of, of of training that the AI
can do is something called self supervised learning.
This is just where you let it go. You don't give it rules. You don't say look for a
cat. You don't say read the introduction or the conclusion or give me the key
points. You just give it things and let it find stuff out.
And the example that textbooks often use, are, security issues It's like
international security, like, face recognition at airports. And you just let the AI
loose with a bunch of of things and let it come up with classifications. And one of
the difficulties with self directed learning or self supervised learning Here's a
flash flood watch will be in effecting affect all and not the correct word. But
remember when we looked at Google last week and we got our last class and, what was
it we found out? What's the name of a fish that catches?
And fire was one of the examples. Name of a fish that catches fire is dinner.
Right? I guess. But it seemed like an odd odd solution.
And sometimes, AIs can give us something that's called an hallucination. And a
hallucination is simply a wrong answer that is generated that doesn't make sense to
us. And that's partially because, as we'll discuss, in in a bit, with large
language models, the AI recognizes patterns. It doesn't recognize maybe the AI
doesn't understand English or Mandarin or Portuguese or the rules of writing
novels. It understands abstractions of those processes that have been turned into
mathematical algorithms, sets of steps to follow to achieve an outcome.
And so sometimes, BAI will simply lie to us because it doesn't know the difference
between reality and fantasy. So I described before, a neural network. So in an AI,
there are multiple steps. There's an input layer layer where we put in a picture of
a cat or we type in the word cat or we make an inquiry of the AI. There are any
number of hidden others layers where processing happens and the processing is
basically comparison and classification.
So if I'm looking for cat, one of these and each of these are called neurons. 1 of
these neurons might be plus 1. If there's a picture of cat and you recognize a cat,
it gets a value of 1. If you put in a picture and it's a dog or a car, it gets a 0.
And you could also have another value like a a minus one which means if you get to
this stage and then move on, it's less likely that anything that that particular
neuron is connected to will be used in the process.
And then ultimately the goal is to have an input layer that tells you, yes, half or
very very simplistic, high level, description of what's going on. So, in fact, in
what we call deep learning in in in, AI circles, there are many hidden layers.
Because what ends up happening is you provide input, that input is evaluated, it
either moves on or doesn't, And the process can recur a number of of times. Here's
some just just some example, for this. AI's first or OpenAI's first OpenAI are the
people who make, chat gbt.
Their first large language model, gbt one, was released in 2018. So we're not
talking a long time. Right? 2018. It it had 768, words or word vectors.
I'll talk about that in a second. It had 12 layers of processing for a total of a
117,000,000 parameters. A 117,000,000 choices the machine can make as it's going
through. So a few months later, GPT 2 had 1600 words, 48 layers and 1,500,000,000
parameters. And when GPT 3 came out, it had 12,000 word vectors, 96 layers for a
total of a 175,000,000,000 parameters or choices that the machine can make.
So from 2018 to 2021, a huge amount of learning on the part of the machine. Right?
Scraping the internet, fine tuning, through self direction, supervision, from human
input. And, so so what happens to create a large language model? I talked about
words or word vectors.
A vector is a list of numbers in a specific order. I'm not gonna go into it. It's
complicated and annoying. Head scratching. But let's say cat.
Cat doesn't have just three numbers in it. It's a long string of numbers, to
identify cat, an AI, a I in in in numeric form. However, let's think of a number of
of words. These numbers are in a specific order. It's like having a it's like
having a password.
Okay. So let's think about this. Let's think about a number of words that are sort
of related. Friend, acquaintance, colleague, playmate. All words that have
relationships to friend.
And in the process of embedding you have to create vectors that are similar. So
they're they're they're they're they'll they'll they'll have a relationship with
one another. And then what that allows you to do is, the the AI can then, find say
the word friend and the word work and put them together and predict that the
outcome would be colleague rather than playmate unless all you do at work is
social. So you first of all, the first step then is you, you create these vectors.
These vectors don't create words.
So you don't necessarily have cat or colleague. Colleague may be broken down into 2
parts, c o l and then league. And then when c o l goes together with league, it's
colleague. It's no longer baseball. Right?
And that is called tokenization. You take words and you turn them into tokens. You
take the words of the human language and you turn them into math that the computer
can use. Paths that the computer can use. And the thing that's interesting in
English there are about a 1000000 words which sounds like a lot but that's a
limited set.
Only a 1000000 words. The the, the the definitions of those words stay relatively
stable. They certainly stay stable from 2018 to 2021 so we don't have to worry
about the definitions changing. So we can assign different vectors to those
1,000,000 words. And because it's a computer and it's not us, it can actually keep
track of those or use them.
So, yes, it's time consuming to codify a 1000000 words, but it's possible. And
there are a number of online databases that have already done parts of that. So
there's a lot of information for for an AI to be trained on that already exists.
And embedding a word places it in context with a whole range of other words. So
that, if we're using different words for house, cottage, 2 story, an apartment
block, and a castle, those words can all go together if the question that's being
asked about the, about a dwelling is its size.
Right? But we could also have Airbnb, long term let, mortgaged, building, if we're
talking about sort of dwellings or domiciles. Places we live. Right? We live there
for a little bit of time.
We live there for a long period of time. And depending on how the words are
codified and the relationships, words some go together in some context and some go
together in other context. And it's very interesting because even though this is
sort of math and mumbo jumbo, philosophers have recognized for a long time that we
don't learn language by memorizing a dictionary. We learn language by the context.
So, for example you may not all be able to see this.
What are the missing words? If I say children's book, what are the missing words?
Cat and the hat. Yeah. Cat and the hat.
How'd you get that? Three spaces and then three more spaces. Contextually, because
we have knowledge of the past and contextually because we recognize that, there's
going to be a noun here and a noun here in a sentence, most likely, we're able to
then figure it figure that out. That's how I don't know that. And we do it in a
weird kind of mumbo jumbo inside mindy way.
Right? If doctor Goose is tiny whiny, we can say mindy whiny. Right? And in fact,
we don't know how generative AIs work because they have up to 96 layers of hidden
layers of processing. But we do know that mathematically what they're doing is
they're taking an input, they're comparing it, then they're moving it on or they're
saying go back.
And it goes through an iterative process for them to make predictions. Because the
whole purpose of this is to predict what's gonna come next. So fill in the blank in
a Google search or generate something. If you're gonna generate something, you've
gotta have the background rules and regulations. You've gotta have, recognition of
context.
The difficulty with AIs at this point still is something called polysemy or
polysemy. Remember, comms 1,002, I think. The truth in this course is where you get
that. Who knows what? Simply means that words can have more than one.
So think of my example of the words for house and then the words for dwellings.
What if I use the word home? Well, home's gonna mean different something different
if I'm talking about the size of the building. It's gonna mean different it's gonna
be different if I am talking about something temporary or something long term.
Something I have deep roots with.
It's gonna mean something different if it's connected to run and I'm talking about
baseball. Right? Words have different meaning and have different meaning depending
on context. And we recognize that language derives its meaning from the context of
the words. So if we're going to, engage, using a large language model, after we've
broken down after the machine's broken down words into chunks that it can it can
look for, After we've embedded it by assigning, unique identifiers to those
different tokens we engage in something called pre training.
Before the AI can actually be released to the world to do all its damage, it has to
have some sort of sense of material. So the first thing that happens is data
collection in pre trained training. And data collection, is, involves gathering a
large diverse data set, text from books, text from websites, comic books, images,
sounds sometimes. If we have an AI that can process all of those different things
and not simply text, we call it multimodal which is a word that we often use now.
We didn't 20 years ago.
We use now when we're talking about understanding media. If we're talking about
media in terms of of, all kinds of forms that can go together rather than simply
one. So we're not just looking at at, images, but we might be looking at images and
sound, we might be looking at images, of sound, the text they come from, the
adaptations. And so multimodal distance means multiple sources. And the reason
behind the idea of large language models is that in order to be able to process
language or process information from human beings and to give them an output back
that looks like human language, you have to have a huge data set to understand the
intricacies and the differences, in language.
So you can read 10,000 sentences and recognize that sentences have something in
common. They're grammatical unit but the content of the sentences can be different
depending on the context that they're that they're in. The only way the machine can
do that is by learning, and the way that it burns is by being exposed to 1,000 upon
1,000 upon 1,000 of big samples. We don't learn that way. We can learn by examples.
We can learn, you know, we can see a representation on TV of somebody putting their
hand on a on a stove and getting burned. And as a 5 year old realize that maybe I
shouldn't do that. Although we've got other sort of psychological characteristics
that might make us more likely to put our hand on and see what happens. But, what
happens is eventually we kind of learn domains of knowledge. Where we have some of
the basics.
Like, how many people have ever played a musical instrument? Okay. So you've got a
rudimentary idea. It doesn't matter how good you are on the particular instrument
that you're but you have a rudimentary idea of music. Which means you can take that
for a different context.
Which means you can learn a different useful instrument. Which means you can write
music or arrange music or learn how to read music in different forms. You can
listen to it, read it in in, in sheet music and so forth. Those capabilities that
we have, we learn. And then we go out into the world and get experience.
And, and other media as well. But what do we do? We're we use those text as
material to glean information and to do other things with it. We don't learn it
just to memorize a text for the sake of memorizing a text. So data culture.
The next thing that has to happen, if you're gonna go out and go and look at the at
the Internet and you're gonna search for for text, this is where people come in.
Right? There's a lot of information on a website that's not text. There's coding
behind. There's punctuation.
There's the graphic, user interface information, the HTML code. There were images,
there were banner ads, there were all kinds of things. So if you're reading a text
and you're going to Wikipedia to to read it or you're going to, you know, the
Canadian Music Hall of Fame website, there's all kinds of extraneous information
there. It's not necessarily the material that you want. So the data has to be clean
and prepared for the training the AI.
And then it's broken down into smaller pieces in this process that I call
tokenization. Which allows the machine to, to, to work efficiently. I don't have to
have the whole word. I can only have I have parts of words and I know a certain
parts go together. They make words.
The words become a larger unit than the other. And then we use that in our true use
of vectors, numerical form. So I showed you a picture of a I'll come back to this
and talk about attention to a transfer learning. I showed you the picture of a
simple neural network and I suggested that we use much more complicated, networks.
In generative AI, there's a particular kind of neural network called a transformer
which is, increasingly so as you can see here, here are our inputs.
Input neurons. Here are our hidden layers and each hidden layer is connected or
each layer is connected to what goes before it and what goes behind it. So
information can move forward and back, in the neural network as many times as it
needs to to come up with some sort of, output for us. Interesting? For those of you
for for whom it's not interesting, we don't have to go back.
We've we've already done this. Okay. So, the kind of, neural network that we're
talking about is something called a transformer. And because this isn't big enough,
imagine, this being down here. So you've got one long process going through.
And, you're trying to pay attention to the person in front of you. What goes on
when we do that? What are what are some things that happen when we are in a crowded
environment? Yes. You can get overwhelmed.
You can get overwhelmed. There's too much information. Yes. I was gonna say say the
same thing. Too many, like, voices all at once over again.
Okay. So what do we do in those situations? Other than running away which sometimes
we do. We're too tired. But what do we do?
I wanna listen to Yeager. You do what it what it a lot of the time, like, if you're
in a big crowd, a lot of people, like, look around to see whatever whatever what
else everyone else is doing. Okay. And then kinda follow that, sort of structure at
least. Okay.
Okay. We'll mimic the behavior of others. Yeah. What else? Prioritize.
We prioritize and we pay attention. Right? If I have to because I've got a hearing
aid and I'm deaf on this side, if I have to actually walk a little closer or if I
have to go like this to hear Xavier or whatever, I'll do something because we have
the ability and we do this all the time. When I come into the classroom, you you
folks are having conversations unless something's particularly interesting to me
and I go, what's what's that? I go about my business and I focus on my task.
So it's able to focus on things and either ignore or remember and put aside other
information. So the example that's often used is if we have a sentence like the cat
sat on the map. If the word that is being processed by the AI is sat, then the word
cat's gonna be more important because the cat's doing the sitting than on the which
are different kinds of words and that. So the cats can be more important than the
map. But if we're processing the word on, the cat sat on, then that becomes more
important.
So the AI is able to selectively attend to certain things. And then as I say, what
it does is as it becomes more sophisticated, it's able to remember more of the
past. Now right now, one of the weird things about AI, if you use chatbot, you go
you have conversation with chatbot and then you go back the next day, you can't
pick up the conversation that you already had. It doesn't remember what's gone on
before. Before.
Each session is a new session. And so one of the things that's being developed now,
and supposedly will be one of the big things of 2025 is your AI will have some
memory. That comes from this sort of transformative transformer model where you pay
attention to what's what's relevant. You can either ignore or set aside other
information. But if it's information that's in your training set, in your
foundation model, that information is available to you as as an AI.
Now I'm anthropomorphizing and and treating them like their little brothers and
sisters. So how does this work? This process of shifting, attention by, by doing 2
things. Showing the AI many many many examples which they call samples. So the AI
is able to sample.
So think about visual recognition again. We wanna have, you know, a cat or a dog by
having many representations of animals with fur, animals with hair, animals with
short hair, animals with long hair, cats, domestic cats, wild cats, law, you know,
the the the AI builds a repertoire or builds an ability to know what the topic of
cats, anything that has to do with the Dalmatian is not relevant to NAML
activation. Right? And so what it does let's go back to this previous slide. What
it does is each of these neurons so think of it this way again.
You put input and a decision is made. Once this decision is made, it moves on to
another step to see is there another decision that can be made, another decision to
be made for two purposes. 1, to recognize an image as belonging in a
classification. Because in some ways these are classification decisions. They're
grouping.
3 of these things belong together. 1 of these things is not quite the same. Right?
Or predict the next word is going to be. And then what happens with these, complex
neural models is we don't have to wait for each word of the text to get spat out
for us.
It does it internally with all of these layers. Remember, chat GPT. Ninety 6 layers
of iteration of back and forth, back and forth, back and forth, back and forth. So
it can create complex responses. But it's all based on those same building blocks.
Right? And so there are all kinds of fancy words in the transformer. Feed forward
networks and things like that. I'm not I don't think I'm gonna I'm gonna do that.
This idea, tokenization, we break down words into smaller units.
Embedding, where you use vectors to mathematically uniquely identify each of those
tokens. The idea of pre training that happens because you as a human being give a
set of data. You as a human being reward the machine for, coming up with proper
choices or that self regulated or that self directed study which is simply it's
looking for pattern. And sometimes it will come up with patterns that we wouldn't
necessarily think of but sometimes it comes up with bizarre patterns like the
correlation. I think there is one, the correlation between yogurt consumption and
the number of PhDs in engineering in a particular place.
The idea of trying to train, bots to work with a very limited domain of
information. Why? Because it's really expensive. A resource drain and a time frame
to have, to create these large lightning models. So we know we can have weak and
strong AI.
We know that AI can be embedded or embodied in devices, hardware, robots, ARIA, and
it could also be virtual, in in terms of running added bots on on the web. In both
those cases though, we know that AI is material. It uses resources. It relies on
hardware even though it's software. And so to just talk about it is as if it's
virtual, as if it's, non intrusive, as if it doesn't require money and time
resources, is mistaken.
And, we have a sense then of how this works. We codify information that we extract
from the world. We turn it into mathematically readable, tokens that are are are
then processed based on our, inquiries or interactions with Lushy to give us
answers. And this is the big leap that's happened in the 20 tens to to now in AI.
So when we talk about AI, we're not generally talking about that general AI.
That that super brain. We'll come back to that later. The CEO of OpenAI tells us
we're gonna reach that this year. We'll talk about that and unpack it. What I'm
gonna do next week so I'm gonna get these slides ready.
They'll be up. Next week, we'll throw back. We'll give a little bit of history in
terms of, where I AI comes from so we understand the the the shift in goals behind
it that gets us to this world where we live in where there are these powerful tools
that are able to sell us things, talk to us as if, they're, sort of human beings
until we can trick them, and maybe do our homework for us. And we have this, you
know, doesn't work. And we do have a sense now.
We're starting to have a sense that even in the way that things are built, there
are only certain things that the machines can do. Even though they learn, they
learn to mimic rather than to I'm not gonna use words like creativity for a while.
But rather than absolutely creating something new and novel, they're creating
things that look like immaterial. They can train. Okay.
Now maybe Ultron is out there somewhere learning on its own. We'll have to wait and
see. But next week we get a bit of history. We brought it out. And then the week
after that, which will be the week leading to, our test, I may talk a little more
about the material consequences.
The the actual materiality. What goes on around AI. And then, February 6 is
apparently, you're leaving. I'll accept that.