0% found this document useful (0 votes)
667 views10 pages

A Beginner's Guide To Large Language Models

This document provides an overview of large language models and how enterprises can leverage them. It defines key terms related to language models and natural language processing. The document is divided into three parts that discuss what makes LLMs powerful, use cases for enterprises, and guidance for developing internal language models.

Uploaded by

Marcelo Neves
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
667 views10 pages

A Beginner's Guide To Large Language Models

This document provides an overview of large language models and how enterprises can leverage them. It defines key terms related to language models and natural language processing. The document is divided into three parts that discuss what makes LLMs powerful, use cases for enterprises, and guidance for developing internal language models.

Uploaded by

Marcelo Neves
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

A Beginner’s Guide to

Large Language Models


Part 1

Contributors:
Annamalai Chockalingam
Ankur Patel
Shashank Verma
Tiffany Yeung
Preface

Language has been integral to human society for thousands of years. A


long-prevailing theory, laryngeal descent theory or LDT, suggests that speech and, thus, language,
may have evolved about 200,000 or 300,000 years ago, while newer research shows it could’ve
happened even sooner.

Regardless of when it first appeared, language remains the cornerstone of human communication. It
has taken on an even greater role in today’s digital age, where an unprecedented portion of the
population can communicate via both text and speech across the globe.

This is underscored by the fact that 347.3 billion email messages are sent and received worldwide
every day, and that five billion people – or over 63% of the entire world population – send and receive
text messages.

Language has therefore become a vast trove of information that can help enterprises extract valuable
insights, identify trends, and make informed decisions. As an example, enterprises can analyze texts
like customer reviews to identify their products’ best-selling features and fine-tune their future
product development.

Similarly, language production – as opposed to language analysis – is also becoming an increasingly


important tool for enterprises. Creating blog posts, for example, can help enterprises raise brand
awareness to a previously unheard-of extent, while composing emails can help them attract new
stakeholders or partners at an unmatched speed.

However, both language analysis and production are time-consuming processes that can distract
employees and decision-makers from more important tasks. For instance, leaders often need to sift
through vast amounts of text in order to make informed decisions instead of making them based on
extracted key information.

Enterprises can minimize these and other problems, such as the risk of human error, by employing
large language models (LLMs) for language-related tasks. LLMs can help enterprises accelerate and
largely automate their efforts related to both language production and analysis, saving valuable time
and resources while improving accuracy and efficiency.

Unlike previous solutions, such as rule-based systems, LLMs are incredibly versatile and can be easily
adapted to a wide range of language-related tasks, like generating content or summarizing legal
documentation.

A Beginner’s Guide to Large Language Models 3


The goal of this book is to help enterprises understand what makes LLMs so groundbreaking
compared to previous solutions and how they can benefit from adopting or developing them. It also
aims to help enterprises get a head start by outlining the most crucial steps to LLM development,
training, and deployment.

To achieve these goals, the book is divided into three parts:

> Part 1 defines LLMs and outlines the technological and methodological advancements over the
years that made them possible. It also tackles more practical topics, such as how enterprises can
develop their own LLMs and the most notable companies in the LLM field. This should help
enterprises understand how adopting LLMs can unlock cutting-edge possibilities and revolutionize
their operations.

> Part 2 discusses five major use cases of LLMs within enterprises, including content generation,
summarization, and chatbot support. Each use case is exemplified with real-life apps and case
studies, so as to show how LLMs can solve real problems and help enterprises achieve specific
objectives.

> Part 3 is a practical guide for enterprises that want to build, train, and deploy their own LLMs. It
provides an overview of necessary pre-requirements and possible trade-offs with different
development and deployment methods. ML engineers and data scientists can use this as a
reference throughout their LLM development processes.

Hopefully, this will inspire enterprises that have not yet adopted or developed their own LLMs to do
so soon in order to gain a competitive advantage and offer new SOTA services or products. The most
benefits will be, as usual, reserved for early adopters or truly visionary innovators.

A Beginner’s Guide to Large Language Models 4


Glossary

Terms Description

Deep learning systems Systems that rely on neural networks with many hidden layers to
learn complex patterns.

Generative AI AI programs that can generate new content, like text, images,
and audio, rather than just analyze it.

Large language models (LLMs) Language models that recognize, summarize, translate, predict,
and generate text and other content. They’re called large
because they are trained on large amounts of data and have
many parameters, with popular LLMs reaching hundreds of
billions of parameters.

Natural language processing (NLP) The ability of a computer program to understand and generate
text in natural language.

Long short-term memory neural network (LSTM) A special type of RNNs with more complex cell blocks that allow
it to retain more past inputs.

Natural language generation (NLG) A part of NLP that refers to the ability of a computer program to
generate human-like text.

Natural language understanding (NLU) A part of NLP that refers to the ability of a computer program to
understand human-like text.

Neural network (NN) A machine learning algorithm in which the parameters are
organized into consecutive layers. The learning process of NNs is
inspired by the human brain. Much like humans, NNs “learn”
important features via representation learning and require less
human involvement than most other approaches to machine
learning.

Perception AI AI programs that can process and analyze but not generate data,
mainly developed before 2020.

Recurrent neural network (RNN) Neural network that processes data sequentially and can
memorize past inputs.

A Beginner’s Guide to Large Language Models 5


Terms Description

Rule-based system A system that relies on human-crafted rules to process data.

Traditional machine learning Traditional machine learning uses a statistical approach, drawing
probability distributions of words or other tokens based on a
large annotated corpus. It relies less on rules and more on data.

Transformer A type of neural network architecture designed to process


sequential data non-sequentially.

Structured data Data that is quantitative in nature, such as phone numbers, and
can be easily standardized and adjusted to a pre-defined format
that ML algorithms can quickly process.

Unstructured data Data that is qualitative in nature, such as customer reviews, and
difficult to standardize. Such data is stored in its native formats,
like PDF files, before use.

Fine-tuning A transfer learning method used to improve model performance


on selected downstream tasks or datasets. It’s used when the
target task is similar to the pre-training task and involves copying
the weights of a PLM and tuning them on desired tasks or data.

Customization A method of improving model performance by modifying only


one or a few selected parameters of a PLM instead of updating
the entire model. It involves using parameter-efficient
techniques (PEFT).

Parameter-efficient techniques (PEFT) Techniques like prompt learning, LoRa, and adapter tuning
which allow researchers to customize PLMs for downstream
tasks or datasets whil preserving and leveraging existing
knowledge of PLMs. These techniques are used during model
customization and allow for quicker training and often more
accurate predictions.

Prompt learning An umbrella term for two PEFT techniques, prompt tuning and
p-tuning, which help customize models by inserting virtual token
embeddings among discrete or real token embeddings.

Adapter tuning A PEFT technique that involves adding lightweight feed-forward


layers, called adapters, between existing PLM layers and
updating only their weights during customization while keeping
the original PLM weights frozen.

Open-domain question answering Answering questions from a variety of different domains, like
legal, medical, and financial, instead of just one domain.

Extractive question answering Answering questions by extracting the answers from existing
texts or databases.

A Beginner’s Guide to Large Language Models 6


Terms Description

Throughput A measure of model efficiency and speed. It refers to the


amount of data or the number of predictions that a model can
process or generate within a pre-defined timeframe.

Latency The amount of time a model needs to process input and


generate output.

Data Readiness The suitability of data for use in training, based on factors such
as data quantity, structure, and quality.

A Beginner’s Guide to Large Language Models 7


Introduction to LLMs

A large language model is a type of artificial intelligence (AI) system


that is capable of generating human-like text based on the patterns
and relationships it learns from vast amounts of data. Large language models
use a machine learning technique called deep learning to analyze and process large sets of data, such
as books, articles, and web pages.

Large language models unlocked numerous unprecedented possibilities in the field of NLP and AI. This
was most notably demonstrated by the release of OpenAI’s GPT-3 in 2020, the then-largest language
model ever developed.

These models are designed to understand the context and meaning of text and can generate text that
is grammatically correct and semantically relevant. They can be trained on a wide range of tasks,
including language translation, summarization, question answering, and text completion.

GPT-3 made it evident that large-scale models can accurately perform a wide – and previously
unheard-of – range of NLP tasks, from text summarization to text generation. It also showed that
LLMs could generate outputs that are nearly indistinguishable from human-created text, all while
learning on their own with minimal human intervention.

This presented an enormous improvement from earlier, mainly rule-based models that could neither
learn on their own nor successfully solve tasks they weren’t trained on. It is no surprise, then, that
many other enterprises and startups soon started developing their own LLMs or adopting existing
LLMs in order to accelerate their operations, reduce expenses, and streamline workflows.

Part 1 is intended to provide a solid introduction and foundation for any enterprise that is considering
building or adopting its own LLM.

What Are Large Language Models (LLMs)?


Large language models (LLMs) are deep learning algorithms that can recognize, extract, summarize,
predict, and generate text based on knowledge gained during training on very large datasets.

They’re also a subset of a more general technology called language models. All language models have
one thing in common: they can process and generate text that sounds like natural language. This is
known as performing tasks related to natural language processing (NLP).

A Beginner’s Guide to Large Language Models 8


Although all language models can perform NLP tasks, they differ in other characteristics, such as their
size. Unlike other models, LLMs are considered large in size because of two reasons:

1. They’re trained using large amounts of data.

2. They comprise a huge number of learnable parameters (i.e., representations of the underlying
structure of training data that help models perform tasks on new or never-before-seen data).

Table 1 showcases two large language models, MT-NLG and GPT-3 Davinci, to help clarify what’s
considered large by contemporary standards.

Table 1. Comparison of MT-NLG and GPT-3


Large Language Model Number of Number of tokens in
parameters the training data
NVIDIA Model: Megatron-Turing Natural Language 530 billion 270 billion
Generation Model (MT-NLG)

OpenAI Model: GPT-3 Davinci Model 175 billion 499 billion

Since the quality of a model heavily depends on the model size and the size of training data, larger
language models typically generate more accurate and sophisticated responses than their smaller
counterparts.

A Beginner’s Guide to Large Language Models 9


Figure 1. Answer Generated by GPT-3.

However, the performance of large language models doesn’t just depend on the model size or data
quantity. Quality of the data matters, too.

For example, LLMs trained on peer-reviewed research papers or published novels will usually perform
better than LLMs trained on social media posts, blog comments, or other unreviewed content. Low-
quality data like user-generated content may lead to all sorts of problems, such as models picking up
slang, learning incorrect spellings of words, and so on.

In addition, models need very diverse data in order to perform various NLP tasks. However, if the
model is intended to be especially good at solving a particular set of tasks, then fine-tune it using a
more relevant and narrower dataset. By doing so a foundation language model is transformed — from
one that’s good at performing various NLP tasks across a broad set of domains – into a fine-tuned
model that specializes in performing tasks in a narrowly scoped domain.

A Beginner’s Guide to Large Language Models 10


Foundation Language Models vs. Fine-Tuned
Language Models
Foundation language models, such as the aforementioned MT-NLG and GPT-3, are what is usually
referred to when discussing LLMs. They’re trained on vast amounts of data and can perform a wide
variety of NLP tasks, from answering questions and generating book summaries to completing and
translating sentences.

Thanks to their size, foundation models can perform well even when they have little domain-specific
data at their disposal. They have good general performance across tasks but may not excel at
performing any one specific task.

Fine-tuned language models, on the other hand, are large language models derived from foundation
LLMs. They’re customized for specific use cases or domains and, thus, become better at performing
more specialized tasks.

Apart from the fact that fine-tuned models can perform specific tasks better than foundation models,
their biggest strength is that they are lighter and, generally, easier to train. But how does one actually
fine-tune a foundation model for specific objectives?

Currently, the most popular method is customizing a model using parameter-efficient customization
techniques, such as p-tuning, prompt tuning, adapters, and so on. Customization is far less time-
consuming and expensive than fine-tuning the entire model, although it may lead to somewhat
poorer performance than other methods. Customization methods are further discussed in Part 3.

Evolution of Large Language Models


AI systems were historically about processing and analyzing data, not generating it. They were more
oriented toward perceiving and understanding the world around us rather than on generating new
information. This distinction marks the main difference between Perceptive and Generative AI, with
the latter becoming increasingly prevalent since around 2020, or after companies started adopting
transformer models and developing increasingly more robust LLMs at a large scale.

The advent of large language models further fueled a revolutionary paradigm shift in the way NLP
models are designed, trained, and used. To truly understand this, it may be helpful to compare large
language models to previous NLP models and how they worked. For this purpose, let’s briefly explore
three regimes in the history of NLP: pre-transformers NLP, transformers NLP, and LLM NLP.

1. Pre-transformers NLP was mainly marked by models that relied on human-crafted rules rather
than machine learning algorithms to perform NLP tasks. This made them suitable for simpler tasks
that didn’t require too many rules, like text classification, but unsuitable for more complex tasks,
such as machine translation. Rule-based models also performed poorly in edge-case scenarios
because they couldn’t make accurate predictions or classifications for never-before-seen data for
which no clear rules were set. This problem was somewhat solved with simple neural networks,
such as RNNs and LSTMs, developed during the later phases of this period. RNNs and LSTMs could
memorize past data to a certain extent and, thus, provide context-dependent predictions and

A Beginner’s Guide to Large Language Models 11

You might also like