0% found this document useful (0 votes)

71 views28 pages

huggingface basics

The Hugging Face ecosystem is a platform for natural language processing (NLP) and machine learning (ML), offering tools like the Transformers library, datasets, and an inference API for model deployment. It simplifies tasks such as text classification, named entity recognition, and text generation through a high-level pipeline API. The ecosystem supports community collaboration and provides resources for training and evaluating ML models.

Uploaded by

arjun kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views28 pages

huggingface basics

Uploaded by

arjun kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

huggingface

April 2, 2025

1 What is Hugging Face� Ecosystem ?

The Hugging Face ecosystem is a comprehensive and popular platform for natural language pro-
cessing (NLP) and machine learning (ML). It provides a wide range of tools, libraries, and services
designed to facilitate the development, deployment, and sharing of ML models.

1.0.1 Hugging Face Ecosystem

• Transformers Library:
– Transformers: Provides thousands of pre-trained models for tasks like text classifica-
tion, translation, question answering, and text generation.
– Tokenizers: Specialized for tokenizing text, essential for NLP model preparation.
• Datasets:
– Access to various datasets for NLP and ML, simplifying loading, processing, and man-
agement.
• Hugging Face Hub:
– Platform for sharing and discovering pre-trained models and datasets.
• Inference API:
– Cloud service for deploying models and obtaining predictions via API calls.
• Spaces:
– Create and share interactive ML applications using Gradio or Streamlit.
• Training and Deployment:
– Tools for training models on custom datasets and deploying them using PyTorch, Ten-
sorFlow, and cloud services.
• Model Evaluation:
– Tools for evaluating and improving ML model performance on various tasks.
• Community and Collaboration:
– Active community sharing models, datasets, and knowledge, with forums and learning
resources.

2 Pipeline
The pipeline in Hugging Face’s Transformers library is a high-level abstraction that simplifies the
use of pre-trained models for various natural language processing (NLP) tasks. It allows users to
perform complex tasks with minimal code.

1
2.0.1 Hugging Face Pipeline API Tasks
• Text Classification: Sentiment analysis, spam detection, etc.
• Named Entity Recognition (NER): Identifying entities like names, dates, and locations
in text.
• Question Answering: Answering questions based on a given context.
• Text Generation: Generating text based on a given prompt (e.g., with GPT-2).
• Translation: Translating text from one language to another.
• Summarization: Generating a summary of a given text.
• Text2Text Generation: Tasks like summarization or translation using models like T5.
• Fill-Mask: Predicting masked words in a sentence (e.g., with BERT).
• Zero-Shot Classification: Classifying text into categories without explicit training on those
categories.
[1]: # ! pip show transformers

[2]: from transformers import pipeline

u:\hugging_face\venv\lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress

not found. Please update jupyter and ipywidgets. See
https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm

[ ]: # Sentiment Analysis
classifier = pipeline('sentiment-analysis')
result = classifier("I hate using Hugging Face transformers!")
print(result)

[ ]: # Named Entity Recognition (NER)

ner = pipeline('ner')
result = ner("My name is atharva and I live in atharva.")
print(result)

[5]: # Question Answering

question_answerer = pipeline('question-answering')
result = question_answerer(question="What is Hugging Face?", context="Hugging␣
↪Face is a company that provides open-source NLP technologies.")

print(result)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-

squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-
cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is
not recommended.
Device set to use cuda:0
{'score': 0.6646773219108582, 'start': 16, 'end': 68, 'answer': 'a company that
provides open-source NLP technologies'}

2
[ ]: # Text Generation
generator = pipeline('text-generation')
result = generator("Once upon a time,")
print(result)

2.0.2 AutoTokenizer Class

• Purpose: Automatically selects the appropriate tokenizer for a given model.
• Model Agnostic: Works with any model in the Hugging Face library.
• Simplifies Tokenization: Automatically handles model-specific tokenization nuances.

Key Features:
• Auto Detection: Identifies the correct tokenizer based on the model name or path.
• Easy Initialization: “‘python from transformers import AutoTokenizer tokenizer =
AutoTokenizer.from_pretrained(“bert-base-uncased”)
• Automatic Addition: The AutoTokenizer automatically adds special tokens (like [CLS],
[SEP], etc.) required by the model.
• Purpose: Special tokens are used for tasks like classification, separation of sentences, and
padding.
[7]: from transformers import AutoTokenizer

# Specify the model checkpoint

model_checkpoint = 'bert-base-uncased'

# Load the tokenizer associated with the model

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

# Example text
text = ["Transformers are incredibly powerful.","Transformers are awesome"]

# Tokenize text
tokens = tokenizer(text, padding=True, truncation=True, return_tensors='pt') #␣
↪'pt' for PyTorch tensors ,'tf' for tensorflow tensors.

[8]: tokens

[8]: {'input_ids': tensor([[ 101, 19081, 2024, 11757, 3928, 1012, 102],
[ 101, 19081, 2024, 12476, 102, 0, 0]]), 'token_type_ids':
tensor([[0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1,
1],
[1, 1, 1, 1, 1, 0, 0]])}

3
2.0.3 Tokenizer Output Explanation
input_ids:
• Description: These are the token IDs corresponding to the input text.
• Details:
– Each token in the vocabulary of the model is assigned a unique ID.
– Special tokens like [CLS] (start of sequence) and [SEP] (end of sequence) are included.
– Padding tokens (usually 0) are added to ensure all sequences in a batch have the same
length.
• Example:
– [101, 19081, 2024, 11757, 3928, 1012, 102]: Represents “Transformers are pow-
erful.” with [CLS] (101) at the start and [SEP] (102) at the end.
– [101, 19081, 2024, 12476, 102, 0, 0]: Represents “Transformers are versatile”
with padding tokens (0) added to match the length of the longest sequence.

token_type_ids:
• Description: These indicate the segment to which each token belongs. Used primarily for
tasks involving sentence pairs (e.g., question answering).
• Details:
– For single sentences, all values are 0.
– For sentence pairs, the first sentence tokens are 0 and the second sentence tokens are 1.
• Example:
– [[0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0]]: Since both examples are single
sentences, all token type IDs are 0.

attention_mask:
• Description: This indicates which tokens should be attended to (1) and which are just
padding (0).
• Details:
– 1 for actual tokens and 0 for padding tokens.
– Helps the model to ignore the padding tokens during processing.
• Example:
– [[1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 0, 0]]: Indicates that the first se-
quence is fully attended to, and the second sequence has padding that should be ignored.
[9]: # Example text
text = "Transformers are incredibly powerful."

# Tokenize the text

tokens = tokenizer.tokenize(text)
print("Tokens:", tokens)
input_ids = tokenizer.convert_tokens_to_ids(tokens)
print("Input IDs:", input_ids)

Tokens: ['transformers', 'are', 'incredibly', 'powerful', '.']

Input IDs: [19081, 2024, 11757, 3928, 1012]

4
The decode method allows us to check how the final output of the tokenizer translates back into
text.
[10]: # Convert tokens to token IDs
token_ids = tokenizer(text, return_tensors='pt')
print("Token IDs:", token_ids)
print(tokenizer.decode(input_ids))

Token IDs: {'input_ids': tensor([[ 101, 19081, 2024, 11757, 3928, 1012,
102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0]]), 'attention_mask':
tensor([[1, 1, 1, 1, 1, 1, 1]])}
transformers are incredibly powerful.

[11]: # To get the token_id for the PAD

print(tokenizer.pad_token_id)
print(tokenizer.pad_token)

0
[PAD]

2.0.4 Model-Specific Tokenizers

Model-specific tokenizers are designed to work with specific pre-trained models, handling tokeniza-
tion according to the requirements of each model.

Key Features:
• Customization: Tailored to the model’s architecture and vocabulary.
• Special Tokens: Automatically adds model-specific special tokens (e.g., [CLS], [SEP] for
BERT).
• Tokenization: Breaks down text into tokens that the model can process.
• BertTokenizer: Adds [CLS] and [SEP] tokens, lowercases the text, and splits words into
word pieces.
• GPT2Tokenizer: Does not add special tokens by default, uses Byte Pair Encoding (BPE)
for tokenization.
• RobertaTokenizer: Similar to BertTokenizer but designed for RoBERTa model, using a
different pre-training strategy and vocabulary. css
[12]: from transformers import BertTokenizer

# Initialize the tokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenize a sample text

inputs = tokenizer("Hello, this is an example using BertTokenizer.",␣
↪padding=True, truncation=True, return_tensors='pt')

print(inputs)

5
{'input_ids': tensor([[ 101, 7592, 1010, 2023, 2003, 2019, 2742, 2478,
14324, 18715,
18595, 6290, 1012, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1]])}

[13]: statements = ["Hello, this is an example using GPT2Tokenizer.", "learning␣

↪huggingface"]

[14]: from transformers import GPT2Tokenizer

# Initialize the tokenizer

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token

# Tokenize a sample text

inputs = tokenizer(statements, padding=True, truncation=True,␣
↪return_tensors='pt')

print(inputs)

{'input_ids': tensor([[15496, 11, 428, 318, 281, 1672, 1262, 402,

11571, 17,
30642, 7509, 13],
[40684, 46292, 2550, 50256, 50256, 50256, 50256, 50256, 50256, 50256,
50256, 50256, 50256]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1],
[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])}

[15]: from transformers import RobertaTokenizer

# Initialize the tokenizer

tokenizer = RobertaTokenizer.from_pretrained('roberta-base')

# Tokenize a sample text

inputs = tokenizer("Hello, this is an example using RobertaTokenizer.",␣
↪padding=True, truncation=True, return_tensors='pt')

print(inputs)

{'input_ids': tensor([[ 0, 31414, 6, 42, 16, 41, 1246, 634,

1738, 102,
45643, 6315, 4, 2]]), 'attention_mask': tensor([[1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

2.0.5 AutoModel Class

Purpose:

6
• Automatically selects the appropriate model architecture for a given pre-trained model.

Key Features:
• Model Agnostic: Works with any model in the Hugging Face library.
• Ease of Use: Simplifies loading pre-trained models with a single line of code.
[ ]: from transformers import AutoTokenizer, AutoModel

model_checkpoint = 'bert-base-uncased'

# Initialize the tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

# Tokenize a sample text

inputs = tokenizer("Hello, this is an example using AutoModel.",␣
↪return_tensors='pt')

print(inputs)
print("\n")
# Initialize the model
model = AutoModel.from_pretrained(model_checkpoint)

# Perform a forward pass

outputs = model(**inputs)

# Print the outputs

print(outputs)

2.0.6 1. Extract Word Embeddings

The number 768 represents the hidden size (or dimensionality) of BERT's internal representation

[17]: token_embedding = outputs.last_hidden_state[0, 1] # 2nd token embedding

[18]: all_token_embeddings = outputs.last_hidden_state[0] # Shape: (sequence_length,␣

↪768)

2.0.7 2. Use Sentence Embedding

[19]: sentence_embedding = outputs.pooler_output # Shape: (1, 768)

The output is not the final predictions here but rather the hidden states or embeddings produced
by the model. These outputs can be used as features for further processing or as input to other
layers (e.g., classification layers).

[20]: # from transformers import AutoModel

# bert_model = AutoModel.from_pretrained('bert-base-uncased')

7
# print(type(bert_model))
# print(bert_model)

# gpt_model = AutoModel.from_pretrained('gpt2')
# print(type(gpt_model))
# print(gpt_model)

# bart_model = AutoModel.from_pretrained('facebook/bart-large-cnn')
# print(type(bart_model))
# print(bart_model)

2.0.8 Custom Classification Model Using AutoModel

Purpose:
• Use AutoModel and add a custom classification head for specific tasks like sequence classifi-
cation.
[21]: import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

# Define a custom classification model

class CustomClassificationModel(nn.Module):
def __init__(self, model_checkpoint, num_labels):
super(CustomClassificationModel, self).__init__()
self.automodel = AutoModel.from_pretrained(model_checkpoint)
self.classifier = nn.Linear(self.automodel.config.hidden_size,␣
↪num_labels)

def forward(self, input_ids, attention_mask=None, token_type_ids=None):

outputs = self.automodel(input_ids, attention_mask=attention_mask,␣
↪token_type_ids=token_type_ids)

# Get the hidden state of the [CLS] token

cls_output = outputs.last_hidden_state[:, 0, :]
logits = self.classifier(cls_output)
return logits

# Model checkpoint and number of labels

model_checkpoint = 'bert-base-uncased'
num_labels = 2 # Example for binary classification

# Initialize the tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

8
[22]: inputs1 = tokenizer("Hello, this is an example using a custom classification␣
↪head. and it is bad", return_tensors='pt')

inputs2 = tokenizer("Hello, this is an example using a custom classification␣

↪head. and it is very good", return_tensors='pt')

[23]: # Initialize the custom model

model = CustomClassificationModel(model_checkpoint, num_labels)

# Perform a forward pass

logits = model(**inputs2)

# Apply softmax to get probabilities

probabilities = F.softmax(logits, dim=-1)

# Convert probabilities to predicted class labels

predictions = torch.argmax(probabilities, dim=-1)

# Print the probabilities and the predicted class

print(probabilities)

tensor([[0.4308, 0.5692]], grad_fn=<SoftmaxBackward0>)

[24]: predictions = torch.argmax(probabilities, dim=-1)

[25]: predictions

[25]: tensor([1])

2.0.9 AutoModelFor** Classes

Purpose:
• Automatically select the appropriate model architecture with a task-specific head for various
NLP tasks.

Key Features:
• Task-Specific: Each class is designed for a specific NLP task.
• Ease of Use: Simplifies loading and using pre-trained models with the appropriate heads.

Common AutoModelFor Classes:

1. AutoModelForSequenceClassification:
• Used for tasks like text classification and sentiment analysis.
2. AutoModelForCausalLM:
• Used for tasks like language modeling and text generation.
3. AutoModelForTokenClassification:
• Used for tasks like named entity recognition (NER).

9
AutoModelForSequenceClassification:
[26]: from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch.nn.functional as F
import torch

# Initialize the tokenizer and model

model_checkpoint = 'bert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint)

# Tokenize input text

inputs = tokenizer("Hello, this is an example for sequence classification.",␣
↪return_tensors='pt')

# Perform a forward pass

outputs = model(**inputs)
logits = outputs.logits

# Apply softmax to get probabilities

probabilities = F.softmax(logits, dim=-1)
predictions = torch.argmax(probabilities, dim=-1)

# Print the probabilities and predicted class

print(probabilities)
print(predictions)

Some weights of BertForSequenceClassification were not initialized from the

model checkpoint at bert-base-uncased and are newly initialized:
['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it
for predictions and inference.
tensor([[0.7151, 0.2849]], grad_fn=<SoftmaxBackward0>)
tensor([0])
AutoModelForCausalLM:
[ ]: # AutoModelForCausalLM

from transformers import AutoTokenizer, AutoModelForCausalLM

# Initialize the tokenizer and model

model_checkpoint = 'gpt2'
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForCausalLM.from_pretrained(model_checkpoint)

# Tokenize input text

inputs = tokenizer("Once upon a time", return_tensors='pt')

10
# Generate text
outputs = model.generate(inputs['input_ids'], max_length=50,␣
↪num_return_sequences=1)

# Decode and print the generated text

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\n")
print(generated_text)

AutoModelForTokenClassification:
[ ]: from transformers import AutoTokenizer, AutoModelForTokenClassification

# Initialize the tokenizer and model

model_checkpoint = 'dbmdz/bert-large-cased-finetuned-conll03-english'
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint)

# Tokenize input text

inputs = tokenizer("John lives in New York City.", return_tensors='pt')

# Perform a forward pass

outputs = model(**inputs)
logits = outputs.logits

# Get the predictions

predictions = torch.argmax(logits, dim=-1)

# Print the predictions

predicted_tokens = [model.config.id2label[prediction.item()] for prediction in␣
↪predictions[0]]

print("\n")
print(predicted_tokens)

2.0.10 Model-Specific Classes

Model-specific classes in Hugging Face Transformers are designed to handle specific NLP tasks by
adding the appropriate heads to pre-trained models. These classes simplify the use of models for
tasks like text classification, text generation, and token classification.
BertForSequenceClassification:
[29]: from transformers import BertTokenizer, BertForSequenceClassification
import torch

11
# Load the tokenizer and model
model_checkpoint = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_checkpoint)
model = BertForSequenceClassification.from_pretrained(model_checkpoint)

# Define labels (these are examples; adjust based on your actual model's␣
↪training)

labels = ["Negative", "Positive"]

# Input sentences
sentences = [
"I've been waiting for a HuggingFace course my whole life.",
"I hate this"
]

# Tokenize and encode the sentences

inputs = tokenizer(sentences, padding=True, truncation=True,␣
↪return_tensors='pt')

# Perform a forward pass and get logits

outputs = model(**inputs).logits

# Apply softmax to get probabilities

probabilities = torch.nn.functional.softmax(outputs, dim=-1)

# Get the predicted class

predictions = torch.argmax(probabilities, dim=-1)

# Print the probabilities and predicted classes

for i, sentence in enumerate(sentences):
print("\n")
print(f"Sentence: {sentence}")
print(f"Probabilities: {probabilities[i].tolist()}")
print(f"Predicted Class: {labels[predictions[i]]}")

Some weights of BertForSequenceClassification were not initialized from the

Sentence: I've been waiting for a HuggingFace course my whole life.

Probabilities: [0.48447129130363464, 0.5155287384986877]
Predicted Class: Positive

12
Sentence: I hate this
Probabilities: [0.4586593806743622, 0.5413405895233154]
Predicted Class: Positive
GPT2LMHeadModel:
[ ]: # GPT2LMHeadModel

from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Load the tokenizer and model

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

# Input prompt
prompt = "Once upon a time"

# Tokenize and encode the prompt

inputs = tokenizer(prompt, return_tensors='pt')

# Generate text
outputs = model.generate(inputs['input_ids'], max_length=50,␣
↪num_return_sequences=1)

# Decode and print the generated text

generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("\n")
print("Generated Text:", generated_text)

2.0.11 AutoConfig Class

Purpose: The AutoConfig class in Hugging Face Transformers is used to automatically load the
configuration of a pre-trained model. This configuration includes model architecture details and
hyperparameters, which are essential for initializing models correctly.

Key Features:
• Automatic Configuration Loading: Load configurations without specifying the model
class explicitly.
• Customization: Modify model configurations to suit specific needs.
[31]: from transformers import AutoConfig

# Load configuration for a specific model checkpoint

model_checkpoint = 'bert-base-uncased'
config = AutoConfig.from_pretrained(model_checkpoint)

# Print the configuration

print(config)

13
BertConfig {
"architectures": [
"BertForMaskedLM"
],
"attention_probs_dropout_prob": 0.1,
"classifier_dropout": null,
"gradient_checkpointing": false,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"transformers_version": "4.50.3",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}

[ ]: # Modify the configuration

config.num_labels = 5 # Change the number of labels for classification

# Print the modified configuration

print(config)
config.hidden_act

[ ]: # Using Configuration to Initialize a Model

from transformers import AutoConfig, AutoModelForSequenceClassification

# Load and modify the configuration

model_checkpoint = 'bert-base-uncased'
config = AutoConfig.from_pretrained(model_checkpoint)
config.num_labels = 5 # Change the number of labels for classification

# Initialize the model with the modified configuration

model = AutoModelForSequenceClassification.from_config(config)

# Print the model

print(model)

14
2.0.12 Model-Specific Configuration Classes
Purpose: Model-specific configuration classes in Hugging Face Transformers are used to define
the architecture and hyperparameters for specific models. These configurations are essential for
initializing models correctly and can be customized to suit specific needs.
[ ]: # BERT Configuration
from transformers import BertConfig, BertForSequenceClassification

# Load and modify the BERT configuration

config = BertConfig.from_pretrained('bert-base-uncased')
config.num_labels = 5 # Change the number of labels for classification

# Initialize the BERT model with the modified configuration

model = BertForSequenceClassification(config)

# Print the model

print(model)

[ ]: # GPT-2 Configuration

from transformers import GPT2Config, GPT2LMHeadModel

# Load and modify the GPT-2 configuration

config = GPT2Config.from_pretrained('gpt2')
config.output_hidden_states = True # Change the configuration to output hidden␣
↪states

# Initialize the GPT-2 model with the modified configuration

model = GPT2LMHeadModel(config)

# Print the model

print(model)

[ ]: # DistilBERT Configuration

from transformers import DistilBertConfig, DistilBertForTokenClassification

# Load and modify the DistilBERT configuration

config = DistilBertConfig.from_pretrained('distilbert-base-uncased')
config.num_labels = 9 # Change the number of labels for NER

# Initialize the DistilBERT model with the modified configuration

model = DistilBertForTokenClassification(config)

# Print the model

print(model)

15
2.0.13 Dataset Class
Purpose: The Dataset class in the Hugging Face datasets library is used to handle and ma-
nipulate datasets efficiently. It supports a wide range of operations for loading, processing, and
transforming datasets, making it easier to prepare data for machine learning models.

Key Features:
• Loading Datasets: Load datasets from local files or the Hugging Face Hub.
• Processing: Apply various preprocessing and transformation functions.
• Splitting: Split datasets into training, validation, and test sets.
• Batching: Efficiently batch data for model training and evaluation.

Important Methods:
1. Loading a Dataset:
• load_dataset(): Loads a dataset from a local file or the Hugging Face Hub.
[37]: # %%capture
# !pip install datasets

[38]: from datasets import load_dataset

# Load a dataset from the Hugging Face Hub

dataset = load_dataset('imdb')

[39]: dataset

[39]: DatasetDict({
train: Dataset({
features: ['text', 'label'],
num_rows: 25000
})
test: Dataset({
features: ['text', 'label'],
num_rows: 25000
})
unsupervised: Dataset({
features: ['text', 'label'],
num_rows: 50000
})
})

[40]: train_subset = dataset['train'].select(range(10000))

[41]: train_subset

[41]: Dataset({
features: ['text', 'label'],

16
num_rows: 10000
})

[42]: import pandas as pd

pd.DataFrame(dataset['train'][0], index=[0])

[42]: text label

0 I rented I AM CURIOUS-YELLOW from my video sto… 0

[43]: import pandas as pd

[44]: pd.DataFrame(dataset['train'][0:3])

[44]: text label

0 I rented I AM CURIOUS-YELLOW from my video sto… 0
1 "I Am Curious: Yellow" is a risible and preten… 0
2 If only to avoid making this type of film in t… 0

[45]: train_subset = dataset['train'].select(range(10000))

[46]: train_subset

[46]: Dataset({
features: ['text', 'label'],
num_rows: 10000
})

[47]: dataset['train'].features

[47]: {'text': Value(dtype='string', id=None),

'label': ClassLabel(names=['neg', 'pos'], id=None)}

train_test_split(): Splits a dataset into training and test sets.

[48]: # Split the dataset into training and test sets
split_dataset = dataset['train'].train_test_split(test_size=0.1)
train_data = split_dataset['train']
test_data = split_dataset['test']

map(): Applies a function to all examples in the dataset.

[49]: # Define a preprocessing function
def preprocess_function(examples):
return tokenizer(examples['text'], padding='max_length', truncation=True)

# Assign a padding token if not set

if tokenizer.pad_token is None:

17
tokenizer.pad_token = tokenizer.eos_token if tokenizer.eos_token else␣
'[PAD]'
↪

tokenizer.add_special_tokens({'pad_token': tokenizer.pad_token})

# Apply the preprocessing function to the dataset

tokenized_dataset = dataset['train'].map(preprocess_function, batched=True)

DataLoader: Used to create batches of data for training and evaluation.

[ ]: from torch.utils.data import DataLoader

# Create a DataLoader for the training data

train_dataloader = DataLoader(tokenized_dataset, batch_size=8, shuffle=True)

# Print the first batch

for batch in train_dataloader:
print(batch)
break

[ ]: import pprint
# Print the first batch
for batch in train_dataloader:
pprint.pprint(batch)
break

2.0.14 Dataset Class Examples

Purpose: The load_dataset function can be used to load datasets from local files or directly
from the internet. This flexibility makes it easy to work with various data sources.

Example 1: Loading a Dataset from a Local File Path Code: “‘python from datasets
import load_dataset

3 Specify the file path (assuming a CSV file format)

file_path = ‘/path/to/your/local/file.csv’

4 Load the dataset from the local file

dataset = load_dataset(‘csv’, data_files=file_path)

5 Print the first example

print(dataset[‘train’][0])

18
[52]: ## Loading a Dataset from the Internet

from datasets import load_dataset

# Specify the URL of the dataset (assuming a CSV file format)

url = 'https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv'

# Load the dataset from the URL

dataset = load_dataset('csv', data_files=url)

# Print the first example

print(dataset['train'][0])

{'Month': 'JAN', ' "1958"': 340, ' "1959"': 360, ' "1960"': 417}

5.0.1 DataCollator Class

Purpose: The DataCollator class in the Hugging Face Transformers library is used to collate
data into batches and prepare them for model input. It handles various preprocessing tasks such
as padding, masking, and formatting to ensure that the input data is compatible with the model
requirements.

Key Features:
• Padding: Ensures that all sequences in a batch have the same length by adding padding
tokens.
• Masking: Creates attention masks to distinguish between real tokens and padding tokens.
• Formatting: Prepares data in the correct format required by the model.
1. DataCollatorWithPadding:
• Automatically pads the sequences in a batch to the same length.

6 What is dynamic padding ?

Unlike static padding where every sequence in the dataset is padded to the maximum length found
in the dataset, dynamic padding adjusts the length of sequences within each batch to the longest
sequence in that batch. This minimizes the amount of padding used and can reduce computation
time significantly.
By only padding to the longest sequence in a batch, dynamic padding reduces the number of
unnecessary operations (like processing padding tokens) that the model has to perform. This can
lead to faster training times and more efficient memory usage, as less padding means fewer data
points to process.
[53]: from transformers import DataCollatorWithPadding, AutoTokenizer
from torch.utils.data import DataLoader

# Initialize the tokenizer

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

19
# Define a simple dataset
dataset = [
{"text": "I've been waiting for a HuggingFace course my whole life."},
{"text": "I hate this"}
]

# Tokenize the dataset

tokenized_dataset = [tokenizer(data['text'], truncation=True) for data in␣
↪dataset]

# Initialize the DataCollator

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Create a DataLoader
dataloader = DataLoader(tokenized_dataset, batch_size=2,␣
↪collate_fn=data_collator)

# Print the batch

for batch in dataloader:
print(batch)
print("\n")

{'input_ids': tensor([[ 101, 1045, 1005, 2310, 2042, 3403, 2005, 1037,
17662, 12172,
2607, 2026, 2878, 2166, 1012, 102],
[ 101, 1045, 5223, 2023, 102, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0]]), 'token_type_ids':
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask':
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])}

[54]: # DataCollatorForLanguageModeling
# Prepares data for language modeling tasks by masking tokens.

from transformers import DataCollatorForLanguageModeling, AutoTokenizer

from torch.utils.data import DataLoader

# Initialize the tokenizer

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Define a simple dataset

dataset = [
{"text": "I've been waiting for a HuggingFace course my whole life."},

20
{"text": "I hate this"}
]

# Tokenize the dataset

tokenized_dataset = [tokenizer(data['text'], truncation=True) for data in␣
↪dataset]

# Initialize the DataCollator with masking

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=True,␣
↪mlm_probability=0.15)

# Create a DataLoader
dataloader = DataLoader(tokenized_dataset, batch_size=2,␣
↪collate_fn=data_collator)

# Print the batch

for batch in dataloader:
print(batch)

{'input_ids': tensor([[ 101, 1045, 1005, 2310, 103, 103, 2005, 1037,
17662, 12172,
27589, 103, 2878, 2166, 1012, 102],
[ 101, 1045, 5223, 2023, 102, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0]]), 'token_type_ids':
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask':
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'labels':
tensor([[-100, -100, -100, -100, 2042, 3403, -100, -100, -100, -100, 2607, 2026,
-100, -100, -100, -100],
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100]])}

[55]: # # DataCollatorForSeq2Seq
# # Prepares data for sequence-to-sequence tasks such as translation and␣
↪summarization.

# from transformers import DataCollatorForSeq2Seq, AutoTokenizer,␣

↪AutoModelForSeq2SeqLM

# from torch.utils.data import DataLoader

# # Initialize the tokenizer and model

# tokenizer = AutoTokenizer.from_pretrained('t5-small')
# model = AutoModelForSeq2SeqLM.from_pretrained('t5-small')

# # Define a simple dataset

# dataset = [

21
# {"text": "translate English to French: HuggingFace is a great library."},
# {"text": "translate English to French: I love programming."}
# ]

# # Tokenize the dataset

# tokenized_dataset = [tokenizer(data['text'], truncation=True) for data in␣
↪dataset]

# # Initialize the DataCollator

# data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

# # Create a DataLoader
# dataloader = DataLoader(tokenized_dataset, batch_size=2,␣
↪collate_fn=data_collator)

# # Print the batch

# for batch in dataloader:
# print(batch)

6.0.1 TrainingArguments Class

Purpose: The TrainingArguments class in the Hugging Face Transformers library is used to
specify various hyperparameters and configurations for the training process. It allows you to cus-
tomize the training setup, including learning rate, batch size, number of epochs, and more.

Key Features:
• Learning Rate: Configure the learning rate for the optimizer.
• Batch Size: Set the batch size for training and evaluation.
• Number of Epochs: Specify the number of training epochs.
• Logging: Enable logging of training metrics and save logs to a specified directory.
• Evaluation Strategy: Configure when to perform evaluation during training.
• Checkpointing: Set up model checkpointing to save the model at specified intervals.

Example Parameters:
• output_dir: Directory to save the model checkpoints.
• evaluation_strategy: Evaluation strategy to use during training ("steps" or "epoch").
• learning_rate: Learning rate for the optimizer.
• per_device_train_batch_size: Batch size for training.
• per_device_eval_batch_size: Batch size for evaluation.
• num_train_epochs: Number of training epochs.
• weight_decay: Weight decay for the optimizer.
• logging_dir: Directory to save the logs.
• logging_steps: Log training metrics every specified number of steps.

Example Usage: “‘python from transformers import TrainingArguments

22
7 Define the training arguments
training_args = TrainingArguments( output_dir=‘./results’, # Directory to save the model check-
points evaluation_strategy=‘epoch’, # Evaluate at the end of every epoch learning_rate=2e-5,
# Learning rate for the optimizer per_device_train_batch_size=8, # Batch size for training
per_device_eval_batch_size=8, # Batch size for evaluation num_train_epochs=3, # Number of
training epochs weight_decay=0.01, # Weight decay for the optimizer logging_dir=‘./logs’, # Di-
rectory to save the logs logging_steps=10, # Log training metrics every 10 steps save_steps=500,
# Save model checkpoint every 500 steps save_total_limit=2, # Limit the total number of check-
points )

7.0.1 Trainer API

Purpose:
• Simplifies the process of training and evaluating models.
• Provides a flexible and extensible framework for various training and evaluation tasks.

Key Features:
• Training: Handles the training loop, including forward and backward passes, optimizer steps,
and learning rate scheduling.
• Evaluation: Supports model evaluation on validation and test sets.
• Data Loading: Integrates seamlessly with DataLoader and Dataset objects.
• Logging: Provides logging and tracking of training metrics.
• Checkpointing: Supports model checkpointing to save and load models during training.

Important Methods:
• train(): Starts the training loop.
• evaluate(): Evaluates the model on a given dataset.
• predict(): Generates predictions on a given dataset.
• save_model(): Saves the model and tokenizer to disk.
• log(): Logs training metrics.

23
8jfrffycd

April 2, 2025

[1]: import torch

torch.cuda.empty_cache() # Clears unused memory
torch.cuda.reset_peak_memory_stats()

[2]: from transformers import AutoTokenizer

from datasets import load_dataset

# Load the dataset

dataset = load_dataset('imdb')

# Split the dataset into training and test sets

dataset = load_dataset('imdb')

# Split the dataset into training and test sets

split_dataset = dataset['train'].train_test_split(test_size=0.1)
train_data = split_dataset['train']
test_data = split_dataset['test']

train_data = split_dataset["train"].select(range(1000))
test_data = split_dataset["test"].select(range(1000))

model_checkpoint = 'bert-base-uncased'

tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

# Tokenization function
def tokenize_function(examples):
return tokenizer(examples['text'], truncation=True, padding=True)

# Apply tokenization and remove the original text column

train_tokenized_dataset = train_data.map(tokenize_function,␣
↪batched=True,remove_columns=['text'])

test_tokenized_dataset = test_data.map(tokenize_function,␣
↪batched=True,remove_columns=['text'])

# Check dataset structure

1
print(train_tokenized_dataset) # Should contain input_ids, attention_mask, and␣
↪label

print("\n")

print(test_tokenized_dataset) # Should contain input_ids, attention_mask, and␣

↪label

u:\hugging_face\venv\lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress

not found. Please update jupyter and ipywidgets. See
https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
Map: 100%|��| 1000/1000 [00:00<00:00, 1835.77 examples/s]
Map: 100%|��| 1000/1000 [00:00<00:00, 2982.79 examples/s]
Dataset({
features: ['label', 'input_ids', 'token_type_ids', 'attention_mask'],
num_rows: 1000
})

Dataset({
features: ['label', 'input_ids', 'token_type_ids', 'attention_mask'],
num_rows: 1000
})

[3]: print(train_tokenized_dataset.column_names)
print(test_tokenized_dataset.column_names)

['label', 'input_ids', 'token_type_ids', 'attention_mask']

[4]: train_tokenized_dataset = train_tokenized_dataset.select_columns(["input_ids",␣

↪"attention_mask", "token_type_ids", "label"])

test_tokenized_dataset = test_tokenized_dataset.select_columns(["input_ids",␣
↪"attention_mask", "token_type_ids", "label"])

[5]: train_tokenized_dataset

[5]: Dataset({
features: ['input_ids', 'attention_mask', 'token_type_ids', 'label'],
num_rows: 1000
})

[6]: test_tokenized_dataset

2
[6]: Dataset({
features: ['input_ids', 'attention_mask', 'token_type_ids', 'label'],
num_rows: 1000
})

[7]: # print(train_tokenized_dataset.keys())
# print(test_tokenized_dataset.keys())

[8]: import torch

torch.cuda.empty_cache() # Clears unused memory
torch.cuda.reset_peak_memory_stats()

torch.cuda.empty_cache() # Clears unused memory

torch.cuda.reset_peak_memory_stats()

torch.cuda.empty_cache() # Clears unused memory

torch.cuda.reset_peak_memory_stats()

torch.cuda.empty_cache() # Clears unused memory

torch.cuda.reset_peak_memory_stats()

[9]: from transformers import AutoModelForSequenceClassification, Trainer,␣

↪TrainingArguments

from datasets import load_dataset

import numpy as np
from sklearn.metrics import accuracy_score

model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint)
model.to("cuda")

training_args = TrainingArguments(
output_dir='./results',
evaluation_strategy='epoch',
eval_steps=2,
learning_rate=2e-5,
per_device_train_batch_size=32,
per_device_eval_batch_size=32,
num_train_epochs=1,
weight_decay=0.01,
logging_dir='./logs',
logging_steps=2,
fp16=True,
gradient_accumulation_steps=4
)

3
def compute_metrics(p):
preds = np.argmax(p.predictions, axis=1)
return {"accuracy": accuracy_score(p.label_ids, preds)}

trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_tokenized_dataset,
eval_dataset=test_tokenized_dataset,
compute_metrics=compute_metrics
)

# Train the model

trainer.train()

# Evaluate the model

eval_results = trainer.evaluate()

# Print evaluation results

print(f"Evaluation results: {eval_results}")
trainer.save_model()

Some weights of BertForSequenceClassification were not initialized from the

model checkpoint at bert-base-uncased and are newly initialized:
['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it
for predictions and inference.
u:\hugging_face\venv\lib\site-packages\transformers\training_args.py:1611:
FutureWarning: `evaluation_strategy` is deprecated and will be removed in
version 4.46 of � Transformers. Use `eval_strategy` instead
warnings.warn(
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
Evaluation results: {'eval_loss': 0.6541457772254944, 'eval_accuracy': 0.734,
'eval_runtime': 127.8558, 'eval_samples_per_second': 7.821,
'eval_steps_per_second': 0.25, 'epoch': 1.0}

[10]: trainer.save_model()

[15]: import matplotlib.pyplot as plt

# Extract training loss and evaluation loss from log history

train_steps = []

4
train_loss = []
eval_steps = []
eval_loss = []

for entry in trainer.state.log_history:

if "loss" in entry: # Training loss
train_steps.append(entry["step"])
train_loss.append(entry["loss"])
if "eval_loss" in entry: # Validation loss
eval_steps.append(entry["step"])
eval_loss.append(entry["eval_loss"])

# Plot Training Loss vs. Steps

plt.figure(figsize=(10, 5))
plt.plot(train_steps, train_loss, label="Training Loss", color="blue")
plt.plot(eval_steps, eval_loss, label="Validation Loss", color="red")
plt.xlabel("Steps")
plt.ylabel("Loss")
plt.title("Training & Validation Loss over Steps")
plt.legend()
plt.show()

[ ]:

Why Do My JBL Headphones Keep Turning Off
No ratings yet
Why Do My JBL Headphones Keep Turning Off
5 pages
Import and Export Data
No ratings yet
Import and Export Data
46 pages
SentinelOne Api Documentation Version 2 1
No ratings yet
SentinelOne Api Documentation Version 2 1
3,013 pages
RS3668 - Computer Science Grade 9 PDF
100% (2)
RS3668 - Computer Science Grade 9 PDF
273 pages
Recent Advances in Networking
No ratings yet
Recent Advances in Networking
465 pages
Basic LLM Routing Tutorial
No ratings yet
Basic LLM Routing Tutorial
21 pages
Hitachi cp-x260 pj658 sm.0566
No ratings yet
Hitachi cp-x260 pj658 sm.0566
77 pages
NVIDIA Aerial GPU Hosted AI-on-5G
No ratings yet
NVIDIA Aerial GPU Hosted AI-on-5G
6 pages
Limited Companies Ahmedabad
0% (1)
Limited Companies Ahmedabad
96 pages
6wind Support Intel DPDK Presentation
100% (1)
6wind Support Intel DPDK Presentation
40 pages
Hedging Ecommerce Risk
No ratings yet
Hedging Ecommerce Risk
15 pages
Python Notes
No ratings yet
Python Notes
279 pages
Insight Platform Quick Start Guide
No ratings yet
Insight Platform Quick Start Guide
82 pages
Chainsaws at Work
No ratings yet
Chainsaws at Work
16 pages
LLMOps Toolkit - Prashant Sahu
No ratings yet
LLMOps Toolkit - Prashant Sahu
12 pages
PhonePe Statement Aug2024 Oct2024
No ratings yet
PhonePe Statement Aug2024 Oct2024
26 pages
TMC Module Offline
No ratings yet
TMC Module Offline
118 pages
BPS UserGuide 8.10
No ratings yet
BPS UserGuide 8.10
1,827 pages
Lecture 3 P4 NetFPGA
No ratings yet
Lecture 3 P4 NetFPGA
83 pages
Stas Bekman - Machine Learning Engineering
No ratings yet
Stas Bekman - Machine Learning Engineering
217 pages
Transnet SOC Ltd PAIA Info Manual 2020
No ratings yet
Transnet SOC Ltd PAIA Info Manual 2020
24 pages
LangChain & RAG
No ratings yet
LangChain & RAG
62 pages
USB-HC PWRMGMT
No ratings yet
USB-HC PWRMGMT
11 pages
Diabetes EDA and Kears Modeling
No ratings yet
Diabetes EDA and Kears Modeling
26 pages
NPU MachineLearning
No ratings yet
NPU MachineLearning
28 pages
Native Packet Transport NPT Family Brochure
100% (1)
Native Packet Transport NPT Family Brochure
8 pages
PRESENTATION - Ask The Expert - How Do I Integrate SAS Viya and Open Source
No ratings yet
PRESENTATION - Ask The Expert - How Do I Integrate SAS Viya and Open Source
121 pages
Hugging Face
100% (1)
Hugging Face
11 pages
Meaning, Scope, Origin of Auditing
No ratings yet
Meaning, Scope, Origin of Auditing
15 pages
TensorRT Installation Guide
No ratings yet
TensorRT Installation Guide
45 pages
P4
No ratings yet
P4
55 pages
Multimodal RAG Systems Hands-On Guide
No ratings yet
Multimodal RAG Systems Hands-On Guide
7 pages
Third Quarter, Week 1-4
No ratings yet
Third Quarter, Week 1-4
4 pages
Machine Learning Methods For Data Security
No ratings yet
Machine Learning Methods For Data Security
141 pages
Multi-Modal Generative AI Survey
No ratings yet
Multi-Modal Generative AI Survey
23 pages
NETFPGA
100% (1)
NETFPGA
23 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
memory_based_questions
No ratings yet
memory_based_questions
29 pages
Suricata Tutorial: Flocon 2016
No ratings yet
Suricata Tutorial: Flocon 2016
78 pages
Buildroot Slides
No ratings yet
Buildroot Slides
337 pages
Admission Cell BACRS 2023
No ratings yet
Admission Cell BACRS 2023
3 pages
C2 - W1 Mlopssadsa
No ratings yet
C2 - W1 Mlopssadsa
111 pages
PNMS+ Back To Back (Point To Multi Point) Connection
No ratings yet
PNMS+ Back To Back (Point To Multi Point) Connection
1 page
M Jamil S/O Feroze Din H 2 ST 12 P Pir MGP LHR: Web Generated Bill
No ratings yet
M Jamil S/O Feroze Din H 2 ST 12 P Pir MGP LHR: Web Generated Bill
1 page
e-Passbook-2025-05-31-22-17-10-pm
No ratings yet
e-Passbook-2025-05-31-22-17-10-pm
20 pages
Genres of Professional Writing
No ratings yet
Genres of Professional Writing
8 pages
CUDA Tutorial
No ratings yet
CUDA Tutorial
50 pages
Screenshot 2024-03-21 at 8.40.16 AM
No ratings yet
Screenshot 2024-03-21 at 8.40.16 AM
2 pages
Difference Between Cost Accounting and Management Accounting
No ratings yet
Difference Between Cost Accounting and Management Accounting
7 pages
Guest Speaker
No ratings yet
Guest Speaker
1 page
@spring Bean Annotations
No ratings yet
@spring Bean Annotations
5 pages
Uprising before 1857
No ratings yet
Uprising before 1857
3 pages
Gluon Tutorials: Deep Learning - The Straight Dope
No ratings yet
Gluon Tutorials: Deep Learning - The Straight Dope
403 pages
Web Related Abbreviations List
No ratings yet
Web Related Abbreviations List
6 pages
La 78040
No ratings yet
La 78040
4 pages
Im-Rag Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
No ratings yet
Im-Rag Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues
11 pages
GT Table Can Not Be Raised - Shorted Motor Cable
No ratings yet
GT Table Can Not Be Raised - Shorted Motor Cable
4 pages
Jetson Nano Developer Kit User Guide
No ratings yet
Jetson Nano Developer Kit User Guide
26 pages
How To Get Mission Critical Broadband Fast With Taira Airbus White Paper
No ratings yet
How To Get Mission Critical Broadband Fast With Taira Airbus White Paper
12 pages
Hyderabad 1717
No ratings yet
Hyderabad 1717
8 pages
ArcSight Express - Technical Presentation
No ratings yet
ArcSight Express - Technical Presentation
37 pages
Switch Abstraction Interface: Joint Work From Microsoft, Dell, Facebook, Broadcom, Intel, Mellanox
No ratings yet
Switch Abstraction Interface: Joint Work From Microsoft, Dell, Facebook, Broadcom, Intel, Mellanox
30 pages
Introduction - Hugging Face NLP Course
No ratings yet
Introduction - Hugging Face NLP Course
8 pages
Speaker -A02- 5747- Best Practices in Networking for AI
No ratings yet
Speaker -A02- 5747- Best Practices in Networking for AI
15 pages
Bro Log Vars
No ratings yet
Bro Log Vars
6 pages
MM-LLMs Recent Advances in MultiModal Large Language Models
No ratings yet
MM-LLMs Recent Advances in MultiModal Large Language Models
22 pages
Community Session IndexingChaining
No ratings yet
Community Session IndexingChaining
19 pages
LLaVA - Large Multimodal Model
No ratings yet
LLaVA - Large Multimodal Model
15 pages
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
No ratings yet
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
76 pages
BreachSeek A Multi-Agent Automated Penetration Tester
No ratings yet
BreachSeek A Multi-Agent Automated Penetration Tester
7 pages
Internet Architecture and Performance Metrics
No ratings yet
Internet Architecture and Performance Metrics
14 pages
DSA for Data Scientists - The Ultimate Roadmap
No ratings yet
DSA for Data Scientists - The Ultimate Roadmap
4 pages
Generative AI
No ratings yet
Generative AI
25 pages
00 Course Introduction
100% (1)
00 Course Introduction
17 pages
BCM 6.30.223.227 WLAN Release Note v0.1
No ratings yet
BCM 6.30.223.227 WLAN Release Note v0.1
8 pages
Generative AI
No ratings yet
Generative AI
11 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
Career Track For AI/ML
No ratings yet
Career Track For AI/ML
10 pages
Law Questionaire
100% (1)
Law Questionaire
7 pages
GNN-XAI 学习提纲.md
No ratings yet
GNN-XAI 学习提纲.md
4 pages
Gen Ai Solutions
No ratings yet
Gen Ai Solutions
14 pages
Mining Multilevel Association Rules From Transactional Databases
No ratings yet
Mining Multilevel Association Rules From Transactional Databases
46 pages
Parameter-Efficient Fine-Tuning Methods For Pretrained Language Models - A Critical Review and Assessment
No ratings yet
Parameter-Efficient Fine-Tuning Methods For Pretrained Language Models - A Critical Review and Assessment
20 pages
Hands-On Lab With LLMs and Gen AI Within IDC
No ratings yet
Hands-On Lab With LLMs and Gen AI Within IDC
57 pages
mcp_security
No ratings yet
mcp_security
28 pages
Pytorch: Tensors and Datasets
No ratings yet
Pytorch: Tensors and Datasets
9 pages
What Is Natural Language Processing?
No ratings yet
What Is Natural Language Processing?
5 pages
Ai Notes
No ratings yet
Ai Notes
2 pages
Data Science & Generative AI Technologies
No ratings yet
Data Science & Generative AI Technologies
97 pages
Hugging Face Case Study 112023
No ratings yet
Hugging Face Case Study 112023
2 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
2023 Intro To Generative Ai
No ratings yet
2023 Intro To Generative Ai
15 pages
Building Websites with VB.NET and DotNetNuke 4
From Everand
Building Websites with VB.NET and DotNetNuke 4
Daniel N. Egan
1/5 (1)
Configuring IPCop Firewalls: Closing Borders with Open Source
From Everand
Configuring IPCop Firewalls: Closing Borders with Open Source
Barrie Dempster
No ratings yet
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet
TIBCO Software The Ultimate Step-By-Step Guide
From Everand
TIBCO Software The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet