TypeofChunking
TypeofChunking
TypeofChunking
Chunking
Strategies
Structured
Data
Unstructured
Data
Fixed-length Chunking Dynamic Chunking
Chunk 1
Large Chunk 2
Document
Chunk n
Why is Chunking required?
Memory Limitation
Large documents can exceed memory capacity.
Chunking breaks down the data into manageable
chunks.
Processing Efficiency
Smaller chunks are faster to process.
Reduces computational cost.
Scalability
Allows to handle larger datasets
Makes system more robust and scalable.
Fixed-Length Chunking
As the name suggest, we create chunk of data of fixed size from
an existing document.
Steps
Input Document
Output
Consideration
Parameters
chunk_size: Number of characters each chunk will contain (in this example, 200
characters).
chunk_overlap: Number of overlapping characters between chunks to
preserve context across splits (20 characters in this example).
separator: Ensures that we don’t split in the middle of a word. In this case, it's set
to a space to split between words.
When to use Fixed-Length Chunking
• When the data is uniform in structure: Ideal for tasks like processing log files or
structured datasets where the content length is predictable.
• When memory and processing efficiency are a priority: Useful when handling large
datasets or models with strict token limits where predictability of chunk size helps
with system optimization.
• Efficient for Uniform Data: Works well for text where content length is consistent
and uniform, allowing predictable chunk sizes.
• Scalable: Fixed-length chunks are scalable for systems that need predictable
resource allocation and processing limits (e.g., token limits in LLMs).
Srinivas Mahakud
Cloud & AI Leader @EY
Srinivas Mahakud
https://www.linkedin.com/in/srinivasmahakud/