rag.chunkers¶
BaseChunker
¶
Bases: BaseModel
, ABC
Base class for chunkers.
Example:
from mirascope.rag import BaseChunker, Document
class TextChunker(BaseChunker):
chunk_size: int
chunk_overlap: int
def chunk(self, text: str) -> list[Document]:
chunks: list[Document] = []
start: int = 0
while start < len(text):
end: int = min(start + self.chunk_size, len(text))
chunks.append(Document(text=text[start:end], id=str(uuid.uuid4())))
start += self.chunk_size - self.chunk_overlap
return chunks
Source code in mirascope/rag/chunkers/base_chunker.py
chunk(text)
abstractmethod
¶
TextChunker
¶
Bases: BaseChunker
A text chunker that splits a text into chunks of a certain size and overlaps.
Example:
from mirascope.rag import TextChunker
text_chunker = TextChunker(chunk_size=1000, chunk_overlap=200)
chunks = text_chunker.chunk("This is a long text that I want to split into chunks.")
print(chunks)