Core Concepts
| Concept | Description |
|---|---|
| Knowledge Base | An independent collection of documents and content that an agent can query. Each base is isolated — documents from one base never appear in searches from another. |
| Chunk | A text fragment produced by splitting long documents. The chunker divides content into pieces of ~4,000 characters with an overlap of ~400 characters between consecutive chunks to preserve context at boundaries. |
| Embedding | A vector representation of a chunk generated by the text-embedding-3-small model. Each embedding has 1,536 dimensions and captures the semantic meaning of the text. |
| Retrieval | A cosine-similarity search process that compares the embedding of the user’s question with stored embeddings and returns the semantically closest chunks. |
| RAG | Retrieval-Augmented Generation — retrieved chunks are injected into the LLM context alongside the user’s message, allowing the agent to respond based on specific indexed information rather than generic pre-training knowledge. |
Why It Matters
- Responses grounded in real data: the agent answers with the exact content you indexed — policies, pricing, catalogs — not generic inferences from the base model.
- Updates without reprogramming: simply update the content in the Knowledge Base. The agent will use the new information immediately in subsequent conversations, with no need to modify the system prompt.
- Controlled scope: searches are always filtered by
knowledge_base_idandcompany_id, ensuring that one customer’s data never appears to another, and that distinct bases remain isolated even when an agent accesses multiple bases simultaneously.
How It Works
Content ingestion
You add content to the base — documents, Q&A pairs, website pages, or YouTube videos. The content is stored as a document in the
knowledge_base_documents table.Chunking
The
knowledge-process-document function splits the text into pieces of ~4,000 characters. An overlap of ~400 characters is maintained between consecutive chunks so context is not lost at boundaries.Embedding generation
Each chunk is converted into a 1,536-dimensional vector by OpenAI’s
text-embedding-3-small model, sent in batches of up to 100 chunks per request. Vectors are stored in the knowledge_chunks table.Semantic search
When the agent needs information, the
knowledge-search function converts the user’s question into a vector and runs a cosine-similarity search. The semantically closest chunks are returned, ordered by relevance.Supported Content Types
| Type | Description |
|---|---|
| Documents | PDFs, text files, and uploaded documents. The processor extracts the text, splits it into chunks, and generates embeddings. |
| Q&A | Manually added question-and-answer pairs. Indexed instantly, with no asynchronous pipeline. High precision because you control exactly what will be retrieved. |
| Website | URLs crawled by the crawler. The system traverses the pages, extracts textual content, and indexes it as documents. Useful for keeping the base in sync with public documentation. |
| YouTube | Video URLs. The system downloads the automatic transcript, splits it into chunks, and indexes it. Useful for video-based tutorial knowledge bases. |
Knowledge Lifecycle
Base creation
A Knowledge Base is created in the Knowledge module, given a name, and associated with the workspace. It starts empty, with no documents.
Content addition
Documents, Q&As, URLs, and videos are added to the base. Each source goes through the processing pipeline corresponding to its type.
Processing and indexing
Content is processed asynchronously: chunking, vectorization, and storage. The status changes from Processing to Indexed when complete.
Connecting to an agent
The base is connected to one or more agents in the Training tab. Retrieval parameters — top-k and similarity threshold — are configured per base.
Example
A software company connects three bases to the same support agent:- “Product FAQ” base with answers to the most frequent questions about features
- “Technical Documentation” base with manuals and integration guides
- “Commercial Policies” base with cancellation, refund, and contract rules