> ## Documentation Index
> Fetch the complete documentation index at: https://docs.prisme.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# RAG Settings

> Configure chunking, embeddings, and retrieval for optimal results

RAG (Retrieval Augmented Generation) settings control how your documents are processed and retrieved. Fine-tuning these settings can significantly improve the quality of agent responses.

## Quick Start: Presets

When creating a knowledge base, choose a preset that matches your needs:

| Preset       | Best For                               | Trade-off                         |
| ------------ | -------------------------------------- | --------------------------------- |
| **Fast**     | Quick setup, general content           | Speed over precision              |
| **Balanced** | Most use cases                         | Good balance of speed and quality |
| **Quality**  | Complex documents, high accuracy needs | Slower processing, better results |

Each preset configures the parser, chunking strategy, and chunk sizes automatically. You can customize settings after creation if needed.

<Tip>
  Start with **Balanced** for most use cases. Switch to **Quality** if you notice retrieval issues with complex documents like PDFs with tables or multi-column layouts.
</Tip>

## Document Parsing

Before chunking, documents are parsed to extract text. The parser affects how well structure is preserved.

| Parser                 | Identifier         | Speed   | OCR | Best For                               |
| ---------------------- | ------------------ | ------- | --- | -------------------------------------- |
| **Tika**               | `tika`             | Fast    | No  | Plain text, simple documents           |
| **Tika + OCR**         | `tika-ocr`         | Slow    | Yes | Scanned documents, images with text    |
| **Unstructured**       | `unstructured`     | Medium  | No  | Documents with headings, lists, tables |
| **Unstructured + OCR** | `unstructured-ocr` | Slowest | Yes | Complex scanned layouts, multi-column  |

The preset you choose selects an appropriate parser, but you can override it at any time in the knowledge base settings — see [Configuring the Document Parser](#configuring-the-document-parser).

## Understanding RAG

When an agent uses a knowledge base:

1. **Query** - The user's question is converted to an embedding
2. **Search** - Similar document chunks are retrieved
3. **Context** - Retrieved chunks are sent to the AI model
4. **Response** - The model generates an answer using the context

Each step can be configured to optimize for your use case.

## Chunking Settings

Chunking splits documents into smaller pieces for retrieval.

### Chunk Size

How many tokens per chunk (default: 512).

| Size             | Pros                   | Cons                  |
| ---------------- | ---------------------- | --------------------- |
| **Small (256)**  | Precise retrieval      | May split context     |
| **Medium (512)** | Good balance           | Default choice        |
| **Large (1024)** | More context per chunk | Less precise matching |

**When to adjust:**

* Decrease for Q\&A with short, specific answers
* Increase for documents where context spans paragraphs

### Chunk Overlap

Tokens shared between consecutive chunks (default: 50).

Overlap ensures that information at chunk boundaries isn't lost. A sentence at the end of one chunk appears at the start of the next.

| Overlap          | Pros              | Cons                        |
| ---------------- | ----------------- | --------------------------- |
| **Small (0-25)** | Less redundancy   | May lose boundary context   |
| **Medium (50)**  | Balanced          | Default choice              |
| **Large (100+)** | Better continuity | More storage, slower search |

### Chunking Strategy

How text is split:

| Strategy      | Description                    |
| ------------- | ------------------------------ |
| **Fixed**     | Split at token count (default) |
| **Paragraph** | Split at paragraph boundaries  |
| **Sentence**  | Split at sentence boundaries   |
| **Semantic**  | Split by meaning changes       |

<Tip>
  Start with the default fixed strategy. Switch to semantic chunking if you notice important concepts being split awkwardly.
</Tip>

## Embedding Settings

### Embedding Model

The model that converts text to vectors. Pick it carefully — it cannot be changed in place later.

Consider:

* **Language** - Some models specialize in specific languages
* **Domain** - Specialized models for code, legal, medical, etc.
* **Size** - Larger models are more accurate but slower

<Warning>
  The embedding model and its dimensions are **frozen at the moment a knowledge base is created**. The Reindex button reapplies chunking and parsing, but it cannot swap the embedding model — the physical vector index is allocated for the original model's dimensions and cannot be resized. To move to a different embedding model, follow the side-by-side migration in [Changing the embedding model](/products/agent-factory/knowledge-architecture#changing-the-embedding-model-the-a-b-pattern).
</Warning>

### Embedding Dimensions

Higher dimensions capture more nuance but use more storage. Most models have a fixed dimension (e.g., 1536 for OpenAI embeddings). Like the model itself, dimensions are frozen at creation.

## Retrieval Settings

### Top K

How many chunks to retrieve (default: 5).

| K Value           | Pros          | Cons                          |
| ----------------- | ------------- | ----------------------------- |
| **Small (3-5)**   | Focused, fast | May miss relevant info        |
| **Medium (5-10)** | Good coverage | Default choice                |
| **Large (10-20)** | Comprehensive | May include irrelevant chunks |

### Similarity Threshold

Minimum similarity score to include a chunk (0-1 scale).

| Threshold        | Effect                          |
| ---------------- | ------------------------------- |
| **Low (0.3)**    | More results, lower relevance   |
| **Medium (0.5)** | Balanced                        |
| **High (0.7)**   | Fewer results, higher relevance |

Set higher thresholds when precision matters more than recall.

### Reranking

Optional second pass to improve retrieval quality:

1. Initial search retrieves more candidates (e.g., 20)
2. A reranking model scores each candidate
3. Only top results are used

Reranking improves quality but adds latency. Enable for use cases where accuracy is critical.

## Knowledge Base Settings

Access these in the knowledge base's Settings tab.

### Configuring Chunking

1. Open the knowledge base
2. Go to **Settings** > **Processing**
3. Adjust chunk size and overlap
4. Click **Save**

<Warning>
  Changing chunking settings only affects new documents. Click **Reindex All** to apply changes to existing documents.
</Warning>

### Configuring Retrieval

1. Open the knowledge base
2. Go to **Settings** > **Retrieval**
3. Adjust Top K, threshold, and reranking
4. Click **Save**

Retrieval settings take effect immediately - no reindexing needed.

### Configuring the Document Parser

The parser can be changed after the knowledge base is created — it is no longer locked to the preset chosen at creation time.

1. Open the knowledge base
2. Go to **Settings** > **Processing**
3. Select the parser — use **Tika + OCR** (`tika-ocr`) for scanned documents or images that contain text
4. Click **Save**, then **Reindex All**

<Warning>
  Switching the parser only affects new documents until you click **Reindex All** to reprocess existing ones. OCR parsers (`tika-ocr`, `unstructured-ocr`) extract text from scans and images but are noticeably slower to index.
</Warning>

<Tip>
  For files uploaded directly into an agent's chat, the parser lives on the agent's conversation store rather than a regular knowledge base. See [Enabling OCR for chat-uploaded files](/products/agent-factory/knowledge-architecture#enabling-ocr-for-chat-uploaded-files).
</Tip>

## Testing Retrieval

After changing settings, test the impact:

1. Go to the knowledge base
2. Use the **Test Search** feature (if available)
3. Enter a query
4. Review which chunks are retrieved
5. Check if relevant content is included

Or test via an agent:

1. Open an agent that uses this knowledge base
2. Go to Playground
3. Ask questions and observe tool calls
4. Check if the retrieved context is relevant

## Common Scenarios

### FAQ-Style Content

Short questions with specific answers:

* Chunk size: 256-512
* Top K: 3-5
* High similarity threshold (0.6+)

### Long-Form Documents

Research papers, manuals, reports:

* Chunk size: 512-1024
* Overlap: 100+
* Top K: 5-10

### Technical Documentation

Code examples, API references:

* Consider code-aware chunking
* Higher overlap to preserve examples
* Semantic chunking if available

### Mixed Content

Various document types:

* Start with defaults
* Test with representative queries
* Adjust based on results

## Performance Considerations

### Index Size

Smaller chunks with more overlap = larger index:

* More storage used
* Potentially slower search
* Better for accuracy-critical use cases

### Query Latency

Factors affecting search speed:

* Number of documents
* Chunk count
* Top K value
* Reranking enabled

For large knowledge bases, balance quality vs. speed.

### Cost

Consider:

* Embedding API costs for indexing
* Reranking costs if enabled
* Token costs from larger context

## Troubleshooting

### "Agent doesn't find relevant content"

* Decrease similarity threshold
* Increase Top K
* Check if content is actually indexed
* Verify the query matches document language/terminology

### "Retrieved context seems irrelevant"

* Increase similarity threshold
* Enable reranking
* Review chunk boundaries
* Consider different embedding model

### "Important context is split across chunks"

* Increase chunk size
* Increase overlap
* Try semantic chunking

### "Too much irrelevant content in responses"

* Decrease Top K
* Increase similarity threshold
* Enable reranking
* More specific instructions in agent prompt

## Next Steps

<CardGroup cols="2">
  <Card title="Test with agents" icon="flask" href="/products/agent-factory/playground">
    See how settings affect real queries in the Playground
  </Card>

  <Card title="Advanced RAG" icon="wand-sparkles" href="./advanced-rag">
    Learn about multi-query, hierarchical retrieval, and more
  </Card>
</CardGroup>