Document Ingestion¶
Upload and manage documents in your knowledge repositories to power RAG-enabled AI agents.
Overview¶
The Knowledge Repository system allows you to upload various document types that your AI agents can use to answer questions and provide information. Documents are processed, indexed, and made searchable for intelligent retrieval.
Accessing Knowledge Repositories¶
Step 1: Navigate to Knowledge¶
- Click on KNOWLEDGE in the top navigation menu
- You will see a list of configured repositories
Knowledge section with list of configured repositories
Viewing Repository Documents¶
Step 1: Access Repository¶
- Locate your repository in the list (e.g., "Product Documentation Repository")
- Click on the Documents link to view all documents in that repository
Document List View¶
The documents list displays:
- Document ID: Unique identifier for the document
- Document Type: File type (PDF, DOCX, CSV, TXT, etc.)
- Document Name: Name of the uploaded file
- Modified Date: Last modification timestamp
- Status: Processing status (Ready, Processing, Failed)
List of documents within a knowledge repository
Adding New Documents¶
Step 1: Access Upload Options¶
- Click the Add Document(s) button in the top-left corner of the documents page
- A modal dialog will appear with different upload options
Add Document(s) button in the documents view
Step 2: Select Upload Method¶
The platform offers several upload methods:
Upload Options¶
| Method | Description | Use When |
|---|---|---|
| Folder (Auto OCR) | Automatically detects need for OCR | Uploading scanned PDFs or mixed content |
| Folder (No OCR) | Upload without OCR processing | All documents are digital/searchable |
| Folder (Force OCR) | Forces OCR on all documents | All documents need OCR |
| Plain Text | Upload raw text data | Uploading text-only content |
Crawl Options¶
| Method | Description | Use When |
|---|---|---|
| Confluence | Crawl Confluence data | Syncing from Confluence |
| Website | Crawl and extract website content | Importing web pages |
| SharePoint/OneDrive | Crawl Microsoft document libraries | Syncing from Microsoft 365 |
| Human Assisted | Crawl with browser plugin | Sites requiring authentication |
API Option¶
| Method | Description | Use When |
|---|---|---|
| API | Add documents via API | Programmatic integration |
Recommended: Folder (Auto OCR)¶
Choose Folder (Auto OCR) from the Upload section. This option:
- ✅ Automatically detects the need for OCR
- ✅ Supports multiple file formats (.pdf, .docx, .csv, .txt)
- ✅ Can handle up to 1000 files per upload
- ✅ Intelligent processing based on document type
Select Folder (Auto OCR) for intelligent document processing
Step 3: Upload Files¶
You can upload files in two ways:
Option 1: Drag and Drop¶
- Drag file(s) directly into the upload area
- Visual feedback when files are over the drop zone
- Supports multiple files at once
Option 2: Browse¶
- Click in the upload area to open a file browser
- Select one or more files
- Click "Open" to add them to the upload queue
Drag and drop or browse to select files
Supported formats: - .pdf - PDF documents - .docx - Microsoft Word documents - .csv - Comma-separated values - .txt - Plain text files
Step 4: Configure Document Locks (Optional)¶
Document Locks allow you to restrict document visibility based on user roles.
How It Works¶
- Enter keywords in the "Document Locks" field
- Press Enter to add multiple keywords
- Only users with roles matching these keywords can see results from this document in search/retrieval
Use Cases¶
- Confidential Documents: Lock to "executive", "finance"
- Department-Specific: Lock to "sales", "marketing"
- Role-Based Access: Lock to specific role names
Configure document locks for role-based access control
Step 5: Save Documents¶
- Review your file selections
- Click the Save button to complete the upload
- Documents will be processed and added to the repository
- The status will show as "Ready" once processing is complete
Processing Statuses¶
Status Types¶
| Status | Description | What It Means |
|---|---|---|
| Processing | Document is being ingested | Wait for processing to complete |
| Ready | Document is indexed and searchable | Available for agent retrieval |
| Failed | Processing encountered an error | Check document format or size |
Processing Time¶
- Small files (< 1 MB): Usually under 1 minute
- Medium files (1-10 MB): 1-5 minutes
- Large files (10-50 MB): 5-15 minutes
- Bulk uploads: Processed in parallel, monitor status
Important Notes¶
OCR Processing¶
- Auto OCR automatically detects whether OCR is needed for uploaded files
- Files containing both OCR and non-OCR data may result in data loss
- Use "Force OCR" only if all documents require OCR
Upload Limits¶
- Maximum upload limit: 1000 files per upload
- Supported formats: .pdf, .docx, .csv, .txt files
- File size limit: 50 MB per file (varies by plan)
Best Practices¶
- Organize Before Uploading: Group related documents
- Use Consistent Naming: Makes documents easier to find
- Clean Content: Remove unnecessary pages or sections
- Test with Sample: Upload a few files first to verify processing
- Monitor Status: Check processing status for errors
Crawl Options¶
Confluence¶
Integrate with Confluence to automatically sync spaces and pages.
Requirements: - Confluence URL - API token or credentials - Appropriate permissions
What Gets Crawled: - Pages and sub-pages - Attachments - Comments (optional)
Website Crawling¶
Crawl public websites to extract content.
Configuration: - Starting URL - Crawl depth - Include/exclude patterns
Use Cases: - Competitor documentation - Public knowledge bases - Blog content
SharePoint/OneDrive¶
Sync document libraries from Microsoft 365.
Requirements: - Microsoft Graph API credentials - Library/folder URLs - Proper permissions
Supported Content: - Documents - Folders - Metadata
Managing Documents¶
Viewing Document Details¶
- Click on a document name in the list
- View metadata, processing status, and content preview
Editing Documents¶
- Click the Edit icon next to a document
- Update document locks or metadata
- Save changes
Deleting Documents¶
- Click the Delete icon next to a document
- Confirm deletion
- Document is removed from the repository and search index
Deletion is Permanent
Deleted documents cannot be recovered and will no longer be available for agent retrieval.
Troubleshooting¶
Document Status Stuck on "Processing"¶
Issue: Document remains in "Processing" status for extended time
Solution: - Wait at least 15 minutes for large files - Refresh the page to check for status update - If stuck for over 30 minutes, delete and re-upload - Contact support if issue persists
Upload Failed Error¶
Issue: File upload returns an error
Solution: - Check file size (must be under 50 MB) - Verify file format is supported (.pdf, .docx, .csv, .txt) - Ensure file is not corrupted - Try uploading files individually instead of in bulk
Documents Not Appearing in Agent Responses¶
Issue: Uploaded documents are not being used by agents
Solution: - Verify document status is "Ready" - Check that agent has the repository configured in Tools section - Ensure document locks don't restrict access - Test agent with specific questions related to document content
Related Topics¶
- RAG Overview - Understanding Retrieval-Augmented Generation
- Vector Search - How documents are searched and retrieved
- Web Crawling - Crawl websites for content
- Agent Builder - Add knowledge tools to your agents
- Document API - Programmatic document management