Skip to main content
The Website Crawler lets you add your website content to the Knowledge Base without copying and pasting manually. Simply provide the URL and configure the crawl depth. Timely.ai uses Firecrawl to extract the content of each page and then processes the embeddings automatically.
Indexed sites manager

Crawl Modes

Extracts content from one specific URL. Use for standalone pages such as a FAQ, a pricing page, or a specific blog post.Faster — result available in seconds.

Indexing a Website

1

Open the Website Crawler

In the agent, go to Knowledge Base > Website Crawler.
2

Enter the URL

Paste the URL of the page or root site. Include https://.
3

Choose the mode

Select Scrape for a single page or Crawl for full site traversal.
4

Configure limits (Crawl only)

Set the maximum number of pages (limit) and the maximum navigation depth (max_depth).
5

Start the crawl

Click Start. A crawl job is created and you can track the progress in real time.

Configuration Parameters

ParameterModeDescription
urlBothEntry URL for the crawl
crawl_typeBothscrape or crawl
limitCrawlMaximum number of pages to process
max_depthCrawlMaximum link depth to follow from the root
The crawl mode follows only internal links from the same domain. External links are not crawled, preserving the scope of the indexed content.

Crawl Job Status

StatusDescription
startedJob started, awaiting response from Firecrawl
crawlingActively traversing pages
processing_embeddingsGenerating vectors for extracted chunks
completedIndexing completed successfully
failedError during crawling or processing
Progress is updated in real time on the panel: pages crawled / total pages.

Viewing Indexed Pages

After completion, each crawled page appears as an individual item in the list. For each page you see:
  • URL and title
  • Preview of extracted content
  • Number of chunks generated
  • Quality score (when available)
  • Indexing date
Click the preview icon to read the full content extracted from each page.

Extracted Content Quality

Firecrawl extracts the main text of the page, discarding navigation, footers, and scripts. Pages with little textual content (e.g., login pages, error pages) may have a low quality score and contribute little to the agent.
Sites with bot protection (CAPTCHA, Cloudflare with challenge) may fail during crawling. In that case, use scrape mode for individual URLs with static content, or add the content manually via document or Q&A.
Re-index the site whenever content changes significantly. Delete the site from the list and add it again to ensure the agent uses the most up-to-date version — Timely.ai does not perform automatic re-crawling at this time.