Why AI Assist’s web crawler might not be able to crawl some sites

AI Assist’s web crawler scans your public website to learn from your content and answer customer questions more effectively. If the crawler can’t access your site, AI Assist may miss important details or give incomplete answers.


This guide explains the most common reasons for crawler issues, how to fix them, and alternative solutions.

Quick fix: Whitelist the crawler

In many cases, access problems are caused by firewall or security settings that block automated traffic.


To allow crawling:

  1. Whitelist this user-agent: Tawktobot-AIAssistant/1.0

  2. If you use a CDN or WAF (for example, Cloudflare), add a rule that skips or allows this user-agent.

  3. Wait 24-48 hours for the crawler to re-scan your site.


Learn more with this guide:

How to whitelist tawk.to's AI Assist crawler bot

Common reasons AI Assist can’t crawl your site

Here are the most frequent causes, with suggested solutions:


  • Site requires login or gated content
    Crawlers cannot access pages that require a username and password. Provide public versions of key pages, upload documents (CSV/PDF/TXT), or connect an API for private data.

  • robots.txt or explicit crawler blocking
    If your robots.txt file or server rules block crawlers, update them to allow the Tawktobot-AIAssistant/1.0 user-agent.

  • CAPTCHAs or anti-bot protections
    Security tools may block automated crawlers. Exempt the crawler’s user-agent where possible.

  • Heavy client-side rendering (JavaScript/SPA)
    The crawler reads server-rendered HTML, not content loaded after the page loads in a browser. Use server-side rendering, pre-rendered snapshots, or upload documents instead.

  • Rate limiting or IP blocking
    Some hosts/CDNs limit repeated requests. Allow the crawler through or use alternate data sources.

  • Session, cookie, or redirect loops
    The crawler can’t handle session-based access or infinite redirects. Provide a simple, static version of important pages.

  • Content obfuscation or encryption
    The crawler can’t read hidden or encrypted text. Offer plain text alternatives or supply the data via API.

Alternatives if whitelisting isn’t possible

If you can’t whitelist the crawler:


Troubleshooting steps

  1. Preview AI Assist responses in your property’s AI Assist settings. Check the data source list to see what content was used. Learn more with this guide: How to fix incorrect responses delivered by AI Assist

  2. Check server/firewall logs. Look for requests from the Tawktobot-AIAssistant/1.0 user-agent.

  3. Update data sources. Remove outdated files or pages and replace them with current content.

  4. Use alternative sources. Upload documents or connect APIs if certain content is still inaccessible.


Best practices

  • Keep key information accessible
    Place important information in server-rendered pages or uploaded documents so AI Assist can access it reliably.

  • Maintain accurate data sources
    Review your data sources regularly and replace outdated information with current, accurate content.

  • Use APIs for specific data types
    Connect an API for real-time, private, or frequently changing data to ensure AI Assist always has the latest information.

  • Allow the crawler on public pages
    Ensure the crawler’s user-agent remains unblocked for public pages so AI Assist can read and process the content.

Additional considerations

  • Privacy and security
    AI Assist can only work with the information it can access. Keep sensitive or private details protected, and use a secure API with authentication for any data you don’t want made public.

  • Text-only ingestion
    The crawler can read only the text it sees on a page. It can’t process images, videos, or interactive elements, so make sure important information is written out in plain text.

  • Dynamic sites
    If your website uses a lot of JavaScript to load content, the crawler may not see it. Use server-side rendering, pre-rendered pages, or upload your content through other methods so AI Assist can access it.


If you’re still having trouble getting AI Assist to crawl your website, schedule a call with us and we’ll help you further.


If you have feedback about this article, or if you need more help:

Was this article helpful?

13 out of 20 liked this article

Still need help? Message Us