Help Center | tawk.to | Why AI Assist’s web crawler might not be able to crawl some sites

Why AI Assist’s web crawler might not be able to crawl some sites

AI Assist’s web crawler scans your public website to learn from your content and answer customer questions more effectively. If the crawler can’t access your site, AI Assist may miss important details or give incomplete answers.

This guide explains the most common reasons for crawler issues, how to fix them, and alternative solutions.

Quick fix: Whitelist the crawler

In many cases, access problems are caused by firewall or security settings that block automated traffic.

To allow crawling:

Whitelist this user-agent: Tawktobot-AIAssistant/1.0
If you use a CDN or WAF (for example, Cloudflare), add a rule that skips or allows this user-agent.
Wait 24-48 hours for the crawler to re-scan your site.

Learn more with this guide:

How to whitelist tawk.to's AI Assist crawler bot

Common reasons AI Assist can’t crawl your site

Here are the most frequent causes, with suggested solutions:

Site requires login or gated content
Crawlers cannot access pages that require a username and password. Provide public versions of key pages, upload documents (CSV/PDF/TXT), or connect an API for private data.

robots.txt or explicit crawler blocking
If your robots.txt file or server rules block crawlers, update them to allow the Tawktobot-AIAssistant/1.0 user-agent.

CAPTCHAs or anti-bot protections
Security tools may block automated crawlers. Exempt the crawler’s user-agent where possible.

Heavy client-side rendering (JavaScript/SPA)
The crawler reads server-rendered HTML, not content loaded after the page loads in a browser. Use server-side rendering, pre-rendered snapshots, or upload documents instead.

Rate limiting or IP blocking
Some hosts/CDNs limit repeated requests. Allow the crawler through or use alternate data sources.

Session, cookie, or redirect loops
The crawler can’t handle session-based access or infinite redirects. Provide a simple, static version of important pages.

Content obfuscation or encryption
The crawler can’t read hidden or encrypted text. Offer plain text alternatives or supply the data via API.

Alternatives if whitelisting isn’t possible

If you can’t whitelist the crawler:

Upload documents
Add CSV, PDF, or TXT files containing your product information, FAQs, or policy details.

Learn more with this guide: Understanding AI Assist’s Data Sources

Use API integrations
Give AI Assist direct access to real-time data, such as order tracking and inventory.

Learn more with this guide: How to set up a custom API integration with AI Assist

Troubleshooting steps

Preview AI Assist responses in your property’s AI Assist settings. Check the data source list to see what content was used. Learn more with this guide: How to fix incorrect responses delivered by AI Assist
Check server/firewall logs. Look for requests from the Tawktobot-AIAssistant/1.0 user-agent.
Update data sources. Remove outdated files or pages and replace them with current content.
Use alternative sources. Upload documents or connect APIs if certain content is still inaccessible.

Best practices

Keep key information accessible
Place important information in server-rendered pages or uploaded documents so AI Assist can access it reliably.

Maintain accurate data sources
Review your data sources regularly and replace outdated information with current, accurate content.

Use APIs for specific data types
Connect an API for real-time, private, or frequently changing data to ensure AI Assist always has the latest information.

Allow the crawler on public pages
Ensure the crawler’s user-agent remains unblocked for public pages so AI Assist can read and process the content.

Additional considerations

Privacy and security
AI Assist can only work with the information it can access. Keep sensitive or private details protected, and use a secure API with authentication for any data you don’t want made public.

Text-only ingestion
The crawler can read only the text it sees on a page. It can’t process images, videos, or interactive elements, so make sure important information is written out in plain text.

Dynamic sites
If your website uses a lot of JavaScript to load content, the crawler may not see it. Use server-side rendering, pre-rendered pages, or upload your content through other methods so AI Assist can access it.

If you’re still having trouble getting AI Assist to crawl your website, schedule a call with us and we’ll help you further.

If you have feedback about this article, or if you need more help:

Click the green live chat icon
Schedule a call with us
Visit our community

Was this article helpful?

26 out of 42 liked this article

Still need help? Message Us