Help Center | tawk.to | Why AI Assist’s web crawler might not be able to crawl some sites

Why AI Assist’s web crawler might not be able to crawl some sites

AI Assist uses a wide range of data sources to deliver timely, accurate responses, helping your team support customers effectively across all interactions.

This smart approach ensures each response is tailored to your customers' needs, enhancing their experience and building trust.

One common way businesses provide AI Assist with data is by allowing it to scrape or crawl their websites.

If AI Assist has trouble accessing your website—which can happen for various reasons (listed at the bottom of this guide)—here’s a solution you can try:

Whitelist AI Assist’s crawler bot

To ensure unrestricted access to your pages, whitelist this user-agent:
Tawktobot-AIAssistant/1.0

Learn how in this guide:
How to whitelist tawk.to's AI Assist crawler bot

If AI Assist still can’t access your website, try these alternatives:

CSV files as data sources
Uploading CSV files containing product details or key website content provides AI Assist with a reliable source of information. Remember to keep these files updated.

To learn more about AI Assist’s data sources, check out this guide:
Understanding AI Assist’s Data Sources
Explore API integrations
If you use third-party platforms for inventory, product information, or order processing, AI Assist can be integrated with these platforms via API. This powerful feature allows AI Assist to directly access and retrieve real-time data, ensuring that your customers always get the most accurate information.

To learn more about setting up API integrations, check out this guide:
How to set up a custom API integration with Apollo AI

The following reasons may be behind AI Assist’s challenges in crawling your website.

Reasons why AI Assist can’t crawl a website

Here are common reasons why AI Assist may not be able to crawl a website:

Access restrictions
AI Assist’s web crawler cannot access websites that require login credentials. Some websites may also generally block crawlers using a robots.txt file.

Security measures
Websites using CAPTCHAs or anti-bot measures prevent automated programs (like crawlers) from accessing their content. While these measures effectively deter malicious bots, they may also hinder legitimate tools like AI Assist.

These are more technical reasons that may hinder AI Assist’s web crawling efforts:

Dynamic content
Websites that rely on JavaScript for dynamic content loading can be difficult for crawlers to interpret.
Rate limiting and IP blocking
To protect their servers, websites may limit requests or block IP addresses that send too many requests too quickly.
User-agent blocking and CDNs
User-agent blocking and CDNs (Content Delivery Networks) can sometimes interfere with crawlers due to unrecognized user-agents or unusual request patterns.
Session management and redirection loops
Websites that use session tokens or cookies or involve redirection loops can present obstacles for crawlers.
Encryption and obfuscation
Websites using complex encryption and obfuscation techniques may hinder AI Assist’s data extraction.

By understanding these potential challenges and following the solutions provided, AI Assist can effectively crawl your website and deliver optimal support in live chat interactions.

If you’re still having trouble getting AI Assist to crawl your website, schedule a call with us and we’ll help you further.

If you have feedback about this article, or if you need more help:

Click the green live chat icon
Schedule a call with us
Visit our community

Was this article helpful?

3 out of 3 liked this article

Still need help? Message Us