Why AI Assist’s web crawler might not be able to crawl some sites

AI Assist uses a wide range of data sources to deliver timely, accurate responses, helping your team support customers effectively across all interactions.


This smart approach ensures each response is tailored to your customers' needs, enhancing their experience and building trust.

One common way businesses provide AI Assist with data is by allowing it to scrape or crawl their websites.

If AI Assist has trouble accessing your website—which can happen for various reasons (listed at the bottom of this guide)—here’s a solution you can try:


Whitelist AI Assist’s crawler bot

To ensure unrestricted access to your pages, whitelist this user-agent:
Tawktobot-AIAssistant/1.0


Learn how in this guide:
How to whitelist tawk.to's AI Assist crawler bot

If AI Assist still can’t access your website, try these alternatives:


  • CSV files as data sources
    Uploading CSV files containing product details or key website content provides AI Assist with a reliable source of information. Remember to keep these files updated.

    To learn more about AI Assist’s data sources, check out this guide:
    Understanding AI Assist’s Data Sources

  • Explore API integrations
    If you use third-party platforms for inventory, product information, or order processing, AI Assist can be integrated with these platforms via API. This powerful feature allows AI Assist to directly access and retrieve real-time data, ensuring that your customers always get the most accurate information.

    To learn more about setting up API integrations, check out this guide:
    How to set up a custom API integration with Apollo AI


The following reasons may be behind AI Assist’s challenges in crawling your website.

Reasons why AI Assist can’t crawl a website

Here are common reasons why AI Assist may not be able to crawl a website:


  • Access restrictions
    AI Assist’s web crawler cannot access websites that require login credentials. Some websites may also generally block crawlers using a robots.txt file.


  • Security measures
    Websites using CAPTCHAs or anti-bot measures prevent automated programs (like crawlers) from accessing their content. While these measures effectively deter malicious bots, they may also hinder legitimate tools like AI Assist.

These are more technical reasons that may hinder AI Assist’s web crawling efforts:

  • Dynamic content
    Websites that rely on JavaScript for dynamic content loading can be difficult for crawlers to interpret.

  • Rate limiting and IP blocking
    To protect their servers, websites may limit requests or block IP addresses that send too many requests too quickly.

  • User-agent blocking and CDNs
    User-agent blocking and CDNs (Content Delivery Networks) can sometimes interfere with crawlers due to unrecognized user-agents or unusual request patterns.

  • Session management and redirection loops
    Websites that use session tokens or cookies or involve redirection loops can present obstacles for crawlers.

  • Encryption and obfuscation
    Websites using complex encryption and obfuscation techniques may hinder AI Assist’s data extraction.


By understanding these potential challenges and following the solutions provided, AI Assist can effectively crawl your website and deliver optimal support in live chat interactions.


If you’re still having trouble getting AI Assist to crawl your website, schedule a call with us and we’ll help you further.


If you have feedback about this article, or if you need more help:

Was this article helpful?

1 out of 1 liked this article

Still need help? Message Us