Google is quietly updating its NotebookLM documentation, making explicit that it does not care about robot.txt. Here’s how you can block it in the first place.
Google has been quietly updating its user-triggered fetchers list with a brand new set of documentation to Google NotebookLM. What’s important about this tiny modification is the fact that it’s now clear that Google NotebookLM doesn’t be able to obey robots.txt.
Google NotebookLM
NotebookLM can be described as an AI tools for writing and research which allows users to enter the URL of a webpage and then process the information and allow them to inquire about a wide range of questions. It also generates summary reports based on the information.
Google’s software can generate a mind-map interactive which arranges the topics of a web site and then extracts key takeaways from it.
User-Triggered Fetchers Ignore Robots.txt
Google User-Triggered fetchers are web-based agents that are activated from users. They are set to disregard their own robots.txt protocol.
As per Google’s User-Triggered fetchers Documentation:
“Because the fetch was requested by a user, these fetchers generally ignore robots.txt rules.”
Google-NotebookLM Ignores Robots.txt
The goal of robots.txt is to provide publishers with the ability to control bots that search for web content. However, agents such as the Google-NotebookLM fetcher don’t have the capability of indexing content on websites They’re working as a representative of the users who are engaging with websites’ content using Google’s NotebookLM.
How To Block NotebookLM
Google makes use of Google’s Google’s NotebookLM user agent to extract web content. It is therefore possible that authors who wish to restrict visitors to access their website content can make rules that disable that particular user agent. A simple method for WordPress publishing is to make use of Wordfence to build a custom rule that blocks all site users who are using the Google-NotebookLM users agent.
Another method to accomplish it is by using .htaccess with the following rule:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Google-NotebookLM [NC]
RewriteRule .* - [F,L]
</IfModule>