mercredi 3 mars 2021

How to prevent web searches using robots.txt

I have a question on how to prevent our development documentation web site from being included in search results.

We've been researching this and found a possible way to do this with a robots.txt file but it's confusing as to how this actually works.

I found the best information on the Dummies and robotstxt.org sites where it explains that you can definitively block searches throughout your entire site by adding just these two lines to a robots.txt file and then place that file at the root level of your site:

User-agent: * Disallow: /

Our dev documentation site is set up this way where the wwwroot folder contains all of our development documentation in folders A-P:

Current Dev site structure

By adding the robots.txt with those two lines of code inside the wwwroot folder, would this prevent search engines from indexing everything in folders A-P?

Also, at the end of a development cycle, we "switch" this dev site and it becomes our production site. So the domain name then changes from "https://ift.tt/3rj26MU" to "https://ift.tt/3qdQTfg".

Is there a way to "Allow" the production version of the site to be searched with the same robots.txt file? Maybe something like:

User-agent: * Disallow: /docs-dev.OurSite.com/

I know we could just delete the robots.txt file after the "switch", but I was wondering if coding the robots.txt this way would also do the trick.

Thanks.




Aucun commentaire:

Enregistrer un commentaire