Home » Proxy Articles » Configuring a web proxy robots.txt file
Creating and configuring a robots.txt file for your web proxy is an essential step in the creation process and certainly a step you do not want to skip. Without it you risk search engine spiders draining your server resources and wasting your bandwidth as well as a host of other problems including duplicate content, copyright infringement and the potential for your web site to be flagged as malware.
The robots.txt primary purpose is to exclude content from being crawled by robots typically used by search engines to index content. It has been extended to include other nonstandard directives such as sitemap, crawl-delay and allow. For further information take a look at the Web Robots Pages or read through the following quote from the Wikipedia:
The robot exclusion standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is otherwise publicly viewable. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. The standard complements Sitemaps, a robot inclusion standard for websites.
Source: Robots Exclusion Standard
To create a robots.txt file for your web proxy simply open up a text editor such as Notepad, copy one of the code snippets below that corresponds with your software and save as a plain text file. Upload the file to the root of your hosting account, you should be able to access the file in your web browser using this form of URL: http://www.yourdomain.com/robots.txt
If the file is displayed in your browser you are all set, if not make sure you are uploading the file to the correct location. It should basically be the same place where your proxy script is located.
CGIProxy
User-agent: *
Disallow: /nph-proxy.pl/
Glype
Versions of Glype 0.4 and above come with the robots.txt file already included.
User-agent: *Disallow: /browse.php
PHProxy
User-agent: *
Disallow: /index.php?q*

