What's new

robots.txt question

whiterabit

Regular Member
Hi all,

I have come across a potential problem and I wanted to clarify my thoughts on the situation with the opinions of other.

I currently have one of my website hosted with a 3rd pty developer and they have set my robot.txt file to

User-agent: *
Disallow: /

My understanding is that this prevents all robots from crawling my site yeah? And as a result is a reasonable explanation as to why my G-Webmaster account is showing crawl errors.

Are there any benefits to this or am I potentially hamstringing my website and SEO efforts?

Long story short, if I want to be indexed and ranked accordingly should "User-agent: * Disallow:/" stay or go?

I think go...
 

petermeadit

Top Contributor
... Are there any benefits to this or am I potentially hamstringing my website and SEO efforts?

Long story short, if I want to be indexed and ranked accordingly should "User-agent: * Disallow:/" stay or go?

I think go...

Hi Whiterabit, if you want to ask search engines not to indexing any page on your website then use these lines only in your robots.txt file. It is up to the search engines if they obey, but they usually do.


User-agent: *
Disallow: /

My bet is you are going to want search engines to index the pages on your site, even with the help of a sitemap.xml file or something, so yeah I suggest changing your robots.txt file.

Depending what kind of site you have, you will still want to stop search engines form indexing certain resources such as admin directories, temp directories or cache directories.

If you have a WordPress website, there are some pretty good out of the box examples you can use. Some people go crazy and tweak the heck out of their robots.txt file, to get more SEO juice out of it. Of course this can also work against you if you get it wrong.

Here is a bit more reading for you: http://yoast.com/example-robots-txt-wordpress/

Below a sample of a good WordPress robots.txt:

# robots.txt
User-agent: *
Disallow: /wp-
Disallow: /wp-*
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.php$
Disallow: /feed/
Disallow: /cgi-bin/
Disallow: /archive/
Disallow: /wp-admin/
Disallow: /trackback/
Disallow: /wp-content/
Disallow: /wp-includes/
Disallow: /wp-login.php
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-content/plugins/
Disallow: */trackback/
Disallow: */comments/
Disallow: /*/feed
Disallow: */feed/
Disallow: /*?
#the rest
Allow: /


HTH :)
 
Last edited:

James

Top Contributor
You cant really tweak too much from a robots.txt for SEO benefit but you can alter crawling to increase speed on specific sections of large sites.

To be honest a robots.txt like this will be fine from a google point of view:
_____________________________

User-Agent: *
Allow: /

Sitemap: http://www.website.com/sitemap.xml

______________________
Include the sitemap in the robots.txt

Also a cool one to check out here: http://explicitly.me/robots.txt
 

johno69

Top Contributor
There's also an argument for keeping your robots.txt as simple as you can.

If you run WP then it's not as much of an issue, but if you have a custom solution, adding locations you want to hide in the robots.txt can actually highlight them to unwanted users (hackers).
 

petermeadit

Top Contributor
Yeah I agree. Funny, I have seen sites ranking really well and they don't have robots.txt or a sitemap.xml and their onsite pages have very little optimization.

Not sure if this is a fluke, or just a case of having naturally great content that people really genuinely like and want more of.

If I really wanted to have hidden or completely private content I would secure it with a .htaccess entry, and not put it in the robots.txt file at all.

more ways to skin a cat?
 

neddy

Top Contributor
I know this sounds like heresy, but I'm so glad I don't develop my sites personally.

I started reading this thread, and it was all "double dutch" to me.

The only "robots" I know are "The Robot" from Lost in Space; and C-3PO / R2-D2 from Star Wars. :D

But in all seriousness, I do admire you guys for knowing this "stuff" - and being able to help out other members.
 

CyberClick

Top Contributor
I rarely used a robots.txt file. IMO the only reason to use the file is to limit indexing, which I try to avoid. I've never had an issue not using it.
 

James

Top Contributor
I rarely used a robots.txt file. IMO the only reason to use the file is to limit indexing, which I try to avoid. I've never had an issue not using it.

The thing is most CMS's come out of the box with robots.txt files built by the developers or by the CMS creator. The number of times I have done investigations on websites and they have a huge section of the site blocked because some one who did not know what they were doing.
 

whiterabit

Regular Member
Thanks again for your comments and help here people. Now I have the information I need, I now have the challenge of getting my web dev host to explain to me why they wont remove that line item from my .txt file.
 

Community sponsors

Domain Parking Manager

AddMe Reputation Management

Digital Marketing Experts

Catch Expired Domains

Web Hosting

Members online

No members online now.

Forum statistics

Threads
11,100
Messages
92,051
Members
2,394
Latest member
Spacemo
Top