Hide from search engines

johno69 · Feb 7, 2010

It's been asked recently how to hide from search engines in another thread.

I have been asked the same thing lately elsewhere due to this site here extracting everyones details.

http://www.123people.com/

If anyone has some pointers on how to hide all of or part of your site from being crawled, that would be much appreciated.

Something that has apparently made a difference was adding this line to robots.txt

Disallow: /directory_name/

Then deleting the directory via google webmaster tools.

Lorenzo · Feb 7, 2010

thank you

soj · Feb 7, 2010

Yeh I had to go through and do this to a few of my sites after being quietly hacked and had malicious content uploaded. There were hidden links and pages, indexing nearly 5000 pages into google of adult/pharmacy crap. Webmaster Tools from google is very handy for that type of stuff, they were all excluded in a matter of days.

johno69 · Feb 7, 2010

the problem my mate is having with 123people is they use yahoo apis, so not real sure if yahoo have an equivelant exclude option.

FirstPageResults · Feb 8, 2010

You can also use meta tags. It's a good idea to use both.

Also use .htaccess file to restrict access to directories

marketingweb · Feb 8, 2010

My understanding (which may be better or worse than the suggestions above as i'm not an abosolute expert on this) is that you are close to the money with your origional suggestion of using robots.txt - with a couple buyer bewares.

Basically the three possible files you could edit are robots.txt, .htaccess or .htpasswd (there are MANY other ways to restrict content of course). .htpasswd I won't discuss as I have never really used it and I presume you don't want everyone to need a password to access the site.

By far the easiest to work with in this case is the robots.txt. This file basically tells robots if they are or are not allowed to crawl your site. You can set this to disallow nothing, disallow all, or disallow specific directories (there is no "allow" only disallow - anything not disallowed is presumed allowed).

The following allows everything:
User-agent: *
disallow:

The following disallows everything (note the slash)
User-agent: *
disallow: /

And various versions are possible blocking specific bots (aka user-agents) or specific directories or combinations of the two. The only problems are you need to know the name of the bot that is doing the crawling you dont want (maybe from access logs), and that it isn't a restriction just an advice - a "bad" bot that hasn't been programmed to play nice will just ignore it and do what it wants.

With .htaccess you can restrict access to the whole site or directories but it's more complicated. AND it 's worth noting that this will apply to anyone and everyone, unless you know the IP address of the "bad bot" and it doesn't keep changing IP. PLUS, you then really should do tricky stuff to ensure the bot can still see robots.txt (blocking everything blocks robots.txt as well), as otherwise it will not find robots.txt, will presume it doesn't exist, will then try to crawl other pages, fail and then keep going - no real problem except a LOT of extra server traffic.

If you then want the pages removed from google as well just use webmaster tools as suggested above - not sure for other engines, though in time you should disapear if using robots.txt.

For more info try this link:
http://www.webmasterworld.com/robots_txt/3523560.htm

Regarding your specific problem with 123people, i'm guessing if they already have your info they aren't giving it up, and removing pages from the web after the horse has bolted won't help.

SORRY ALL for being so wordy, hopefully this may help a little bit not just cause more confusion!

Matt
Marketing Web
___________________________________
Anti Aging Medicine
Cattle Crushes
Promotional Bags
Free Christian Dating Australia

johno69 · Feb 8, 2010

marketingweb said:
Regarding your specific problem with 123people, i'm guessing if they already have your info they aren't giving it up, and removing pages from the web after the horse has bolted won't help.

They search live from my understanding so removing should help eventually.

Search

Search

Hide from search engines

johno69

Top Contributor

Lorenzo

Top Contributor

soj

Founder

johno69

Top Contributor

FirstPageResults

Top Contributor

marketingweb

Top Contributor

johno69

Top Contributor

Community sponsors

Members online

Forum statistics

Latest posts

Share this page