What's new

Hide from search engines

johno69

Top Contributor
It's been asked recently how to hide from search engines in another thread.

I have been asked the same thing lately elsewhere due to this site here extracting everyones details.

http://www.123people.com/

If anyone has some pointers on how to hide all of or part of your site from being crawled, that would be much appreciated.

Something that has apparently made a difference was adding this line to robots.txt

Disallow: /directory_name/

Then deleting the directory via google webmaster tools.
 

soj

Founder
Yeh I had to go through and do this to a few of my sites after being quietly hacked and had malicious content uploaded. There were hidden links and pages, indexing nearly 5000 pages into google of adult/pharmacy crap. Webmaster Tools from google is very handy for that type of stuff, they were all excluded in a matter of days.
 

johno69

Top Contributor
the problem my mate is having with 123people is they use yahoo apis, so not real sure if yahoo have an equivelant exclude option.
 

marketingweb

Top Contributor
My understanding (which may be better or worse than the suggestions above as i'm not an abosolute expert on this) is that you are close to the money with your origional suggestion of using robots.txt - with a couple buyer bewares.

Basically the three possible files you could edit are robots.txt, .htaccess or .htpasswd (there are MANY other ways to restrict content of course). .htpasswd I won't discuss as I have never really used it and I presume you don't want everyone to need a password to access the site.

By far the easiest to work with in this case is the robots.txt. This file basically tells robots if they are or are not allowed to crawl your site. You can set this to disallow nothing, disallow all, or disallow specific directories (there is no "allow" only disallow - anything not disallowed is presumed allowed).

The following allows everything:
User-agent: *
disallow:

The following disallows everything (note the slash)
User-agent: *
disallow: /

And various versions are possible blocking specific bots (aka user-agents) or specific directories or combinations of the two. The only problems are you need to know the name of the bot that is doing the crawling you dont want (maybe from access logs), and that it isn't a restriction just an advice - a "bad" bot that hasn't been programmed to play nice will just ignore it and do what it wants.

With .htaccess you can restrict access to the whole site or directories but it's more complicated. AND it 's worth noting that this will apply to anyone and everyone, unless you know the IP address of the "bad bot" and it doesn't keep changing IP. PLUS, you then really should do tricky stuff to ensure the bot can still see robots.txt (blocking everything blocks robots.txt as well), as otherwise it will not find robots.txt, will presume it doesn't exist, will then try to crawl other pages, fail and then keep going - no real problem except a LOT of extra server traffic.

If you then want the pages removed from google as well just use webmaster tools as suggested above - not sure for other engines, though in time you should disapear if using robots.txt.

For more info try this link:
http://www.webmasterworld.com/robots_txt/3523560.htm

Regarding your specific problem with 123people, i'm guessing if they already have your info they aren't giving it up, and removing pages from the web after the horse has bolted won't help.

SORRY ALL for being so wordy, hopefully this may help a little bit not just cause more confusion!

Matt
Marketing Web
___________________________________
Anti Aging Medicine
Cattle Crushes
Promotional Bags
Free Christian Dating Australia
 

johno69

Top Contributor
Regarding your specific problem with 123people, i'm guessing if they already have your info they aren't giving it up, and removing pages from the web after the horse has bolted won't help.

They search live from my understanding so removing should help eventually.
 

Community sponsors

Domain Parking Manager

AddMe Reputation Management

Digital Marketing Experts

Catch Expired Domains

Web Hosting

Members online

No members online now.

Forum statistics

Threads
11,101
Messages
92,056
Members
2,394
Latest member
Spacemo

Latest posts

Top