DN Trade
Domain Names

Go Back   DN Trade > Community > General Domain Discussion
VentraIP
Register FAQ Members List Domain Registrar Pricing Rules Search Today's Posts Mark Forums Read

Reply
 
Thread Tools Display Modes
  #1  
Old 07-02-10, 08:29 AM
johno69's Avatar
johno69 johno69 is offline
Verified User
 
Join Date: Nov 2008
Posts: 852
Default Hide from search engines

It's been asked recently how to hide from search engines in another thread.

I have been asked the same thing lately elsewhere due to this site here extracting everyones details.

http://www.123people.com/

If anyone has some pointers on how to hide all of or part of your site from being crawled, that would be much appreciated.

Something that has apparently made a difference was adding this line to robots.txt

Disallow: /directory_name/

Then deleting the directory via google webmaster tools.
__________________
Poker Tournaments | Web Design & Development

Coming Soon: Play Dough | Pizza Tonight | Poker Wear | Poker Talk | More soon...
Reply With Quote
  #2  
Old 07-02-10, 03:41 PM
italiandragon's Avatar
italiandragon italiandragon is offline
Senior Member
 
Join Date: Dec 2009
Location: In the land of the devils
Posts: 577
Default

thank you
Reply With Quote
  #3  
Old 07-02-10, 04:33 PM
soj's Avatar
soj soj is offline
Founder
 
Join Date: Jun 2008
Location: Melbourne
Posts: 666
Default

Yeh I had to go through and do this to a few of my sites after being quietly hacked and had malicious content uploaded. There were hidden links and pages, indexing nearly 5000 pages into google of adult/pharmacy crap. Webmaster Tools from google is very handy for that type of stuff, they were all excluded in a matter of days.
Reply With Quote
  #4  
Old 07-02-10, 04:50 PM
johno69's Avatar
johno69 johno69 is offline
Verified User
 
Join Date: Nov 2008
Posts: 852
Default

the problem my mate is having with 123people is they use yahoo apis, so not real sure if yahoo have an equivelant exclude option.
__________________
Poker Tournaments | Web Design & Development

Coming Soon: Play Dough | Pizza Tonight | Poker Wear | Poker Talk | More soon...
Reply With Quote
  #5  
Old 08-02-10, 10:48 AM
FirstPageResults's Avatar
FirstPageResults FirstPageResults is offline
Administrator
 
Join Date: May 2009
Posts: 1,207
Default

You can also use meta tags. It's a good idea to use both.

Also use .htaccess file to restrict access to directories
__________________
Index > Rank > Monetise
Reply With Quote
  #6  
Old 08-02-10, 09:42 PM
marketingweb marketingweb is offline
Senior Member
 
Join Date: Dec 2009
Location: Albury, NSW
Posts: 145
Default

My understanding (which may be better or worse than the suggestions above as i'm not an abosolute expert on this) is that you are close to the money with your origional suggestion of using robots.txt - with a couple buyer bewares.

Basically the three possible files you could edit are robots.txt, .htaccess or .htpasswd (there are MANY other ways to restrict content of course). .htpasswd I won't discuss as I have never really used it and I presume you don't want everyone to need a password to access the site.

By far the easiest to work with in this case is the robots.txt. This file basically tells robots if they are or are not allowed to crawl your site. You can set this to disallow nothing, disallow all, or disallow specific directories (there is no "allow" only disallow - anything not disallowed is presumed allowed).

The following allows everything:
User-agent: *
disallow:

The following disallows everything (note the slash)
User-agent: *
disallow: /

And various versions are possible blocking specific bots (aka user-agents) or specific directories or combinations of the two. The only problems are you need to know the name of the bot that is doing the crawling you dont want (maybe from access logs), and that it isn't a restriction just an advice - a "bad" bot that hasn't been programmed to play nice will just ignore it and do what it wants.

With .htaccess you can restrict access to the whole site or directories but it's more complicated. AND it 's worth noting that this will apply to anyone and everyone, unless you know the IP address of the "bad bot" and it doesn't keep changing IP. PLUS, you then really should do tricky stuff to ensure the bot can still see robots.txt (blocking everything blocks robots.txt as well), as otherwise it will not find robots.txt, will presume it doesn't exist, will then try to crawl other pages, fail and then keep going - no real problem except a LOT of extra server traffic.

If you then want the pages removed from google as well just use webmaster tools as suggested above - not sure for other engines, though in time you should disapear if using robots.txt.

For more info try this link:
http://www.webmasterworld.com/robots_txt/3523560.htm

Regarding your specific problem with 123people, i'm guessing if they already have your info they aren't giving it up, and removing pages from the web after the horse has bolted won't help.

SORRY ALL for being so wordy, hopefully this may help a little bit not just cause more confusion!

Matt
Marketing Web
___________________________________
Anti Aging Medicine
Cattle Crushes
Promotional Bags
Free Christian Dating Australia
Reply With Quote
  #7  
Old 08-02-10, 11:12 PM
johno69's Avatar
johno69 johno69 is offline
Verified User
 
Join Date: Nov 2008
Posts: 852
Default

Quote:
Originally Posted by marketingweb View Post
Regarding your specific problem with 123people, i'm guessing if they already have your info they aren't giving it up, and removing pages from the web after the horse has bolted won't help.
They search live from my understanding so removing should help eventually.
__________________
Poker Tournaments | Web Design & Development

Coming Soon: Play Dough | Pizza Tonight | Poker Wear | Poker Talk | More soon...
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +10. The time now is 03:52 AM.


Powered by vBulletin® Version 3.8.3
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Forum SEO by Zoints
Hosted by VentrIP