1. Welcome to DNTrade. If you want to find out about the latest domain name industry news or talk, share, learn, buy, sell, trade or develop domain names - then you've come to the right place. It's a diverse and active community, with domain investors, web developers and online marketers - and it's free! Click here to join now.
    Dismiss Notice

You know you're committed to your project when...

Discussion in 'Web Development' started by nt81, Apr 17, 2014.

  1. nt81

    nt81 Membership: Community

    Joined:
    Jul 22, 2013
    Messages:
    440
    Likes Received:
    42
    You know you're committed to your project when...

    Your server has been running a script to glean information on 300,000 domains and it is still only half way done after 3 weeks running 24x7

    But the reward will be worth it. Ask me in a month, haha
     
  2. m8e

    m8e Membership: Trader

    Joined:
    Feb 16, 2011
    Messages:
    106
    Likes Received:
    14
    Nice! What combination of language/database/OS do you use to write your bots?
     
  3. nt81

    nt81 Membership: Community

    Joined:
    Jul 22, 2013
    Messages:
    440
    Likes Received:
    42
    I'm learning bash scripting at the moment, but the one running the lookups at the moment is PHP/mySQL + Centos6 x64 -

    But i'm rate limited by the API's that i'm working with, and I'd rather not flood them / get banned for it.

    One worker thread checking about 10 domains per minute. Slow going!

    I got sick of my VentraIP hosting stalling all the time and signed up for a VPS with another supplier and haven't had an issue since.

    My current love is for working with API's and PHP/MySQL - so many interesting API's out there :)
     
  4. m8e

    m8e Membership: Trader

    Joined:
    Feb 16, 2011
    Messages:
    106
    Likes Received:
    14
    Nice! Bash+API = powerful

    Good not to flood the service you're grabbing data from TOO much.

    Trick is to find the balance between speed and not getting banned.
    So you are doing one request every 6 second then?

    Could also throw in a randomized wait as well in between requests so it's not just a constant rhythm jackhammer.

    I've done that on a few web scraper projects.


    Since you are grabbing such a huge data set, if you are interested in speeding things up I would consider splitting it over a few virtual machines on different IP's and share the load.

    Depends on your API access though... if it's linked to the one account then no matter. But on the other hand, if it's public data that's anonymous open access for all, split it up into a few cloned VM's

    Could get fancy with shared backend MySQL on another server even.

    Or just combine the results in the end, via one-way replication, have each VM node a MySQL master and publish back to your central data collector which is a slave to all the bots

    http://dev.mysql.com/doc/refman/5.0/en/replication.html

    Assuming you have unlimited time and budget to set this all up ;)
     
  5. nt81

    nt81 Membership: Community

    Joined:
    Jul 22, 2013
    Messages:
    440
    Likes Received:
    42
    Yeah, I looked at slaving a couple of cheap hosting accounts, spread around the place... but i'm not going to push it.

    Script does 5 domains at a time, with a random wait inbetween for 5 - 12 seconds. Would be easy to spot, but i'm not leeching :p

    I go live in August, so I have plenty of time to catch up on the data I need to capture.

    VPS #2 is on the way probably tomorrow. This one is going to be in the USA, I've spent the last 6 hours researching it, so many VPS hosts out there :S
     
  6. Rhythm

    Rhythm Membership: VIP

    Joined:
    Sep 25, 2008
    Messages:
    1,162
    Likes Received:
    118
    Sounds cool.