python - How to find out programmatically if a domain name is registered or not -


i use pywhois determine if domain name registered or not. here source code. (all permutations a.net zzz.net)

#!/usr/bin/env python import whois  #pip install python-whois import string import itertools  def main():     characters = list(string.ascii_lowercase)     ##domain names generator     r in range(1, 4) :         name in itertools.permutations(characters, r) : #from 'a.net' 'zzz.net'             url = ''.join(name) + '.net'              #check if domain name registered or not             try :                 w = whois.whois(url)             except (whois.parser.pywhoiserror):  #not found                 print(url)   #unregistered domain names?  if __name__ == '__main__':     main() 

i got following results:

jv.net uli.net vno.net xni.net 

however, above domain names have been registered. not accurate. can explain it? there lot of errors:

fgets: connection reset peer connect: no route host connect: network unreachable connect: connection refused timeout. 

there alternative way, reported here.

import socket    try:         socket.gethostbyname_ex(url)  except:     print(url) #unregistered domain names? 

in speaking of speed, use map parallel processing.

def select_unregisteredd_domain_names(self, domain_names):     #parallelism using map     pool = threadpool(16)  # sets pool size     results = pool.map(query_method(), domain_names)     pool.close()  #close pool , wait work finish     pool.join()      return results 

this tricky problem solve, trickier people realize. reason people don't want find out. domain registrars apply lots of black magic (i.e. lots of tld-specific hacks) nice listings provide, , wrong. of course, in end know sure, since have epp access hold authoritative answer (but it's done when click "order").

your first method (whois) used one, , did on large scale in 90s when more open. nowadays, many tlds protect information behind captchas , obstructive web interfaces, , whatnot. if nothing else, there quotas on number of queries per ip. (and may reason too, used ridiculous amounts of spam email addresses used registering domains). note spamming whois databases queries in breach of terms of use , might rate limited, blocked, or abuse report isp.

your second method (dns) lot quicker (but don't use gethostbyname, use twisted or other async dns efficiency). need figure out how response taken , free domains each tld. because domain doesn't resolve doesn't mean free (it unused). , conversely, tlds have landing pages nonexisting domains. in cases impossible determine using dns alone.

so, how solve it? not ease, i'm afraid. each tld, need figure out how make clever use of dns , whois databases, starting dns , resorting other means in tricky cases. make sure not flood whois databases queries.

another option api access 1 of registrars, might offer programmatic access domain search.


Comments

Popular posts from this blog

python - pip install -U PySide error -

arrays - C++ error: a brace-enclosed initializer is not allowed here before ‘{’ token -

apache - setting document root in antoher partition on ubuntu -