python - How to find out programmatically if a domain name is registered or not -
i use pywhois determine if domain name registered or not. here source code. (all permutations a.net
zzz.net
)
#!/usr/bin/env python import whois #pip install python-whois import string import itertools def main(): characters = list(string.ascii_lowercase) ##domain names generator r in range(1, 4) : name in itertools.permutations(characters, r) : #from 'a.net' 'zzz.net' url = ''.join(name) + '.net' #check if domain name registered or not try : w = whois.whois(url) except (whois.parser.pywhoiserror): #not found print(url) #unregistered domain names? if __name__ == '__main__': main()
i got following results:
jv.net uli.net vno.net xni.net
however, above domain names have been registered. not accurate. can explain it? there lot of errors:
fgets: connection reset peer connect: no route host connect: network unreachable connect: connection refused timeout.
there alternative way, reported here.
import socket try: socket.gethostbyname_ex(url) except: print(url) #unregistered domain names?
in speaking of speed, use map
parallel processing.
def select_unregisteredd_domain_names(self, domain_names): #parallelism using map pool = threadpool(16) # sets pool size results = pool.map(query_method(), domain_names) pool.close() #close pool , wait work finish pool.join() return results
this tricky problem solve, trickier people realize. reason people don't want find out. domain registrars apply lots of black magic (i.e. lots of tld-specific hacks) nice listings provide, , wrong. of course, in end know sure, since have epp access hold authoritative answer (but it's done when click "order").
your first method (whois) used one, , did on large scale in 90s when more open. nowadays, many tlds protect information behind captchas , obstructive web interfaces, , whatnot. if nothing else, there quotas on number of queries per ip. (and may reason too, used ridiculous amounts of spam email addresses used registering domains). note spamming whois databases queries in breach of terms of use , might rate limited, blocked, or abuse report isp.
your second method (dns) lot quicker (but don't use gethostbyname, use twisted or other async dns efficiency). need figure out how response taken , free domains each tld. because domain doesn't resolve doesn't mean free (it unused). , conversely, tlds have landing pages nonexisting domains. in cases impossible determine using dns alone.
so, how solve it? not ease, i'm afraid. each tld, need figure out how make clever use of dns , whois databases, starting dns , resorting other means in tricky cases. make sure not flood whois databases queries.
another option api access 1 of registrars, might offer programmatic access domain search.
Comments
Post a Comment