Domain Madness
OK, I admit it, I may have too much time on my hands,
but I was looking for a new DNS domain name, and decided to shove all of
the 234,937 words in /usr/share/dict/words through Whois, collecting those
not having a .com entry. The script is attached below. Currently the words
file lists the words in Webster’s Second International, who’s 1937 copyright
has expired.
Oddly enough, the result is that 63% of the words are NOT
registered names!! That’s right, 147,886 words are not taken. That’s the
good news. The bad news is that many of them are pretty weird. I’ve stashed
these guys, both compressed and clear, on
https://backspaces.net/files/NonDNSWords
https://backspaces.net/files/NonDNSWords.gz
For example, here’s the list of all 43 4-letter words not taken:
grep ^....$ NonDNSWords
bikh fowk hawm koae odso shlu waup yeuk yirndird frib hewt kuar oime suld wusp yigh yirrdowf gawm jaob mowt paut syrt wype yilt yuftdowp ghuz jewy munj phoh uily yalb yirkemyd gype jhow niog rynt wauf ycie yirm
I can’t see one that calls out to me, really. A lot of these do not appear in my dictionary, but the Second International was known for stretching!
Here’s a random sampling of 100 of the 6 letter critters:
grep ^......$ NonDNSWords | ran 100 | column -xdiaene evener tummer burdie palpon coccid taxwax chanst madefy buntalhaggly masted untone dutied unmiry cynips psetta otitic gawcie beflagmidpit orgyia tutory amylic begnaw punlet adigei scrank bedrop lusorydorize repale unmold snurly scotic unsing uplead hemine unnose stibicfunori cobcab yengee cahita rutuli menkib uptend sassak beflap crantsocyroe rugose avowry mogdad coecal elleck ptotic kommos amusgo lemosiavitic amorua cacara ideist reswim napaea reshut egeran lechea embolykorait uplick baeria kurvey ureido tuchit beroll adroop degged twiselkechel solate unbare hardim upwaft sullan tineal uramil ovinia pappoxforrad jacami unlean byrlaw thymyl scrobe lyncid crenic bepity anoine
..where “ran” is a simple awk script, below, to randomly select n lines
from a file. Its kinda spooky doing all this on Mac OS X .. it really IS
Unix.
Again, not a lot of love. By the way, there were 5,166 6-letter
words, so I likely have not shown some real winners in this sampling. Let
me know if you find some real winners.
This got me a bit curious .. how do the words work out by size?
I.e. how many words are 6 letters long etc? Time for another script, also
attached below:
/usr/share/dict/words NonDNSWords 1 52 0.0221 . 2 155 0.0660 . 3 1351 0.5750 . 4 5110 2.1751 4 43 0.0291 5 9987 4.2509 5 1219 0.8243 6 17477 7.4390 6 5166 3.4932 7 23734 10.1023 7 10725 7.2522 8 29926 12.7379 8 16593 11.2201 9 32380 13.7824 9 20861 14.106110 30867 13.1384 10 22254 15.048111 26011 11.0715 11 20415 13.804612 20460 8.7087 12 17065 11.539313 14937 6.3579 13 12935 8.746614 9763 4.1556 14 8811 5.958015 5924 2.5215 15 5433 3.673816 3377 1.4374 16 3146 2.127317 1813 0.7717 17 1681 1.136718 842 0.3584 18 804 0.543719 428 0.1822 19 408 0.275920 198 0.0843 20 189 0.127821 82 0.0349 21 79 0.053422 41 0.0175 22 38 0.025723 17 0.0072 23 16 0.010824 5 0.0021 24 5 0.0034
Well, the most populous part of NonDNSWords is 10; here’s a sample:
dramseller floriation tractional clanswoman periphrasecyrtometer symphytize convolvuli mucigenous clamminesshyperacute myrtlelike unharbored ergonovine undertideddigressory preclosure parnassism habilatory boycottismnilometric paralgesic trimacular annelidian breezinessprelegatee admiringly scatophagy bonebinder morphinismendosteoma ranivorous undistinct solenodont scathinglyunfreckled unpanelled impalpably unemphatic staverwortgradientia cystospasm xenocratic cogredient rubescenceneurolytic unrebutted saponacity brachyoura depatriate
OK, I know you want to know the 5 24-letter words, so here they are:
formaldehydesulphoxylate
pathologicopsychological
scientificophilosophical
tetraiodophenolphthalein
thyroparathyroidectomize
..and, yup, antidisestablishmentarianism wasn’t there.
This did make it easy to search for substrings of interest. For example, I wanted to
find all the words with “plex” in them. There were 54:
grep plex NonDNSWords | column -x
amplexation amplexicaudate amplexicaul amplexicauline amplexifoliate
autocomplexes cerviciplex complexedness complexionably complexional
complexionally complexioned complexionless complexively complexly
decemplex diaplexal diaplexus epiplexis euplexoptera
ganglioplexus holoplexia intercomplexity kataplexy myelapoplexy
nulliplex overcomplex overcomplexity perplexable perplexedly
perplexedness perplexingly perplexment phantoplex plexicose
pleximeter pleximetric plexodont plexometer plexure
pseudoapoplexy reperplex retroplexed semiamplexicaul semiduplex
sextuplex simplexed supercomplex triplexity ultracomplex
unimultiplex unperplexed unperplexing veniplex
This is a bit more interesting: holoplexia.com sounds nifty, as does nulliplex.com
So, I guess you’re wondering which one I took, right? Well, sadly,
none of them. While groveling around, I thought of a two-word critter I kinda
like: ComplexityWorks.com, so hmm..all this was a waste? I think not, but…
Scripts:
Check a list of words w/ whois.#!/bin/shpat=${1:-"^...*"}start=${2:-a}file=${3:-/usr/share/dict/words}words=`sed -n "/^$start/,\$p" $file | grep $pat -i`for w in $words ; do whois $w.com | sed -n '/No match for/{s:.*for .::;s:......$::p;}' | tr A-Z a-zdone
For choosing N random samples from a stream:#!/bin/shsamples=$1awk -v samples=$samples '{a[NR]=$0} # Read in fileEND { len=NR for ( len=NR; samples > 0 && len > 0; samples--) { i=int(rand()*(len+1)) print a[i] delete a[i] len-- }}'For sorting a stream by length:#!/bin/shawk '{a[length]++}END { for (i in a) printf "%2i %10i %10.4f n", i, a[i], 100*a[i]/NR}' | sort -n
I’m curious: How did you pick *your* domain?!