Word frequency scripts
Table of Contents
- About
- Similarities between the scripts
- Differences in the scripts
- What these scripts don’t do
- Example source file used in the screenshots below
- Bash Yad main interface
- Bash Yad example result alphabetically sorted (case insensitive)
- Bash Yad example result alphabetically sorted (case sensitive)
- Bash Yad example result numerically sorted (case insensitive)
- Bash Yad example result numerically sorted (case Sensitive)
- Bash Zenity main interface
- Bash Zenity example result alphabetically sorted (case insensitive)
- Bash Zenity example result alphabetically sorted (case sensitive)
- Bash Zenity example result numerically sorted (case insensitive)
- Bash Zenity example result numerically sorted (case Sensitive)
- Python main interface
- Python example result alphabetically sorted (case insensitive)
- Python example result alphabetically sorted (case sensitive)
- Python example result numerically sorted (case insensitive)
- Python example result numerically sorted (case Sensitive)
- Get the scripts
- See the Bash Yad script
- See the Bash Zenity script
- See the Python script
- Some extra goodies
- Obligatory Happy Ending
About
Since there aren’t a whole lot of word frequency programs easily found for GNU/Linux, perhaps some scripts would be useful for some of you out there. Below are a couple of Bash scripts and a Python script, each of which can take pretty much any text as input and will display a list of all the words (and numbers, which are treated as words) in the text, and how many times each was used.
The Bash scripts were written by Frank Pirrone and the Python script was written by me, but uses the framework of a Python Tkinter script Frank and I collaborated on a while back as its foundation. The Bash Yad script has been tested on Linux Mint Cinnamon. The Bash Zenity script has been tested on Linux Mint Cinnamon and Linux Mint MATE. The Python script has been tested on Linux Mint Cinnamon and Linux Mint MATE.
Please excuse the purple title bars and the magenta scrollbar in some of the screenshots. Your system’s title bar and scroll bar colors will be used for those when you run the scripts.
Similarities between the scripts
- All three scripts launch a GUI and let you browse to a file or paste the input in.
- All three scripts are fully commented to let you know what each line of code does.
- All three scripts offer the option to save a copy of the result to a file in the location you specify.
- All three scripts allow you to choose whether to display case sensitive or case insensitive results.
- All three scripts display a word count in addition to a word frequency count.
- All three scripts offer the option to sort the results numerically.
Differences in the scripts
- The Bash Yad script uses Yad, which you can get from the https://sourceforge.net/projects/yad-dialog/ page. The https://launchpad.net/~webupd8team/+archive/ubuntu/y-ppa-manager/+packages page has the Yad PPA, and you can find instructions for installing the PPA on the https://www.webupd8.org/2010/12/yad-zenity-on-steroids-display.html page if you don’t have it installed already.
- The Python script offers a help button that displays a description and some guidance on how to use the GUI.
- The Python script uses Python 2 and you’ll need to install the python-tk package if you don’t have it installed already.
What these scripts don’t do
These scripts are intended for use as light-weight word frequency utilities whose focus is on words and numbers. Although they handle telephone numbers, times, URLs, IPs, email addresses, and American dollar amounts, they were not intended to elegantly handle code or text with a lot of non-alphanumeric characters. As a result, when using that sort of input, you may see some not so pretty results. They may also fail to elegantly handle some common types of text that we hadn’t thought to test them with. My apologies in advance if that occurs.
Example source file used in the screenshots below
An example of how these scripts work can be seen by creating a file named pie.txt with these contents:
I love apple pie. I love banana pie. I love coconut pie.
Bash Yad main interface

Bash Yad example result alphabetically sorted (case insensitive)
If you choose File or Paste to insert text into the text box, choose Alphabetic, choose Insensitive, and choose Save or Don’t save, you will get this result:

Bash Yad example result alphabetically sorted (case sensitive)
If you choose File or Paste to insert text into the text box, choose Alphabetic, choose Sensitive, and choose Save or Don’t save, you will get this result:

Bash Yad example result numerically sorted (case insensitive)
If you choose File or Paste to insert text into the text box, choose Numeric, choose Insensitive, and choose Save or Don’t save, you will get this result:

Bash Yad example result numerically sorted (case Sensitive)
If you choose File or Paste to insert text into the text box, choose Numeric, choose Sensitive, and choose Save or Don’t save, you will get this result:

Bash Zenity main interface

Bash Zenity example result alphabetically sorted (case insensitive)
If you choose File or Paste to insert text into the text box, choose Alphabetic, choose Insensitive, and choose Save or Don’t save, you will get this result:

Bash Zenity example result alphabetically sorted (case sensitive)
If you choose File or Paste to insert text into the text box, choose Alphabetic, choose Sensitive, and choose Save or Don’t save, you will get this result:

Bash Zenity example result numerically sorted (case insensitive)
If you choose File or Paste to insert text into the text box, choose Numeric, choose Insensitive, and choose Save or Don’t save, you will get this result:

Bash Zenity example result numerically sorted (case Sensitive)
If you choose File or Paste to insert text into the text box, choose Numeric, choose Sensitive, and choose Save or Don’t save, you will get this result:

Python main interface

Python example result alphabetically sorted (case insensitive)
If you insert text into the text box, process the text, say yes to Insensitive, and say no to Numeric, you will get this result:

Python example result alphabetically sorted (case sensitive)
If you insert text into the text box, process the text, say no to Insensitive, and say no to Numeric, you will get this result:

Python example result numerically sorted (case insensitive)
If you insert text into the text box, process the text, say yes to Insensitive, and say yes to Numeric, you will get this result:

Python example result numerically sorted (case Sensitive)
If you insert text into the text box, process the text, say no to Insensitive, and say yes to Numeric, you will get this result:

Get the scripts
The Python script has now been updated and is at version 2, which contains an Export button that saves the result to a file in the current directory and a Help button that displays some basic help for running the GUI.
- Download the Bash YAD script here: https://app.box.com/s/xn1gwd6m0fcz6idiwjyyp4onkz267gbo
- Download the Bash Zenity script here: https://app.box.com/s/zvf55u8tsqe7hbt5k8nwc338zz8b3xhx
- Download the Python script here: https://app.box.com/s/ibwuicar4vnphu8vdymfw2ka97qpckq6
See the Bash Yad script
#!/bin/bash
# Make program settings choices using YAD and format the output for capture:
for i in `yad --list --print-all --title "Word Frequency Application Settings"\
--text "<b>Click</b> to select words processing item from each first and second row pairs:\n"\
--column=col1 --column=1:rd --column=col2 --column=2:rd\
--column=col3 --column=3:rd --column=col4 --column=4:rd\
"File Text" true "Sensitive Case" true "Alphabetic Sort" true "Save File" false\
"Paste Text" false "Insensitive Case" false "Numeric Sort" false "No Save" true\
--no-headers --height=150 --width=450\
| sed 's/|FALSE|/\n/g' | sed 's/|TRUE|/-\n/g' | grep - | cut -d' ' -f1 | tr '\n' ' '`
# Capture formatted output of program setting choices by setting variables with Case:
do
case "$i" in
"File") file=TRUE;;
"Sensitive") sensitive=TRUE;;
"Alphabetic") alphabetic=TRUE;;
"Save") save=TRUE;;
*)
esac
done
# Set file or paste text as input source and process paste output with Printf:
if [ $file ]; then
file=$(yad --file-selection --title "File Source Text" --height=400 --width=600)
text=$(cat "$file")
else
text=$(yad --form --text "<b>Paste</b> text into box:" --field "":txt --title "Paste Source Text" \
--height=400 --width=400)
text=$(printf %b "$text")
fi
# Set sensitive or insensitive case of words using Translate:
if [ -z $sensitive ]; then
text=$(echo "$text" | tr "[:upper:]" "[:lower:]")
fi
# Create text processing function with common text and I/O utilities:
# Note, each command is followed by a pipe to send its output to the next command:
process(){
# Use echo to output contents of text to command stream
echo "$text" |
# Use tr to preserve desired alphanum and punct characters:
tr -cd [[:alnum:]"\n@\'-/' ',:\$"] |
# Use tr to remove specific extra punctuation not handled above:
tr -d [\(\)+*] |
# Use sed to replace spaces with newlines using global:
sed "s/[[:space:]]/\n/g" |
# Use sed to remove trailing commas colons and periods:
sed "s/[,:.]*$//" |
# Use sed to remove any remaining blank lines:
sed "/^$/d" |
# Use sed to remove lines containing just valid punctuation:
sed "/^[@,./\$\'-]*$/d" |
# Use sort to alphabetize word list in natural dictionary order:
sort |
# Use uniq to generate word counts:
uniq -c |
# Use awk to print tab separated count and word columns:
awk '{print $1" \t"$2}'
}
# Store alphabetic or numeric output of text processing function in results variable:
if [ $alphabetic ]; then
results=$(process)
else
results=$(process | sort -nr)
fi
# Cut word frequencies column and get word total and fill display variable:
total=$(echo "$results" | cut -d" " -f1 | paste -sd+ | bc)
spacer="---------------"
display="Word Count:\n$spacer\n$total\n$spacer\nWord Frequency\n$spacer\n$results"
# Display alphabetic or numeric results and save file if option chosen:
if [ $alphabetic ]; then
printf "$display" | yad --text-info --title "Word Counts Sorted Alphabetically" \
--height=400 --width=435
else
printf "$display" | yad --text-info --title "Word Counts Sorted Numerically" \
--height=400 --width=435
fi
if [ $save ]; then
outfile=$(yad --file-selection --save --confirm-overwrite="File Exists, Overwrite?" \
--file-filter="*.txt" --title "Save Results File" --height=400 --width=500)
outfile=$(echo $outfile | cut -f 1 -d '.')
printf "$display" > "$outfile.txt"
fi
See the Bash Zenity script
#!/bin/bash
# Originally by Frank Pirrone on 04/22/2016 with very slight contributions by Little Girl.
# This script launches a GUI that lets you provide text for it to process, either by
# importing a file or pasting in its contents so they can be processed for display of
# all of the words and how frequently each was used, and the option to save the results
# to your hard drive in a text file.
# Choose File or Paste as the source of words to be counted:
choice=$(zenity --list --radiolist --title "Choose Text Source" --text "carview.php?tsp=" --hide-header \
--column "choice" --column "Item" FALSE "File" FALSE "Paste");
# If File was chosen:
if [ "$choice" == "File" ];then
# Load variable with the selected file:
file=$(zenity --file-selection --title "File Source Text" );
# Load variable with the file read in:
text=$(cat "$file")
# Else if Paste was chosen:
else
# Load variable with text pasted in:
text=$(zenity --text-info --editable "Paste text into box:" --title "Paste Source Text" \
--height=400 --width=435)
text=$(printf %b "$text")
fi
# Choose whether alphabetic or numeric sort:
sort=$(zenity --list --radiolist --title "Choose sort type" --text "carview.php?tsp=" --hide-header \
--column "Choice" --column "Item" FALSE "Alphabetic" FALSE "Numeric");
# If alphabetic was chosen then set the variable:
if [ "$sort" == "Alphabetic" ];then
# Load variable with boolean:
alphabetic=TRUE
fi
# Choose whether case insensitive or case sensitive:
case=$(zenity --list --radiolist --title "Choose case type" --text "carview.php?tsp=" --hide-header \
--column "Choice" --column "Item" FALSE "Insensitive" FALSE "Sensitive");
# If Insensitive was chosen, else Sensitive:
if [ "$case" == "Insensitive" ];then
# Load text variable with text piped to the tr command to convert case:
text=$(echo "$text" | tr "[:upper:]" "[:lower:]");
else
# Load text variable with text unconverted:
text=$(echo "$text");
fi
# Choose whether or not to save file:
saveit=$(zenity --list --radiolist --title "Choose save file" --text "carview.php?tsp=" --hide-header \
--column "Choice" --column "Item" FALSE "Save" FALSE "Don't Save");
# If Save was chosen then set the variable:
if [ "$saveit" == "Save" ];then
# Load variable with boolean:
save=TRUE
fi
# pasted from here to the end with the code of the YAD version:
# Create text processing function with common text and I/O utilities:
# Note, each command is followed by a pipe to send its output to the next command:
process(){
# Use echo to output contents of text to command stream
echo "$text" |
# Use tr to preserve desired alphanum and punct characters:
tr -cd [[:alnum:]"\n@\'-/' ',:\$"] |
# Use tr to remove specific extra punctuation not handled above:
tr -d [\(\)+*] |
# Use sed to replace spaces with newlines using global:
sed "s/[[:space:]]/\n/g" |
# Use sed to remove trailing commas colons and periods:
sed "s/[,:.]*$//" |
# Use sed to remove any remaining blank lines:
sed "/^$/d" |
# Use sed to remove lines containing just valid punctuation:
sed "/^[@,./\$\'-]*$/d" |
# Use sort to alphabetize word list in natural dictionary order:
sort |
# Use uniq to generate word counts:
uniq -c |
# Use awk to print tab separated count and word columns:
awk '{print $1" \t"$2}'
}
# Store alphabetic or numeric output of text processing function in results variable:
if [ $alphabetic ]; then
results=$(process)
else
results=$(process | sort -nr)
fi
# Cut word frequencies column and get word total and fill display variable:
total=$(echo "$results" | cut -d" " -f1 | paste -sd+ | bc)
spacer="---------------"
display="Word Count:\n$spacer\n$total\n$spacer\nWord Frequency\n$spacer\n$results"
# Display alphabetic or numeric results and save file if option chosen:
if [ $alphabetic ]; then
printf "$display" | zenity --text-info --title "Word Counts Sorted Alphabetically" \
--height=400 --width=435
else
printf "$display" | zenity --text-info --title "Word Counts Sorted Numerically" \
--height=400 --width=435
fi
if [ $save ]; then
outfile=$(zenity --file-selection --save --confirm-overwrite="File Exists, Overwrite?" \
--file-filter="*.txt" --title "Save Results File" --height=400 --width=500)
outfile=$(echo $outfile | cut -f 1 -d '.')
printf "$display" > "$outfile.txt"
fi
See the Python script
#!/usr/bin/env python
# This script was written by Little Girl, is based on the foundation of a script that she and Frank Pirrone collaborated on a while back, got a bit of code added to it from some nice guys in IRC, and Frank contributed the finishing touches to it with the case insensitive/case sensitive and alphabetic/numeric sorting.
import ScrolledText as st
import string
import Tkinter as tk
from Tkinter import *
import tkFileDialog, tkMessageBox
from collections import Counter
helptext = "carview.php?tsp=""
This script launches a GUI that processes your text for display
of all of the words and how frequently each was used.
The Import button can be used to import the contents of a
file into the text box, or you can type or paste text into the
text box.
The Process button offers case insensitive or case sensitive
processing and numeric or alphabetic sort and displays the
word frequency of the words in the text box.
The Export button exports the results to a text file you
specify the name of in the current directory, creating it if
it doesn't exist, and overwriting its contents if it does.
The Reset button clears the text box.
The Help button displays this help text.
"carview.php?tsp=""
###############
# ROOT WINDOW #
###############
# Define the window:
root = tk.Tk()
# Give the window a title:
root.title("Word-Frequency-O-Matic")
# Define the window size and center function:
def size_and_center_window(target, w=0, h=0):
# Define the variable to hold the screen width:
ws = target.winfo_screenwidth()
# Define the variable to hold the screen height:
hs = target.winfo_screenheight()
# Define the variable to hold half the screen width:
x = (ws/2) - (w/2)
# Define the variable to hold half the screen height:
y = (hs/2) - (h/2)
# Use the combined measurements to center the window:
target.geometry('%dx%d+%d+%d' % (w, h, x, y))
# Run the window size and center function to create the root window at the specified size:
size_and_center_window(root, 500, 440)
#####################
# ROOT WINDOW FRAME #
#####################
# Create and configure a frame in the window:
frame = tk.Frame(root)
# Color the frame's background:
frame.configure(bg = "lightgreen")
# Position the frame and determine how much space it takes up:
frame.pack(fill='both', expand=True)
##@########################
# ROOT WINDOW FRAME LABEL #
###########################
# Create a label in the frame:
label = Label(frame)
# Define the label's text and color its background:
label.configure(text = "Import text from a file or paste text into the text box.", bg = "lightgreen")
# Position the label and determine how much white space surrounds it:
label.pack(side=BOTTOM, pady=2)
########################
# ROOT WINDOW TEXT BOX #
########################
# Create a scrolled text box in the frame:
textbox = st.ScrolledText(master = frame, bg = "lightyellow")
# Give the text box focus:
textbox.focus()
# Position the text box and determine how much white space surrounds it:
textbox.pack(fill='both', expand=True, padx=8, pady=8)
#####################################
# ROOT WINDOW TEXT BOX CONTEXT MENU #
#####################################
# Create the context menu:
menu = Menu(textbox, tearoff=0, cursor="top_left_arrow", activebackground="lavender", background="lightgreen")
# Define the context menu open function:
def open_context_menu(event):
menu.post(event.x_root, event.y_root)
# Define the context menu close function:
def close_context_menu(event):
menu.unpost()
# Define the context menu select all function:
def selectall():
textbox.tag_add("sel","1.0","end")
# Add the Select all entry to the context menu:
menu.add_command(label="Select all", command=lambda: selectall())
# Add the Cut entry to the context menu:
menu.add_command(label="Cut", command=lambda: textbox.event_generate("<>"))
# Add the Copy entry to the context menu:
menu.add_command(label="Copy", command=lambda: textbox.event_generate("<>"))
# Add the Paste entry to the context menu:
menu.add_command(label="Paste", command=lambda: textbox.event_generate("<>"))
# Bind the text box widget and context menu together so the menu opens on right click:
textbox.bind("carview.php?tsp=", lambda(event): open_context_menu(event))
# Bind the text box widget and context menu together so the menu closes on left click:
textbox.bind("carview.php?tsp=", lambda(event): close_context_menu(event))
###############
# FILE IMPORT #
###############
# Define the file import function:
def readfile():
# Open a browsing interface to open a file in read only mode:
file = tkFileDialog.askopenfile(parent=root,mode='rb',title='Choose a file')
# Determine what to do if a file is opened:
if file != None:
# Put the contents of the file into the data variable:
data = file.read()
# Insert the data into the text box:
textbox.insert(INSERT, data)
# Close the file:
file.close()
#########################
# PROCESS TEXT FUNCTION #
#########################
# Define the text processing function:
def process():
# Read data from text box into the mylist variable:
mydata = textbox.get(1.0, END)
# Ask if case insensitive sort and if True convert to lower:
case = tkMessageBox.askyesno("Sort Option", "Case insensitive?")
if case == True: mydata = mydata.lower()
# Ask if numerical sort and hold result for text box display:
num = tkMessageBox.askyesno("Sort Option", "Sort numerically?")
# Split string into a list at whitespace:
mylist = mydata.split()
# Remove punctuation from the beginning of each list item:
mylist = [x.lstrip('`~!@#%^&*()-_=+[{]}\\|;:",? ') for x in mylist]
# Remove punctuation from the end of each list item:
mylist = [x.rstrip('`~!@#$%^&*()-_=+[{]}\\|;:",? ') for x in mylist]
# Remove pure punctuation items:
mycleanlist = filter (lambda string:any([x.isalnum() for x in string]), mylist)
# Remove duplicates by converting the list to a set:
myset=set(mycleanlist)
# Define the function used to sort the list (thanks to 2 nice guys in IRC):
def inverse_case_key(s):
# Define the result variable as a list:
result = []
# For x in each Unicode character:
for x in s:
# If the character is lower case:
if x.islower():
# Add it as an upper case character to the result:
result.append(x.upper())
# If the character is upper case:
elif x.isupper():
# Add it as a lower case character to the result:
result.append(x.lower())
# Otherwise:
else:
# Add it as is to the result:
result.append(x)
# Return the result sorted as lower case:
return (s.lower(), result)
# Do an inverse case sort of the list:
mylist = sorted(set(myset), key=inverse_case_key)
# Clear the text box display:
textbox.delete(1.0, END)
# Determine what to do for each word in the list:
if num == False:
# Display results alphabetically:
for word in mylist:
# Insert the number of occurrences followed by a space followed by the word into the text box:
textbox.insert(INSERT, mycleanlist.count(word), INSERT, '\t', INSERT, word, INSERT, '\n')
# Insert the total words above the counts and sorted words:
textbox.insert(1.0, 'Total Words\n------------\n', INSERT, sum(Counter(mycleanlist).values()), INSERT, '\n------------\n' )
else:
# Display results numerically:
for value, count in Counter(mycleanlist).most_common():
# Insert the number of occurrences followed by a space followed by the word into the text box:
textbox.insert(INSERT, count, INSERT, '\t', INSERT, value, INSERT, '\n')
# Insert the total words above the counts and sorted words:
textbox.insert(1.0, 'Total Words\n------------\n', INSERT, sum(Counter(mycleanlist).values()), INSERT, '\n------------\n' )
######################
# SAVE FILE FUNCTION #
######################
def savefile():
# Load the myfile variable with a text file in write mode in the current directory:
myfile = tkFileDialog.asksaveasfile(mode='w', filetypes=[("text files", "*.txt")])
# If the Cancel button was pressed or the window was closed:
if myfile is None:
# Exit the savefile function:
return
# Load the result variable with the contents of the text box:
result=textbox.get(1.0, END)
# Write the result variable contents to myfile:
myfile.write(result)
# Close myfile:
myfile.close()
#################
# HELP FUNCTION #
#################
# Define the help window function:
def help():
# Define the top window as a TopLevel window:
top = Toplevel()
# Define the title for the help window:
top.title("HELP")
# Run the function to create the centered help window at the specified size:
size_and_center_window(top, 400, 340)
# Create the label for the help window, filling it with the helptext variable contents:
label = Label(top, anchor=CENTER, justify=LEFT, text=helptext, wraplength=375)
# Define the background color for the label:
label.configure(bg = "orchid1")
# Set the label into place and get it to fill the entire top window:
label.pack(fill='both', expand=True)
# Run the top window:
top.mainloop()
########################
# WINDOW FRAME BUTTONS #
########################
# Create an Import button in the frame:
Import = tk.Button(frame, text='Import', command = lambda: readfile())
# Define the File button's background color, border width, and hide the highlight around it:
Import.configure(bg = "lightblue", bd = 5, highlightthickness = 0)
# Position the File button and determine how much white space surrounds it on either side:
Import.pack(side=LEFT, fill=BOTH, expand=True, padx=(15,10))
# Create a Process button in the frame:
Process = tk.Button(frame, text='Process', command = lambda: process())
# Define the Process button's background color, border width, and hide the highlight around it:
Process.configure(bg = "lightblue", bd = 5, highlightthickness = 0)
# Position the Process button and determine how much white space surrounds it on either side:
Process.pack(side=LEFT, fill=BOTH, expand=True, padx=(10,10))
# Create an Export button in the frame:
Export = tk.Button(frame, text='Export', command = lambda: savefile())
# Define the File button's background color, border width, and hide the highlight around it:
Export.configure(bg = "lightblue", bd = 5, highlightthickness = 0)
# Position the File button and determine how much white space surrounds it on either side:
Export.pack(side=LEFT, fill=BOTH, expand=True, padx=(10,10))
# Create a Reset button in the frame:
Reset = tk.Button(frame, text='Reset', command = lambda: textbox.delete(1.0,END))
# Define the Reset button's background color, border width, and hide the highlight around it:
Reset.configure(bg = "lightblue", bd = 5, highlightthickness = 0)
# Position the Reset button and determine how much white space surrounds it on either side:
Reset.pack(side=LEFT, fill=BOTH, expand=True, padx=(10,10))
# Create a Help button in the frame:
Help = tk.Button(frame, text='Help', command = help)
# Define the Help button's background color, border width, and hide the highlight around it:
Help.configure(bg = "lightblue", bd = 5, highlightthickness = 0)
# Position the Help button and determine how much white space surrounds it on either side:
Help.pack(side=LEFT, fill=BOTH, expand=True, padx=(10,15))
#############
# LAUNCH IT #
#############
# Run the whole script:
root.mainloop()
Some extra goodies
Here’s the Bash algorithm used in the Bash scripts above in the form of a “one-liner” you can paste into a terminal window inside any directory that contains text files (files ending in the .txt extension) and it will display the word frequency of all the words in all the text files in that directory:
cat *.txt | tr -cd [[:alnum:]"\n@\'-/' ',\$"] | tr -d [\(\)+*] | sed "s/[[:space:]]/\n/g" | sed "s/\.$//g" | sed "s/[,:.]*$//" | sed "/^$/d" | sed "/^[@,./\$\'-]*$/d" | sort | uniq -c | awk '{print $1" \t"$2}'
Here’s the same algorithm with one space removed before the \t near the end in order to compact the tab between the word frequency count and the words:
cat *.txt | tr -cd [[:alnum:]"\n@\'-/' ',\$"] | tr -d [\(\)+*] | sed "s/[[:space:]]/\n/g" | sed "s/\.$//g" | sed "s/[,:.]*$//" | sed "/^$/d" | sed "/^[@,./\$\'-]*$/d" | sort | uniq -c | awk '{print $1"\t"$2}'
You can get a detailed explanation of the algorithm written by Frank Pirrone here: https://app.box.com/s/0nr5l0eqczlhdog0srs0t56bt3kpes99
Last, but far from least, you can get a Word Frequency Diatribe written by Frank Pirrone about learning from example with the Bash scripts above here: https://app.box.com/s/6ic13hwaahifhxftks7s15rpncp17rwi
Obligatory Happy Ending
And they all lived happily ever after. The end.

Comment: