betterlivingthroughpython

blog about me resume portfolio

I guess I'm learning, a python script is easier than using formulas in a spreadsheet!

November 14, 2011

This past Friday I analyzed our voting data from the Kickstarter naming poll. I used to interact with Excel a ton, but when I started looking at the results from the googledocs spreadsheet I realized that I could just as easily (if not more so), get the information I needed by creating a small python script. So I cut and pasted the three things I needed into text (.txt) files and got to it.

There were five main things I needed to accomplish, before I could analyze my data. First I needed to create some text files for my data. I needed three text files. The first one was a list of all the acceptable ‘unique-identifiers’. These ‘unique-identifiers’ were a tool we used to verify that those who voted were eligible to do so. They were automatically inputted into the form once the ‘voter’ reached the link provided. Robey had created a list of such identifiers before and sent them out to all participating voters. The next text file was a list of all the ‘unique-identifiers’ that our googledocs form received. I simply copy-pasted them into a file and saved it as text. The final text file was a list of all of the names voted for. Each ‘voter’ was allowed to select more than one name, and so I was able to copy paste my column, but afterwards I had to go back through and set it up so that there was only one name per line, and that each line had no spaces at the beginning or end (so that none of the names were mistakenly counted).

Once I had all of these text files complete, I started writing my python script. Secondly I identified the text files as variables in my code.

tokens_FILENAME = "tokens.txt"
usedtokens_FILENAME = "usedtokens.txt"
kickstarterpoll_FILENAME = "kickstarterpoll.txt"

Next I borrowed a little function from my MIT Programming course to interpret the file and make it into usable data. Each text file was interpreted the same way. As an example, here is one of my functions:

def load_tokens():
   print "Loading token list from file..."
   # inFile: file
   inFile = open(tokens_FILENAME, 'r', 0)
   # wordlist: list of strings
   wordlist = []
   for line in inFile:
       wordlist.append(line.strip().lower())
   print " ", len(wordlist), "words loaded."
   return wordlist

The function returns a list of valid tokens. Each token in the list is a string of lowercase letters. These lists helped me to analyze my data. Before I could use them, however, I had to run the functions so that I can get the results I need. Thus I created new variables, that are simply representations of the results of the functions.

tokenlist = load_tokens()
usedtokenlist = load_usedtokens()
pollresults = load_kickstarterpoll()

The two tasks I needed to complete were to verify that all ‘unique identifiers’ (in my code I call them tokens) submitted in the googledocs form were valid ones that Robey sent out to the ‘voters’. The last and most important task was to count the resultant names submitted and find the winning one.

Up Next Time: My final two functions and the winning name from our Kickstarter poll.

better living through python

I guess I'm learning, a python script is easier than using formulas in a spreadsheet!