Word Prediction Project

Due Sunday, May 1

For this project you may work with a partner, or you may work alone. Either way you are responsible for getting the project finished and in on time. If you choose to work with a partner, make sure both of your names are on the lab.

For this project, you are creating a word prediction system that predicts the most common words with the prefix (i.e., first letter(s) ) that you have “typed” in (we’re clicking because we’re using turtle). On most of your phones, you’ve got word prediction, in which as you type in a word, a predicted word pops up, allowing you to select the predicted word and saving on typing time. If you are a person with poor muscle control, you may use a communication device, in which you start entering a word, character by character, in order for that word to be spoken with a synthetic voice. This can be a slow, tedious process both for the person entering the word, and also for their communication partner who is waiting for an answer. In order to speed up the process, using word prediction to reduce the amount of typing can be quite helpful. In this case, the amount of typing can be reduced by giving the 5 most common words with same prefix as what has been typed so far, in order of their frequency (more than 5 and the list gets cumbersome to read through).

So for this project, you will create a board using turtle. On the board there will be a keyboard (similar to the one you type on), a space for where the letters you select will show up, a space where the words you’ve selected will show up (for the sentence you are creating), and an area in which the predicted words will show up (see picture, below).

The keyboard will consist of letters, numbers, and spaces. We are not going to worry about punctuation for now. As you click on each letter, it will appear above the keyboard, and an ordered list of the 5 most commonly occurring words with the letters you’ve typed so far below the keyboard. You can choose to continue selecting letters, or you can choose among the list of predicted words.

Each time a letter is selected, the old set of predicted words is cleared (I filled the area with white) and a new set of predicted words are generated. Only the 5 top words are ever displayed. If there are less than 5, then the number of predicted words is displayed. If the space key is selected (next to the ‘l’), the keys typed in are assumed to make a word, and that word is added to the sentence area and the key area is cleared.

Each time a word is selected, it is written in the sentence area, and the predicted words and the chosen letter area are cleared (by drawing a white square over them)

In order to create the prediction lists, I first read a text file into the list. I then took the text file, stripped out all punctuation, and made a new list of words in alphabetical order. Each word in my list only occurred once. I also created a corresponding list of word counts. So, for instance, if the wordlist is [‘all’,’and’,’as’] and the wordcount list is [7,4,8], that means that all occurred 7 times in the document I read in, and occurred 4 times, and as occurred 8 times.

Note: For this project, the easiest way to code the entire board was to use global variables.

Global variables are variables that apply to the entire program we’re writing. I placed all global variables at the very top of my code. Then every function can use and modify those variables without passing them into the function as parameters, or returning them. In this case, it allows the entire project to work with the same set of variables. To use global variables within a function, you should specify that the variables are global. I’ve included that within the function descriptions for you. Once you’ve specified that they’re global, you can just use them as if they were regular variables

For example:

k = []

defmodk():

global k

k.append(‘cat’)

k.append(‘dog’)

return()

defprintk()

global k

x = 0

while x < len(k):

print(k[x], end=”,”)

return

modk()

printk()#will print out #cat, dog”

Extra Credit (5 pts):

Using Great Expectations to come up with word counts is not the best way to predict words in our word prediction system. At the very least, we’d probably want to use a text that is more current, and more likely to be about current topics. There are many many other ways to improve on the accuracy of our predicted words.

List 3 creative ways in which we could improve on the words being predicted in this system. (At this point, do you think you could implement any of them?)

**********************************************************************************

The following code is the outline of the code I used for this project. You may use it as is, or you can modify it as needed to create the project.

import turtle

from random import *

#GLOBAL VARIABLES #############################################################

#qwerty keyboard keys (essentially)

keylist = ['1','2','3','4','5','6','7','8','9','0','q','w','e','r','t','y','u','i','o','p','a','s','d','f','g','h','j','k','l',' ','z','x','c','v','b','n','m','\,','.','clr']

squaresize = 40 #the size of each keyboard key square

fontsize = 18 #fontsize used throughout

wordlist = [] #the list of (unique) words read in from a file

wordcount = [] #the list of each word's count, or number of times it occurred

#This list corresponds directly with wordlist

#so, for instance, if wordlist[22] holds 'and', the

#number of times word occurred in the document will

#be located at wordcount[22]

boardleft = 0 - squaresize * 5 #the left-most coordinate of the keyboard

boardtop= squaresize * 2 #the top-most coordinate of the keyboard

predictls = [] #the list of currently predicted words, based on what has been typed

# in so far (like on your phone). So, for instance, if you've typed in

# 'b' and 'e', the predictls might hold ['be','become','before','behind',beneath']

# or something similar

predictcts = [] #the corresponding list of number of times each word in the prediction

#list occurred in the document

predictx = -200 #the x coordinate of where the prediction list will be printed

predicty = -110 #the y coordinate of where the prediction list will be printed

currword = "" #the current word being typed in

currsentence = "" #the current sentence being typed in

typex = -220 #the x coordinate of where the sentence you've typed will start showing up

typey = 200 #the y coordinate of where the sentence you've typed will start showing up

wordx = -150 #the x coordinate of where you'll print the word you're currently typing in

wordy = 120 #the y coordinate of where you'll print the word you're currently typing in

#####END OF GLOBAL VARIABLES ##################################################

#FUNCTION OUTLINE#############################################################

#drawkey

#This function takes three input parameters:

#the letter to be printed, the x coordinate of where the key will be printed, and

#the y coordinate of where the key will be printed on the turtle board.

#Note: to write the key, I used turtle.write(key,font='Arial',fontsize), where

#key is the parameter holding the letter being printed.

defdrawkey(k,top,left):

#makeboard

#This method draws the board in turtle. It uses the global parameters boardtop and

#boardleft as the starting positions for the keyboard, and prints out 4 rows, each

#with 10 characters. Mine looked like:

#

defmakeboard():

#The following functions involve reading in a file, and adding each word in the file into a list in

#alphabetical order, also keeping track of how often each word occurs

#addinorder(z)

#this function takes a word, and adds it to a list, in alphabetical order.

#the list is the wordlist, used in other functions

defaddinorder(z):

global wordlist

globalwordcount

iflen(wordlist) == 0:

wordlist.append(z)

wordcount.append(1)

else:

x = 0

while (x < len(wordlist) and z > wordlist[x]):

x+=1

if (x >= len(wordlist)):

wordlist.append(z)

wordcount.append(1)

elif wordlist[x] == z:

wordcount[x] += 1

else:

wordlist.insert(x,z)

wordcount.insert(x,1)

return

#stripchar(ls):

#This function takes a list of strings, and strips out spaces, effectively

#dividing the string into individual words, which are then added to the wordlist

defstripchar(ls):

for x in ls:

z = ""

for y in x:

ify.lower() in keylist:

if y != " ":

z += y.lower()

eliflen(z) > 0:

addinorder(z)

z = ""

return()

#readlist(doc)

#This function reads in the content of a document into a list of lines

defreadlist(doc):

global wordlist

globalwordcount

f = open(doc,'r')

ls = []

for line in f:

ls.append(line.strip())

f.close()

stripchar(ls)

for x in range(len(wordlist)):

print(wordlist[x] +" "+str(wordcount[x]))

return()

#makesorted(x,y)

#Premise: when a word predictor predicts words (based on what you've typed in so

#far, it will predict the most commonly occurring word (starting with the typed letters)

#first, followed by the next most commonly occurring word, etc.

#

#This function takes two integers as input parameters. The two input parameters

#are the index of the first word in the wordlist that has the same beginning letters

#as the keys that you've typed in, and y is the index right after the last word

#in the wordlist that has the typed in letters.

#It creates the predictls with those words in the wordlist with the same first letters

#as those typed in in the order of teh most commonly occurring words first, down

#to the least commonly occurring word.

#It also creates a predictcts list that is the corresponding numbers of the occurrences

#of the words in predictls

#so, for instance, if you've typed in "pi", the x and y coordinates in the list

#might be:354,360

#the words before sorting might be:

#['picking', 'piece', 'pint', 'pip', 'pirate', 'pirrip']

#then teh sorted predictls would be:

#['pip','pirrip','pirate','picking','piece','pint']

#and the predictcts list would be:

#[5,3,2,1,1,1]

#

#Note: This function is a bit of a brain teaser. If you want to do it last, and

#just make predictls be the unsorted list of words starting with 'pi'(or whatever

#you've typed in)andpredictcts the unsorted list of wordcounts associated with

#those words, go for it.

defmakesorted(x,y):

global wordlist

globalwordcount

globalpredictls

globalpredictcts

#writeprediction()

#This function writes out the predictls list in the bottom left corner of turtle. It

#also writes out the accompanying counts.

#So if this function is being tested on its own, with predictls set to,

#['pip','pirrip','pirate','picking','piece','pint']

#and the predictcts list is:

#[5,3,2,1,1,1]

#Your board would look like:

#

defwriteprediction():

globalpredictx

globalpredicty

globalpredictcts

globalpredictls

#makewordlist(currtyped):

#This function takes a string as input parameter and finds the first and last index of of words

#in the wordlist that start with the input string (currtyped). So, for instance,

#if the currtyped string is 3 characters long, you want to return the first and last index of

#every word whose first 3 letters match currtyped.

#It calls makesorted with the x and y index found.

#so, if the input was "pi", then the x and y coordinates found would be:354,360

#and then it calls writeprediction() to write out the (sorted) prediction list and their counts

defmakewordlist(currtyped):

#clearlist(x,y,x2,y2)

#This function draws a big white square over the predicted list (in the bottom left). As

#you type, the prediction list will change. You can't just write over it - you have to

#erase it first. The easiest way to erase it in turtle is to draw a bit white rectangle, with x,y and x2,y2 being the outside top left and bottom right coordinates respectively, around

#the area on the turtle board where you wrote the predicted list. Note: you'll want to set

#turtle.color to white using

#turtle.color('white','white')

#and you will want to use

#turtle.begin_fill()

#right before you start drawing your rectangle. When you're done drawing your rectangle, use

#turtle.end_fill()

#so that the rectangle you draw will be filled with white.

defclearlist(x,y,x2,y2):

#findletter(x,y)

#This is the heart of the function.

#findletter takes the x and y coordinate of where you clicked

#This function does a lot. It checks to see if the x and y coordinate are within the words

#printed as the predicted words (by using each of their x and y coordinates - do the math to

#find the edges of each word.

#if a word is found, it prints it up at the top typex and typey coordinate and clears the

#word being typed, the prediction list, the prediction count list, and wipes out the prediction

#list off the board (using clearlist)

#If the x and y coordinates do not fall within the word prediction list, it then checks to

#see if the coordinates are within the keyboard, and, if so, on which key (again, do the math

#to figure out if the x and y coordinates are within the boundaries of where a particular key

#was printed on the keyboard. If a key is found and it is not the space key or the clr key,

#the character in the x,y coordinate is concatenated to the current word and the current sentence

#and the current word is printed out so far, and the prediction list is updated and reprinted

#if the x,y coordinate falls within the space key, the character is added to the sentence, the

#updated sentence is printed out at the top, the typed word is cleared, the prediction list is

#cleared, the prediction count list is cleared, and the prediction list is wiped off the screen.

#If x,y coordinate falls within the clr button, however, the sentence is cleared to an empty

#as well.

#Here are some screenshots of what happens:

#

deffindletter(x,y):

globalpredictls

globalsquaresize

globalpredictx

globalpredicty

globaltypex

globaltypey

global typed

globalboardtop

globalboardleft

globalcurrword

globalcurrsentence

#This function gets the x and y coordinates of where the user clicked on the turtle board, and sends them

#into the findletter function

defwriteletters(x,y):

findletter(x,y)

#This function reads where you clicked on the board

defenterword():

turtle.onscreenclick(writeletters)

#your main function that gets everything going…

def main():

turtle.speed(0)

makeboard()

readlist("GEChap1.txt")

enterword()

return

main()