CSC 401 Lecture #7 Yosef Mendelsohn

CSC 401– Lecture #7
Yosef Mendelsohn

Dictionaries review

Suppose you want to store employee records in a collection and access the employee records using the employees’ social security numbers.

The list collection is not really useful for this problem because list items are accessed using an index, i.e. a position in a collection.

For this problem we need a collection type that allows access to stored items using user-defined “indices”, which are referred to as keys.

> employees = {'022-83-8795': 'Allison Andover', '987-65-4321': 'Benjamin Benton', '123-45-6789': 'Cassidy Cantor', '874-74-1532': 'Daniel Danton'}

> employees['123-45-6789']

'Cassidy Cantor'

A dictionary, also called a map, is a collection type that stores key-value pairs.

In the employees dictionary above, ‘022-83-8795’: ‘Allison Andover’ is a key-value pair.

The key ‘022-83-8795’ is the “index” of the stored object and is used to access the object. The key must be an object of an immutable type (like a string or number or tuple).

The following are the dictionary collection type properties:

Items in the collection are accessed by key, not index (e.g. offset)
Items in the collection are not ordered, and no ordering assumption can be made
Dictionaries are mutable, like lists, meaning they can be modified to contain different items.
Dictionaries have dynamic size, meaning they can grow or shrink

Dictionary problems

Problem: Write a function wordFreq() that takes as a parameter a file name (i.e. a string object). (For purposes of time, you don’t have to worry about exception handling). The function should compute the frequency of each word in the file and then output the list of words and their frequencies as shown below. The function should also return the dictionary created.

Use the file example.txt to test your code.

Incidentally, this is a very good example of a situation in which you should think through the problem ahead of time using a pen and paper with pseudocode and/or diagrams.

Hints:

Remove all punctuation and capitalization from the words before you process them to ensure accurate results. Perhaps the easiest way to do this would be to replace any of the following characters: . , ! ? (period, comma, exclamation point, question mark) with a space.
Can you think of any other pre-processing you might want to do? Hint: 'the' and 'The' would be considered as two different words…!
To remove punctuation, recall the use of the ‘in’ operator: e.g.

for character in '!.,?':

string = string.replace(character, " ")

The function would be used as follows:

> wordFreq('sample.txt')

a 2

no 1

this 3

text 4

is 2

contains 1

but 1

sample 3

file 3

in 1

or 1

Variation: Modify the function into one called duplicates()that accepts a file name as the argument. Ths function should return True if there are any duplicate words in the file and False if there are no duplicates in the file. Test your code with the file 'duplicates.txt' and then again with the file 'no_duplicates.txt'

HINT: To save a LOT of extra (completely redundant) coding, inside this function, use the wordFreq function we created in the previous step!

The random module

Let's begin by reviewing the module random.

Practice problem: Implement a function guess() that takes an integer n as a parameter and implements a simple, interactive guessing game.

The function should start by choosing a random integer number in the range from 1 up to and including n.

The function should then repeatedly ask the user to enter to guess the chosen number. When the user guesses correctly, the function should print a ‘You got it’ message and terminate. Each time the user guesses incorrectly, the function should indicate whether the guess was too high or too low.

If the user types something other than an integer, the function should recover gracefully.

> guess(100)

Enter your guess: fifty

That was not a valid number.

Enter your guess: 50

Too high

Enter your guess: 25

Too low

Enter your guess: 38

Too high

Enter your guess: 31

Too high

Enter your guess: 28.5

That was not a valid number.

Enter your guess: 28

Too high

Enter your guess: 26

You got it!

See the solution in this week's file.

More collection types

The standard Python library supports two more collection types: tuples and sets.

Let’s start with sets.

The set collection type

The set type is a collection type used to store an unordered collection of immutable values.

Because this collection is unordered, we can not do things like:

print( set_name[0] )

Perhaps the key principal of sets – and probably the main thing that makes sets different from other types of sequences, is that sets do not contain any duplicates.

Therefore, one use of sets is to remove duplicates from other types of collections.

For example, if you take an existing list that has duplicates, and create a set out of it, the set will look exactly like the list – but without any of the duplicate values.

Creating a set:

Herer are two ways of creating a Set:

grains = {'rice','wheat','oat'} #Not the perferred way
grains = set()

As you can see, the first option looks almost exactly like a dictionary. Not surprisingly, this can lead to confusion. The ONLY time you should create a set using this syntax is when you are creating a brand new set and already know some of the values you want to place inside it.

However, if you are creating an empty set, then do not use the first syntax. Instead, you should always create your new set using the set() "constructor". (We'll discuss what we mean by construcors in a later lecture).

Once you have a new set, you can easily start adding items to it.

Here is an example where we create a set and we already know some of the values we want to put in it. Again, the one time it’s okay to create the set using braces, is when you know the initial values you want to put inside the set:

> grains = {'rice', 'wheat', 'corn', 'rye', 'oat', 'wheat', 'millet'}

In the above example, Python is smart enough to figure out that we are trying to create a set as opposed to a dictionary. As you can probably surmise, Python figures this out by noting that the values we entered are not pairs. If Python saw Key:Value syntax, Python would instead create a dictionary.

> grains

{'wheat', 'corn', 'rye', 'millet', 'oat', 'rice'}

> type(grains)

Now—compare what we just did with trying to create an empty set using braces:

> fruit = {}

In this case, Python can't tell if we want a set or a dictionary. If the Python interpreter can't tell, it's always going to guess dictionary!

To prove this, note what happens when we check the type of our "set":

> type(fruit)

Therefore, remember that when creating an empty set, you should always use the set() constructor:

> fruit = set()

In any case, back to sets…

To add an item to a set, use the method add():

> fruit.add("apple")

> fruit.add("pear")

> fruit.add("apple") #will be ignored

print(fruit)

{'apple','pear'}

#Note that ‘apple’ only appears once since this is a set

Set Techniques:

Some of the techniques we use when working with sets are: Membership, Union, Intersection, Difference.

For purposes of our discussion, consider these two sets:

fruit1 = {'apple','pear','banana'}

fruit2 ={'apple','pear','blueberry'}

Membership: When you are trying to determine if a certain item is present in the set.
item in(setName) Returns True if the item is in the set and False otherwise.

 returns true if item is present in the set

> 'banana' in fruit1

True

> 'banana' in fruit2

False

union(setName) This function returnsa set. The set is produced by combining the elements of the two sets. However, because the result must be a set, any items that overlap (i.e. would be duplicates) are removed. As with the membership() function, you can invoke the union() function in two ways:
set1.union(set2) returns a set containing a combination of all elements in set1 and all elements in set2. However, any duplicates are removed.
(set1 | set2) another way of invoking the same function. The | symbol is known as a “pipe”.

> fruit1 | fruit2

{'banana', 'apple', 'pear', 'blueberry'}

Intersection: When you are searching for items that are in both sets.
set1.intersection(set2)
(set1 set2) another way of invoking the same function

> fruit1 & fruit2

{'pear', 'apple'}

Symmetric Difference: This is executed by the ^ operator. The symmetric difference gives you a combination of two sets, but excludes any items that happen to be in both sets.
set1 ^ set2

> fruit1 ^ fruit2

{'blueberry', 'banana'}

Difference: The – operator gives you everything that is in the first set that you specify, but is not present in the second set that you specify. In other words, unlike the previous examples, when typing the code for ‘difference’, the order matters.

set1-set2 will return everything in set1 minus those items that also appear in set2.

> fruit1-fruit2

{'banana'}

> fruit2-fruit1

{'blueberry'}

A few more examples:

> s = {1, 2, 3}

> t = {2, 3, 4}

> s | t

{1, 2, 3, 4}

> s&t

{2, 3}

> s^t

{1, 4}

> s-t

{1}

> t-s

{4}

> (s-t) | (t-s)

{1, 4}

Converting a list to a set:

Imagine that you have a list, but you want to get rid of any duplicates from that list. For example:

lstFruit = ['apple','pear','banana','pear',

'pear','apple']

You can easily convert this by using the set “constructor” function like so:

setFruit = set(lstFruit)

setFruit will contain: {‘apple’,’pear’,’banana’}

To learn more, read: or you can always do a quick check (albeit in a less easy-to-read format) by typing at the console:

> help(set)

Help on class set in module builtins:

[…]

Question: Can you think of a problem we did today that would have been MUCH easier using sets? …

Answer:

The duplictes() function!

I've intentionally made the text of the answer small. Select the text and increase the font size to see the answer.

Problem: Recall the function we wrote in the dictionary lecture: duplicates() that takes as a parameter a string representing the name of a text file. The functionreturnsTrue if the file contains duplicate words and False if the file doesn't contain duplicate words. Make sure to remove all punctuation (i.e. all commas, question marks, periods, colons, and semicolons) before determining whether any words are duplicated so that the punctuation won't interfere with the process. Your function should ignore case (e.g. APPLE = Apple = apple).

The above version used a dictionary to determine if there were duplicates. Rewrite the function as duplicates_set() so that it uses a set instead of a dictionary. (It should be much simpler).

See the code in the solutions file for this lecture.

The tuple collection type

The tuple type is essentially the same as a list with one important distinction: A tuple is immutable, i.e. a tuple cannot be modified.

You'll also note that we create a tuple using parentheses as opposed to square brackets:

> t = (3,-7, 92)

> t

(3, -7)

> type(t)

> t[0]

> t[1] = 3  trying to modify the tuple!

Traceback (most recent call last):

File "<pyshell#4>", line 1, in <module>

t[1] = 3

TypeError: 'tuple' object does not support item assignment

Collection type properties for tuples include:

Items in the collection are ordered
Items in the collection are accessed using an index (offset)
Like strings – but unlike lists -- tuples are immutable
Tuples have a fixed length, since they cannot change. For example, whereas lists can grow and shrink, tuples can not.

One important use of tuples: Because tuple objects are immutable, a tuple can be used as a dictionary key.

More about class tuple

> help(tuple)

Problem: Implement a function lookup() that provides a phonebook lookup feature.

The function takes as a parameter a dictionary representing a phonebook. In the dictionary, tuples containing first and last names of individuals (the keys) are mapped to strings containing phone numbers (the values).

Your function should provide a simple user interface through which a user can enter the first and last name of an individual and obtain the phone number assigned to that individual. It should indefinitely prompt the user for first and last names, stopping only when the user does a keyboard interrupt (e.g. control-c).

For example, it would be used as follows:

> phonebook = {('Luca', 'Elam'): '(312) 123-4567',\

('Djengo', 'Thomas'): '(773) 987-6543',\

('Devon', 'Reilly'): '(520) 454-6677'}

> lookup(phonebook)

Enter the first name: Luca

Enter the last name: Elam

(312) 123-4567

Enter the first name: Devon

Enter the last name: Reilly

(520) 454-6677

Enter the first name:

See the solution in the solutions file for this week.

Review of loops and collections

To review loops and two-dimensional lists, we will do a practice problem.

Implement a function csv_to_list() that transforms a .csv spreadsheet into a Python two-dimensional list. Your function will take as input the name of the .csv file and return the corresponding two-dimensional list.

Here is what the cities.csv file looks like when opened up in a text editor:

The following is an example of how the function would be used:

csv_to_list(‘cities.csv’)

[[‘Paris’, ‘Sao Paulo’, ‘Tokyo’, ‘Kinshasa’], [‘London’, ‘Mexico City’, ‘Seoul’, ‘Lagos’], [‘Moscow’, ‘New York’, ‘Mumbai’, ‘Cairo’]]

Here is the same function invoked on the file data.csv

csv_to_list(‘data.csv’)

[[‘3’, ‘5’, ‘6’, ‘4’], [‘8’, ‘5’, ‘2’, ‘3’], [‘2’, ‘8’, ‘9’, ‘1’]]

The solution is found in this week's code solutions.

A note about the upcoming topics

A few of the concepts from this point forward through the end of the course may become a little bit confusing. Therefore, be sure to really try and understand the definitions and follow the code examples as you move through. However—it is very likely that you will end up confused at times. Don't be discouraged. These topic will likely need a few ‘passes’ and mental review before the concepts discussed start to sink in.

You will also find the readings from the textbook to be very helpful in getting a handle on these concepts. The reason is that the notes for a course can never encompass the length and detail of an entire textbook. (Which is why we require you to get one!)

Some of the upcoming topics are a situation where I particularly recommend that you read (and reread) the text.

Review: Functions

Functions are a way to package a group of statements so that those statements can be executed within a program by typing a single statement.

The purpose of functions includes:

Code reuse
Encapsulation or information hiding
Modularity or procedural decomposition

So date in this course we have been using:

"predefined" a.k.a. "built-in" functions that work on their own such as open(), print(), etc, etc
Predefined functions that are meant to work with specific data types such as replace(), sort(), append(), etc.
Functions that we have defined on our own such as the ones we have been creating throughout the course.

We are now ready to start a discussion of how functions work "under the hood". Doing so will allow us to further harness the power and flexibility of functions.

We will use the following simple function as we continue our discussion:

def add_numbers(num1, num2):

total = num1 + num2;

return total

Terminology

Calling / Invoking v.s. Defining functions

When we execute a function by typing out its identifier, we are said to be "invoking" or "calling" that function. In the example below, we invoke the function len().

> lst = [3,4,5,6]

len(lst)

When we create a function, we are said to be "defining" a function:

> def calc_average(x,y):

return (x+y)/2.0

Formal parameters, parameters, arguments

This is important terminology. So be sure to review as needed until you get comfortable with it.

'Formal parameter' is the term that refers to information required by the function when the function is invoked. For example, the function add_numbers has 2 formal parameters.

‘Argument’ refers to the information we provide when we invoke (execute) a function. Each argument must therefore match up with a corresponding formal parameter. Parameters are also known as "arguments".

Example: When we invoke the add_numbers function, we must provide two arguments. If we neglect to provide these 2 arguments, the function will not work, and our program may crash.

Say we invoked it with: add_numbers(3, 7)

The first argument, 3, will be assigned to the first parameter, num1.

The second argument, 7, will be assigned to the second parameter, num2.

Note: Don't confuse " parameters" which refer to the information required by the function when we create it, with "arguments" which refers to the information passed when we invoke (execute) the function.

CONFUSION TIME:

All that being said, you may sometimes hear people refer to parameters as “formal parameters” and arguments as simply “parameters”. Confusing? Yes. Inconsistent? Yes. Unfortunately, it just is that way.

The main thing is for you to recognize the distinction between the variables we create when we are defining a function (what we typically call ‘parameters’), v.s. the information we pass tothe function when we are invoking it (what we typically call ‘arguments’).

"Calling Program" a function must always be called from somewhere. For example, we nearly always invoke a function from within another function. If this is the case (and it almost always is!!), the first function (i.e. the one that invoked the other function) is known as the "calling program".

The Process

The following steps are carried out by Python when a function is called:

The calling program (by 'calling program' we typically simply mean a function) suspends execution. For example, if you are writing a function called do_something() and from inside it you then invoke a function do_something_else(), then the do_something() is the "calling program". At this point, the calling program, ie.. do_something() temporarily stops execution.
Note: The Python interpreter keeps track of precisely where inside do_something() we were at when we invoked do_something_else().As a result, when we are ready to return to our original function, we know exactly where to pick up.
As discussed a few minutes ago, any formal parameters that are present will get assigned the input information as supplied by the parameters (aka arguments).
E.g. When we invoke add_numbers(3,7) the formal parameter num1 is assigned the parameter (aka "argument") 3 and the formal parameter num2 is assigned the argument (aka "parameter") 7.
The body of the function is executed
The executions of the calling program (i.e. the program/function from where add_numbers() was originally invoked) gets resumed. Recall that Python has kept track of exactly where we were in the calling program when we invoked the outside function. Therefore, the calling program can continue from the point immediately after the location where add_numbers() was originally invoked.

Example 1: See the code in the file invoke1.py.