Another Use for the 'In' Operator: Searching

CSC 401– Lecture #6
Yosef Mendelsohn

Another use for the 'in' operator: searching

We have used this operator many times such as when iterating through a list:

for n in lstNumbers:

Or when using the range() function:

for i in range(10):

However, this very convenient operator also allows you to check if a value is present in some collection (e.g. list, or string, or other):

For example, you might want to see if the number 3 is present in a list.
Or you might want to see if the letter 'q' is present in a string (which is a collection of characters).

> lstNums = [1, 2, 3, 5, 7, 11, 13] #a list with 7 integers

> lstPets = ['rabbit','goldfish','emu']

> s = "Doh!"

The ‘in’ command returns true if the first argument is present inside the second argument:

> 2 in lstNums

True

> 4 in lstNums

False

> 'emu' in lstPets

True

> 'Emu' in lstPets

False

> '!' in s

True

> 'oh' in s

True

# Example:

numbers = [3,4,27,96, 18, 43]

searchNumber = input("Enter a number to search for: ")

searchNumber = int(searchNumber)

if searchNumber in numbers:

print("It's there!")

Loop patterns: Accumulator loop

A common pattern in loops is to accumulate some value in every iteration of the loop. This value can be a numeric value such as a count of something, or it might refer to a list that we keep appending to, etc.

For example, given a list of numbers we might want to determine the sum of those numbers.

To do this we need to introduce a variable that will contain the sum. This variable will need to be initialized to 0, and then a for loop is used to add every number in the list to the variable.

> lst = [3, 2, 7, -1, 9]

> sum = 0

> for number in lst:

sum = sum + number

print(sum)

In this example, sum is theaccumulator. The assignment sum = sum + n increments the value of sum by n.

This type of operation, i.e. one in which we add a value to an existing variable is so common that there is a shortcut for it:

sum += n

Note: This shortcut also works for various other common mathematical operations such as subtraction, multiplication, division. For example, let's say you had a variable called total. If each time through the loop you wanted to double the value of this variable you could use:

total = total * 2

total *= 2

Many – perhaps most – programmers tend to use this version.

Recall from the textbook that there is a built-in function sum that can be used to add up the values in a list.

> lst = [3, 2, 7, -1, 9]

> sum(lst)

So in this case a solution using a loop wasn’t necessary.

However, most of the time, a convenient built-in function that just so happens to do exactly what you want is not going to be available!

Example: Suppose we want to multiply all the numbers in the list.

A similar approach as the one we used for the sum would work:

> lst = [3, 2, 7, -1, 9]

> product = 0

> for i in lst:

product = product * i #Or: product*=i

> print(product)

Clearly something went wrong. What is the problem?

Note that we initialized our variable product to 0. Since anything times 0 equals 0, that's what our calculation will end up being.

Instead we need to initialize product to 1, which is neutral for multiplication.

> lst = [3, 2, 7, -1, 9]

> product = 1

> for i in lst:

product *= i

> product

-378

In the previous two examples the accumulators were of a numerical type.

There can be other types of accumulators. In the following examples, the accumulators are strings and lists.

Practice problem 1: Write a function calledupAbbrev that takes a string as an argument and returns a string consisting of all upper-case characters in the string.

It would be used as follows:

> upAbbrev(“Cubs Win the Pennant!”)

‘CWP’

Practice problem 2: Write a function divisors that takes a positive integer n as an argument and returns the list of all positive divisors of n.

Question: Do you remember what we mean by positive divisor? It's a question you must be able to answer in the future!

Answer: It's the situation where when you divide one number by another, the remainder is 0. In programming we do this using the modulus operator, %. For example, positive divisors of 10 are 1,2,5,10.

It would be used as follows:

> divisors(6)

[1, 2, 3, 6]

> divisors(11)

[1, 11]

> divisors(12)

[1, 2, 3, 4, 6, 12]

Practice problem 3: An acronym is a word formed by taking the first letters of the words in a phrase and making a word from them. For example, RAM is an acronym for “random access memory”.

Write a function acronym that takes a phrase as a parameter and returns the acronym for that phrase. Note: The acronym should be all uppercase, even if the words in the phrase are not capitalized.

It would be used as follows:

> makeAcronym('random access memory')

'RAM'

> makeAcronym('internal revenue service')

'IRS'

Hint: You'll need to break your phrase up into separate words. There is a function you definitely want to become familiar with that does this.

Answer: The split() function.

NEW & Important Loop pattern: 'while' loop

This loop consists of the keyword 'while' followed by a conditional. As long as the conditional evaluates to True, the loop continues.

repeat = ''

while repeat != 'No':

print("Hello!")

repeat = input("Would you like another greeting? Type Yes or No: ")

print("Goodbye!")

Loop pattern: Infinite loop

It is also possible to create a so called “infinite” loop, i.e. a loop that runs forever. Most of the time, infinite loops are the result of sloppy programming.

It is very common to accidentally create infinite loops when first learning how to program. When this happens, the most common way to exit the loop is to press Control-C (Windows), or Command-C (Mac).

Note: There are occasionally situations where infinite loops are useful. For example, consider a service that is needed to be executed indefinitely. A web server, i.e. a program that serves web pages, is an example of a program that provides a service that has no known end point.

Perhaps the easiest way to create an infinite loop intentionally is:

while True:

<body>

Practice problem: Write a function greeting() that, repeatedly without end, requests that the user input their name and then greets the user using that name.

Recall that an infinite loop happens when the condition after the ‘while’ term always evaluates to True. I.e. It never becomes false.

The following is an example of how the function would be used:

greeting()

What’s your name? Amber

Hello Amber

What’s your name? Marcus

Hello Marcus

What’s your name?

Loop pattern: Interactive loop

A while loop is useful when a program needs to repeatedly request input from the user, until the user enters a flag which is a value indicating the end of the input.

Example: Write a function called ‘betterGreeting()’ which fixes the function above, so that if the user types ‘stop’ (in lower case), the loop ends.

Example: Write a function interact1() that repeatedly requests positive integers from the user and appends them to a list. The user will indicate the end of the input with the value 0 or a negative number, at which point the function should return the list of all positive integers entered by the user.

The following is an example of how the function would be used:

> interact1()

Enter value: 3

Enter value: 1

Enter value: 8

Enter value: 0

[3, 1, 8]

> interact1()

Enter value: 0

[]

The solution uses the accumulator loop pattern in addition to the interactive loop pattern.

Specialized loop statements

There are certain statements that can be used to construct more elegant loops (and other statements).

These are: break, continue, and pass.

The break statement and the loop-and-a-half pattern

A break statement immediately terminates the loop. A break statement can be placed anywhere inside the body of the loop.

As with any situation in which a loop has completed, execution then resumes at the first statement after the body of the loop.

To see how it works, consider writing a variation of the interact1() function, called interact2() in which we use a break statement to exit the loop when the user gives us our 'flag' value.

> interact2()

Enter value: 3

Enter value: 1

Enter value: 8

Enter value: 0

[3, 1, 8]

> interact2()

Enter value: 0

[]

Instead of returning, we can use the break statement inside the function.

The continue statement

The continue statement is used to terminate the current iteration of the loop.

That is, whereas the break statement completely exits the loop – and does not even try to check the conditional, the continue statement merely terminates the current execution of the loop. However, the loop is not over. The interpreter now returns to the conditional to evaluate for True vs False.

Again: Unlike the break statement, execution is NOT resumed with the first statement after the loop body. Instead, execution resumes at the loop's conditional.

If the loop is a while loop, execution is resumed at the condition checking statement. If the condition is still true, then execution moves to the next iteration of the while loop.
If the loop is a for loop, then execution resumes with the next iteration (provided, of course, that there are additional elements to iterate over.)

For example, consider the startsWithVowel() function:

def startsWithVowel():

while True:

word = input("Enter a word: ")

if word[0] not in 'aeiouAEIOU':

continue

else:

print("Starts with a vowel.")

The print statement is only executed when the condition in the if statement evaluates to False.

This is because the continue statement in the body of the if statement makes the execution skip to the next iteration of the while loop.

Also note that this code has an infinite loop inside. You'll have to press control-C to end the loop. In the real world, infinite loops should not happen (or at least – should be limited to certain highly specific circumstances).

The pass statement

In Python every body of a function, if statement, for or while loop must contain at least one statement.

Sometimes we just don’t need to do anything inside the body.

When that’s the case, we can use the pass statement.

For example:

if n % 2 == 0:

pass

else:

print(n)

In the case where n is even, we don’t do anything. When n is odd, we print the number.

The pass statement is especially useful when a body for a particular construct has not yet been written. Basically, this means that you should pretty much only use it during writing/testing of code. For example, for functions that we haven’t implemented. In such a case, it is there as a temporary placeholder until we are ready to implement that section.

However, “production” code should rarely if ever have ‘pass’ in it. (For that matter, even break and continueshould be used with care)

Warning: Be careful how you use break, continue, and pass

The specialized statements break, continue, and pass are sometimes used by sloppy programmers because they haven't thought about the problem in an efficient way.

For example, consider this example from earlier:

if n % 2 == 0:

pass

else:

print(n)

How can we rewrite the if statement so that it doesn’t use a pass?
Was it necessary to use a break statement in the interact2() function?

Sometimes break, continue, and (rarely)pass can make your code more elegant and better. But careless programmers frequently use them inappropriately.

As you become a more experienced programmer, you’ll learn the difference.

Rule of thumb: You shouldn't need to use them very often. The pass statement is fine as a temporary measure when you are developing your code. However, it too shouldn't be seen very often in "finished" the product. (i.e. I shouldn't encounter them on assignments very often – or perhaps ever!)

Loop patterns: Nested loops

Some problems can only be solved using multiple loops together.

To see how to use multiple loops, consider the following (somewhat artificial) problem: Suppose we want to write a function nested() that takes a positive integer n and prints to the screen the following n lines:

0 1 2 3 … n-1

…

0 1 2 3 … n-1

For example:

> nested(5)

0 1 2 3 4

We’ve already seen that in order to print one line we can write the following:

> for i in range(n):

print(i, end=" ")

0 1 2 3 4

In order to get n such lines (5 lines in this case), we need to repeat the above loop n times (5 times in this case). We can do that with an additional outerfor loop which will repeatedly execute the above innerfor loop:

> for j in range(n):

for i in range(n):

print(i, end = " ")

0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4

Great – we're close!!

Still, this doesn’t produce exactly we wanted. Setting end to a space forces all of the numbers onto a single line, which is not what we want.

We would like to have a new line printed between each line printed by the inner for loop. We can do that as follows:

> for j in range(n):

for i in range(n):

print(i, end = " ")

print()

0 1 2 3 4

Some problems can only be solved using multiple loops together.

Practice problem: Write a function multmult that takes two lists of integers as parameters and returns a list containing the products of integers from the first list with the integers from the second list.

For example, it would be used as follows:

> multmult([3, 4, 1], [2, 0])

[6, 0, 8, 0, 2, 0]

Important: This is a classic example of a situation where you should not, not, not start by typing out code! Rather, you should spend a few minutes with a pen and paper and think (yuck) through the problem and come up with a strategy for solving it.

As always, solutions are in this week's examples file.

Another Example:

Imagine a function lines() that takes one positive integer n as a parameter and prints on the screen the following n lines:

0 1

0 1 2

0 1 2 3

…

0 1 2 3 … n-1

For example:

lines(5)

0 1

0 1 2

0 1 2 3

0 1 2 3 4

This is a situation where a nested loop can do the job nicely with minimal code.

def lines(n):

for j in range(n+1):

for i in range(j):

print (i, end = " ")

print()

More on lists: Multi-dimensional lists

A 2-dimensional table has many real-world representations. For

example, think of just about any table or spreadsheet you may have seen:

-Sales of 5 people in your department over 12 months

-Statistics of 10 basketball players (shooting percentage, free throw percentage, fouls committed, etc)

Here is an example of a table showing the scores (out of 40) on 5 homework assignments for a group of students:

Tables such as these are frequently analyzed in data science and other fields using programming code. In Python, 2-dimensional tables can be easily stored as a list of lists.

Prior to our current discussion, the lists we’ve seen have been one-dimensional. We might think of them as a one-dimensional table.

For example, a list such as:

lst1 = [3, 5, 7] can be viewed as the table:

3 / 5 / 7

A two-dimensional table like the following:

4 / 7 / 2
5 / 1 / 9
8 / 3 / 6

can be viewed as a list of three rows, with each row being a one-dimensional list:

4 / 7 / 2
5 / 1 / 9
8 / 3 / 6

Can you see what I'm getting at? We can use Python to represent such a two-dimensional table by making it a list of lists:

table = [ [4, 7, 2] , [5, 1, 9] , [8, 3, 6] ]

Note: In the above line of code I added several (unnecessary) extra spaces to help visualize what is happening. In the real world, we shouldn't space things out this much.

In this example:

table[0] is holding the list [4,7,2]
table[1] is holding the list [5,1,9]
table[2] is holding the list [8,3,6]

To see how to use nested loops to process a two-dimensional list, we need to think about how to access elements in a two-dimensional list.

#Recall our original list:

> lst = [[4, 7, 2], [5, 1, 9], [8, 3, 6]]

If you are working with a list of lists, then accessing that object with a singleindex gives you an entiresub-list:

> lst[0]

[4, 7, 2]

> lst[1]

[5, 1, 9]

Accessing a list of lists with two indices gives you a specific element in the specified row.

Here are some examples using the object 'table' we created above:

> table[0][0] #first row, first element

> table[1][2] #second row, third element

> table[2][1] #third row, second element

To summarize:

To access an element in row i and column j, we would write: table[i][j]

Example: If we wanted to get the average of the first number in each row we could do something like:

avg = (table[0][0]+table[1][0]+table[2][0]) / 3.0

But imagine if instead of 3 rows, we had 30, or perhaps 3,000,000? Clearly writing out this code would be impossible. However, now that we are more familiar with nested loops, we can use this technique to analyze this data.

Aside: For those of you who have been hearing all the buzz about the field of "data science", this is one of the bread-and-butter techniques used by data scientists to analysze tabular data – which can easily and commonly reach into millions and millions of records.

Practice problem: Write a function add2D that takes a 2-dimensional list of integers, adds 1 to each entry in the list, and returns the resulting modified list.

For example, it would be used as follows:

> lst = [[4, 7, 2], [5, 1, 9], [8, 3, 6]]

add2D(lst)

print(lst)

[[5, 8, 3], [6, 2, 10], [9, 4, 7]]

Hint: The indices for the row will be 0 to len(lst), and the indices for the columns will be 0 to len(lst[i]).

Dictionaries

Suppose you want to store employee records with information such as their social security number and name. You'd like to store these in a collection and access the employee records (e.g. name, birthdate, salary, etc, etc) using the employees’ social security numbers.

The list collection is not really useful for this problem because list items must be accessed using an index, i.e. by their position in the collection.

But what if we want a way to access an item in a collection by, say, some kind of key? For example, say you wanted to access a bunch of information about a student based on their Student ID number? Dictionaries are a type of collection will allow us to do that.

For this problem we need a collection type that allows access to stored items, but instead of using a position as our index, we use a "key". With dictionaries, we must decide what the key will be. The one requirement is that the key we choose must be unique for every record. For an employee, social security number would be a good choice since no two people can have the same value.