Session Three: Sequences, Iteration and String Formatting¶

Review/Questions¶

Review of Previous Session¶

Functions
Booleans
Modules

Corrections¶

line breaks don’t end a block
squirrel party example
unicode hello world
stepping through code
linter

Course Logistics¶

Attendance, grades, and homework due dates on Canvas
Course Notes
Use of Slack

Survey Feedback¶

Homework Review¶

Any questions that are nagging?

Sequences¶

Ordered collections of objects

What is a Sequence?¶

Remember Duck Typing? A sequence can be considered as anything that supports at least these operations:

Indexing
Slicing
Membership
Concatenation
Length
Iteration

Sequence Types¶

There are seven builtin types in Python that are sequences:

strings
Unicode strings
lists
tuples
bytearrays
buffers
array.arrays
xrange objects (almost)

For this class, you won’t see much beyond the string types, lists, tuples – the rest are pretty special purpose.

But what we say today applies to all sequences (with minor caveats)

Indexing¶

Items in a sequence may be looked up by index using the subscription operator: []

Indexing in Python always starts at zero.

In [98]: s = u"this is a string"
In [99]: s[0]
Out[99]: u't'
In [100]: s[5]
Out[100]: u'i'

You can use negative indexes to count from the end:

In [105]: s = u"this is a string"
In [106]: s[-1]
Out[106]: u'g'
In [107]: s[-6]
Out[107]: u's'

Indexing beyond the end of a sequence causes an IndexError:

In [4]: s = [0, 1, 2, 3]
In [5]: s[4]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-42efaba84d8b> in <module>()
----> 1 s[4]

IndexError: list index out of range

Slicing¶

Slicing a sequence creates a new sequence with a range of objects from the original sequence.

It also uses the subscription operator ([]), but with a twist.

sequence[start:finish] returns all sequence[i] for which start <= i < finish:

In [121]: s = u"a bunch of words"
In [122]: s[2]
Out[122]: u'b'
In [123]: s[6]
Out[123]: u'h'
In [124]: s[2:6]
Out[124]: u'bunc'
In [125]: s[2:7]
Out[125]: u'bunch'

Think of the indexes as pointing to the spaces between the items:

  a       b   u   n   c   h       o   f
|   |   |   |   |   |   |   |   |   |
0   1   2   3   4   5   6   7   8   9

You do not have to provide both start and finish:

In [6]: s = u"a bunch of words"
In [7]: s[:5]
Out[7]: u'a bun'
In [8]: s[5:]
Out[8]: u'ch of words'

Either 0 or len(s) will be assumed, respectively.

You can combine this with the negative index to get the end of a sequence:

In [4]: s = u'this_could_be_a_filename.txt'
In [5]: s[:-4]
Out[5]: u'this_could_be_a_filename'
In [6]: s[-4:]
Out[6]: u'.txt'

Why start from zero?¶

Python indexing feels ‘weird’ to some folks – particularly those that don’t come with a background in the C family of languages.

Why is the “first” item indexed with zero?

Why is the last item in the slice not included?

Because these lead to some nifty properties:

len(seq[a:b]) == b-a

seq[:b] + seq[b:] == seq

len(seq[:b]) == b

len(seq[-b:]) == b

There are very many fewer “off by one” errors as a result.

Slicing takes a third argument, step which controls which items are returned:

In [289]: string = u"a fairly long string"
In [290]: string[0:15]
Out[290]: u'a fairly long s'
In [291]: string[0:15:2]
Out[291]: u'afil ogs'
In [292]: string[0:15:3]
Out[292]: u'aallg'
In [293]: string[::-1]
Out[293]: u'gnirts gnol ylriaf a'

Though they share an operator, slicing and indexing have a few important differences:

Indexing will always return one object, slicing will return a sequence of objects.

Indexing past the end of a sequence will raise an error, slicing will not:

In [129]: s = "a bunch of words"
In [130]: s[17]
----> 1 s[17]
IndexError: string index out of range
In [131]: s[10:20]
Out[131]: ' words'
In [132]: s[20:30]
Out[132]: "

(demo)

Membership¶

All sequences support the in and not in membership operators:

In [15]: s = [1, 2, 3, 4, 5, 6]
In [16]: 5 in s
Out[16]: True
In [17]: 42 in s
Out[17]: False
In [18]: 42 not in s
Out[18]: True

For strings, the membership operations are like substring operations in other languages:

In [20]: s = u"This is a long string"
In [21]: u"long" in s
Out[21]: True

This does not work for sub-sequences of other types (can you think of why?):

In [22]: s = [1, 2, 3, 4]
In [23]: [2, 3] in s
Out[23]: False

Concatenation¶

Using + or * on sequences will concatenate them:

In [25]: s1 = u"left"
In [26]: s2 = u"right"
In [27]: s1 + s2
Out[27]: u'leftright'
In [28]: (s1 + s2) * 3
Out[28]: u'leftrightleftrightleftright'

You can apply this concatenation to slices as well, leading to some nicely concise code:

from CodingBat: Warmup-1 – front3

def front3(str):
  if len(str) < 3:
    return str+str+str
  else:
    return str[:3]+str[:3]+str[:3]

This non-pythonic solution can also be expressed like so:

def front3(str):
    return str[:3] * 3

Length¶

All sequences have a length. You can get it with the len builtin:

In [36]: s = u"how long is this, anyway?"
In [37]: len(s)
Out[37]: 25

Remember, Python sequences are zero-indexed, so the last index in a sequence is len(s) - 1:

In [38]: count = len(s)
In [39]: s[count]
------------------------------------------------------------
IndexError                Traceback (most recent call last)
<ipython-input-39-5a33b9d3e525> in <module>()
----> 1 s[count]
IndexError: string index out of range

Even better: use s[-1]

Miscellaneous¶

There are a more operations supported by all sequences

All sequences also support the min and max builtins:

In [42]: all_letters = u"thequickbrownfoxjumpedoverthelazydog"
In [43]: min(all_letters)
Out[43]: u'a'
In [44]: max(all_letters)
Out[44]: u'z'

Why are those the answers you get? (hint: ord(u'a'))

All sequences also support the index method, which returns the index of the first occurence of an item in the sequence:

In [46]: all_letters.index(u'd')
Out[46]: 21

This causes a ValueError if the item is not in the sequence:

In [47]: all_letters.index(u'A')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-47-2db728a46f78> in <module>()
----> 1 all_letters.index(u'A')

ValueError: substring not found

A sequence can also be queried for the number of times a particular item appears:

In [52]: all_letters.count(u'o')
Out[52]: 4
In [53]: all_letters.count(u'the')
Out[53]: 2

This does not raise an error if the item you seek is not present:

In [54]: all_letters.count(u'A')
Out[54]: 0

Iteration¶

Lists, Tuples...¶

The other sequence types.

Lists¶

Lists can be constructed using list Literals ([]):

In [1]: []
Out[1]: []
In [2]: [1,2,3]
Out[2]: [1, 2, 3]
In [3]: [1, 'a', 7.34]
Out[3]: [1, 'a', 7.34]

Or by using the list type object as a constructor:

In [6]: list()
Out[6]: []
In [7]: list(range(4))
Out[7]: [0, 1, 2, 3]
In [8]: list('abc')
Out[8]: ['a', 'b', 'c']

The elements contained in a list need not be of a single type.

Lists are heterogenous, ordered collections.

Each element in a list is a value, and can be in multiple lists and have multiple names (or no name)

In [9]: name = u'Brian'
In [10]: a = [1, 2, name]
In [11]: b = [3, 4, name]
In [12]: a[2]
Out[12]: u'Brian'
In [13]: b[2]
Out[13]: u'Brian'
In [14]: a[2] is b[2]
Out[14]: True

Tuples¶

Tuples can be constructed using tuple literals (()):

In [15]: ()
Out[15]: ()
In [16]: (1, 2)
Out[16]: (1, 2)
In [17]: (1, 'a', 7.65)
Out[17]: (1, 'a', 7.65)
In [18]: (1,)
Out[18]: (1,)

Tuples don’t NEED parentheses...

In [161]: t = (1,2,3)
In [162]: t
Out[162]: (1, 2, 3)
In [163]: t = 1,2,3
In [164]: t
Out[164]: (1, 2, 3)
In [165]: type(t)
Out[165]: tuple

But they do need commas...!

In [156]: t = ( 3 )
In [157]: type(t)
Out[157]: int
In [158]: t = (3,)
In [160]: type(t)
Out[160]: tuple

You can also use the tuple type object to convert any sequence into a tuple:

In [20]: tuple()
Out[20]: ()
In [21]: tuple(range(4))
Out[21]: (0, 1, 2, 3)
In [22]: tuple('garbanzo')
Out[22]: ('g', 'a', 'r', 'b', 'a', 'n', 'z', 'o')

The elements contained in a tuple need not be of a single type.

Tuples are heterogenous, ordered collections.

Each element in a tuple is a value, and can be in multiple tuples and have multiple names (or no name)

In [23]: name = u'Brian'
In [24]: other = name
In [25]: a = (1, 2, name)
In [26]: b = (3, 4, other)
In [27]: for i in range(3):
   ....:     print(a[i] is b[i], end=' ')
   ....:
False False True

So Why Have Both?

Mutability¶

image from flickr by illuminaut, (CC by-nc-sa)

Mutability in Python¶

All objects in Python fall into one of two camps:

Mutable
Immutable

Objects which are mutable may be changed in place.

Objects which are immutable may not be changed.

Immutable	Mutable
Unicode	List
String
Integer
Float
Tuple

Try this out:

In [28]: food = [u'spam', u'eggs', u'ham']
In [29]: food
Out[29]: [u'spam', u'eggs', u'ham']
In [30]: food[1] = u'raspberries'
In [31]: food
Out[31]: [u'spam', u'raspberries', u'ham']

And repeat the exercise with a Tuple:

In [32]: food = (u'spam', u'eggs', u'ham')
In [33]: food
Out[33]: (u'spam', u'eggs', u'ham')
In [34]: food[1] = u'raspberries'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-0c3401794933> in <module>()
----> 1 food[1] = u'raspberries'

TypeError: 'tuple' object does not support item assignment

This property means you need to be aware of what you are doing with your lists:

In [36]: original = [1, 2, 3]
In [37]: altered = original
In [38]: for i in range(len(original)):
   ....:     if True:
   ....:         altered[i] += 1
   ....:

Perhaps we want to check to see if altered has been updated, as a flag for whatever condition caused it to be updated.

What is the result of this code?

Our altered list has been updated:

In [39]: altered
Out[39]: [2, 3, 4]

But so has the original list:

In [40]: original
Out[40]: [2, 3, 4]

Why?

Easy container setup, or deadly trap?

(note: you can nest lists to make a 2D-ish array)

In [13]: bins = [ [] ] * 5

In [14]: bins
Out[14]: [[], [], [], [], []]

In [15]: words = [u'one', u'three', u'rough', u'sad', u'goof']

In [16]: for word in words:
   ....:     bins[len(word)-1].append(word)
   ....:

So, what is going to be in bins now?

In [65]: bins
Out[65]:
[[u'one', u'three', u'rough', u'sad', u'goof'],
 [u'one', u'three', u'rough', u'sad', u'goof'],
 [u'one', u'three', u'rough', u'sad', u'goof'],
 [u'one', u'three', u'rough', u'sad', u'goof'],
 [u'one', u'three', u'rough', u'sad', u'goof']]

We multiplied a sequence containing a single mutable object.

We got a list containing five pointers to a single mutable object.

Watch out especially for passing mutable objects as default values for function parameters:

In [71]: def accumulator(count, list=[]):
   ....:     for i in range(count):
   ....:         list.append(i)
   ....:     return list
   ....:
In [72]: accumulator(5)
Out[72]: [0, 1, 2, 3, 4]
In [73]: accumulator(7)
Out[73]: [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6]

Mutable Sequence Methods¶

In addition to all the methods supported by sequences we’ve seen above, mutable sequences (the List), have a number of other methods that are used to change the list.

You can find all these in the Standard Library Documentation:

http://docs.python.org/2/library/stdtypes.html#mutable-sequence-types

Assignment¶

You’ve already seen changing a single element of a list by assignment.

Pretty much the same as “arrays” in most languages:

In [100]: list = [1, 2, 3]
In [101]: list[2] = 10
In [102]: list
Out[102]: [1, 2, 10]

Growing the List¶

.append(), .insert(), .extend()

In [74]: food = [u'spam', u'eggs', u'ham']
In [75]: food.append(u'sushi')
In [76]: food
Out[76]: [u'spam', u'eggs', u'ham', u'sushi']
In [77]: food.insert(0, u'beans')
In [78]: food
Out[78]: [u'beans', u'spam', u'eggs', u'ham', u'sushi']
In [79]: food.extend([u'bread', u'water'])
In [80]: food
Out[80]: [u'beans', u'spam', u'eggs', u'ham', u'sushi', u'bread', u'water']

You can pass any sequence to .extend():

In [85]: food
Out[85]: [u'beans', u'spam', u'eggs', u'ham', u'sushi', u'bread', u'water']
In [86]: food.extend(u'spaghetti')
In [87]: food
Out[87]:
[u'beans', u'spam', u'eggs', u'ham', u'sushi', u'bread', u'water',
 u's', u'p', u'a', u'g', u'h', u'e', u't', u't', u'i']

Shrinking the List¶

.pop(), .remove()

In [203]: food = ['spam', 'eggs', 'ham', 'toast']
In [204]: food.pop()
Out[204]: 'toast'
In [205]: food.pop(0)
Out[205]: 'spam'
In [206]: food
Out[206]: ['eggs', 'ham']
In [207]: food.remove('ham')
In [208]: food
Out[208]: ['eggs']

You can also delete slices of a list with the del keyword:

In [92]: nums = range(10)
In [93]: nums
Out[93]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [94]: del nums[1:6:2]
In [95]: nums
Out[95]: [0, 2, 4, 6, 7, 8, 9]
In [96]: del nums[-3:]
In [97]: nums
Out[97]: [0, 2, 4, 6]

Copying Lists¶

You can make copies of part of a list using slicing:

In [227]: food = ['spam', 'eggs', 'ham', 'sushi']
In [228]: some_food = food[1:3]
In [229]: some_food[1] = 'bacon'
In [230]: food
Out[230]: ['spam', 'eggs', 'ham', 'sushi']
In [231]: some_food
Out[231]: ['eggs', 'bacon']

If you provide no arguments to the slice, it makes a copy of the entire list:

In [232]: food
Out[232]: ['spam', 'eggs', 'ham', 'sushi']
In [233]: food2 = food[:]
In [234]: food is food2
Out[234]: False

The copy of a list made this way is a shallow copy.

The list is itself a new object, but the objects it contains are not.

Mutable objects in the list can be mutated in both copies:

In [249]: food = ['spam', ['eggs', 'ham']]
In [251]: food_copy = food[:]
In [252]: food[1].pop()
Out[252]: 'ham'
In [253]: food
Out[253]: ['spam', ['eggs']]
In [256]: food.pop(0)
Out[256]: 'spam'
In [257]: food
Out[257]: [['eggs']]
In [258]: food_copy
Out[258]: ['spam', ['eggs']]

Consider this common pattern:

for x in somelist:
    if should_be_removed(x):
        somelist.remove(x)

This looks benign enough, but changing a list while you are iterating over it can be the cause of some pernicious bugs.

For example:

In [121]: list = range(10)
In [122]: list
Out[122]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [123]: for x in list:
   .....:     list.remove(x)
   .....:

What is the expected outcome of this code?

In [124]: list
Out[124]: [1, 3, 5, 7, 9]

Was that what you expected?

Iterate over a copy, and mutate the original:

In [126]: list = range(10)
In [127]: for x in list[:]:
   .....:     list.remove(x)
   .....:
In [128]: list
Out[128]: []

Okay, so we’ve done this a bunch already, but let’s state it out loud.

You can iterate over a sequence.

for element in sequence:
    do_something(element)

Again, we’ll touch more on this in a short while, but first a few more words about Lists and Tuples.

Miscellaneous List Methods¶

These methods change a list in place and are not available on immutable sequence types.

.reverse()

In [129]: food = [u'spam', u'eggs', u'ham']
In [130]: food.reverse()
In [131]: food
Out[131]: [u'ham', u'eggs', u'spam']

.sort()

In [132]: food.sort()
In [133]: food
Out[133]: [u'eggs', u'ham', u'spam']

Because these methods mutate the list in place, they have a return value of None

.sort() can take an optional key parameter.

It should be a function that takes one parameter (list items one at a time) and returns something that can be used for sorting:

In [137]: def third_letter(string):
   .....:     return string[2]
   .....:
In [138]: food.sort(key=third_letter)
In [139]: food
Out[139]: [u'spam', u'eggs', u'ham']

List Performance¶

indexing is fast and constant time: O(1)
x in s proportional to n: O(n)
visiting all is proportional to n: O(n)
operating on the end of list is fast and constant time: O(1)
- append(), pop()
operating on the front (or middle) of the list depends on n: O(n)
- pop(0), insert(0, v)
- But, reversing is fast. Also, collections.deque

http://wiki.python.org/moin/TimeComplexity

Choosing Lists or Tuples¶

Here are a few guidelines on when to choose a list or a tuple:

If it needs to mutable: list
If it needs to be immutable: tuple
- (safety when passing to a function)

Otherwise ... taste and convention

Lists are Collections (homogeneous): – contain values of the same type – simplifies iterating, sorting, etc

tuples are mixed types: – Group multiple values into one logical thing – Kind of like simple C structs.

Do the same operation to each element?
- list
Small collection of values which make a single logical item?
- tuple
To document that these values won’t change?
- tuple
Build it iteratively?
- list
Transform, filter, etc?
- list

Iteration¶

Repetition, Repetition, Repetition, Repe...

For Loops¶

We’ve seen simple iteration over a sequence with for ... in:

In [170]: for x in "a string":
   .....:         print(x)
   .....:
a

s
t
r
i
n
g

Contrast this with other languages, where you must build and use an index:

for(var i=0; i<arr.length; i++) {
    var value = arr[i];
    alert(i + ") " + value);

If you need an index, though you can use enumerate:

In [140]: for idx, letter in enumerate(u'Python'):
   .....:     print(idx, letter, end=' ')
   .....:
0 P 1 y 2 t 3 h 4 o 5 n

The range builtin is useful for looping a known number of times:

In [171]: for i in range(5):
   .....:     print(i)
   .....:
0
1
2
3
4

But you don’t really need to do anything at all with i

Session Three: Sequences, Iteration and String Formatting¶

Review/Questions¶

Review of Previous Session¶

Corrections¶

Course Logistics¶

Survey Feedback¶

Homework Review¶

Sequences¶

What is a Sequence?¶

Sequence Types¶

Indexing¶

Slicing¶

Why start from zero?¶

Membership¶

Concatenation¶

Length¶

Miscellaneous¶

Iteration¶

Lists, Tuples...¶

Lists¶

Tuples¶

Mutability¶

Mutability in Python¶

Mutable Sequence Methods¶

Assignment¶

Growing the List¶

Shrinking the List¶

Copying Lists¶

Miscellaneous List Methods¶

List Performance¶

Choosing Lists or Tuples¶

More Documentation¶

Iteration¶

For Loops¶

Table Of Contents

Previous topic

Next topic

Navigation

Session Three: Sequences, Iteration and String Formatting¶

Review/Questions¶

Review of Previous Session¶

Corrections¶

Course Logistics¶

Survey Feedback¶

Homework Review¶

Sequences¶

What is a Sequence?¶

Sequence Types¶

Indexing¶

Slicing¶

Why start from zero?¶

Membership¶

Concatenation¶

Length¶

Miscellaneous¶

Iteration¶

Lists, Tuples...¶

Lists¶

Tuples¶

Mutability¶

Mutability in Python¶

Mutable Sequence Methods¶

Assignment¶

Growing the List¶

Shrinking the List¶

Copying Lists¶

Miscellaneous List Methods¶

List Performance¶

Choosing Lists or Tuples¶

More Documentation¶

Iteration¶

For Loops¶

Table Of Contents

Previous topic

Next topic

Quick search

Navigation