Any questions that are nagging?
Ordered collections of objects
Remember Duck Typing? A sequence can be considered as anything that supports at least these operations:
There are seven builtin types in Python that are sequences:
For this class, you won’t see much beyond the string types, lists, tuples – the rest are pretty special purpose.
But what we say today applies to all sequences (with minor caveats)
Items in a sequence may be looked up by index using the subscription operator: []
Indexing in Python always starts at zero.
In [98]: s = u"this is a string"
In [99]: s[0]
Out[99]: u't'
In [100]: s[5]
Out[100]: u'i'
You can use negative indexes to count from the end:
In [105]: s = u"this is a string"
In [106]: s[-1]
Out[106]: u'g'
In [107]: s[-6]
Out[107]: u's'
Indexing beyond the end of a sequence causes an IndexError:
In [4]: s = [0, 1, 2, 3]
In [5]: s[4]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-5-42efaba84d8b> in <module>()
----> 1 s[4]
IndexError: list index out of range
Slicing a sequence creates a new sequence with a range of objects from the original sequence.
It also uses the subscription operator ([]), but with a twist.
sequence[start:finish] returns all sequence[i] for which start <= i < finish:
In [121]: s = u"a bunch of words"
In [122]: s[2]
Out[122]: u'b'
In [123]: s[6]
Out[123]: u'h'
In [124]: s[2:6]
Out[124]: u'bunc'
In [125]: s[2:7]
Out[125]: u'bunch'
Think of the indexes as pointing to the spaces between the items:
a b u n c h o f
| | | | | | | | | |
0 1 2 3 4 5 6 7 8 9
You do not have to provide both start and finish:
In [6]: s = u"a bunch of words"
In [7]: s[:5]
Out[7]: u'a bun'
In [8]: s[5:]
Out[8]: u'ch of words'
Either 0 or len(s) will be assumed, respectively.
You can combine this with the negative index to get the end of a sequence:
In [4]: s = u'this_could_be_a_filename.txt'
In [5]: s[:-4]
Out[5]: u'this_could_be_a_filename'
In [6]: s[-4:]
Out[6]: u'.txt'
Python indexing feels ‘weird’ to some folks – particularly those that don’t come with a background in the C family of languages.
Why is the “first” item indexed with zero?
Why is the last item in the slice not included?
Because these lead to some nifty properties:
len(seq[a:b]) == b-a
seq[:b] + seq[b:] == seq
len(seq[:b]) == b
len(seq[-b:]) == b
There are very many fewer “off by one” errors as a result.
Slicing takes a third argument, step which controls which items are returned:
In [289]: string = u"a fairly long string"
In [290]: string[0:15]
Out[290]: u'a fairly long s'
In [291]: string[0:15:2]
Out[291]: u'afil ogs'
In [292]: string[0:15:3]
Out[292]: u'aallg'
In [293]: string[::-1]
Out[293]: u'gnirts gnol ylriaf a'
Though they share an operator, slicing and indexing have a few important differences:
Indexing will always return one object, slicing will return a sequence of objects.
Indexing past the end of a sequence will raise an error, slicing will not:
In [129]: s = "a bunch of words"
In [130]: s[17]
----> 1 s[17]
IndexError: string index out of range
In [131]: s[10:20]
Out[131]: ' words'
In [132]: s[20:30]
Out[132]: "
(demo)
All sequences support the in and not in membership operators:
In [15]: s = [1, 2, 3, 4, 5, 6]
In [16]: 5 in s
Out[16]: True
In [17]: 42 in s
Out[17]: False
In [18]: 42 not in s
Out[18]: True
For strings, the membership operations are like substring operations in other languages:
In [20]: s = u"This is a long string"
In [21]: u"long" in s
Out[21]: True
This does not work for sub-sequences of other types (can you think of why?):
In [22]: s = [1, 2, 3, 4]
In [23]: [2, 3] in s
Out[23]: False
Using + or * on sequences will concatenate them:
In [25]: s1 = u"left"
In [26]: s2 = u"right"
In [27]: s1 + s2
Out[27]: u'leftright'
In [28]: (s1 + s2) * 3
Out[28]: u'leftrightleftrightleftright'
You can apply this concatenation to slices as well, leading to some nicely concise code:
from CodingBat: Warmup-1 – front3
def front3(str):
if len(str) < 3:
return str+str+str
else:
return str[:3]+str[:3]+str[:3]
This non-pythonic solution can also be expressed like so:
def front3(str):
return str[:3] * 3
All sequences have a length. You can get it with the len builtin:
In [36]: s = u"how long is this, anyway?"
In [37]: len(s)
Out[37]: 25
Remember, Python sequences are zero-indexed, so the last index in a sequence is len(s) - 1:
In [38]: count = len(s)
In [39]: s[count]
------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-39-5a33b9d3e525> in <module>()
----> 1 s[count]
IndexError: string index out of range
Even better: use s[-1]
There are a more operations supported by all sequences
All sequences also support the min and max builtins:
In [42]: all_letters = u"thequickbrownfoxjumpedoverthelazydog"
In [43]: min(all_letters)
Out[43]: u'a'
In [44]: max(all_letters)
Out[44]: u'z'
Why are those the answers you get? (hint: ord(u'a'))
All sequences also support the index method, which returns the index of the first occurence of an item in the sequence:
In [46]: all_letters.index(u'd')
Out[46]: 21
This causes a ValueError if the item is not in the sequence:
In [47]: all_letters.index(u'A')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-47-2db728a46f78> in <module>()
----> 1 all_letters.index(u'A')
ValueError: substring not found
A sequence can also be queried for the number of times a particular item appears:
In [52]: all_letters.count(u'o')
Out[52]: 4
In [53]: all_letters.count(u'the')
Out[53]: 2
This does not raise an error if the item you seek is not present:
In [54]: all_letters.count(u'A')
Out[54]: 0
More on this in a while.
The other sequence types.
Lists can be constructed using list Literals ([]):
In [1]: []
Out[1]: []
In [2]: [1,2,3]
Out[2]: [1, 2, 3]
In [3]: [1, 'a', 7.34]
Out[3]: [1, 'a', 7.34]
Or by using the list type object as a constructor:
In [6]: list()
Out[6]: []
In [7]: list(range(4))
Out[7]: [0, 1, 2, 3]
In [8]: list('abc')
Out[8]: ['a', 'b', 'c']
The elements contained in a list need not be of a single type.
Lists are heterogenous, ordered collections.
Each element in a list is a value, and can be in multiple lists and have multiple names (or no name)
In [9]: name = u'Brian'
In [10]: a = [1, 2, name]
In [11]: b = [3, 4, name]
In [12]: a[2]
Out[12]: u'Brian'
In [13]: b[2]
Out[13]: u'Brian'
In [14]: a[2] is b[2]
Out[14]: True
Tuples can be constructed using tuple literals (()):
In [15]: ()
Out[15]: ()
In [16]: (1, 2)
Out[16]: (1, 2)
In [17]: (1, 'a', 7.65)
Out[17]: (1, 'a', 7.65)
In [18]: (1,)
Out[18]: (1,)
Tuples don’t NEED parentheses...
In [161]: t = (1,2,3)
In [162]: t
Out[162]: (1, 2, 3)
In [163]: t = 1,2,3
In [164]: t
Out[164]: (1, 2, 3)
In [165]: type(t)
Out[165]: tuple
But they do need commas...!
In [156]: t = ( 3 )
In [157]: type(t)
Out[157]: int
In [158]: t = (3,)
In [160]: type(t)
Out[160]: tuple
You can also use the tuple type object to convert any sequence into a tuple:
In [20]: tuple()
Out[20]: ()
In [21]: tuple(range(4))
Out[21]: (0, 1, 2, 3)
In [22]: tuple('garbanzo')
Out[22]: ('g', 'a', 'r', 'b', 'a', 'n', 'z', 'o')
The elements contained in a tuple need not be of a single type.
Tuples are heterogenous, ordered collections.
Each element in a tuple is a value, and can be in multiple tuples and have multiple names (or no name)
In [23]: name = u'Brian'
In [24]: other = name
In [25]: a = (1, 2, name)
In [26]: b = (3, 4, other)
In [27]: for i in range(3):
....: print(a[i] is b[i], end=' ')
....:
False False True
So Why Have Both?
image from flickr by illuminaut, (CC by-nc-sa)
All objects in Python fall into one of two camps:
Objects which are mutable may be changed in place.
Objects which are immutable may not be changed.
Immutable | Mutable |
---|---|
Unicode | List |
String | |
Integer | |
Float | |
Tuple |
Try this out:
In [28]: food = [u'spam', u'eggs', u'ham']
In [29]: food
Out[29]: [u'spam', u'eggs', u'ham']
In [30]: food[1] = u'raspberries'
In [31]: food
Out[31]: [u'spam', u'raspberries', u'ham']
And repeat the exercise with a Tuple:
In [32]: food = (u'spam', u'eggs', u'ham')
In [33]: food
Out[33]: (u'spam', u'eggs', u'ham')
In [34]: food[1] = u'raspberries'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-34-0c3401794933> in <module>()
----> 1 food[1] = u'raspberries'
TypeError: 'tuple' object does not support item assignment
This property means you need to be aware of what you are doing with your lists:
In [36]: original = [1, 2, 3]
In [37]: altered = original
In [38]: for i in range(len(original)):
....: if True:
....: altered[i] += 1
....:
Perhaps we want to check to see if altered has been updated, as a flag for whatever condition caused it to be updated.
What is the result of this code?
Our altered list has been updated:
In [39]: altered
Out[39]: [2, 3, 4]
But so has the original list:
In [40]: original
Out[40]: [2, 3, 4]
Why?
Easy container setup, or deadly trap?
(note: you can nest lists to make a 2D-ish array)
In [13]: bins = [ [] ] * 5
In [14]: bins
Out[14]: [[], [], [], [], []]
In [15]: words = [u'one', u'three', u'rough', u'sad', u'goof']
In [16]: for word in words:
....: bins[len(word)-1].append(word)
....:
So, what is going to be in bins now?
In [65]: bins
Out[65]:
[[u'one', u'three', u'rough', u'sad', u'goof'],
[u'one', u'three', u'rough', u'sad', u'goof'],
[u'one', u'three', u'rough', u'sad', u'goof'],
[u'one', u'three', u'rough', u'sad', u'goof'],
[u'one', u'three', u'rough', u'sad', u'goof']]
We multiplied a sequence containing a single mutable object.
We got a list containing five pointers to a single mutable object.
Watch out especially for passing mutable objects as default values for function parameters:
In [71]: def accumulator(count, list=[]):
....: for i in range(count):
....: list.append(i)
....: return list
....:
In [72]: accumulator(5)
Out[72]: [0, 1, 2, 3, 4]
In [73]: accumulator(7)
Out[73]: [0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, 6]
In addition to all the methods supported by sequences we’ve seen above, mutable sequences (the List), have a number of other methods that are used to change the list.
You can find all these in the Standard Library Documentation:
http://docs.python.org/2/library/stdtypes.html#mutable-sequence-types
You’ve already seen changing a single element of a list by assignment.
Pretty much the same as “arrays” in most languages:
In [100]: list = [1, 2, 3]
In [101]: list[2] = 10
In [102]: list
Out[102]: [1, 2, 10]
.append(), .insert(), .extend()
In [74]: food = [u'spam', u'eggs', u'ham']
In [75]: food.append(u'sushi')
In [76]: food
Out[76]: [u'spam', u'eggs', u'ham', u'sushi']
In [77]: food.insert(0, u'beans')
In [78]: food
Out[78]: [u'beans', u'spam', u'eggs', u'ham', u'sushi']
In [79]: food.extend([u'bread', u'water'])
In [80]: food
Out[80]: [u'beans', u'spam', u'eggs', u'ham', u'sushi', u'bread', u'water']
You can pass any sequence to .extend():
In [85]: food
Out[85]: [u'beans', u'spam', u'eggs', u'ham', u'sushi', u'bread', u'water']
In [86]: food.extend(u'spaghetti')
In [87]: food
Out[87]:
[u'beans', u'spam', u'eggs', u'ham', u'sushi', u'bread', u'water',
u's', u'p', u'a', u'g', u'h', u'e', u't', u't', u'i']
.pop(), .remove()
In [203]: food = ['spam', 'eggs', 'ham', 'toast']
In [204]: food.pop()
Out[204]: 'toast'
In [205]: food.pop(0)
Out[205]: 'spam'
In [206]: food
Out[206]: ['eggs', 'ham']
In [207]: food.remove('ham')
In [208]: food
Out[208]: ['eggs']
You can also delete slices of a list with the del keyword:
In [92]: nums = range(10)
In [93]: nums
Out[93]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [94]: del nums[1:6:2]
In [95]: nums
Out[95]: [0, 2, 4, 6, 7, 8, 9]
In [96]: del nums[-3:]
In [97]: nums
Out[97]: [0, 2, 4, 6]
You can make copies of part of a list using slicing:
In [227]: food = ['spam', 'eggs', 'ham', 'sushi']
In [228]: some_food = food[1:3]
In [229]: some_food[1] = 'bacon'
In [230]: food
Out[230]: ['spam', 'eggs', 'ham', 'sushi']
In [231]: some_food
Out[231]: ['eggs', 'bacon']
If you provide no arguments to the slice, it makes a copy of the entire list:
In [232]: food
Out[232]: ['spam', 'eggs', 'ham', 'sushi']
In [233]: food2 = food[:]
In [234]: food is food2
Out[234]: False
The copy of a list made this way is a shallow copy.
The list is itself a new object, but the objects it contains are not.
Mutable objects in the list can be mutated in both copies:
In [249]: food = ['spam', ['eggs', 'ham']]
In [251]: food_copy = food[:]
In [252]: food[1].pop()
Out[252]: 'ham'
In [253]: food
Out[253]: ['spam', ['eggs']]
In [256]: food.pop(0)
Out[256]: 'spam'
In [257]: food
Out[257]: [['eggs']]
In [258]: food_copy
Out[258]: ['spam', ['eggs']]
Consider this common pattern:
for x in somelist:
if should_be_removed(x):
somelist.remove(x)
This looks benign enough, but changing a list while you are iterating over it can be the cause of some pernicious bugs.
For example:
In [121]: list = range(10)
In [122]: list
Out[122]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [123]: for x in list:
.....: list.remove(x)
.....:
What is the expected outcome of this code?
In [124]: list
Out[124]: [1, 3, 5, 7, 9]
Was that what you expected?
Iterate over a copy, and mutate the original:
In [126]: list = range(10)
In [127]: for x in list[:]:
.....: list.remove(x)
.....:
In [128]: list
Out[128]: []
Okay, so we’ve done this a bunch already, but let’s state it out loud.
You can iterate over a sequence.
for element in sequence:
do_something(element)
Again, we’ll touch more on this in a short while, but first a few more words about Lists and Tuples.
These methods change a list in place and are not available on immutable sequence types.
.reverse()
In [129]: food = [u'spam', u'eggs', u'ham']
In [130]: food.reverse()
In [131]: food
Out[131]: [u'ham', u'eggs', u'spam']
.sort()
In [132]: food.sort()
In [133]: food
Out[133]: [u'eggs', u'ham', u'spam']
Because these methods mutate the list in place, they have a return value of None
.sort() can take an optional key parameter.
It should be a function that takes one parameter (list items one at a time) and returns something that can be used for sorting:
In [137]: def third_letter(string):
.....: return string[2]
.....:
In [138]: food.sort(key=third_letter)
In [139]: food
Out[139]: [u'spam', u'eggs', u'ham']
Here are a few guidelines on when to choose a list or a tuple:
Otherwise ... taste and convention
Lists are Collections (homogeneous): – contain values of the same type – simplifies iterating, sorting, etc
tuples are mixed types: – Group multiple values into one logical thing – Kind of like simple C structs.
For more information, read the list docs:
http://docs.python.org/2/library/stdtypes.html#mutable-sequence-types
(actually any mutable sequence....)
Repetition, Repetition, Repetition, Repe...
We’ve seen simple iteration over a sequence with for ... in:
In [170]: for x in "a string":
.....: print(x)
.....:
a
s
t
r
i
n
g
Contrast this with other languages, where you must build and use an index:
for(var i=0; i<arr.length; i++) {
var value = arr[i];
alert(i + ") " + value);
If you need an index, though you can use enumerate:
In [140]: for idx, letter in enumerate(u'Python'):
.....: print(idx, letter, end=' ')
.....:
0 P 1 y 2 t 3 h 4 o 5 n
The range builtin is useful for looping a known number of times:
In [171]: for i in range(5):
.....: print(i)
.....:
0
1
2
3
4
But you don’t really need to do anything at all with i