A couple other nifty utilties with for loops:
tuple unpacking:
remember this?
x, y = 3, 4
You can do that in a for loop, also:
In [3]: from __future__ import print_function
In [4]: l = [(1, 2), (3, 4), (5, 6)]
In [5]: for i, j in l:
print("i:%i, j:%i" % (i, j))
i:1, j:2
i:3, j:4
i:5, j:6
zip:
In [10]: l1 = [1, 2, 3]
In [11]: l2 = [3, 4, 5]
In [12]: for i, j in zip(l1, l2):
....: print("i:%i, j:%i" % (i, j))
....:
i:1, j:3
i:2, j:4
i:3, j:5
Building up a long string.
The obvious thing to do is something like:
msg = u""
for piece in list_of_stuff:
msg += piece
But: strings are immutable – python needs to create a new string each time you add a piece – not efficient:
msg = []
for piece in list_of_stuff:
msg.append(piece)
u" ".join(msg)
appending to lists is efficient – and so is the join() method of strings.
What is assert for?
Testing – NOT for issues expected to happen operationally:
assert m >= 0
in operational code should be:
if m < 0:
raise ValueError
I’ll cover Exceptions later this class...
(Asserts get ignored if optimization is turned on!)
Someone volunteer to have their homeworks (Task 6 and 7) debugged in-class.
Free programming help!
Open up your task 7 files in your text editor.
N-grams are a way to study word associations
https://books.google.com/ngrams
Coding Kata 14 - Dave Thomas
http://codekata.com/kata/kata14-tom-swift-under-the-milkwood/
and in this doc:
http://codefellows.github.io/sea-c34-python/supplements/kata_fourteen.html
Use “The Travels of Marco Polo the Venetian” as input:
http://codefellows.github.io/sea-c34-python/_downloads/marco-polo.txt
Python calls it a dict
Other languages call it:
>>> {'key1': 3, 'key2': 5}
{'key1': 3, 'key2': 5}
>>> dict([('key1', 3),('key2', 5)])
{'key1': 3, 'key2': 5}
>>> dict(key1=3, key2=5)
{'key1': 3, 'key2': 5}
>>> d = {}
>>> d['key1'] = 3
>>> d['key2'] = 5
>>> d
{'key1': 3, 'key2': 5}
>>> d = {'name': 'Brian', 'score': 42}
>>> d['score']
42
>>> d = {1: 'one', 0: 'zero'}
>>> d[0]
'zero'
>>> d['non-existing key']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'non-existing key'
Keys can be any immutable object:
In [325]: d[3] = 'string'
In [326]: d[3.14] = 'pi'
In [327]: d['pi'] = 3.14
In [328]: d[ (1,2,3) ] = 'a tuple key'
In [329]: d[ [1,2,3] ] = 'a list key'
TypeError: unhashable type: 'list'
Actually – any “hashable” type.
Hash functions convert arbitrarily large data to a small proxy (usually int)
Always return the same proxy for the same input
MD5, SHA, etc
Dictionaries hash the key to an integer proxy and use it to find the key and value.
Key lookup is efficient because the hash function leads directly to a bucket with very few keys (often just one)
What would happen if the proxy changed after storing a key?
Hashability requires immutability
Key lookup is very efficient
Same average time regardless of size
Note: Python name look-ups are implemented with dict – it’s highly optimized
Key to value:
Value to key:
If you need to check dict values often, create another dict or set
(up to you to keep them in sync)
Dictionaries have no defined order
In [352]: d = {'one':1, 'two':2, 'three':3}
In [353]: d
Out[353]: {'one': 1, 'three': 3, 'two': 2}
In [354]: d.keys()
Out[354]: ['three', 'two', 'one']
You will be fooled by what you see into thinking that the order of pairs can be relied on.
It cannot.
for iterates over the keys
In [15]: d = {'name': 'Brian', 'score': 42}
In [16]: for x in d:
....: print(x)
....:
score
name
(note the different order...)
In [20]: d = {'name': 'Brian', 'score': 42}
In [21]: d.keys()
Out[21]: ['score', 'name']
In [22]: d.values()
Out[22]: [42, 'Brian']
In [23]: d.items()
Out[23]: [('score', 42), ('name', 'Brian')]
Iterating on everything
In [26]: d = {'name': 'Brian', 'score': 42}
In [27]: for k, v in d.items():
....: print("%s: %s" % (k,v))
....:
score: 42
name: Brian
See them all here:
https://docs.python.org/2/library/stdtypes.html#mapping-types-dict
Is it in there?
In [5]: d
Out[5]: {'that': 7, 'this': 5}
In [6]: 'that' in d
Out[6]: True
In [7]: 'this' not in d
Out[7]: False
Membership is on the keys.
(like indexing)
In [9]: d.get('this')
Out[9]: 5
But you can specify a default
In [11]: d.get(u'something', u'a default')
Out[11]: u'a default'
Never raises an Exception (default default is None)
In [13]: for item in d.iteritems():
....: print item
....:
('this', 5)
('that', 7)
In [15]: for key in d.iterkeys():
....: print key
....:
this
that
In [16]: for val in d.itervalues():
....: print val
....:
5
7
the iter* methods don’t actually create the lists.
gets the value at a given key while removing it
Pop just a key
In [19]: d.pop('this')
Out[19]: 5
In [20]: d
Out[20]: {'that': 7}
pop out an arbitrary key, value pair
In [23]: d.popitem()
Out[23]: ('that', 7)
In [24]: d
Out[24]: {}
setdefault(key[, default])
gets the value if it’s there, sets it if it’s not
In [26]: d = {}
In [27]: d.setdefault(u'something', u'a value')
Out[27]: u'a value'
In [28]: d
Out[28]: {u'something': u'a value'}
In [29]: d.setdefault(u'something', u'a different value')
Out[29]: u'a value'
In [30]: d
Out[30]: {u'something': u'a value'}
dict View objects:
Like keys(), values(), items(), but maintain a link to the original dict
In [47]: d
Out[47]: {u'something': u'a value'}
In [48]: item_view = d.viewitems()
In [49]: item_view
Out[49]: dict_items([(u'something', u'a value')])
In [50]: d['something else'] = u'another value'
In [51]: item_view
Out[51]: dict_items([('something else', u'another value'), (u'something', u'a value')])
A set is an unordered collection of distinct values
Essentially a dict with only keys
Set Constructors
>>> set()
set([])
>>> set([1, 2, 3])
set([1, 2, 3])
>>> {1, 2, 3}
set([1, 2, 3])
>>> s = set()
>>> s.update([1, 2, 3])
>>> s
set([1, 2, 3])
Set members must be hashable
Like dictionary keys – and for same reason (efficient lookup)
No indexing (unordered)
>>> s[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'set' object does not support indexing
>>> s = set([1])
>>> s.pop() # an arbitrary member
1
>>> s.pop()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'pop from an empty set'
>>> s = set([1, 2, 3])
>>> s.remove(2)
>>> s.remove(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 2
All the “set” operations from math class...
s.isdisjoint(other)
s.issubset(other)
s.union(other, ...)
s.intersection(other, ...)
s.difference(other, ...)
s.symmetric_difference( other, ...)
Another kind of set: frozenset
immutable – for use as a key in a dict (or another set...)
>>> fs = frozenset((3,8,5))
>>> fs.add(9)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
Another Branching structure:
try:
do_something()
f = open('missing.txt')
process(f) # never called if file missing
except IOError:
print "couldn't open missing.txt"
Never Do this:
try:
do_something()
f = open('missing.txt')
process(f) # never called if file missing
except:
print "couldn't open missing.txt"
Use Exceptions, rather than your own tests:
Don’t do this:
do_something()
if os.path.exists('missing.txt'):
f = open('missing.txt')
process(f) # never called if file missing
It will almost always work – but the almost will drive you crazy
Example from homework
if num_in.isdigit():
num_in = int(num_in)
but – int(num_in) will only work if the string can be converted to an integer.
So you can do
try:
num_in = int(num_in)
except ValueError:
print(u"Input must be an integer, try again.")
Or let the Exception be raised....
"it's Easier to Ask Forgiveness than Permission"
-- Grace Hopper
http://www.youtube.com/watch?v=AZDWveIdqjY
(Pycon talk by Alex Martelli)
For simple scripts, let exceptions happen.
Only handle the exception if the code can and will do something about it.
(much better debugging info when an error does occur)
try:
do_something()
f = open('missing.txt')
process(f) # never called if file missing
except IOError:
print(u"couldn't open missing.txt")
finally:
do_some_clean-up
The finally: clause will always run
try:
do_something()
f = open('missing.txt')
except IOError:
print(u"couldn't open missing.txt")
else:
process(f) # only called if there was no exception
try:
do_something()
f = open('missing.txt')
except IOError as the_error:
print the_error
the_error.extra_info = "some more information"
raise
Particularly useful if you catch more than one exception:
except (IOError, BufferError, OSError) as the_error:
do_something_with (the_error)
def divide(a,b):
if b == 0:
raise ZeroDivisionError("b can not be zero")
else:
return a / b
when you call it:
In [515]: divide (12,0)
ZeroDivisionError: b can not be zero
You can create your own custom exceptions, but...
exp = [name for name in dir(__builtin__) if "Error" in name]
len(exp)
32
For the most part, you can/should use a built in one
Choose the best match you can for the built in Exception you raise.
Example (for last week’s ackerman homework):
if (not isinstance(m, int)) or (not isinstance(n, int)):
raise ValueError
Is the value of the input the problem here?
Nope: the type is the problem:
if (not isinstance(m, int)) or (not isinstance(n, int)):
raise TypeError
but should you be checking type anyway? (EAFP)
Text Files
import io
f = io.open('secrets.txt', encoding='utf-8')
secret_data = f.read()
f.close()
secret_data is a (unicode) string
encoding defaults to sys.getdefaultencoding() – often NOT what you want.
(There is also the regular open() built in, but it won’t handle Unicode for you...)
Binary Files
f = io.open('secrets.bin', 'rb')
secret_data = f.read()
f.close()
secret_data is a byte string
(with arbitrary bytes in it – well, not arbitrary – whatever is in the file.)
(See the struct module to unpack formatted binary data)
File Opening Modes
f = io.open('secrets.txt', [mode])
'r', 'w', 'a'
'rb', 'wb', 'ab'
r+, w+, a+
r+b, w+b, a+b
U
U+
These follow the Unix conventions, and aren’t all that well documented on the Python docs. But these BSD docs make it pretty clear:
http://www.manpagez.com/man/3/fopen/
Gotcha – ‘w’ modes always clear the file
Text is default
Gotcha:
io.open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True)
(https://docs.python.org/2/library/io.html?highlight=io.open#io.open)
Reading part of a file
header_size = 4096
f = open('secrets.txt')
secret_header = f.read(header_size)
secret_rest = f.read()
f.close()
Common Idioms
for line in io.open('secrets.txt'):
print line
(the file object is an iterator!)
f = io.open('secrets.txt')
while True:
line = f.readline()
if not line:
break
do_something_with_line()
outfile = io.open('output.txt', 'w')
for i in range(10):
outfile.write("this is line: %i\n"%i)
Commonly Used Methods
f.read() f.readline() f.readlines()
f.write(str) f.writelines(seq)
f.seek(offset) f.tell()
f.flush()
f.close()
Many classes implement the file interface:
https://docs.python.org/2/library/stdtypes.html#file-objects
In [417]: import StringIO
In [420]: f = StringIO.StringIO()
In [421]: f.write(u"somestuff")
In [422]: f.seek(0)
In [423]: f.read()
Out[423]: 'somestuff'
(handy for testing file handling code...)
Paths are generally handled with simple strings (or Unicode strings)
Relative paths:
u'secret.txt'
u'./secret.txt'
Absolute paths:
u'/home/chris/secret.txt'
Either work with open() , etc.
(working directory only makes sense with command-line programs...)
os.getcwd() -- os.getcwdu() (u for Unicode)
chdir(path)
os.path.abspath()
os.path.relpath()
os.path.split()
os.path.splitext()
os.path.basename()
os.path.dirname()
os.path.join()
(all platform independent)
os.listdir()
os.mkdir()
os.walk()
(higher level stuff in shutil module)
pathlib is a new package for handling paths in an OO way:
http://pathlib.readthedocs.org/en/pep428/
It is now part of the Python3 standard library, and has been back-ported for use with Python2:
$ pip install pathlib
All the stuff in os.path and more:
In [64]: import pathlib
In [65]: pth = pathlib.Path('./')
In [66]: pth.is_dir()
Out[66]: True
In [67]: pth.absolute()
Out[67]: PosixPath('/Users/Chris/PythonStuff/CodeFellowsClass/sea-f2-python-sept14/Examples/Session04')
In [68]: for f in pth.iterdir():
print f
junk2.txt
junkfile.txt
...
In your student folder, create a subdirectory called session04. Create a new branch called task9 and switch to it (git checkout task9).
Within the session04 subdirectory, create a new file called dict_lab.py.
Add the file to your clone of the repository and commit changes frequently while working on the following tasks. When you are done, push your changes to GitHub and issue a pull request.
Improving raw_input : - Create a new file: safe_input.py – add it to your repo, and submit a pull
request. Make sure to make frequent commits with good commit messages.
The raw_input() function can generate two exceptions: - EOFError or end-of-file (EOF) - KeyboardInterrupt or canceled input. - Create a wrapper function, perhaps safe_input() that returns ‘None’ rather
than raising these exceptions.
Note: - ^C causes a KeyboardInterrupt Error - ^D (^Z on Windows) causes an End Of File Error. - ^ is the Control character
The next step should be done in your mailroom.py file: - Update your mailroom.py program to use exceptions (and BAFP) to handle
malformed numeric input (and other malformed input)
Read through the Session 05 slides.
http://codefellows.github.io/sea-c34-python/session05.html
There are three sections. For each one, come up with three questions each.
Write some Python code to help you answer them, one function per question.
For each function, write a good docstring describing what question you are trying to answer.
Put the functions in four separate modules (files) called arguments.py, comprehensions.py, functional.py in the session05 subdirectory of your student directory.
That is, you should have nine questions, and nine functions, total, spread out across three files.
Use everything you’ve learned so far as needed (including lists, tuples, slicing, iteration, functions, booleans, printing, modules, assertions, dictionaries, sets, exceptions, file reading/writing, and paths).
Create a branch in your local repo called task12 and switch to it (git checkout task12).
Add your files to that branch, commit and push, then submit a pull request to the main class repo.
Finally, submit your assignment in Canvas by giving the URL of the pull request.