Jump to content

New Iterables in Python 3.0

0
  adfm's Photo
Posted Jun 02 2010 04:43 PM

This excerpt from Mark Lutz' Learning Python introduces you to the New Iterables in Python 3.0. With it you'll learn about the range, map, zip, filter and dictionary view iterators. You'll also learn the differences between multiple versus single iterators.


One of the fundamental changes in Python 3.0 is that it has a stronger emphasis on iterators than 2.X. In addition to the iterators associated with built-in types such as files and dictionaries, the dictionary methods keys, values, and items return iterable objects in Python 3.0, as do the built-in functions range, map, zip, and filter. As shown in the prior section, the last three of these functions both return iterators and process them. All of these tools produce results on demand in Python 3.0, instead of constructing result lists as they do in 2.6.

Although this saves memory space, it can impact your coding styles in some contexts. In various places in this book so far, for example, we’ve had to wrap up various function and method call results in a list(...) call in order to force them to produce all their results at once:

>>> zip('abc', 'xyz') 	# An iterable in Python 3.0 (a list in 2.6)





>>> list(zip('abc', 'xyz')) 	# Force list of results in 3.0 to display

[('a', 'x'), ('b', 'y'), ('c', 'z')]

This isn’t required in 2.6, because functions like zip return lists of results. In 3.0, though, they return iterable objects, producing results on demand. This means extra typing is required to display the results at the interactive prompt (and possibly in some other contexts), but it’s an asset in larger programs—delayed evaluation like this conserves memory and avoids pauses while large result lists are computed. Let’s take a quick look at some of the new 3.0 iterables in action.

The range Iterator

We studied the range built-in’s basic behavior in the prior chapter. In 3.0, it returns an iterator that generates numbers in the range on demand, instead of building the result list in memory. This subsumes the older 2.X xrange (see the upcoming version skew note), and you must use list(range(...)) to force an actual range list if one is needed (e.g., to display results):

C:\\misc> c:\python30\python

>>> R = range(10) 	# range returns an iterator, not a list

>>> R

range(0, 10)



>>> I = iter® 	# Make an iterator from the range

>>> next(I) 	# Advance to next result

0 	# What happens in for loops, comprehensions, etc.

>>> next(I)

1

>>> next(I)

2



>>> list(range(10)) 	# To force a list if required

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Unlike the list returned by this call in 2.X, range objects in 3.0 support only iteration, indexing, and the len function. They do not support any other sequence operations (use list(...) if you require more list tools):

>>> len® 	# range also does len and indexing, but no others

10

>>> R[0]

0

>>> R[-1]

9



>>> next(I) 	# Continue taking from iterator, where left off

3

>>> I.__next__() 	# .next() becomes .__next__(), but use new next()

4

Note

Version skew note: Python 2.X also has a built-in called xrange, which is like range but produces items on demand instead of building a list of results in memory all at once. Since this is exactly what the new iterator-based range does in Python 3.0, xrange is no longer available in 3.0—it has been subsumed. You may still see it in 2.X code, though, especially since range builds result lists there and so is not as efficient in its memory usage. As noted in a sidebar in the prior chapter, the file.xreadlines() method used to minimize memory use in 2.X has been dropped in Python 3.0 for similar reasons, in favor of file iterators.

The map, zip, and filter Iterators

Like range, the map, zip, and filter built-ins also become iterators in 3.0 to conserve space, rather than producing a result list all at once in memory. All three not only process iterables, as in 2.X, but also return iterable results in 3.0. Unlike range, though, they are their own iterators—after you step through their results once, they are exhausted. In other words, you can’t have multiple iterators on their results that maintain different positions in those results.

Here is the case for the map built-in we met in the prior chapter. As with other iterators, you can force a list with list(...) if you really need one, but the default behavior can save substantial space in memory for large result sets:

>>> M = map(abs, (-1, 0, 1)) 	# map returns an iterator, not a list

>>> M



>>> next(M) 	# Use iterator manually: exhausts results

1 	# These do not support len() or indexing

>>> next(M)

0

>>> next(M)

1

>>> next(M)

StopIteration



>>> for x in M: print(x) 	# map iterator is now empty: one pass only

...



>>> M = map(abs, (-1, 0, 1)) 	# Make a new iterator to scan again

>>> for x in M: print(x) 	# Iteration contexts auto call next()

...

1

0

1

>>> list(map(abs, (-1, 0, 1))) 	# Can force a real list if needed

[1, 0, 1]

The zip built-in, introduced in the prior chapter, returns iterators that work the same way:

>>> Z = zip((1, 2, 3), (10, 20, 30))	# zip is the same: a one-pass iterator

>>> Z





>>> list(Z)

[(1, 10), (2, 20), (3, 30)]



>>> for pair in Z: print(pair) 	# Exhausted after one pass

...



>>> Z = zip((1, 2, 3), (10, 20, 30))

>>> for pair in Z: print(pair) 	# Iterator used automatically or manually

...

(1, 10)

(2, 20)

(3, 30)



>>> Z = zip((1, 2, 3), (10, 20, 30))

>>> next(Z)

(1, 10)

>>> next(Z)

(2, 20)

The filter built-in, which we’ll study in the next part of this book, is also analogous. It returns items in an iterable for which a passed-in function returns True (as we’ve learned, in Python True includes nonempty objects):

>>> filter(bool, ['spam', '', 'ni'])



>>> list(filter(bool, ['spam', '', 'ni']))

['spam', 'ni']

Like most of the tools discussed in this section, filter both accepts an iterable to process and returns an iterable to generate results in 3.0.

Multiple Versus Single Iterators

It’s interesting to see how the range object differs from the built-ins described in this section—it supports len and indexing, it is not its own iterator (you make one with iter when iterating manually), and it supports multiple iterators over its result that remember their positions independently:

>>> R = range(3) 	# range allows multiple iterators

>>> next®

TypeError: range object is not an iterator



>>> I1 = iter®

>>> next(I1)

0

>>> next(I1)

1

>>> I2 = iter® 	# Two iterators on one range

>>> next(I2)

0

>>> next(I1) 	# I1 is at a different spot than I2

2

By contrast, zip, map, and filter do not support multiple active iterators on the same result:

>>> Z = zip((1, 2, 3), (10, 11, 12))

>>> I1 = iter(Z)

>>> I2 = iter(Z) 	# Two iterators on one zip

>>> next(I1)

(1, 10)

>>> next(I1)

(2, 11)

>>> next(I2) 	# I2 is at same spot as I1!

(3, 12)



>>> M = map(abs, (-1, 0, 1)) 	 # Ditto for map (and filter)

>>> I1 = iter(M); I2 = iter(M)

>>> print(next(I1), next(I1), next(I1))

1 0 1

>>> next(I2)

StopIteration



>>> R = range(3) 	# But range allows many iterators

>>> I1, I2 = iter®, iter®

>>> [next(I1), next(I1), next(I1)]

[0 1 2]

>>> next(I2)

0

When we code our own iterable objects with classes later in the book (Chapter 29, Operator Overloading), we’ll see that multiple iterators are usually supported by returning new objects for the iter call; a single iterator generally means an object returns itself. In Chapter 20, Iterations and Comprehensions, Part 2, we’ll also find that generator functions and expressions behave like map and zip instead of range in this regard, supporting a single active iteration. In that chapter, we’ll see some subtle implications of one-shot iterators in loops that attempt to scan multiple times.

Dictionary View Iterators

In Python 3.0 the dictionary keys, values, and items methods return iterable view objects that generate result items one at a time, instead of producing result lists all at once in memory. View items maintain the same physical ordering as that of the dictionary and reflect changes made to the underlying dictionary. Now that we know more about iterators, here’s the rest of the story:

>>> D = dict(a=1, b=2, c=3)

>>> D

{'a': 1, 'c': 3, 'b': 2}



>>> K = D.keys() 	# A view object in 3.0, not a list

>>> K





>>> next(K) 	# Views are not iterators themselves

TypeError: dict_keys object is not an iterator



>>> I = iter(K) 	# Views have an iterator,

>>> next(I) 	# which can be used manually

'a' 	# but does not support len(), index

>>> next(I)

'c'



>>> for k in D.keys(): print(k, end=' ') 	# All iteration contexts use auto

...

a c b

As for all iterators, you can always force a 3.0 dictionary view to build a real list by passing it to the list built-in. However, this usually isn’t required except to display results interactively or to apply list operations like indexing:

>>> K = D.keys()

>>> list(K) 	# Can still force a real list if needed

['a', 'c', 'b']



>>> V = D.values() 	# Ditto for values() and items() views

>>> V



>>> list(V)

[1, 3, 2]



>>> list(D.items())

[('a', 1), ('c', 3), ('b', 2)]



>>> for (k, v) in D.items(): print(k, v, end=' ')

...

a 1 c 3 b 2

In addition, 3.0 dictionaries still have iterators themselves, which return successive keys. Thus, it’s not often necessary to call keys directly in this context:

>>> D 	# Dictionaries still have own iterator

{'a': 1, 'c': 3, 'b': 2} 	# Returns next key on each iteration

>>> I = iter(D)

>>> next(I)

'a'

>>> next(I)

'c'



>>> for key in D: print(key, end=' ')	# Still no need to call keys() to iterate

... 	# But keys is an iterator in 3.0 too!

a c b

Finally, remember again that because keys no longer returns a list, the traditional coding pattern for scanning a dictionary by sorted keys won’t work in 3.0. Instead, convert keys views first with a list call, or use the sorted call on either a keys view or the dictionary itself, as follows:

>>> D

{'a': 1, 'c': 3, 'b': 2}

>>> for k in sorted(D.keys())): print(k, D[k], end=' ')

...

a 1 b 2 c 3



>>> D

{'a': 1, 'c': 3, 'b': 2}

>>> for k in sorted(D): print(k, D[k], end=' ')	# Best practice key sorting

...

a 1 b 2 c 3
Learning Python

Learn more about this topic from Learning Python, 4th Edition.

Google and YouTube use Python because it's highly adaptable, easy to maintain, and allows for rapid development. If you want to write high-quality, efficient code that's easily integrated with other languages and tools, this hands-on book will help you be productive with Python 3.0 quickly. Each chapter includes a unique Test Your Knowledge section with practical exercises and quizzes, so you can practice new skills and test your understanding as you go.

See what you'll learn


Tags:
0 Subscribe


1 Reply

 : Oct 18 2012 01:51 PM
What's the deal with the trademark symbols?

>>> I = iterĀ® # Make an iterator from the range

Does it have something to do with Python 3's unicode support? :)