Python's `range`: fancier than I thought
I knew from way back when, in the Python 2 days, that xrange
was better than
range
because it was lazy (and range
was kept around for backwards compatibility). All this
means is that range(1000000)
doesn't allocate a list with a million integers. Python 3 came
around, and range
finally started to behave like I wanted. I could still get a list numbers from
one to ten with list(range(1, 11))
(and force it to allocate the memory all at once). But it
turns out that most of the time, I don't need to!
Ranges aren't just iterators!
Somehow I got it into my head that range
objects were iterators. Iterators are
constructed by calling iter
on an object that supports the iteration protocol or the sequence
protocol. The critical thing about iterators is that you can't just grab a random item by its index,
and they can just be consumed once:
In [1]: ls = [4, 5, 6]
In [2]: it = iter(ls)
In [3]: it[0]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-40b9578de2b6> in <module>
----> 1 it[0]
TypeError: 'list_iterator' object is not subscriptable
In [4]: next(it)
Out[4]: 4
In [5]: next(it)
Out[5]: 5
In [6]: next(it)
Out[6]: 6
In [7]: next(it)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-7-bc1ab118995a> in <module>
----> 1 next(it)
StopIteration:
However, the original list, ls
, is left alone. Without getting too fancy, we could naively
implement something similar:
class NaiveIterator:
def __init__(self, sequence):
self._pos = 0
self._seq = sequence
def next(self):
pos = self._pos
self._pos += 1
return self._seq[pos]
In [1]: from naive_iterator import NaiveIterator
In [2]: ls = [4, 5, 6]
In [3]: it = NaiveIterator(ls)
In [4]: it.next()
Out[4]: 4
In [5]: it.next()
Out[5]: 5
In [6]: it.next()
Out[6]: 6
In [7]: it.next()
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-7-24d6ab57088b> in <module>
----> 1 it.next()
~/src/blog-code/python-range/naive_iterator.py in next(self)
9 pos = self._pos
10 self._pos += 1
---> 11 return self._seq[pos]
12
IndexError: list index out of range
If we want to split hairs, this only works for a subset of the objects that iter
works for--only
for sequences, or objects that implement __getitem__
and __len__
. For example, we
can't access the keys of a dictionary by index, but we can call iter
on them:
In [1]: d = {'a': 1, 'b': 2}
In [2]: d.keys()[0]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-8ea809e8b798> in <module>
----> 1 d.keys()[0]
TypeError: 'dict_keys' object is not subscriptable
In [3]: it = iter(d.keys())
In [4]: next(it)
Out[4]: 'a'
What's an iterator, then?
We can iterate over lists--are they iterators? No! A list is not an iterator until we call iter
on
it. But it's an iterable! Basically, an iterable is a sequence or an iterator.
It occured to me the other day--the value at index i of a range can always be calculated based on
its start
, stop
, and step
attributes even if the value at an index greater than i has been
accessed! Another naive implementation:
def range_at_i(start, stop, step, i):
val = start + step * i
if val > stop:
raise IndexError()
else:
return val
What's range
, then?!
I then played around and confirmed it--range
is a sequence type, not an iterator after all! And,
unlike sequence types that occupy more memory as they grow larger, range
objects don't need to.
In [1]: r = range(100)
In [2]: r[30]
Out[2]: 30
In [3]: r[0]
Out[3]: 0
Who cares? Or, range
to the rescue
Me! I was about to write a class that behaved just like range
. Suppose you're writing a
plotting library and you want to represent a 1-dimensional dataset, but you don't
want to make the user provide the values for those little ticks along the x-axis at the bottom of
the plot. By default, you just want to use the index of the datapoint in the provided dataset. You
can simply write that like this:
class DataSeries:
def __init__(ys, xs=None):
self._ys = ys
if xs is None:
self._xs = range(len(ys))
else:
if len(xs) != len(ys):
raise ValueError('Length of xs must match length of ys')
self._xs = xs
@property
def zipped(self):
return zip(self._xs, self._ys)
If ys
is a list or Numpy array, it's already taking up memory proportional to the number of
elements in the sequence. But there's no need to take up twice as much memory to store the x-values!
n [1]: from data_series import DataSeries
In [2]: ls = [4*i for i in range(4)]
In [3]: ls
Out[3]: [0, 4, 8, 12]
In [4]: series = DataSeries(ls)
In [5]: series
Out[5]: <data_series.DataSeries at 0x7fc8436ca278>
In [6]: series.zipped
Out[6]: <zip at 0x7fc843676a48>
In [7]: list(series.zipped)
Out[7]: [(0, 0), (1, 4), (2, 8), (3, 12)]
range
to the rescue.