17.5. Alternative File Reading Methods

Again, recall the contents of the qbdata.txt file.

Colt McCoy QB CLE  135 222 1576    6   9   60.8%   74.5
Josh Freeman QB TB 291 474 3451    25  6   61.4%   95.9
Michael Vick QB PHI    233 372 3018    21  6   62.6%   100.2
Matt Schaub QB HOU 365 574 4370    24  12  63.6%   92.0
Philip Rivers QB SD    357 541 4710    30  13  66.0%   101.8
Matt Hasselbeck QB SEA 266 444 3001    12  17  59.9%   73.2
Jimmy Clausen QB CAR   157 299 1558    3   9   52.5%   58.4
Joe Flacco QB BAL  306 489 3622    25  10  62.6%   93.6
Kyle Orton QB DEN  293 498 3653    20  9   58.8%   87.5
Jason Campbell QB OAK  194 329 2387    13  8   59.0%   84.5
Peyton Manning QB IND  450 679 4700    33  17  66.3%   91.9
Drew Brees QB NO   448 658 4620    33  22  68.1%   90.9
Matt Ryan QB ATL   357 571 3705    28  9   62.5%   91.0
Matt Cassel QB KC  262 450 3116    27  7   58.2%   93.0
Mark Sanchez QB NYJ    278 507 3291    17  13  54.8%   75.3
Brett Favre QB MIN 217 358 2509    11  19  60.6%   69.9
David Garrard QB JAC   236 366 2734    23  15  64.5%   90.8
Eli Manning QB NYG 339 539 4002    31  25  62.9%   85.3
Carson Palmer QB CIN   362 586 3970    26  20  61.8%   82.4
Alex Smith QB SF   204 342 2370    14  10  59.6%   82.1
Chad Henne QB MIA  301 490 3301    15  19  61.4%   75.4
Tony Romo QB DAL   148 213 1605    11  7   69.5%   94.9
Jay Cutler QB CHI  261 432 3274    23  16  60.4%   86.3
Jon Kitna QB DAL   209 318 2365    16  12  65.7%   88.9
Tom Brady QB NE    324 492 3900    36  4   65.9%   111.0
Ben Roethlisberger QB PIT  240 389 3200    17  5   61.7%   97.0
Kerry Collins QB TEN   160 278 1823    14  8   57.6%   82.2
Derek Anderson QB ARI  169 327 2065    7   10  51.7%   65.9
Ryan Fitzpatrick QB BUF    255 441 3000    23  15  57.8%   81.8
Donovan McNabb QB WAS  275 472 3377    14  15  58.3%   77.1
Kevin Kolb QB PHI  115 189 1197    7   7   60.8%   76.1
Aaron Rodgers QB GB    312 475 3922    28  11  65.7%   101.2
Sam Bradford QB STL    354 590 3512    18  15  60.0%   76.5
Shaun Hill QB DET  257 416 2686    16  12  61.8%   81.3

In addition to the for loop, Python provides three methods to read data from the input file. The readline method reads one line from the file and returns it as a string. The string returned by readline will contain the newline character at the end. This method returns the empty string when it reaches the end of the file. The readlines method returns the contents of the entire file as a list of strings, where each item in the list represents one line of the file. It is also possible to read the entire file into a single string with read. Table 2 summarizes these methods and the following session shows them in action.

Note that we need to reopen the file before each read so that we start from the beginning. Each file has a marker that denotes the current read position in the file. Any time one of the read methods is called the marker is moved to the character immediately following the last character returned. In the case of readline this moves the marker to the first character of the next line in the file. In the case of read or readlines the marker is moved to the end of the file.

>>> infile = open("qbdata.txt", "r")
>>> aline = infile.readline()
>>> aline
'Colt McCoy QB, CLE\t135\t222\t1576\t6\t9\t60.8%\t74.5\n'
>>>
>>> infile = open("qbdata.txt", "r")
>>> linelist = infile.readlines()
>>> print(len(linelist))
34
>>> print(linelist[0:4])
['Colt McCoy QB CLE\t135\t222\t1576\t6\t9\t60.8%\t74.5\n',
 'Josh Freeman QB TB\t291\t474\t3451\t25\t6\t61.4%\t95.9\n',
 'Michael Vick QB PHI\t233\t372\t3018\t21\t6\t62.6%\t100.2\n',
 'Matt Schaub QB HOU\t365\t574\t4370\t24\t12\t63.6%\t92.0\n']
>>>
>>> infile = open("qbdata.txt", "r")
>>> filestring = infile.read()
>>> print(len(filestring))
1708
>>> print(filestring[:256])
Colt McCoy QB CLE   135     222     1576    6       9       60.8%   74.5
Josh Freeman QB TB  291     474     3451    25      6       61.4%   95.9
Michael Vick QB PHI 233     372     3018    21      6       62.6%   100.2
Matt Schaub QB HOU  365     574     4370    24      12      63.6%   92.0
Philip Rivers QB SD 357     541     4710    30      13      66.0%   101.8
Matt Ha
>>>
Method Name Use Explanation
write filevar.write(astring) Add astring to the end of the file. filevar must refer to a file that has been opened for writing.
read(n) filevar.read() Reads and returns a string of n characters, or the entire file as a single string if n is not provided.
readline(n) filevar.readline() Returns the next line of the file with all text up to and including the newline character. If n is provided as a parameter than only n characters will be returned if the line is longer than n.
readlines(n) filevar.readlines() Returns a list of strings, each representing a single line of the file. If n is not provided then all lines of the file are returned. If n is provided then n characters are read but n is rounded up so that an entire line is returned.

Now let’s look at another method of reading our file using a while loop. This is important because many other programming languages do not support the for loop style for reading files but they do support the pattern we’ll show you here.

There are several important things to notice in this code:

On line 2 we have the statement line = infile.readline(). We call this initial read the priming read. It is very important because the while condition needs to have a value for the line variable.

The readline method will return the empty string if there is no more data in the file. An empty string is an empty sequence of characters. When Python is looking for a Boolean condition, as in while line:, it treats an empty sequence type as False, and a non-empty sequence as True. Remember that a blank line in the file actually has a single character, the \n character (newline). So, the only way that a line of data from the file can be empty is if you are reading at the end of the file, and the while condition becomes False.

Finally, notice that the last line of the body of the while loop performs another readline. This statement will reassign the variable line to the next line of the file. It represents the change of state that is necessary for the iteration to function correctly. Without it, there would be an infinite loop processing the same line of data over and over.