Spring 2012 CSCI 220 Week 10

File Processing
One critical feature of an application is the ability to store and retrieve information from files on the disk.

Conceptually, a file is a sequence of data stored on secondary memory (usually a disk). Files can contain any data type, but the easiest files to work with contain text.

Files of text can be read and understood by humans. You can think of a text file as a long string that happens to be stored on disk. A special character that we've seen before in class is used to mark the end of lines: \n.

For example, a file that contains the following: Hello World

Goodbye 32

When stored to a file, you get this sequence of characters.

Hello\nWorld\n\nGoodbye 32

This is no different than how we've used newlines before in this class.

The exact details of file-processing differ substantially among programming languages. In fact, Python itself has multiple ways to accomplish the same file I/O. But there is a common pattern:
 * 1) Open the file by associating it with an object in our program
 * 2) Manipulate the file object (e.g., read, write, seek)
 * 3) Close the file, which is necessary to maintain correspondence between the file on disk and the file object. For example, changes you make to the file object might not show up on the disk until you close the file.

As you "edit the file," you are really making changes to data in memory, not the file itself. But Python will take care of modifying the file on the disk.

Working with text files in Python is easy. The first step is to create a file object corresponding to a file on a disk: = open

The mode is a string parameter that is either "r" or "w" depending on whether we intend to read from the file or write to the file.

An example: infile = open("numbers.dat","r")

Reading from a file
Python provides three related operations for reading information from a file:
 * .read
 * .readline
 * .readlines

Here is an example program that reads the entire contents of a file:
 * 1) printfile.py
 * 2)    Prints a file to the screen.

def main: fname = input("Enter filename: ") infile = open(fname,"r") data = infile.read print(data)

main

The readline operation can be used to read the next line from a file. Successive calls to readline get successive lines from the file. These lines will include the newline character. Example:

infile = open("Some_file.txt","r") for i in range(5): line = infile.readline print(line)

We can remove the newline by slicing it off: line[:-1]

We can also loop through all of the lines in file using: infile = open("Some_file.txt","r") for line in infile.readlines: # process the line here infile.close

A potential drawback of this approach is that all of the lines of the file are read at once. This can be a problem for very large files, which take up too much RAM. There is a simple alternative:

infile = open("Some_file.txt","r") for line in infile: # process the line here infile.close

Writing to a file
Opening a file for writing prepares that file to receive data. If no file with the name exists, it will be created. If a file with the name does exist, then Python will delete it and create a new, empty file.

outfile = open("mydata.out","w")

The easiest way to write information into a text file is to use the already familiar print function: print(..., file=)