files (class slides)

CSc 110 - files

Announcements

  • Midterm 2 grades have been published
  • Also check D2L grades
  • November 3 is the Last day for students to drop from a regular session class with a W grade through UAccess
  • No in-person class on November 1 – a video lecture will be posted on D2L

Files and file systems

  • On (at least most of) our computers, there is a file system via which we can create, save, modify, and remove files

    • On Mac: can browse with Finder

    • On Windows: can browse with windows explorer

  • File systems are often hierarchical

Opening a file

  • To open a file in a python program:

    a_file = open(file_name, mode)

  • file_name should be the name of the file to open

    • It can also be a path
  • mode should be the mode in which to open in

    • ‘a’    ‘r’    ‘w’

After opening, use the a_file object to read from or write to the file

Absolute vs Relative path

  • A path describes a location.

    • /home/doc/example.txt
  • An absolute path describes the location from the root directory.

  • A relative path describes the location of a file relative to the current (working) directory.

Opening and closing a file

info.txt

The quick brown fox
jumped over
the lazy
bear
sitting by the tree

read_file.py

info = open('info.txt', 'r')
info.close()

Reading a line

  • file.readline()

    • reads one line from a file, returns a string

    • returns an empty string if it has reached the end of the content

  • file.readlines()

    • reads all of the lines

    • returns a list of strings

  • file.read()

    • returns one string

Reading a line example

info.txt

The quick brown fox
jumped over
the lazy
bear
sitting by the tree

read_line.py

def read_first_line(filename):
  info = open(filename, 'r')
  line = info.readline()
  info.close()
  return line

assert read_first_line('info.txt') == "The quick brown fox\n"

Submit code for attendance

Submit the read_first_line function to Gradescope for attendance (given a file name string, open file in read mode, read first line, return that first line).

Name your file first_read.py

Iterating over a file directly

The type file is iterable so we can use a for x in file loop:

info = open('info.txt', 'r')
for line in info:
    print(line)
info.close()
The quick brown fox

jumped over

the lazy

bear

sitting by the tree

More string methods

We worked with .isnumeric(), a string method to check if a string contains only digit characters

Here are some other useful string methods:

  • string.strip(chars) – removes any of the characters in chars from the beginning or end of string, returns a string
  • string.split(chars) – splits string at the chars, returns a list

More string methods – .strip()

info = open('info.txt', 'r')

first_line = info.readline()
second_line = info.readline()
print(first_line)
print(second_line)


first_line = first_line.strip("\n")
second_line = second_line.strip("\n")
print(first_line)
print(second_line)

info.close()
The quick brown fox

jumped over

The quick brown fox
jumped over

More string methods – .split(" ")

info = open('info.txt', 'r')

first_line = info.readline().strip("\n")
second_line = info.readline().strip("\n")

first_line_words = first_line.split(" ")
second_line_words = second_line.split(" ")

print(first_line_words)
print(second_line_words)

info.close()
['The', 'quick', 'brown', 'fox']
['jumped', 'over']

Write a function

  1. Its name is lines_and_words
  2. It takes a string called file_name
  3. It opens the file with file_name in read mode
  4. It iterates over the file’s lines, splitting each line by space (remember to strip the line breaks)
  5. It returns a list of lists, where each inner-list is a line containing all the words in that line

Test case (download text file):

assert lines_and_words("info.txt") == [ ['The', 'quick', 'brown', 'fox'], 
                                        ['jumped', 'over'], ['the', 'lazy'],
                                        ['bear'], ['sitting', 'by', 'the', 'tree'] ]

Write a function – solution

def lines_and_words(file_name):
  f = open(file_name, "r")
  all_words = []
  for line in f:
    all_words.append(line.strip("\n").split(" "))
  f.close()
  return all_words
  
  
def main():
  assert lines_and_words("info.txt") == [ ['The', 'quick', 'brown', 'fox'], 
                                          ['jumped', 'over'], ['the', 'lazy'],
                                          ['bear'], ['sitting', 'by', 'the', 'tree'] ]
                                     
main()

Write a function

  1. Its name is count_words
  2. It takes a string file_name as argument
  3. It returns a dictionary with word counts

Test case:

assert count_words("info.txt") == {"the": 3, "quick": 1, "brown": 1,
                                   "fox": 1 , "jumped": 1, "over": 1, 
                                   "lazy": 1, "bear": 1, "sitting": 1,
                                   "by": 1, "tree": 1}

Write a function – solution

def count_words(file_name):
  f = open(file_name, "r")
  counts = {}
  for line in f:
    words = line.strip("\n").split(" ")
    for w in words:
      lower_case_w = w.lower()
      if lower_case_w not in counts:
        counts[lower_case_w] = 1
      else:
        counts[lower_case_w] += 1
  return counts

def main():
  assert count_words("info.txt") == {"the": 3, "quick": 1, "brown": 1,
                                     "fox": 1 , "jumped": 1, "over": 1, 
                                     "lazy": 1, "bear": 1, "sitting": 1,
                                     "by": 1, "tree": 1}
                                 
main()