Programming Project 8

Programming Projects are to be submitted to gradescope.

Due date: Nov 7, Thursday at 7pm This programming project is an adaptation of Ben Dicken’s Spell Check Programming Assignment.

You should name the file spellcheck.py and have a function named correct_spelling to write out the ouput file.

Spell Check

Most in this class have probably used a spellchecker at least once in your life (more likely, many times). Spell checkers are built into a number of programs, including Microsoft Word, Apple Pages, gmail, and others. In this assignment, you’ll be implementing a program that can suggest and fix spelling issues in text.

Program behavior

The spell checking program should be named spellcheck.py. The spell check program takes as argument the name of a text file, which will contain the text that the user wants to be spell-checked. The program than writes out an output file with swapping the contents of the input file with the spelling already fixed.

Let’s walk through a really simple example to demonstrate the difference in the two modes. Let’s say that the file words.txt has the below content:

one hudnred dollars
and fiveteen cents

The words hundred (hudnred) and the word fifteen (fiveteen) are spelled wrong. The contents of the file should be written out to an output file with the misspelled words replaced with correctly-spelled ones.

one hundred dollars
and fifteen cents

Notice that the program goes ahead and replaces the misspelled words with correctly-spelled ones.

How was the program able to identify the words that were not spelled correctly? Read the next section to find out.

misspellings.txt

Your program should open up and read a file names misspellings.txt. You can assume that this file already exists in the same directory as spellcheck.py. When testing on your computer, you should create your own misspellings.txt. This file will include information about how words are misspelled. Using this information, you can determine which words are spelled wrong in the input file. You can assume that each line of misspellings.txt is formatted as follows:

incorrect_a->correctly_spelled_word_a
incorrect_b->correctly_spelled_word_b

In other words, the first word on a line of the file is a correctly spelled word, then an arrow, then the correctly-spelled version of the word. For instance:

hudnred->hundred
hundrid->hundred
fifeteen->fifteen
fiveteen->fifteen
fiften->fifteen

This file contains two possible misspellings of the word "hundred" and three possible misspellings of the word "fifteen". Your program should handle a misspellings file of any size.

Your program should open and read the misspellings.txt file into the program and store the mapping between misspelled and correctly-spelled words in a dictionary. The keys of the dictionary should be the misspelled version of the word, and the value the keys map to should be the correctly-spelled version of that word. Given the example content of misspelling.txt shown above, the misspellings dictionary should have the keys and values shown below:

misspellings = { 
    'hudnred' : 'hundred',
    'hundrid' : 'hundred',
    'fifeteen' : 'fifteen',
    'fiveteen' : 'fifteen',
    'fiften' : 'fifteen' }

After you have read in the file and constructed this dictionary, you can use it to determine the correct spelling of an incorrectly-spelled word. How would you do this?

After you’ve opened up the input file, loop through each line of it. You can split each line into individual words. For each word on the line, you can check if that word is one of the keys in the misspellings dictionary. If it is, you can assume it is incorrectly spelled. The correct spelling of that word is the value associated with the key, so you can grab it from the dictionary and either replace the original word (if in replace mode) or use it as a suggestion (if in suggest mode).

Preserving Capitalization

The words in the misspellings.txt file will be all in lower case. However, your program should also be able to replace or suggest fixes for misspelled capitalized words. For instance, it should be able to change Hudnred to Hundred in replace mode.

Before you try to look up a word in the misspellings dictionary, determine if it is or is not capitalized (meaning that the first letter is upper-case). Then, you can convert it to lower case. If the word does need replacing, then you should make sure to capitalize the correctly spelled word if the original word was also capitalized.

You might find the string methods .lower(),.isupper(), and .capitalize() useful.

Ignoring Punctuation

You also should make the spell checker ignore punctuation. In particular, spellcheck.py``` should be able to handle a trailing ,, ., ?, or ! or ;. For instance, it should be able to changehudnred?tohundred?` in replace mode. How can this be accomplished?

Before looking up a word in the misspellings dictionary, check if the last character is one of the four types of punctuation that you should look for (, or . or ? or ! or ;). If it is, create a variable to store that punctuation, and then trim it off of the end of the word. Then, proceed with checking the spelling. After you have determined if it is spelled correctly or not, you can put the punctuation at the end of the word.

Development Strategy

(1) Reading in misspellings.txt

Create a function that will be responsible for opening up the misspellings file and reading in the information. The function can assume that the misspellings.txt file exists. The function should open the file, read the lines, and populate the misspellings dictionary, as described previously in this specification.

(2) Replacing words

Next, work reading the input file and replacing misspelled words. At this point, don’t yet worry about capitalized words, or punctuation. Iterate through the contents of the input file, line-by-line and word-by-word. When a correctly-spelled word is encountered, print it out with a trailing space. When an incorrectly-spelled word is encountered, print out the correctly-spelled version instead.

(3) Handle Punctuation / Handle Capitalization

You might find one or the other of these easier or harder, so choose whichever you think you will be able to accomplish more quickly. Once you have one working, move to the other! Handling punctuation is extra credit.

(4) Write out output file

Your function that writes out the output file should be named correct_spelling and the output file should have the output_ prefix. That means that if the original file was names words_1.txt, the output file should be named output_words_1.txt

Test Cases

There are going to be a number of test cases for this assignment. You should test out your program in chunks. Don’t try to tackle everything at once. If you follow the development strategy, you should be able to progressively pass more and more test cases, as you add more and more functionality.

Below are several examples. Each example includes the contents of misspellings.txt, the contents of the input file, and the results for the output file.

Example 1 (no punctuation or capitalized words)

misspellings.txt

datamic->dramatic
dramati->dramatic
elaphant->elephant
elofent->elephant
elaphent->elephant
zooo->zoo
zo->zoo

words_1.txt

joe and his family went to the zoo the other day
the zooo had many animals including an elofent
the elaphant was being too dramati though
after they walked around joe left the zo
if __name__ == '__main__':
    spell_dict = read_spellings()
    correct_spelling("words_1.txt", spell_dict)

output_words_1.txt

joe and his family went to the zoo the other day
the zoo had many animals including an elephant
the elephant was being too dramatic though
after they walked around joe left the zoo

Example 2 (with punctuation and capitalized words)

misspellings.txt

hereo->hero
heroc->hero
fluwe->flew
jumpedd->jumped
jimped->jumped
jumpped->jumped
savved->saved
sived->saved
sivved->saved
sumprean->superman
sumperan->superman
dayy->day
ayy->day

words_2.txt

There once was a hero, named superman.
Sumperan, being the hero he is, jumped.
After he jumped, he fluwe!
Then, Sumprean savved the day.
if __name__ == '__main__':
    spell_dict = read_spellings()
    correct_spelling("words_2.txt", spell_dict)

output_words_2.txt

There once was a hero, named superman.
Superman, being the hero he is, jumped.
After he jumped, he flew!
Then, Superman saved the day.