Part 1 (40 points):For Part 1 of this assignment, recreate a portion of the van Velzen, Nanetti, and de Deyn paper by calculating a normalized Type-Token Ratio of Agatha Christie's novel The Secret Adversary.

To do this, write a program that:

1. Accesses the Project Gutenberg html file that contains the book at

2. Parses the HTML to isolate only the book contents (do not include the title, table of contents, metadata, etc). In your readme, be sure to explain why you chose the parsing technique you chose and how you confirmed it gives the text you expect.

3. Calculates and prints the Type-Token Ratio of the piece, using the block TTR method described in the Method section of the van Velzen and Garrard paper ( You only need to calculate the mean of the blocks, not the standard deviation.

You may use any of the libraries or materials used in the lecture videos or in class. Remember to break up your program into reasonable functions.

Part 2 (60 points):For Part 2, create a program that grades essays by giving a score based on Orwell's metrics for good writing.

1. Choose at least three metrics of "writing well" (on the level of word, sentence, or paragraph) based on Orwell's essay and write your own explicit rubric for grading a specific type of paper in your readme. Be sure to justify your rubric, describe its limitations, and suggest places for refinement.

For example, since students' philosophy papers often devolve into rambling personal musing, I could write a rubric where the score is:

40%: Ideal paragraph length, calculated by 1 - the average deviation from a six sentence paragraph divided by the number of sentences in that paragraph

35%: Thesis supported by evidence, calculated by the average number of quotes per paragraph, with a maximum average of 1

15%: Idea expressed in simplest terms, calculated by 1/(average length of word - 6), where negative/undefined values receive a score of 1

10%: Variety of language, calculated using a normalized TTR with a block size of 500 words

2. Create a program that cleans and grades a given text file according to your rubric and outputs a score out of 100 possible points. Run some of your own essays through your program to test it.

3. Modify your program to accept a variable number of file name inputs and writes every file name and its associated score to another text file. Your output should look as follows:

file1.txt 82

myessay.txt 65

apaper.txt 90

What to hand in:

part1.py- Your commented program submission for part 1

part2.py- Your commented program submission for part 2

README.txt- For each program, include information on how to run the program, a list of required libraries, explanations of decisions you made, and any known errors