Literary Clock

Home

Posts

50,000 Books

For the less popular times, we need to look in a much larger body of work. Project Gutenberg has an amazing resource of over 50,000 ebooks (circa late 2016) and within them they contain many of the less common times. So how to get them? Thankfully this really well covered by Project Gutenberg here. I was still slightly nervous, so I ran the following during early morning American EST, which I guessed was the quietist time for them:

wget -w 2 -m http://www.gutenberg.org/robot/harvest?filetypes[]=txt&langs[]=en

which grabbed me around 50,000 txt files of English language ebooks. Next up get the metadata associated with these books, once again Project Gutenberg has a nice webpage describing how to grab the metadata in RDF/XML format. Using gutenberg_metadata.py, I can link the filename of the books downloaded with the author and title of the book.

Now I can use the tools discussed in the previous post to extract time for the rarest 400 minutes. The older books were less fruitful, but as personal timepieces became common and increasingly referred to in Victorian literature the number of quotes found increased. Perhaps my favourite two minutes found in the collection are:

He was thrifty, of Scotch-Irish descent, and at two minutes past three had never had an adventure in his life.

At three minutes past three he began his career as one of the celebrities of the world.

The Man Who Rocked the Earth by Arthur Cheney Train and Robert Williams Wood

I should really get around to reading it, though I am worried that the rest of the book cannot live up to these two minutes.

Thanks to Project Gutenberg and time obsessed, turn of the century detectives, amongst others, I was within finishing distance of getting a quote for each 1440 minute of the day. As discussed in the Crowdsourcing post, with a little bit of help, I managed to finish. So, how to share this with all of you?

Prev: Data Mining

Next: Apps & Bots