- Men in Black Swan
- Sling Blade Runner
- Law Abiding Citizen Kane
- My Fair Lady and the Tramp
- Batman & Robin Hood
- A Walk to Remember the Titans
I find the answers to be fairly amusing, so I wrote a program that generates before & afters.
You can find the source code here.
I've pumped out a spreadsheet of results (for movies) here.
I used a word list to discover words in common between the start/end of movie names. For finding the common word, I generated both a forwards and a backwards trie of the first/last word of each movie name. Tries are pretty much the best for simple text searches.
Ultimately, the program is general enough that it could work on anything. The main reason I focused on movies was because IMDB is good enough to dump their ratings data, which helped immensely for sussing out good vs. bad results. I found the best way to rank results was a combination of the # of voters (showing popularity of both items) and the difference in their rating (in that combining two items with vastly different ratings is hilarious).
There was one problem I could not overcome (since I was putting relatively minimal effort into this venture), which is determining when a word was part of a compound word or not. For example, I would want to match "ball" in "basketball" because "basketball" is a compound word, but I wouldn't want to match "all" in "hall". Solving this problem would a word list with more data than the word itself, so I just skipped it.
No comments:
Post a Comment