Fetch Your Netflix Ratings

by Devanshu Mehta

If you use the “Netflix”:http://www.netflix rating system even half as much as I do, your account has information documenting your entire movie watching life. At this point in time, I have rated 1348 films on Netflix and that number grows by about 15 per month- considering I am a movie geek and a statistics geek, that information is important- nay, vital– to me! If only there was a simple, friendly way to get at my information…

What use would that be?
Good question; but here are a few answers:

  • You could get information about your ratings per director. You think you like Copolla (Francis or Sophia), but do you really? What is your average Copolla rating?
  • How about 1939? People say it was a great year in cinema, but did you think so? How about the ’60s or 2002? You might be surprised with how many 5 star films you saw last year, considering the fact that the media has dubbed it a failure at the box office. Your Netflix data knows the answers.
  • How about Tom Cruise? You find him annoying (I don’t), but do you really know how you’ve rated his movies on average. Say you could find out which obscure actor you’ve actually rated 5 stars every time he’s been in a film?
  • Did you know that on average you rate Family films higher than drama? Or that of the 180 Horror films you’ve seen and rated, your average rating has been 3.0556?

Well, I’ve made all of this information within your reach. A little patience, perl and python… (Click to skip the discussion and go straight to the Download of getFlix v0.1.)

Aren’t Pythons dangerous?
They are- especially the Monty variety.

“Python”:http://www.python.org and “Perl”:http://www.perl.org are scripting languages that are really good at obeying your command; especially if you know the right commands. Their skills like in being able to answer tough questions like what is the square root of 11553201 (3399) or how many letter ‘a’s there are in this article (you figure it out… using perl). And also, grabbing information off the internet and parsing it to make statistical sense.

You need the following:

I was originally looking for someone else who had written a script to do this and had found “Net::Netflix”:http://ejohn.org/projects/netflix by a guy named John Resig. Very cool script- it went to the Netflix web site with your username/password and fetched your ratings. The problem was that it only got the film title and the rating. Nothing else. Great, but no cigar.

The Scripts
So I had to write everything else myself. Here is what I have:

  • getflix.pl – A script that calls the main perl module (included, next) called Netflix.pm
  • Netflix.pm – The heart of my effort here is this module, partially borrowed from John Resig’s script but modified to get the following:
    1. The film’s Netflix ID
    2. Film title
    3. Film Year
    4. Film MPAA Rating
    5. Film Genre
    6. Your Film Rating

    For example, it would get the following: “60000161~Wonder Boys~2000~R~Drama~3” for every single film you have rated and puts it in a file called ‘nflicks.txt’. Not bad for a little perl script.

  • Now for all the accessories, starting with nflixHisto.py which is a very nifty script that takes all the data in nflicks.txt and generates distribution for the data. For example, the average rating for a particular year or decade. Or for a particular MPAA rating or genre. Great stuff!
  • getdirectors.py: This script will get the name of the directors for each of the film you have rated (from Netflix.com) and tabulate them with their rating in a file called directors.txt.
  • dirHisto.py: This script will generate meaningful data about directors that you have rated highly; i.e. average rating for a particular director. This will live in directors2.txt.
  • getstars.py: This script will fetch the stars (actors and actresses) for each film you have rated and again, tabulate them (similar to the directors). No histogram script exists yet to make sense of this data.

What kind of information will I get out of it?
For a brief overview, you can “read my own discoveries at WideScreenGlory.com”:http://www.widescreenglory.com/2005/12/09/netflix-ratings/ but suffice to say, you can get all the information it is possible to fetch from Netflix.com regarding each of the film you have rated. That is,

  • Director
  • Actors/Actresses
  • Year
  • Genre
  • MPAA Rating

And also, a lot of analysis of the information, such as averages for years, decades, ratings and a lot more. And if you find something that is more meaningful, go ahead and add it or suggest it in the comments below.
What if it doesn’t work?
Let me know- preferrably in the comments below. If you can code perl at all (I can’t and I wrote these scripts!) or python, please feel free to modify the scripts. I only ask for credit where it is due.

If you do run into problems, make sure you have read everything on this page AND the README file included with the distribution. Then, be as specific as you can about the problems you run into. I will see what I can do.

Some points to keep in mind when you use these scripts:

  1. This is v0.1 code. No guarantees, no warrantees and no manatees. Certainly no manatees.
  2. Again, this is v0.1 code and while I can write a mean perl/python script when required to, grabbing and parsing web pages is not my strong point.
  3. Make sure you have all the required modules; perl and python
  4. Make sure the path to perl in all scripts is correct.
  5. Make sure you have edited your email address and password in the getflix.pl file.

That’s it! Happy mining!