Fetch Your Netflix Ratings
by Devanshu Mehta
If you use the “Netflix”:http://www.netflix rating system even half as much as I do, your account has information documenting your entire movie watching life. At this point in time, I have rated 1348 films on Netflix and that number grows by about 15 per month- considering I am a movie geek and a statistics geek, that information is important- nay, vital– to me! If only there was a simple, friendly way to get at my information…
What use would that be?
Good question; but here are a few answers:
- You could get information about your ratings per director. You think you like Copolla (Francis or Sophia), but do you really? What is your average Copolla rating?
- How about 1939? People say it was a great year in cinema, but did you think so? How about the ’60s or 2002? You might be surprised with how many 5 star films you saw last year, considering the fact that the media has dubbed it a failure at the box office. Your Netflix data knows the answers.
- How about Tom Cruise? You find him annoying (I don’t), but do you really know how you’ve rated his movies on average. Say you could find out which obscure actor you’ve actually rated 5 stars every time he’s been in a film?
- Did you know that on average you rate Family films higher than drama? Or that of the 180 Horror films you’ve seen and rated, your average rating has been 3.0556?
Well, I’ve made all of this information within your reach. A little patience, perl and python… (Click to skip the discussion and go straight to the Download of getFlix v0.1.)
Aren’t Pythons dangerous?
They are- especially the Monty variety.
“Python”:http://www.python.org and “Perl”:http://www.perl.org are scripting languages that are really good at obeying your command; especially if you know the right commands. Their skills like in being able to answer tough questions like what is the square root of 11553201 (3399) or how many letter ‘a’s there are in this article (you figure it out… using perl). And also, grabbing information off the internet and parsing it to make statistical sense.
You need the following:
- Perl v5 (higher should work, don’t know about lower). Get “perl at perl.org”:http://www.perl.org/get.html.
- Python v2.3 (again, higher yes, lower don’t know). Get “python at their web site”:http://www.python.org/download/.
- For Perl, install modules WWW::Mechanize, Crypt::SSLeay and Data::Dumper. “Help installing perl modules”:http://www.cpan.org/misc/cpan-faq.html#How_install_Perl_modules. Mac OS X, most Linux distributions and BSD variants come pre-installed with both perl and python.
- For Python, install module ‘mechanize’. “Help installing mechanize”:http://wwwsearch.sourceforge.net/mechanize/#download.
- Find out your path to Perl (type ‘whereis perl’ at command line) and your path to Python (‘whereis python’). Also, you will need your Netflix user name (email address) and password.
- And finally, the actual scripts: Download v0.1 of getFlix.
Net::Netflix
I was originally looking for someone else who had written a script to do this and had found “Net::Netflix”:http://ejohn.org/projects/netflix by a guy named John Resig. Very cool script- it went to the Netflix web site with your username/password and fetched your ratings. The problem was that it only got the film title and the rating. Nothing else. Great, but no cigar.
The Scripts
So I had to write everything else myself. Here is what I have:
- getflix.pl – A script that calls the main perl module (included, next) called Netflix.pm
- Netflix.pm – The heart of my effort here is this module, partially borrowed from John Resig’s script but modified to get the following:
- The film’s Netflix ID
- Film title
- Film Year
- Film MPAA Rating
- Film Genre
- Your Film Rating
For example, it would get the following: “60000161~Wonder Boys~2000~R~Drama~3” for every single film you have rated and puts it in a file called ‘nflicks.txt’. Not bad for a little perl script.
- Now for all the accessories, starting with nflixHisto.py which is a very nifty script that takes all the data in nflicks.txt and generates distribution for the data. For example, the average rating for a particular year or decade. Or for a particular MPAA rating or genre. Great stuff!
- getdirectors.py: This script will get the name of the directors for each of the film you have rated (from Netflix.com) and tabulate them with their rating in a file called directors.txt.
- dirHisto.py: This script will generate meaningful data about directors that you have rated highly; i.e. average rating for a particular director. This will live in directors2.txt.
- getstars.py: This script will fetch the stars (actors and actresses) for each film you have rated and again, tabulate them (similar to the directors). No histogram script exists yet to make sense of this data.
What kind of information will I get out of it?
For a brief overview, you can “read my own discoveries at WideScreenGlory.com”:http://www.widescreenglory.com/2005/12/09/netflix-ratings/ but suffice to say, you can get all the information it is possible to fetch from Netflix.com regarding each of the film you have rated. That is,
- Director
- Actors/Actresses
- Year
- Genre
- MPAA Rating
And also, a lot of analysis of the information, such as averages for years, decades, ratings and a lot more. And if you find something that is more meaningful, go ahead and add it or suggest it in the comments below.
What if it doesn’t work?
Let me know- preferrably in the comments below. If you can code perl at all (I can’t and I wrote these scripts!) or python, please feel free to modify the scripts. I only ask for credit where it is due.
If you do run into problems, make sure you have read everything on this page AND the README file included with the distribution. Then, be as specific as you can about the problems you run into. I will see what I can do.
Caveats
Some points to keep in mind when you use these scripts:
- This is v0.1 code. No guarantees, no warrantees and no manatees. Certainly no manatees.
- Again, this is v0.1 code and while I can write a mean perl/python script when required to, grabbing and parsing web pages is not my strong point.
- Make sure you have all the required modules; perl and python
- Make sure the path to perl in all scripts is correct.
- Make sure you have edited your email address and password in the getflix.pl file.
That’s it! Happy mining!
I came across your site from HackingNetflix.com and I love what you’ve done here. I’m an idiot when it comes to this stuff, but I’d love to be able to download my ratings, but I seem to have hit a wall. All the .py files show a python icon next to them, but the Netflix.pm file does not recognize a program. Should that open in Perl? And if so, what might I be missing? I am running XP and downloaded the newest versions of Perl and Python and tried to follow your directions above.
Thanks in advance for any help you can provide.
Pete
Follow the instructions in the README file included. If you have Perl, Python and all the modules installed correctly, (and modified getflix.pl with your login info), all you should have to do is type ‘perl getflix.pl’ on the command prompt (in Windows: Start Menu->Run; then type ‘cmd’)
Okay, now I’ve got Perl icons for the .pl & .pm files. I went into the command prompt and typed ‘perl getflix.pl’ and got an error of ‘can’t open perl script “getflix.pl”: no such file or directory’ Any idea what I’m doing wrong?
Thanks again,
Pete
So it seems Perl cannot find the file. I have two suggestions, which may seem silly:
– Are you running ‘perl getflix.pl’ in the same directory as where the file getflix.pl is?
– Also, try running it with the entire path to the file; e.g. ‘perl c:perlfilesgetFlixgetflix.pl’.
I’ve run the getflix.pl script and think what you’ve done so far is great. I have a lot of movies that I’ve marked as ‘Not Interested’ and they are showing up in the nflicks.txt file with a rating – which must be either the average or predicted rating for that DVD.
In your regular expression that gathers the info, I think that this part:
code doesn’t seem to show up in your comment; email it to skywalker([at])galaxyfaraway.com if you want and I will put it up on the page if it works! Thanks for the input.
I’ve put together a browser-based script to get the same data (minus the ID number):
http://badsegue.org/archives/2006/03/10/netflix-ratings-grabber
Have you thought about hosting your scripts on your server and letting people submit their raw data for analysis? 99.9% of netflix users aren’t going to have python installed.
I do like your solution a lot. The reason I have not hosted the script on my server is that I do not want to be responsible for having people’s account info (username, password) passing through my server.
Great perl script. I’m running this under Windows XP using ActivePerl and Python. It’s working fine. The ‘getstars.pl’ script needs some more functionality. For those of you who want to use ‘getstars.pl’, make sure you pass in the url for the movie, i.e. from the command prompt type:
> getstars.py http://www.netflix.com/MovieDisplay?movieid=354611
This returns: [‘Humphrey Bogart’, ‘Ingrid Bergman’]
Just a suggestion: it would be nice if you could integrate the getstars option into the nflicks.txt document, so the starring cast is shown along with your ratings, etc.
I found a problem in nflixHisto.py:
This code…..:
mpaas=[‘G’,’PG’,’PG-13′,’R’,’NC-17′,’UR’,’NR’]
print
print “MPAAt#filmst tAve. Rating”
for m in mpaas:
print ‘%s t %3d t %2.2f t %1.4f’ % (m, mpaah[m],decadehr[str(x)]/decadeh[str(x)],mpaahr[m]/mpaah[m])
…is generating a zero-division error. I have not rated any NC-17, UR or NR movies, so the code crashes. If I remove ‘NC-17’, ‘UR’ and ‘NR’ from mpaas, then the script works fine. It looks like there is division by zero because I have not rated movies with certain mpaa values.
Unfortunately, I don’t know enough Perl to be able to fix the code, but I hope this helps you.
-Nick
I found a problem in Netflix.pm:
The variable $cur on this line….:
my $cur = 0;
…. should be 1, not 0. If 0 is used, the first page of your Netflix reviews are loaded twice, causing your results to repeat for the first page, i.e. if you reviewed 20 movies, and you have $cur set to 0, getFlix.pl will return 40 results.
The problem is caused because this url…: http://www.netflix.com/MoviesYouveSeen?title_sort=t&pageNum=0
and this url….:
http://www.netflix.com/MoviesYouveSeen?title_sort=t&pageNum=1
…return the same information.
So set $cur=1 and everything works fine.
-Nick
I have trouble getting it to work.
It logs in and everything but when it going to fetch the movies on the page, I only get $VAR1 = {};
Is the regular expression correct? Regex contains “StarbarInsert” but I cant find it on the web page…
Any ideas?
/E
Unfortunately, NetFlix seems to have changed its website and I can’t seem to get even getflix.pl to work. Is there any chance that you can edit the script to accomodate the new changes?
It seems to be related to Erik’s comments, namely the script does not successfuly access the given movie of interest.
Thanks very much and I appreciate all of your hard work on this.
Adam
Sorry ’bout that- I will take a look at it and modify it as soon as possible. If anyone else wants to take a look at it, you are welcome to.
Check this out: http://www.nickfessel.com/Netflix/ratings.gif
That example chart was automatically created by using a PHP5 script I wrote to grab ratings data. I will be presenting the code on my website shortly. You can e-mail me at nfessel@gmail.com if you have questions or suggestions. Thanks.
Nick
Fixed the code for sub getRatings. Hope this works for everyone.
sub getRatings {
my ( $self ) = @_;
my %rat;
my %year;
my %id;
my %mpaa;
my %genre;
my $body = ‘alt=”Next”‘;
my $cur = 1;
my $genre =”;
while ( $body =~ /alt=”Next”/i ) {
open(FD, “>>nflicks.txt”) or die(“Couldn’t open nflicks.txt”);
$self->{www}->get( “http://www.netflix.com/MoviesYouveSeen?title_sort=t&pageNum=$cur” );
$body = $self->{www}->content();
# print $body;
# This is the main Regular Expression. If Netflix ever changes their web site,
# this regular expression will need to change as well.
while ( $body =~ /MovieDisplay?movieid=(d+).*?>([^(.*?)(.*?)
while ( $body =~ /MovieDisplay?movieid=(d+).*?>([^(.*?)(.*?)
apparently i cannot paste an open angle bracket….any suggestions how to paste the code?
Email it to me at skywalker [at] galaxyfaraway.com
Any update on this? Can somebody post what the updated regex needs to be? All I need is to fetch all of the movieid’s for the movies I have rated. My ultimate goal is to clear the ratings for all of the movies I have rated so that I can start from scratch.
I can write my own script to do the clearing based on the URL, but I need a way to fetch all of my movieid’s (2700 of them)
I fixed the regular expression but it hangs when it is checking it… I never got this work before they changed the site so I dont my setup even works…
Anyone got it working?
I got it working by inserting a ” after mpaa and genre in the regexp:
/movieid=(d+).*?trkid=d+”>([^(.*?)(.*?)
I have rewritten the ratings retrieval code and updated the URLs and regexp used.
I’m a little late to this thread, but figured I’d chime in if anyone else still has this problem.
I created a service called “Save My Ratings” that allows you to copy your movie ratings from sites like Netflix.
You can copy your ratings from Netflix into Amazon, Blockbuster, Yahoo , RottenTomatoes…etc.. Or vis-versa. For example, lots of people copy their Netflix ratings into Amazon; to improve the Amazon recommendations. Or you can just copy them back into another netflix profile.
Check it out and let me know if you have any suggestions. SaveMyRatings
Thanks.
Alan
Email Me : $MyName@savemyratings.com
I personally really would like to mention that I have found your website exceptionally beneficial! You have got so much knowledge and also details right here that has aided me carry out our university report!