Visualizing My Netflix Viewing Activity with Python and Matplotlib (Part II)

How to find out your Netflix viewing patterns using your own Netflix data for fun.

How does your Netflix homepage look like? Here’s mine.

Previously, I have written a simple tutorial on visualizing your Netflix viewing activity by downloading the history file (.CSV) from your own account. The article can be viewed here. Now I want to go further by requesting more data from Netflix.

As I have said on my previous article, I have requested a more complete data on my account on 26 June, and finally by 20 July I finally got it! Yay! And yes it took me quite a long time to finish writing this article.

Unluckily, the data sets are not exactly what I expected. I kind of expected a data set detailing on whether this piece of video I watched is a movie or a television show, and if it is a show what is the original title (instead of the combined title of the show title + episode title), or even better if we can even get the genres of the shows/movies, the cast, the directors, etc. Instead, I got these files that are more of a report on every click I did on Netflix.

Still, let’s try to gain insights from those data.

After your request is granted, you will get a compressed file called “netflix-report.zip”. If you decompress it, you will see the following folders inside.

If you want to see what’s inside each of those folders, you can see it below.

As you can see above, there are a lot of CSV files here that you can process and try to visualize if you’re curious. For this article, I want to keep it brief and focus on analysis using one file only. I choose the “ViewingActivity.csv” for the purpose of the article. It’s like a more detailed version of the “NetflixViewingActivity.csv” that I used on my previous article.

So, here we go.

There are 5 profiles on my account. I have my own account, my sister has her own account, and the three others are used by my friends. Now I know that I don’t let anyone use my profile (yes I am selfish about Netflix!), but I have no idea if my friends use theirs with other people (and they probably do). Mostly I don’t care as long as I don’t get the “too many people are using your account right now” notification.

Here’s the five profiles on my account. El is mine, Silev is my sister’s.

So let’s begin by importing some libraries and the file itself.

What I can observe from the data above is the following columns.

  • Profile Name: clear enough
  • Start Time: the time when you start watching the show/movie
  • Duration: how long you watch it
  • Attributes: I am still not sure what this column explains, but it describes whether the show/movie/trailer you watch is autoplayed
  • Title: clear enough
  • Supplemental Video Type: this column describes whether what you watch is a trailer, hook, preview, teaser trailer, recap, etc. I take a guess at this but when left empty, that means you are actually watching the show/movie, not some preview/hook/trailer etc
  • Device Type: explains what device you watch Netflix on
  • Bookmark: not clear
  • Latest Bookmark: not clear
  • Country: explains the country from which you watch Netflix. Y̶e̶s̶ ̶I̶ ̶d̶o̶ ̶u̶s̶e̶ ̶V̶P̶N̶ ̶s̶o̶m̶e̶t̶i̶m̶e̶s̶,̶ ̶a̶n̶d̶ ̶i̶t̶ ̶w̶i̶l̶l̶ ̶r̶e̶g̶i̶s̶t̶e̶r̶ ̶s̶a̶i̶d̶ ̶c̶o̶u̶n̶t̶r̶y̶ ̶(̶t̶h̶e̶ ̶V̶P̶N̶)̶ ̶t̶o̶ ̶t̶h̶i̶s̶.̶

Data Cleaning

This is important because not all data presented are relevant to our analysis below, e̶s̶p̶e̶c̶i̶a̶l̶l̶y̶ ̶t̶h̶e̶ ̶c̶o̶l̶u̶m̶n̶s̶ ̶t̶h̶a̶t̶ ̶I̶ ̶d̶o̶n̶’̶t̶ ̶u̶n̶d̶e̶r̶s̶t̶a̶n̶d̶.

Dropping unnecessary data

I don’t want to include the times I watched trailers or previews f̶r̶o̶m̶ ̶s̶c̶r̶o̶l̶l̶i̶n̶g̶ ̶N̶e̶t̶f̶l̶i̶x̶ ̶m̶i̶n̶d̶l̶e̶s̶s̶l̶y̶ ̶w̶h̶i̶l̶e̶ ̶n̶o̶t̶ ̶b̶e̶i̶n̶g̶ ̶a̶b̶l̶e̶ ̶t̶o̶ ̶d̶e̶c̶i̶d̶e̶ ̶w̶h̶a̶t̶ ̶t̶o̶ ̶w̶a̶t̶c̶h̶, so I am going to drop all rows that have values in ‘Supplemental Video Type’ column.

Convert timestamp to your local timezone

I think Netflix uses GMT timezone, so I need to convert the time on ‘Start Time’ column to my local timezone. I live in Western Indonesia, so I set the timezone below to “Asia/Jakarta”, which is GMT+7. You can convert it accordingly based on your location. The list of available time zones can be seen on a StackOverflow thread here.

Convert timestamp to appropriate format

Drop rows with very short durations

Okay, so there are probably times when we open a show/movie on Netflix but a few seconds later we change our mind for whatever reason. Does this only happen to me? Well, I don’t want to count that out, so we drop viewings with less than 5 minutes duration. You can set this to whatever time duration that fits you though.

Data Visualization

O̶k̶a̶y̶ ̶t̶h̶a̶t̶ ̶d̶a̶t̶a̶ ̶c̶l̶e̶a̶n̶i̶n̶g̶ ̶p̶a̶r̶t̶ ̶w̶a̶s̶ ̶b̶o̶r̶i̶n̶g̶, now here comes the fun part! The visualization with graphics and such!

Who watches the most Netflix?

Let’s see the viewing frequency of each profile. Who watches the most Netflix?

There’s also another way to do this that gives the same result.

How long is the duration of shows/movies we watch?

We already converted the duration into minutes before. Let’s try to turn that into a histogram.

T̶h̶e̶ ̶h̶i̶s̶t̶o̶g̶r̶a̶m̶ ̶l̶o̶o̶k̶s̶ ̶a̶w̶f̶u̶l̶,̶ ̶a̶n̶d̶ ̶I̶ ̶d̶o̶n̶’̶t̶ ̶l̶i̶k̶e̶ ̶i̶t̶.̶ So let’s try to do something else. You can categorize the duration (in minutes) into discrete categories, like shows with less than 30 minutes length, shows between 30–60 minutes, and so on.

Then turn that into a graphic.

Yeah, apparently we mostly watch Netflix less than 30 minutes duration. S̶u̶c̶h̶ ̶a̶ ̶s̶h̶o̶r̶t̶ ̶a̶t̶t̶e̶n̶t̶i̶o̶n̶ ̶s̶p̶a̶n̶ ̶i̶n̶d̶e̶e̶d̶!̶

Now if you want to see the duration of viewings each profile, you can do that too!

Where do you access Netflix the most?

There used to be a state owned internet provider in Indonesia that blocked Netflix, and unluckily I use that provider. I̶ ̶c̶a̶n̶ ̶w̶r̶i̶t̶e̶ ̶a̶ ̶w̶h̶o̶l̶e̶ ̶r̶a̶n̶t̶ ̶a̶b̶o̶u̶t̶ ̶t̶h̶e̶ ̶g̶o̶v̶e̶r̶n̶m̶e̶n̶t̶s̶ ̶h̶e̶r̶e̶,̶ ̶b̶u̶t̶ ̶t̶h̶i̶s̶ ̶i̶s̶ ̶n̶o̶t̶ ̶t̶h̶e̶ ̶t̶i̶m̶e̶.̶ At least they are not blocking Netflix anymore today. But the day when they still did, you know what I did.

If you look at that graphic above and think, “nah I can’t see that.” Then you’re right. It’s cases like this when logarithmic scale is actually a good option to use.

I have also painstakingly tried another way, which gives the same result.

Now let’s do that to each profile a̶n̶d̶ ̶s̶e̶e̶ ̶w̶h̶o̶ ̶a̶c̶c̶e̶s̶s̶e̶d̶ ̶N̶e̶t̶f̶l̶i̶x̶ ̶u̶s̶i̶n̶g̶ ̶V̶P̶N̶ ̶t̶h̶e̶ ̶m̶o̶s̶t̶.̶

Since all five of us are from Indonesia, you see that we access Netflix mostly from Indonesia.

That’s so obvious! So let’s remove Indonesia, not from reality though, just from the dataset.

Well hello there, it’s me again. I̶ ̶d̶o̶n̶’̶t̶ ̶w̶a̶n̶t̶ ̶t̶o̶ ̶a̶d̶m̶i̶t̶ ̶t̶h̶a̶t̶ ̶I̶ ̶u̶s̶e̶ ̶V̶P̶N̶ ̶a̶ ̶l̶o̶t̶ ̶t̶o̶ ̶a̶c̶c̶e̶s̶s̶ ̶N̶e̶t̶l̶i̶x̶.̶ ̶L̶e̶t̶’̶s̶ ̶j̶u̶s̶t̶ ̶a̶s̶s̶u̶m̶e̶ ̶t̶h̶a̶t̶ ̶I̶ ̶d̶i̶d̶ ̶i̶n̶d̶e̶e̶d̶ ̶g̶o̶ ̶t̶o̶ ̶t̶h̶o̶s̶e̶ ̶c̶o̶u̶n̶t̶r̶i̶e̶s̶ ̶o̶n̶l̶y̶ ̶t̶o̶ ̶w̶a̶t̶c̶h̶ ̶N̶e̶t̶f̶l̶i̶x̶!̶

My sister (Silev) probably used VPN once or twice. Meanwhile my friend (calico cat) doesn’t appear on the graph at all, which means she never used VPN at all. Now that’s a law abiding citizen.

What device do you use to watch Netflix?

How do you watch Netflix? On a big screen? On a phone screen? On a smart fridge?

Chrome PC is apparently the most common choice to watch Netflix here. And that Linux thing (Chrome and Firefox) is definitely me, because I am the only one who uses Linux here (that may sound like humblebragging, trust me I am not, I̶ ̶a̶m̶ ̶j̶u̶s̶t̶ ̶t̶o̶o̶ ̶c̶h̶e̶a̶p̶ ̶t̶o̶ ̶b̶u̶y̶ ̶o̶r̶i̶g̶i̶n̶a̶l̶ ̶O̶S̶). In case you don’t believe it, let’s prove it here.

My sister watches a lot from her phone, and so do I (dark blue), though I also watch a lot from Chrome Linux (light beige) and Opera (salmon). Meanwhile Chrome PC (light green) is more popular among my friends.

Heat Map

Here comes the actual fun part, heat map! I am still learning about this too, hopefully this is good enough for you, because I got lost a few times when writing the code. At least it’s working now, or so I guess.

For this part I only want to analyze my own data, from El profile, so I am going to focus on that one only. I will make a new dataframe called ‘dfl’.

First, let’s count the viewing frequency and sort that by hour.

You see, there are a lot of dates and hours missing, for example during the time when I didn’t watch Netflix (well I don’t watch Netflix 24/7 you know), meanwhile we need a continuous time series with 1 hour in between. So we will have to fill those missing dates and time with 0s.

After that, we get the date, hour, month, year to the dataframe.

The we make a matrix like dataframe before we can turn it into a heat map.

Let’s define how many hours there are in a day and how many days there are in a week.

Now here is the heatmap!

The lighter the color means the more frequency I watch Netflix. The darker means less. I watch Netflix mostly at night, very rarely during the day, and I guess Sunday is a pretty prominent time, though that’s quite an interesting 20.00 too for a Thursday.

Conclusion

Finally, this article reaches an end too! I only made use one of the files I got from Netflix, but I believe there are still a lot more you can dig from those files. I wish I could analyze and visualize them all, b̶u̶t̶ ̶I̶ ̶a̶m̶ ̶a̶ ̶b̶i̶t̶ ̶t̶o̶o̶ ̶l̶a̶z̶y̶. While I know my writing isn’t perfect, I hope this little piece of writing can help you or inspire you (or even aggravate you) or whatever. Feel free to comment below or contact me on Twitter (it’s on my profile) about what you think or if you need any help.

I write about python programming and data science | catriscode.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store