Visualizing My Netflix Viewing Activity with Python and Matplotlib
Let me start this post by stating that this is my first post on Medium, s̶̶̶o̶̶̶ ̶̶̶b̶̶̶e̶̶̶ ̶̶̶p̶̶̶r̶̶̶e̶̶̶p̶̶̶a̶̶̶r̶̶̶e̶̶̶d̶̶̶ ̶̶̶f̶̶̶o̶̶̶r̶̶̶ ̶̶̶a̶̶̶n̶̶̶ ̶̶̶a̶̶̶w̶̶̶f̶̶̶u̶̶̶l̶̶̶ ̶̶̶s̶̶̶h̶̶̶i̶̶̶t̶̶̶ ̶p̶̶̶o̶̶̶s̶̶̶t̶̶̶ ̶̶̶b̶̶̶y̶̶̶ ̶m̶̶̶e̶̶̶.̶
Netflix subscribers probably know that they can see their viewing activity on each profile in their account. You can find it on ‘Viewing activity’ under each profile when you click on ‘Account’ from the home page. When I first came across this page, I wonder what I can do with this simple information of my viewing activity which only consists of date and the title of the shows/movies. There’s not much that you can do about it, but I am curious anyway, so let’s see what I can do with this data.
I came up with some questions like, “What shows do I spend the most time with?”, “Which genres do I like the most?” “Which actor/actress do I watch the most?”, “How is my viewing behavior?” “How much time did I spend on Netflix?”, etc. T̶h̶e̶n̶ ̶u̶n̶l̶u̶c̶k̶i̶l̶y̶,̶ ̶n̶o̶n̶e̶ ̶o̶f̶ ̶t̶h̶e̶s̶e̶ ̶q̶u̶e̶s̶t̶i̶o̶n̶s̶ ̶g̶o̶t̶ ̶a̶n̶s̶w̶e̶r̶e̶d̶.̶
First, you can download all the Netflix viewing activity data available here, and it will give you a file called ‘NetflixViewingHistory.csv’ that you can save on your disk. Then, open this CSV file using python as follows, but first don’t forget to import the needed library first, like pandas.
import pandas as pddf = pd.read_csv('NetflixViewingHistory.csv')
You see, the thing with this Netflix viewing history data as shown in the table above is there is no way to separate the title of a show and its episode title. I came up with not-so-perfect solution to get the title of the show only, by separating the title from its episode by a colon “:”.
I swear the following code is not the best way to do it, but it got the result I needed for now, so I will just take it. Maybe someday I will come up with better solution, or even better, better data set!
df[‘show_title’] = [s.partition(‘:’) for s in df.Title]my_titles = list(df[‘show_title’])
It gives me the result that I want, separating the title of the show from its episode title. So now we have the titles like Community, The Blacklist, Forensic Files, etc. What do we do next now? We want to get the top ten shows that I watched and visualize it. Let’s get the top ten shows first using the value_counts() function.
top_views = pd.Series(my_titles).value_counts().nlargest(10)
Now, let’s visualize it using matplotlib.
import matplotlib.pyplot as plt
import numpy as npN = len(top_views)
x = np.arange(N)colors = plt.get_cmap('viridis')plt.figure(figsize=(10,5))
plt.bar(top_views.index, top_views.values, color=colors(x/N))
plt.xlabel("Show titles", fontsize=12)
plt.xticks(rotation=30, ha=”right”, fontsize=11)
plt.title("My Top 10 Shows on Netflix based on My Viewing Activity", fontsize=16)plt.savefig("top 10 shows bar.png", dpi=300, bbox_inches='tight')
My top 10 shows include a variety of genres. Friends, Community, Brooklyn Nine-Nine, and The Good Place are what people consider sitcoms, and yes I do love sitcoms and I will rewatch them too from time to time. Meanwhile, I also watch crime shows like Forensic Files and Mindhunter, a̶n̶d̶ ̶h̶o̶p̶e̶f̶u̶l̶l̶y̶ ̶N̶e̶t̶f̶l̶i̶x̶ ̶d̶o̶e̶s̶n̶’̶t̶ ̶t̶h̶i̶n̶k̶ ̶I̶ ̶a̶m̶ ̶t̶r̶y̶i̶n̶g̶ ̶t̶o̶ ̶c̶o̶m̶m̶i̶t̶ ̶a̶ ̶c̶r̶i̶m̶e̶ ̶b̶e̶c̶a̶u̶s̶e̶ ̶I̶ ̶a̶m̶ ̶f̶a̶s̶c̶i̶n̶a̶t̶e̶d̶ ̶w̶i̶t̶h̶ ̶t̶h̶i̶s̶ ̶k̶i̶n̶d̶ ̶o̶f̶ ̶s̶h̶o̶w̶s̶.̶
There are also sci-fi shows like Dark and Stranger Things (before any die-hard Dark fans attack me for implying Dark and Stranger Things are similar, let me tell you they are not similar at all). Then, there is Anne with an E, which is not something that I usually watch, but the books it was based on were quite special to me as a kid, so I had to watch it anyway. I am not sure how to describe it, but it’s a children show, though it’s not childish in any kind of way. It’s a very touching coming-of-age story that needs to be watched by anyone!
If you think the above bar chart is awful, then that’s because I agree with you, so let’s make another awful graphic!
Now let’s convert the date in the column ‘date’ to real date format that can actually be read by python.
from datetime import datetimedf['date'] = pd.to_datetime(df['Date'])
Now we have stored the date in a new column. Then, we want to convert those dates into days in a week, because we want to visualize the viewing frequency in a week.
df[‘day’] = [d.weekday_name for d in df.date]
You can actually immediately visualize the day counts from the column ‘day’ above, but the result wouldn’t be in order. And I want the day to be sorted in order, so I add the following code first.
cats = [‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’, ‘Saturday’, ‘Sunday’]
df.day = pd.Categorical(df[‘day’], categories=cats, ordered=True)
by_day = df.sort_values(‘day’)[‘day’].value_counts().sort_index()
The result will give you numbers of viewing in each day in a week. Then, let’s visualize it.
plt.style.use(‘seaborn-darkgrid’)N = len(by_day)
x = np.arange(N)
colors = plt.get_cmap(‘winter’).reversed()plt.figure(figsize=(10,5))
plt.bar(by_day.index, by_day.values, color=colors(x/N))
plt.title(“My Netflix Viewing Activity Pattern by Day”, fontsize=20)
plt.xlabel(“day of the week”, fontsize=15)
plt.ylabel(“freq”, fontsize=15)plt.savefig("freq by day.png", dpi=300, bbox_inches='tight')
By now you probably realize that I am awful with colors in graphic or anything that remotely needs a little bit of art and aesthetics, but whatever. Anyway, we can see now that I apparently spend a lot of Netflix time during the weekends, as expected.
So, let’s make another graphic. A timeline that shows time range would be good. Here I am counting the numbers of Netflix viewing for each date presented in the column ‘date’.
by_date = pd.Series(df[‘date’]).value_counts().sort_index()
by_date.index = pd.DatetimeIndex(by_date.index)
Then, I create a data frame that stores the date and the number of viewings on that day.
df_date = by_date.rename_axis(‘date’).reset_index(name=’counts’)
As you can see above, there are gaps in between the dates. For example there is no data for 2019–11–28 because maybe I had life on that day, and thus didn’t watch Netflix. Now I create an index for my date range to fill all the empty days where I didn’t watch Netflix. Surprise to you, I apparently still have life because I don’t watch Netflix every single day.
idx = pd.date_range(min(by_date.index), max(by_date.index))
s = by_date.reindex(idx, fill_value=0)
As shown above, it fills the days when I didn’t watch Netflix at all with 0s. Then I create a new graphic with matplotlib.
plt.title("My Netflix Viewing Activity Timeline", fontsize=20)
plt.ylabel("freq", fontsize=15)plt.savefig("timeline.png", dpi=300, bbox_inches='tight')
I have no idea if the graphic above makes any sense, but it is kind of interesting actually. It visualizes the ups and downs of my frequency in watching Netflix since I started subscribing. There were days when I didn’t watch anything at all. There were also days where I watched a lot of things. Generally, I probably watch less Netflix nowadays compared to the first early days when I just started subscribing, but the difference wasn’t that significant anyway.
Anyway, in case you don’t know, I don’t consider myself a binge-watcher. I like to watch 2 or 3 episodes at most in a day.
Now how about we visualize each top 10 shows above in its own timeline.
plt.style.use(‘seaborn-darkgrid’)idx = pd.date_range(min(df[‘date’]), max(df[‘date’]))
palette = plt.get_cmap(‘tab10’)num = 0
for title in top_views.index:
num += 1
plt.ylim(-1.0, 15)show = df[df[‘show_title’]==title]
showly = show[‘date’].value_counts().sort_index()
s = showly.reindex(idx, fill_value=0)
plt.plot(s.index, s.values, marker=’’, color=palette(num%8), linewidth=2.5, alpha=0.9, label=title)
plt.title(title, loc=’left’, fontsize=12, fontweight=0)plt.suptitle(“My top 10 viewing activity on Netflix”, fontsize=20, fontweight=0, color=’black’, y=1.05)plt.tight_layout()
plt.savefig(“viewing activity.png”, dpi=300, bbox_inches=’tight’)plt.show()
I tend to watch Friends a lot and continuously. Some time when I started watching Community, I forgot Friends a little bit, then recently I watch Friends again. As for Community, this show finally made its way onto Netflix on April, and since then I kept rewatching this show (#sixseasonsandamovie). Brooklyn Nine-Nine was quite different from the others, I binged it in a few weeks then didn’t touch it again.
There are also other shows that I tend to binge in a few days or weeks, like Stranger Things, Anne with An E and The Haunting of Hill House. I remember watching The Haunting of Hill House in the span of a few days, very rare for me, that I ended up not sleeping well because of the horror images in my head. Meanwhile, the spike you see on Dark is when I watched the entirety of Season 3 as soon as it came out because I was afraid I would forget it if I didn’t finish it in a few days.
Forensic Files was something different. I discovered this show after I discovered I̶ ̶h̶a̶v̶e̶ ̶a̶ ̶t̶a̶s̶t̶e̶ ̶i̶n̶ ̶t̶r̶u̶e̶ ̶c̶r̶i̶m̶e̶ ̶a̶n̶d̶ ̶m̶u̶r̶d̶e̶r̶s̶ I am fascinated by forensic psychology after watching Mindhunter. So, you probably notice that I only started watching Forensic Files after I started watching Mindhunter first. And yes, this show is so excellent, you should watch it. It’s based on the true story of the people who built the Behavioral Science Unit in FBI that started the study on serial killers and criminal profiling. Plus it’s made by David Fincher, one of my favorite filmmakers! Maybe if more people watch it, he will give us the promised Season 3 (and 4 and 5 too!).
The Good Place is a show that I gave up after one season, so you can see the lack of activity on that one. I adore Kristen Bell, but I guess the show just wasn’t for me.
Well, I guess that is all the awful graphics and code I have written combined with my uncalled opinions on some Netflix shows. Right now you probably feel confused, angry, or even satisfied for whatever reason that I don’t know. Feel free to fight me, either for my bug-ridden code or for my taste in television series.
I have also recently requested a more complete data for my account from Netflix, but so far there is no e-mail yet. I don’t know what kind of data or information I can get here. Hopefully, if I can get this data, I can make a more thorough analysis on my Netflix viewing activity. Maybe even some of the questions that I asked above can finally be answered.
Thanks for reading my article (and thanks for bearing my cheeky sense of humor), folks! Have a nice day!
Edit (26 August 2020):
I have written another article as a continuation to this article, which you can access here.