Exploring Bias in Fandango Movie Ratings
This project explores whether or not Fandango artifically displays higher review ratings to sell more movie tickets, as compared to other rating sites, such as RottenTomatoes or MetaCritic.
- Problem Definition
- Import Libraries & Data
- Exploratory Data Analysis
- Comparison of Fandango Ratings to Other Sites
- Fandango Scores vs. All Sites
- Conclusion
Sources:
- This project was inspired by an article published on the blogsite "FiveThirtyEight" in 2015
- The version of the project below was adapted from a capstone project in Jose Portilla's Udemy course "2022 Python for Machine Learning & Data Science Masterclass"
Skills Demonstrated:
- Exploratory Data Analysis
- Data visualization
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Note: there are 2 primary data sources, both available from 538's github
The first dataset contains every film that has a Rotten Tomatoe's rating, a RT User rating, and a Metacritic score, a Metacritic User score, and IMDb score, and at least 30 fan reviews on Fandango. Data from Fandango was pulled Aug. 24, 2015
all_sites = pd.read_csv("all_sites_scores.csv")
all_sites.info()
The Fandango dataset contains every film 538 pulled from Fandango
-
Film: the movie
-
Stars: the number of stars presented on Fandango.com (0-5)
-
Rating: the Fandango ratingValue for the film, as pulled from the HTML of each page (this is the actual average score the movie obtained)
-
Votes: the number of people who had reviewed the film at the time it was pulled.
fandango = pd.read_csv("fandango_scrape.csv")
First, let's explore the Fandango dataset
fandango.head()
fandango.info()
fandango.describe()
Below, a scatterplot shows the relationship between rating and votes.
plt.figure(figsize=(10,4), dpi=150)
sns.scatterplot(data=fandango, x='RATING', y='VOTES');
Next, the correlation between the columns is shown:
fandango.corr()
Now, a new column (YEAR) is created in the dataframe from the year in the title strings, as the total value counts are visualized.
fandango['YEAR'] = fandango['FILM'].apply(lambda title:title.split('(')[-1])
fandango['YEAR'].value_counts()
p = sns.countplot(data=fandango, x='YEAR');
p.set_title('Total Number of Movies By Year');
The 10 movies with the highest number of votes are:
fandango.nlargest(10,'VOTES')
And the number of movies with zero votes are:
no_votes = fandango['VOTES'] == 0
no_votes.sum()
Now create a new dataframe of only reviewed films (by removing any films that have zero votes)
fan_reviewed = fandango[fandango['VOTES']>0]
The article mentioned above discusses the fact that true user ratings may be slightly different than the rating shown to a user (due to HTML and star rating displays).
-
Can create a KDE plot that displays the distribution of ratings that are displayed (STARS) versus what the true rating was from the votes (RATING)
-
The KDE's are clipped to 0-5
plt.figure(figsize=(10,4), dpi=150)
sns.kdeplot(data=fan_reviewed, x='RATING',
clip=[0,5],
fill=True,
label = 'True Rating')
sns.kdeplot(data=fan_reviewed, x='STARS',
clip=[0,5],
fill=True,
label = 'Stars Displayed')
plt.legend(loc=(1.05,0.5));
So, it seems the stars displayed are slightly higher than the true ratings (which may just be due to how the stars are calculated and displayed in the HTML).
Below, we quantify the discrepency. A new column of the difference between the two is created, and the difference is rounded to the nearest decimal point:
fan_reviewed['STARS_DIFF'] = fan_reviewed['STARS'] - fan_reviewed['RATING']
fan_reviewed['STARS_DIFF'] = fan_reviewed['STARS_DIFF'].round(2)
fan_reviewed
Next, a countplot is used to display the number of times a certain difference occurs
plt.figure(figsize=(12,4), dpi=150)
sns.countplot(data=fan_reviewed, x='STARS_DIFF', palette='magma');
It seems that one move was displaying a 1-star difference from its true rating. That movie is:
fan_reviewed[fan_reviewed['STARS_DIFF'] == 1]
Of course, it only had 2 votes!
Below, the Fandango ratings are compared to other the ratings from other sites.
all_sites.head()
all_sites.describe()
First, the data from Rotten Tomatoes (RT) is examined. RT has 2 sets of reviews: their critics reviews & their user reviews.
Below shows a scatterplot exploring the relationship between these 2 reviews:
plt.figure(figsize=(10,4), dpi=150)
sns.scatterplot(data=all_sites, x='RottenTomatoes', y='RottenTomatoes_User')
plt.xlim(0,100)
plt.ylim(0,100);
Next, the difference between the RT critics and users is quantified, by looking at their difference. A difference of 0 means the reviews match.
all_sites['Rotten_Diff'] = all_sites['RottenTomatoes'] - all_sites['RottenTomatoes_User']
The mean absolute difference between RT critic and user scores is:
all_sites['Rotten_Diff'].apply(abs).mean()
Next, the distribution of the differences between the RT critics and users is displayed (using KDE and histogram).
plt.figure(figsize=(10,4), dpi=200)
sns.histplot(data=all_sites, x='Rotten_Diff', kde=True, bins=25)
plt.title('RT Critics Score minus RT User Score');
Next, the absolute value difference between the RT critics and users scores is shown:
plt.figure(figsize=(10,4),dpi=200)
sns.histplot(x=all_sites['Rotten_Diff'].apply(abs),bins=25,kde=True)
plt.title("Abs Difference between RT Critics Score and RT User Score");
Now, we'll try to find out which movies are creating the largest differences.
First, show the top 5 movies with the largest negative difference between Users and RT critics (meaning that users rated the movie much higher on average than the critics did.)
print('Users Love but Critics Hate')
all_sites.nsmallest(5, 'Rotten_Diff')[['FILM', 'Rotten_Diff']]
Now, the top 5 movies where critics scored the move higher than users are shown:
print("Critics love, but Users Hate")
all_sites.nlargest(5,'Rotten_Diff')[['FILM','Rotten_Diff']]
Next, the ratings for MetaCritic are explored. Like Rotten Tomatoes, MetaCritic also lists an official (critic) and user ratings.
plt.figure(figsize=(10,4), dpi=150)
sns.scatterplot(data=all_sites, x='Metacritic', y='Metacritic_User')
plt.xlim(0,100)
plt.ylim(0,10);
Finally, the data for IMDB are explored.
Note that both MetaCritic and IMDB report back vote counts.
Below, a scatterplot shows the relationship between vote counts on MetaCritic versus vote counts on IMDB:
plt.figure(figsize=(10,4), dpi=150)
sns.scatterplot(data=all_sites, x='Metacritic_user_vote_count', y='IMDB_user_vote_count');
Note that there are 2 outliers. The movie with the highest vote count on IMDB only has 500 Metacritic ratings.
That movie is:
all_sites.nlargest(1,'IMDB_user_vote_count')
The movie with the highest Metacritic User Vote count:
all_sites.nlargest(1, 'Metacritic_user_vote_count')
Finally, the question of whether or not Fandango artifically displays higher ratings than warranted is explored.
Below, the Fandango table is combined with the All Sites table. Since some Fandango movies have very little or no reviews, they are not included. An inner merge is used to join the tables.
df = pd.merge(fandango, all_sites, on='FILM', how='inner')
df.info()
df.head()
Notice that RT, Metacritic, and IMDB don't use a score between 0-5 stars like Fandango does.
In order to do a fair comparison, we need to normalize these values so they all fall between 0-5 stars, and the relationship between the reviews stays the same.
df['RT_Norm'] = np.round(df['RottenTomatoes']/20,1)
df['RTU_Norm'] = np.round(df['RottenTomatoes_User']/20,1)
df['Meta_Norm'] = np.round(df['Metacritic']/20,1)
df['Meta_U_Norm'] = np.round(df['Metacritic_User']/2,1)
df['IMDB_Norm'] = np.round(df['IMDB']/2,1)
df.head()
Now, a norm_scores DataFrame is created that only contains the normalized ratings. Both STARS and RATING from the original Fandango table are included:
norm_scores = df[['STARS','RATING','RT_Norm','RTU_Norm','Meta_Norm','Meta_U_Norm','IMDB_Norm']]
norm_scores.head()
Now the question of whether or not Fandango displays abnormally high ratings can be answered.
Are Fandango's ratings themselves higher than average?
Below, a plot is created showing the normalized ratings across all sites, using a KDE plot in seaborn:
def move_legend(ax, new_loc, **kws):
old_legend = ax.legend_
handles = old_legend.legendHandles
labels = [t.get_text() for t in old_legend.get_texts()]
title = old_legend.get_title().get_text()
ax.legend(handles, labels, loc=new_loc, title=title, **kws)
fig, ax = plt.subplots(figsize=(15,6),dpi=150)
sns.kdeplot(data=norm_scores,clip=[0,5],shade=True,palette='Set1',ax=ax)
move_legend(ax, "upper left")
It's pretty obvious from this that Fandango has an uneven distribution.
It seems that RT critics have the most uniform distribution.
Below, the RT critic ratings are compared against the STARS displayed by Fandango:
fig, ax = plt.subplots(figsize=(15,6),dpi=150)
sns.kdeplot(data=norm_scores[['RT_Norm','STARS']],clip=[0,5],shade=True,palette='Set1',ax=ax)
move_legend(ax, "upper left")
It seems that Fandango rates all films 2.5 stars or higher, even though there is a more uniform distribution for RT critics.
Below, a clustermap shows all of the normalized scores.
sns.clustermap(norm_scores, cmap='magma', col_cluster=False);
This too shows that Fandango, unlike the other rating sites, has almost no low-starred movies (note the lack of dark shading in the first two columns, representing STARS and RATINGS in Fandango)
Based off the Rotten Tomatoes critic ratings, what are the top 10 lowest rated movies, and what are the normalized scores across all plaforms for these movies?
norm_films = df[['STARS','RATING','RT_Norm','RTU_Norm','Meta_Norm','Meta_U_Norm','IMDB_Norm','FILM']]
norm_films.nsmallest(10, 'RT_Norm')
Finally, the distribution of ratings across all sites for the top 10 worse movies is visualized:
print('\n\n')
plt.figure(figsize=(15,6),dpi=150)
worst_films = norm_films.nsmallest(10,'RT_Norm').drop('FILM',axis=1)
sns.kdeplot(data=worst_films,clip=[0,5],shade=True,palette='Set1')
plt.title("Ratings for RT Critic's 10 Worst Reviewed Films");
Note that Fandango is showing 3-4 star ratings for films that are clearly bad according to the other rating sites.
Thus, at least when this data was pulled in 2015, Fandango is not to be trusted with ratings!