STAT 430/830: Final Project
STAT 430/830: Final Project
DUE: Friday August 14 by 11:59pm EST
PREAMBLE
Netflix, at one time just an online DVD rental service, has become a titan in the entertainment industry.
While predominantly a streaming service, Netflix has also become well-known for its original programming
such as the Stranger Things television series, or the Oscar-nominated film Ma
iage Story.
The success of Netflix is due, in part, to their well-known data-driven culture. Enmeshed within this
culture is a strong appreciation for, and exploitation of, designed experiments. Netflix’s home-grown ABlaze
experimentation platform is well-known in the industry for its sophistication and the “wins” it has helped
them achieve. It is perhaps unsurprising, then, that Netflix is a leader in online-experimentation. Though not
ecent, this job ad from 2016 for a Senior Data Scientist illustrates the organization’s experimental maturity.
In this role, you would “design, run, and analyze A/B and multivariate tests”, “analyze experimental data
with statistical rigor”, and “adapt existing methods such as Response Surface Methodology (RSM) to online
A/B testing”.
In this project you will embark on a Netflix-inspired experimental journey with a hypothetical problem and a
web-based response surface simulator.
THE PROBLEM
In this project you will be concerned with optimizing the www.netflix.com homepage by way of minimizing
owsing time. For those unfamiliar with Netflix, a screenshot of the homepage is included above. As is
depicted in the screenshot, the homepage is laid out in a grid system in which movies and TV shows appea
as tiles with rows differing with respect to some categorization. Though not depicted in the screenshot, when
1
www.netflix.com
https:
www.tibco.com
log/2018/06/27/a-culture-of-analytics-why-amazon-netflix-succeed-while-others-fail
https:
netflixtechblog.com/its-all-a-bout-testing-the-netflix-experimentation-platform-4e1ca458c15
https:
www.linkedin.com/jobs/view/senior-data-scientist-streaming-experimentation-and-modeling-at-netflix XXXXXXXXXX
www.netflix.com
one hovers their mouse over a tile, its size is enlarged and a preview of the show/movie is automatically
played in the enlarged window.
When faced with so many viewing options, Netflix users often experience choice-overload and can be overcome
y a psychological phenomenon known as decision paralysis. The problem is that it becomes harder to make
a decision, and it takes longer to make a decision, when faced with a large number of options to choose from.
Decision paralysis negatively impacts Netflix because a user may become overwhelmed by all of the options
and fatigued by the prospect of making a choice, and may ultimately lose interest and not watch anything.
To overcome this, Netflix tries to help you choose what to watch, and by a variety of mechanisms tries to help
you choose quickly. Of relevance is
owsing time – the length of time a user spends
owsing (as opposed
to watching) Netflix. Ideally,
owsing time and, in particular, average
owsing time would small. In this
project you will conduct a series of experiments to learn what influences
owsing time and how that may be
exploited in order to minimize average
owsing time. There are infinitely many things that likely influence
the amount of time someone spends
owsing Netflix, but just four factors will be explored in this project.
Each of these is described below.
• Tile Size: The ratio of a tile’s height to the overall screen height. Note the tile’s aspect ratio is fixed
so changing this factor changes the size of the tile, but not its shape. Smaller values co
espond to a
larger number of tiles visible on the screen, and larger values co
espond to fewer visible tiles.
• Preview Size: The ratio of the preview window’s height to the overall screen height. Note the preview
window’s aspect ratio is fixed so changing this factor changes the size of the window, but not its shape.
Smaller values co
espond to a smaller viewing window, and larger values co
espond to a larger viewing
window.
• Preview Length: The duration (in seconds) of a show or movie’s preview.
• Top Row: The viewing category of a user’s first row of tiles.
The table below summarizes the design space for each of these factors, and the default values they take on
when not being experimented with.
Factor Code Name Region of Operability Default Value
Tile Size Tile.Size [0.1,0.5] 0.2
Preview Size Preview.Size [max(Tile.Size, 0.2), 0.8] 0.5
Preview Length Preview.Length [30, 120]† 75
Top Row Top.Row TC, NO‡ TC
† For purposes of experimentation Prev.Length can only be changed in increments of 5 seconds
‡ TC stands for Top 10 in Canada and NO stands for Netflix Originals
Through a series of experiments you will seek to determine which of these factors significantly influences
owsing time, and you will attempt to find an optimal configuration of them that minimizes expected
owsing time. You will do this by interacting with a web-based simulator, into which you will submit
experimental designs and out of which you will receive response observations.
The remainder of this document provides guidelines for using the simulator, an overview of the sequential
experimentation process you will undertake, and a description of the deliverable that you must submit. An
outline of the marking scheme is included as an Appendix to make clear my expectations and to make
transparent the manner in which you will be graded.
2
https:
psmag.com/social-justice/i-have-been-staring-at-this-screen-for-hours-please-help-me-how-did-i-get-here-is-this-what-i-really-want-to-read
THE SIMULATOR
The response surface simulator can be accessed at the following URL:
https:
nathaniel-t-stevens.shinyapps.io/Netflix_Simulato
The interface (pictured above) and the manner in which you interact with it is straightforward: you upload a
design matrix and then collect your results. Interaction with the simulator should include three distinct steps:
1. Upload a .csv file containing your design matrix. The .csv file must adhere to the following formatting
guidelines:
• The file name must be your 8-digit student number, i.e., XXXXXXXXXXcsv. Any file name other than
this will result in an e
or.
• The columns co
espond to design factors with headings Tile.Size, Prev.Size, Prev.Length,
Top.Row. Any heading other than these will result in an e
or. The order of the headings does not
matter. You do not need to experiment with every factor in every experiment.
• Each row co
esponds to a distinct experimental condition, and each element indicates the level of
the co
esponding factor.
• Factor levels must be in natural units.
2. Click the “Visualize my Design” button. This will render a plot of the design space and indicate the
experimental conditions you plan to run.
• If the design is not the one you intended, you may reset the simulator (by clicking the “Reset”
utton) and upload a different design matrix.
• If there is anything amiss with the file you uploaded, an e
or (instead of a plot) will be returned.
3. Supposing you are happy with the design, click the “Run the Experiment” button. This will generate
n = 250
owsing times (in minutes) for each condition. The results will be automatically downloaded
in a .csv file.
• Remark 1: This mimics the random assignment of n = 250 users to each condition and the
observation of their response variable.
• Remark 2: You may assume without justification that n = 250 is a sufficient sample size in each
condition for the task at hand.
• Remark 3: You may assume that
owsing time observations do not include the amount of time
spent watching previews;
owsing time is simply the number of minutes spent scrolling and
searching.
3
https:
nathaniel-t-stevens.shinyapps.io/Netflix_Simulato
THE EXPERIMENTS
Your experimental journey will consist of three phases as outlined below. Note that STAT 430 students may
ignore the Top.Row factor for the entirety of this project. The STAT 830 students, however, must consider all
four factors.
PHASE I: Factor Screening
Use a two-level experiment (i.e., 2K factorial or 2K−p fractional factorial) to determine which factors
significantly influence the response. A factor deemed insignificant can be ignored in all subsequent phases of
experimentation.
STAT 430 Instructions
You will experiment with three factors: Tile.Size, Prev.Size, Prev.Length. The low and high levels of
these factors (for this experiment) are shown below.
Factor Low High
Tile.Size XXXXXXXXXX
Preview.Size XXXXXXXXXX
Preview.Length 30 90
Using the data collected from your two-level experiment, determine which factors significantly influence
owsing time. Be sure to include formal hypothesis tests and main effect plots in your analysis.
STAT 830 Instructions
You will experiment with all four factors: Tile.Size, Prev.Size, Prev.Length, Top.Row. The low and high
levels of these factors (for this experiment) are shown below.
Factor Low High
Tile.Size XXXXXXXXXX
Preview.Size XXXXXXXXXX
Preview.Length 30 90
Top.Row TC NO
Using the data collected from your two-level experiment, determine which factors significantly influence
owsing time. Be sure to include formal hypothesis tests and main effect plots in your analysis.
PHASE II: Method of Steepest Descent
Considering only those factors deemed to significantly influence
owsing time in Phase I, perform a method
of steepest descent analysis to move from the initial region of experimentation toward the vicinity of the
optimum. Note that this may require intermediate two-level designs to reorient toward the optimum. You
will find tests for curvature and a plot of average
owsing time vs. step number useful.
PHASE III: Response Optimization
Once you are confident that you are in the vicinity of the optimum, conduct a central composite design and
use a second order response surface model to identify the location of the optimum (i.e., the factor levels
that minimize expected
owsing time). Report the estimate and a 95% confidence interval for the expected
owsing time at this location.
4
THE DELIVERABLE
You will prepare and submit a report (saved as a five separate .pdf files) via Crowdmark by the due date
listed at the top of this document. The five files constituting the report will be based on the following
elements:
• File #1: Executive Summary (1 page max)
– Summary of the problem, your experimental journey, and the ensuing findings.
– Be sure to state the location and value of the optimum.
• File #2: Introduction (2 pages max)
– Describe in your own words the problem you are trying to solve
– Describe in your own words the goals of response surface methodology
• File #3: Factor Screening (2 pages max)
– Explain your factoring screening experiment through the lens of QPDAC. State the objective,
explain your design, collect the data, analyze the data, and draw a conclusion.
– Be sure to justify any decisions you made in either the design or the analysis. For instance, why
did you use a 2K factorial experiment as opposed to a 2K−p fractional factorial experiment (o
vice versa)?
– Be sure to include visual and/or tabular summaries of the experiment.
• File #4: Method of Steepest Descent (2 pages max)
– Explain your MSD experiments through the lens of QPDAC. State the objective, explain you
design, collect the data, analyze the data, and draw a conclusion.
– Be sure to justify any decisions you