Curve Fitting
Film production companies use different techniques to predict movie sales. We’ll
consider predicting the final worldwide box office sales of movies based on the first
weekend box office sales and the rating of the film (by IMDb).
• Why would first weekend sales as the only variable predicting final box office sales?
• Why do we think I’ve used movies from 2017 and not movies from different years?
Let’s assume a model of the form
F = a0 + a1 S + a2 R
where F is the final box office sales, S is the first weekend sales, R is the IMDb rating,
and a0, a1, a2 are coefficients we’ll choose to minimize some measure of e
or.
• What would a good measure of the e
or be? What would the e
or be a function
of?
What would a good set of equations be to solve for the coefficients a0, a1, a2?
Movies from 2017
(a simpler time)
First weekend
Box Office ($USD)
Final Worldwide
Box Office ($USD)
Rating
(IMDb)
Baby Driver 20,553,320 227,250,102 7.6
Baywatch 18,503,871 175,863,783 5.5
Beauty and the Beast 174,750,616 1,273,109,220 7.1
Blade Runner XXXXXXXXXX,753,122 258,829,058 8.0
Despicable Me 3 72,434,025 1,032,596,894 6.3
Emoji Movie 24,531,923 216,564,840 3.3
It 123,403,419 701,083,042 7.3
Murder on the Orient Express 28,681,472 351,767,147 6.5
Spiderman: Homecoming 117,027,053 878,346,440 7.4
Star Wars: Episode VIII 220,009,584 1,331,635,141 6.9
1
Day 3 lecture: More Curve Fitting 2
This set of equations is over-determined; that is, we have more equations than vari-
ables to solve for. In practice, we’ll find that there is no solution to over-determined
systems, but we’ll find something that best represents the solution by linear least
squares.
Listing 1. Movie Sales
1 F = [227.25; 175.86; XXXXXXXXXX; 258.83; XXXXXXXXXX; ...
XXXXXXXXXX; 701.08; 351.77; 878.35; XXXXXXXXXX; ];
3
4 S = [20.55; 18.50; 174.75; 32.75; 72.43; ...
5 24.53; 123.40; 28.68; 117.03; 220.01; ];
6
7 R = [7.6; 5.5; 7.1; 8.0; 6.3; ...
8 3.3; 7.3; 6.5; 7.4; 6.9; ];
9
10 M=[ones(size(S)) S R];
11 a=M\F;
12
13 scatter3(S,R,F,100,[0 0 0],'filled','o')
14 hold on
15 x=0:25:250; y=2:.5:9;
16 [X,Y]=meshgrid(x,y);
17 Z=a(1)+a(2)*X+a(3)*Y;
18 surf(X,Y,Z)
19
20 FPredLin =@(S,R) a(1)+a(2)*S+a(3)*R;
21 predLin(:) = FPredLin(S(:),R(:));
22 diffLin = abs(predLin'�F);
23 e
orLin = sum(diffLin.^2);
Day 3 lecture: More Curve Fitting 3
How could we set up a second-order polynomial (in two variables) to fit the data.
• What would the general form of the fit look like?
• What would a system look like to solve for the coefficients in the general form?
• GROUPWORK: Set up a system and find the second-order fit in Matlab.
• Find the e
or of the second-order fit.
• Use each model to predict the final box office sales for Wonder Woman, a movie
with an IMDB rating of 7.4 and first weekend box office sales of $103,251,471.
(You can compare your solution to the actual final box office sales: $818,058,22.)
• Overall, is your model good? Comment on if the predicted relationships make
sense.
• Using some of the additional data provided (production budget, opening weekend
theaters, etc.), create a new second-order model in two variables. (You can use
any two of the variables from the box office information or the IMDb ratings, but
you should spend a little time justifying your choices. That is, using ‘domestic
ox office’ and ‘inflation adjusted domestic box office’ would not be a good choice
since these two variables are essentially the same thing.)
• Determine if your new model is better than the second-order fit you found above.