CS 1103 – Introductory Programming for Engineers and Scientists – Spring 2023
Project
This year the subject of your project will be The Movie DB (https:
www.themoviedb.org/) which has data
similar to that of IMDB. From the class project webpage, you will need to download the database.mat file.
This contains a subset of the movie database in the form of four MATLAB a
ays. Specifically, it has all
easonably popular mostly US affiliated movies made from 2000 through 2021 and all the actors and
directors associated with them. Note that this is user contributed data, so it is neither complete nor
guaranteed to be absolutely co
ect. Put the file in a new blank folder and issue the following commands:
clear; load database
If you now type the whos command, this is what you should see:
whos
Name XXXXXXXXXXSize XXXXXXXXXXBytes Class Attributes
actors XXXXXXXXXX135155x XXXXXXXXXX XXXXXXXXXXdouble
directors XXXXXXXXXX6340x XXXXXXXXXX XXXXXXXXXXdouble
movies XXXXXXXXXX1x XXXXXXXXXX XXXXXXXXXXstruct
persons XXXXXXXXXX1x XXXXXXXXXX XXXXXXXXXXstruct
Notice that there are over 80,000 people and over 5,700 movies in the database. The persons a
ay has
a struct for every person in the database. For example:
persons(1)
ans =
struct with fields:
name: 'George Lucas'
DOB: ' XXXXXXXXXX'
If the DOB is not available, and for many people, it is not, it will be an empty string.
The movies a
ay has one struct for every movie in the database. For example:
movies(1)
ans =
struct with fields:
XXXXXXXXXXtitle: 'Gladiator'
XXXXXXXXXXmpaa: [1×0 char]
release: XXXXXXXXXX
rating: 8.1000
votecount: 10921
countries: 'GB US'
The mpaa field is a string for the MPAA rating of the movie (G, PG, PG-13, R, NC-17). However, it is not
available for most of the movies. The countries string contains the two-letter country codes separated by
space.
Note that the index into the persons and movies a
ays are very important: the index is the
ID of the given person or movie. This number is used to identify them in the other two a
ays of the
database. The actors a
ay has two columns. Each row identifies an “acting” relationship: the first
column is the movie ID, the second column is the person ID. For example, picking a random row:
actors(68611,:)
ans =
XXXXXXXXXX3432
movies(2034).title
ans =
'Hidden Figures'
persons(3432).name
ans =
'Octavia Spencer'
This means that the movie “Hidden Figures” has an ID of 2034, and actress Octavia Spencer (who has the
ID 3432) played in the movie.
Similarly, the directors a
ay specifies directing information. For example:
directors(2746,:)
ans =
XXXXXXXXXX8868
movies(2527)
ans =
struct with fields:
XXXXXXXXXXtitle: 'Avengers: Endgame'
XXXXXXXXXXmpaa: 'PG'
release: XXXXXXXXXX
rating: 8.3000
votecount: 11209
countries: 'US'
persons(8868)
ans =
struct with fields:
name: 'Anthony Russo'
DOB: ' XXXXXXXXXX'
That is, Anthony Russo directed “Avengers: Endgame”. Note that a movie may have more than one
director (in this case, Joe Russo). Also note that the actual indices into the directors and actors
a
ays do not contain any significant information. These a
ays are just collections of relationship
information between movies and persons. However, they are typically stored in order of importance: the
first entry for a particular movie will specify the first credited acto
actress.
To illustrate how to use the information in the database, consider the following function:
function films = search_for_movie(movies,keyword)
films = [];
keyword = upper(keyword); % to avoid capitalization issues
for ii = 1:length(movies)
XXXXXXXXXXif ~isempty(strfind(upper(movies(ii).title),keyword))
XXXXXXXXXXfilms(end+1) = ii;
XXXXXXXXXXend
end
end
This function finds all movies in the database whose title contains the keyword provided. For example:
search_for_movie(movies,'Basterd')
ans =
920
movies(920)
ans =
struct with fields:
XXXXXXXXXXtitle: 'Inglourious Basterds'
XXXXXXXXXXmpaa: [1×0 char]
release: XXXXXXXXXX
rating: 8.1000
votecount: 13686
countries: 'DE US'
If there is no matching movie, the function returns an empty a
ay. You are free to use this
function in your project.
You will receive an individual assignment of 4 function problems out of the list of 20 below. Some
have a single output argument, some have more than one. When asked for a person or movie, provide
the ID of the person or movie, that is, the index in the persons or movies a
ay and NOT the struct.
Do NOT load the database inside your functions! Pass whatever a
ay is needed by the function as an
input argument just like in the example above.
Note that some of the problems may require iterating through the various a
ays multiple times.
However, none of these functions should run for more than 20 seconds. The TAs will not have time to wait
for each and every function to finish. Therefore, if your function runs longer than 20 seconds, it will be
stopped and the solution will count as inco
ect. Also, you may write additional helper functions, just
make sure that you submit all m files needed to run your solutions! Before you submit, copy all the files
you think you need into a blank folder and test your solution. Submit all these files except for the
database.mat file.
The difficulty level of the various problems are as follows (note that everybody gets one problem from
each of the groups below):
1- 5: easy
6-10: moderate
11-15: getting harder
16-20: hard
Problems (use the function definition exactly as specified):
1. Return the cast list (i.e., list of actors) of any movie. The movie ID is an input argument and the output
is a vector of person IDs.
function people = movie_cast(actors,movie)
2. Return the list of directors of any movie. The movie ID is an input argument and the output is a vector
of person IDs.
function people = movie_directors(directors,movie)
3. Return the list of movies a given person played in. The person ID is an input, while the output is a
vector of movie IDs.
function films = movies_by_actor(actors,actor)
4. Return the list of movies a given person directed. The person ID is an input, while the output is a
vector of movie IDs.
function films = movies_by_director(directors,director)
5. Which movies are there in the database with an MPAA rating of R? Return a vector of movie IDs.
function films = r_rated(movies)
6. Which movies are rated highest? Return a list of movie IDs that have the highest rating value. Do not
hardcode it to 10 just because in this database it is 10! Note that there may be a single movie or
multiple movies with the same maximum rating.
function films = highest_rated(movies)
7. Return the list of movies credited to a given country. The two-letter country code is an input and the
output is a vector of movie IDs.
function films = movies_by_country(movies,country)
8. Which movies in the database have the most countries associated with it? Return a list of movie IDs.
Note that there may be a single movie or multiple movies with the same number of countries
credited.
function films = most_countries(movies)
9. Who is the oldest person (today) to direct a movie in the database? Only consider the birthyear.
Disregard people with missing DOBs. Return a list of person IDs if there are multiple people with the
same age.
function people = oldest_director(directors,persons)
10. Return the list of movies a given director directed a given actor. The IDs of the director and the actor
are input arguments and a vector of movie IDs is the output.
function films = actor_directed_by(directors,actors,director,actor)
11. Who played in the highest number of highly rated movies? Highly rated is defined as a rating greater
than or equal to rating, an input argument. Return a vector of person IDs if there is a tie. A second
output is the number of highly rated movies the person(s) played in.
function [people number] = most_great_movies(movies,persons,actors,rating)
12. Which countries appear in the database? Return a character a
ay where each row contains the two-
letter country code of a unique country. That is, repeated entries are not allowed.
function states = countries(movies)
13. Return the top N most prolific directors. The person who directed the most movies in the database
is the most prolific. N is an input argument, while there are two output vectors: the first contains the
ID-s of the people who directed the most movies in decreasing order, the second is the co
esponding
number of movies the given person directed.
function [people num_films] = most_prolific_director(directors,N)
14. Return the top N most prolific actors. The person who played in the most movies in the database is
the most prolific. N is an input argument, while there are two output vectors: the first contains the
ID-s of the people who played in the most movies in decreasing order, the second is the
co
esponding number of movies the given person played in.
function [people num_films] = most_prolific_actor(actors,N)
15. Return a list of movies two actors played together in. If the two actors provided as input arguments
did not play together in any movie, return an empty a
ay.
function films = common_movies(actors,actor1,actor2)
16. Calculate the number of movies credited to each country. Return a character a
ay of the two letter
country a
eviations and a vector of the co
esponding number of movies.
function [count countries] = movies_by_countries(movies)
17. Which movie in the database has the youngest cast (minimum average age today, based on birthyear
only)? Ignore all persons who have no birthyears provided. Consider only movies which have at least
N cast members with valid birthdates where N is an input argument.
function [film age] = youngest_cast(movies, persons, actors, N)
18. Who directed the most movies that they