top of page

IMDB Subset Database: SQL Queries and Analysis


Introduction

Welcome to our new blog post! We're really excited to share something special with you today. In this post, we'll be introducing a fascinating new project requirement: the 'IMDB Subset Database: SQL Queries and Analysis.' We'll walk you through the project requirements and discuss our approach to tackling them. Plus, we'll give you a sneak peek at some of the outputs with screenshots. So, let's jump right in and explore together!


Project Requirement : 

This document describes a database on movies, movie-stars, etc. and asks a range of questions on this database. You must work out how to produce answers to these questions using SQL or PLpgSQL (as specified in the question).


In this assignment, you will work with a very small subset of the Internet Movie Database (aka IMDB). This database has information about movies, TV series, actors, directors, etc. The database for the assignment only contains highly-rated movies, and the people associated with them. All of the data about "lesser" movies, TV series, and other IMDB content has been removed to keep the size manageable; the actual database is over 50GB.


Some of the terminology IMDB uses may require some explanation:


People

The database deals with a wide variety of humans, animals and animated characters that appear in movies. The term "people" isn't broad enough, but we use it anyway since most of the references are to people.


Aliases

Movies (and other media) are released in different forms in different regions of the world. Some versions are cut, to fit with local laws. Others are dubbed or subtitled, to fit the local language. The title is also often changed, and to a phrase with quite a different meaning to the original. The various versions of a movie are called "Aliases".


ER Model

People have the following ER model


Movies have the following ER model


Aliases for movies have the following ER model


The above entities are linked together as follows


Notes

  • In the queries, references to title mean Movies.title

  • In most cases, the order of results doesn't matter; the testing code will use order by to force a specific order

  • Queries should not take more than 3 seconds to run; queries that take longer to run will be penalised


Sample Outputs and Views

To give you an idea of what you're aiming for, there are sample outputs in each question. Note that these assume that you are creating a view for the question and then invoking that view. You are not required to create views, but you will probably find it convenient. 


If you create views, use

create or replace view ViewName(attr1name, attr2name, ...)
SQLstatement;

If you decide to change the view appearance later, you will also need to include

drop view ViewName;

before creating the view.


Questions : 

  1. "Find the titles of movies that have the highest rating."

  2. "Find the titles of movies that have the highest rating, ordered by rating (highest first), then by title (alphabetically)."

  3. "Find the titles of movies that have the highest rating, ordered by title."

  4. "Find the titles of all the movies in which both Johnny Depp and Helena Bonham Carter have appeared."

  5. "Find the titles of all the movies in which either Johnny Depp or Helena Bonham Carter or both have appeared."

  6. "Find the titles of the three highest-rated movies."

  7. "Find the titles of the ten lowest-rated movies."

  8. "Find the name of the director who directed the most number of movies."

  9. "Find the name of the director who directed the highest-rated movie."

  10. "Find the titles of all the movies that have aliases."

  11. "Find the names of all the people who have acted in a movie that also has aliases."

  12. "Find the title of the highest-rated movie that has aliases."



Solution Approach : 


In addressing the requirements of the IMDB Subset Database project, we employed a systematic approach leveraging SQL queries and analysis techniques. Below is a breakdown of the methods and techniques utilized in solving each of the provided questions:


  • Querying Movies with Highest Rating:

  • We employed a SQL query to identify movies with the highest rating. By selecting movies based on their rating, we ensured a focused approach to retrieve the most critically acclaimed films from the database.


  • Sorting Highest-Rated Movies:

  • To fulfill this requirement, we utilized SQL's ORDER BY clause to arrange movies first by their rating (highest first) and then alphabetically by title. This ensured a structured presentation of the highest-rated movies.


  • Ordering Movies by Title with Highest Rating:

  • Similar to the previous question, we utilized SQL's ORDER BY clause to organize movies by title while maintaining the criterion of highest rating. This provided an alternative view of the highest-rated movies.


  • Identifying Movies with Specific Actors:

  • Leveraging SQL's JOIN operation, we crafted a query to find movies in which both Johnny Depp and Helena Bonham Carter have appeared. By joining the Movies and People tables based on the respective roles of the actors, we retrieved the desired movie titles.


  • Searching Movies by Actor:

  • Utilizing SQL's OR logical operator, we constructed a query to identify movies featuring either Johnny Depp, Helena Bonham Carter, or both. This approach facilitated a comprehensive search for movies involving the specified actors.


  • Top Three Highest-Rated Movies:

  • We employed SQL's LIMIT clause to select the top three movies based on their rating. This straightforward approach allowed us to extract the titles of the three highest-rated films efficiently.


  • Bottom Ten Lowest-Rated Movies:

  • Similar to the previous question, we utilized SQL's LIMIT clause, coupled with appropriate ordering, to identify the ten movies with the lowest ratings. This approach ensured a concise presentation of the least-rated movies.


  • Finding Director with Most Movies:

  • Through SQL's GROUP BY and COUNT functions, we determined the director who directed the most number of movies. By aggregating the data on directors and counting the occurrences, we pinpointed the director with the highest movie count.


  • Locating Director of Highest-Rated Movie:

  • Employing SQL's MAX function along with appropriate joins, we identified the director responsible for directing the highest-rated movie. By selecting the director associated with the maximum rating value, we isolated the director of the top-rated film.


  • Movies with Aliases:

  • By utilizing SQL's JOIN operation with the Aliases table, we identified movies that have aliases. This allowed us to capture movies with multiple versions or titles.


  • Actors in Movies with Aliases:

  • Leveraging SQL's JOIN operation between People and Aliases tables, we extracted the names of individuals who have acted in movies with aliases. This approach facilitated the identification of actors involved in movies with alternate titles.


  • Highest-Rated Movie with Aliases:

  • Through SQL's JOIN operation and appropriate selection criteria, we pinpointed the title of the highest-rated movie that has aliases. By combining data from multiple tables, we precisely identified the desired movie title.


Some Outputs : 







At CodersArts, we're excited to unveil our latest project, the "IMDB Subset Database: SQL Queries and Analysis." Delving into the world of cinema data, our team is poised to demonstrate the power of SQL queries in unraveling insightful trends and patterns within the IMDB dataset. With a keen focus on efficient data retrieval and analysis, we aim to provide comprehensive solutions to a range of queries, offering a deeper understanding of movie ratings, actors' roles, and directorial contributions.


From inception to implementation, CodersArts guides you through the intricacies of the IMDB Subset Database project. We meticulously dissect the dataset, identifying crucial entities and relationships that underpin the world of movies and entertainment. Through strategic SQL query formulation and execution, we uncover hidden gems within the data, allowing users to extract valuable insights with ease. Our commitment to efficiency ensures that queries run seamlessly, providing prompt and accurate results to meet your analytical needs.


But our journey doesn't stop at data retrieval. CodersArts is dedicated to empowering users with actionable insights derived from SQL analysis. By leveraging the power of structured query language, we equip you with the tools to make informed decisions in the realm of movie analytics. Whether it's identifying top-rated movies, analyzing actor collaborations, or spotlighting directorial achievements, our expertise in SQL queries and analysis elevates your understanding of the IMDB dataset, paving the way for informed decision-making and enriched cinematic experiences.


If you require any assistance with the project discussed in this blog, or if you find yourself in need of similar support for other projects, please don't hesitate to reach out to us. Our team can be contacted at any time via email at contact@codersarts.com.

bottom of page