Understanding Prefetch related and select related in Django

Django ORM (Object-Relational Mapping) is a powerful feature of Django that allows you to interact with your database using Python code instead of writing raw SQL queries. It enables developers to define database models as Python classes, and these models are automatically mapped to database tables. Django ORM provides a high-level API to perform queries, including filtering, sorting, and aggregating data, all without writing SQL. Django ORM allows you to easily manipulate and query the database while keeping your codebase clean and Pythonic.

In this blog we are going to discuss about two methods of Django ORM that helps optimize data fetching from database.

select_related()

The main difference between select_related() and a normal query (without it) lies in how Django fetches related objects from the database. Here's a breakdown of the key differences:

1. Without select_related() (Normal Queries)

In a normal query without select_related(), Django performs separate queries for each related object. This can lead to the n + 1 query problem, where:

  1. Query is executed for the main model

  2. Additional queries are executed for each related object. For n, related objects, there will be n queries

Consider the following models:

class Author(models.Model):
    name = models.CharField(max_length=100)

class Book(models.Model):
    title = models.CharField(max_length=100)
    author = models.ForeignKey(
                    Author, 
                    on_delete=models.CASCADE
                    )

In a normal query without select_related(), Django performs separate queries for each related object.

Example:

# Get all books
books = Book.objects.all()  
for book in books:
    # Each access to `book.author` causes a new query
    print(book.title, book.author.name)  

This would result in n + 1 queries:

  • query to fetch all books:
SELECT * FROM book;
  • query for each book to fetch its corresponding author:
SELECT * FROM author WHERE id = (book.author_id);

If you have 1000 books, there will be 1001 queries: 1 for books and 1000 for authors.

2. Using select_related()

In contrast, select_related() performs an SQL JOIN to fetch related objects in a single query. This is efficient when dealing with one-to-one and foreign key (many-to-one) relationships.

Example:

# Fetch books with related author in one query
books = Book.objects.select_related('author').all()  
for book in books:
    # No additional query for the author
    print(book.title, book.author.name)  

This would result in 1 query:

SELECT book.*, author.* 
FROM book 
INNER JOIN author ON book.author_id = author.id;

Here’s why:

  • select_related() uses a JOIN to combine the Book table and the Author table.

  • When you access book.author, no additional query is needed because the author data is already included in the result set.

Thus, instead of 1001 queries, you have 1 query to retrieve all the books and their associated authors.

prefetch_related()

The main difference between prefetch_related() and a normal query (without it) lies in how Django fetches related objects from the database. Here's a breakdown of the key differences:

1. Without prefetch_related()

When you access a related field without using prefetch_related(), Django fetches related objects separately for each record, causing the n + 1 query problem.

Consider the given models:

class Course(models.Model):
    title = models.CharField(max_length=100, blank=False)
    description = models.TextField()
    creator = models.ForeignKey(Creator, 
            on_delete=models.CASCADE,
            related_name="course", 
            blank=False
            )


class Student(models.Model):
    name = models.CharField(max_length=100)
    courses = models.ManyToManyField(
                    'Course', 
                    related_name='students'
                    )

Example:

courses = Course.objects.all()  

for course in courses:
    # Fetching students for each course
    students = course.students.all()  
    print(course.name, list(students))

What Happens in SQL?

  • query to fetch all courses:
SELECT * FROM course;
  • queries (one per course) to fetch students:
SELECT * FROM student WHERE course_id = 1;
SELECT * FROM student WHERE course_id = 2;
SELECT * FROM student WHERE course_id = 3;
-- Runs one query per course (N queries)

If there are 100 courses, Django makes 101 queries (1 + 100). This is inefficient and slow.

2. Using prefetch_related()

prefetch_related() solves the n + 1 query problem by fetching related objects in a single additional query and mapping them in Python.

Example:

courses = Course.objects.prefetch_related("students")

for course in courses:
    # No extra query here
    students = course.students.all()
    print(course.name, list(students))

What Happens in SQL?

  • query to fetch all courses:
SELECT * FROM course;
  • query to fetch all related students at once:
SELECT * FROM student 
WHERE course_id IN (1, 2, 3, 4, 5);

Optimization: Instead of 101 queries, Django now runs only 2 queries (1 for courses + 1 for all students).

When to Use What?

  • Use prefetch_related() when dealing with ManyToMany (Many-to-Many) or reverse ForeignKey (One-to-Many) relationships to optimize queries.
  • Use select_related() when dealing with OneToOne (One-to-One) and ForeignKey(Many-to-One) relationships to optimize queries.

Demonstration project on github:

Go to django_queries folder and follow instructions on README.md: rasbin111/django_workshop