Paginating data in SQL databases with OFFSET and LIMIT to manage large datasets

Learn how to paginate data in PostgreSQL and MySQL databases using OFFSET and LIMIT to manage large datasets.

SQL
MySQL
PostgreSQL
Queries
Pagination
Published: 03/26/2024|By: Dashbase Team

When working with large datasets, efficiency and speed become crucial for both the server's performance and the user's experience. One common approach to managing large sets of data is through pagination. Pagination involves dividing the data into manageable chunks or "pages," which allows users to view a subset of results at a time. This method not only enhances performance by reducing the load on the server but also improves the user interface by making data navigation easier.

Pagination is particularly necessary in scenarios like displaying search results, product listings on e-commerce sites, or posts on social media platforms. Without pagination, a query requesting all rows from a large table could overwhelm the server, leading to slow response times and a poor user experience.

illustration of book pages

Pagination is a common technique in MySQL and PostgreSQL databases

Implementing Pagination Using LIMIT and OFFSET

The essence of pagination in SQL revolves around two clauses: LIMIT and OFFSET. The LIMIT clause controls the number of records returned by a query, acting as the "page size." The OFFSET clause skips a specific number of rows before starting to return rows from the query.

  • The LIMIT Clause: This clause is used to specify the maximum number of records to return from a query. For instance, LIMIT 10 would return the first 10 records from the result set.

  • The OFFSET Clause: This clause is used in conjunction with LIMIT to skip a specified number of records before starting to return records from the query. For example, OFFSET 10 would skip the first 10 records.

A basic pagination query combining both would look like this:


SELECT * FROM table_name

ORDER BY column_name

LIMIT 10 OFFSET 20;

This query would skip the first 20 records (thanks to the OFFSET) and then return the next 10 records from the dataset.

Example

Imagine you have a database of books and you want to display them in a list that shows 10 books at a time.


SELECT title, author

FROM books

ORDER BY title

LIMIT 10

OFFSET 0;

This query will retrieve the first 10 books ordered by their title. To access the next page (i.e., the next set of 10 books), you would increase the OFFSET:


SELECT title, author

FROM books

ORDER BY title

LIMIT 10

OFFSET 10;

Counting the Number of Pages

To provide a better user experience, it's helpful to show the total number of pages available. This can be achieved by using the COUNT function to calculate the total number of records and then dividing that number by the number of records per page to get the total number of pages.


SELECT COUNT(*) FROM table_name;

This query would return the total number of records in the table. To calculate the total number of pages, you can divide the total number of records by the page size (e.g., 10 records per page). The result would be the total number of pages available.

Make sure to use the same filters (e.g., WHERE conditions) in the COUNT query as in the main query to ensure that the count is accurate. Otherwise you might end up with a different count than the actual number of records displayed.

With the count of total records and the page size, you can now calculate the total number of pages and display this information to the user - it is often displayed as a part of the pagination control.

example of pagination controls in a UI

Example pagination UI

PostgreSQL Pagination and MySQL Pagination

Pagination is a basic part of the SQL syntax, and both MySQL and PostgreSQL - as well as most SQL databases - support it similarly. MySQL limit and offset clauses and PostgreSQL limit and offset clauses are used in the same way to paginate data. While the basic functionality is the same, there might be some differences in performance optimizations between the two databases, so in more advanced use cases with large data sets you can seek specific information for MySQL optimization or PostgreSQL optimization.

Performance Implications of Pagination

While pagination is beneficial for usability and efficiency, it does come with performance implications, especially at deeper pages. The OFFSET clause can cause the database to read through N rows (that are then skipped) to retrieve the desired subset, which can be inefficient for large N values. This overhead increases with the page number, potentially leading to performance bottlenecks.

This issues is most pronounced in deep pages - when the user is many pages into the view, as the database has to read and discard a large number of rows before returning the desired subset. For instance if you have 10 results per page and the user is on page 1,000, the database would have to read and discard 10,000 rows before returning the desired 10 rows.

In many applications this isn't a big issue as it may be unlikely for users to navigate to such deep pages (when was the last time you made it to page 100 of Google search results?). However, if deep pagination is a common use case in your application (for instance, if you are using pagination for a data export feature or to build batches of data) this can be a concern. In such cases, consider alternative pagination strategies like keyset pagination or cursor-based pagination, or use deffered joins (discussed next) to improve performance.

Deffered Joins

The "deferred join" method is an approach for enchancing the efficiency of pagination using offset and limit. This method involves initially applying pagination to a smaller, specifically selected portion of data through a subquery. This subselection is subsequently merged with the main table, but notably, this merge---or join---occurs only after the pagination step, hence the term "deferred join." The process delays the more resource-intensive join operation until after the dataset has been narrowed down.

Essentially, the deferred join method optimizes the pagination process by first limiting the data to a smaller subset (less columns) before performing the join operation. This approach significantly reduces the amount of data that need to be processed, leading to improved performance, especially for deep pagination.

Here's how it's implemented in SQL:


SELECT * FROM books

INNER JOIN (

    -- First, paginate using a more focused subquery

    SELECT id FROM books ORDER BY title LIMIT 10 OFFSET 450000

) AS tmp USING (id)

ORDER BY title

This method is not only theoretically efficient but has gained practical validation and support through integration into libraries designed for widely-used web frameworks like Rails (through the FastPage gem) and Laravel (via Fast Paginate).

Other Pagination Techniques

While offset/limit pagination is widely used, its performance issues, especially with large datasets or deep pagination, have led to the exploration and adoption of other techniques. These alternatives aim to offer more efficient ways to navigate through large volumes of data.

Keyset Pagination (Cursor-based Pagination)

Keyset pagination, also known as cursor-based pagination, is an alternative that avoids the performance pitfalls of offset-based pagination. This technique relies on using a column (or set of columns) to navigate through the dataset. Instead of skipping a fixed number of rows, it remembers the last point it reached (the "cursor") and fetches the next set of rows based on this point. This method is highly efficient for sequential data access, especially when you always move forward.

SQL Example:


SELECT * FROM people

WHERE id > last_seen_id

ORDER BY id ASC

LIMIT 10;

This approach is particularly suited for real-time data feeds and can significantly improve performance since it doesn't require scanning over skipped rows.

Window Functions

Window functions offer a sophisticated way to perform operations across sets of rows that are related to the current query row. Although not directly a pagination method, window functions can be used to rank results and manage pagination more efficiently in some contexts. They allow for more complex data manipulations and analyses within the same query, potentially reducing the need for multiple queries or processing steps.

Seek Method

The seek method, closely related to keyset pagination, optimizes pagination by "seeking" directly to a specific position without counting the rows to that point. This is achieved by using a condition that directly accesses a row or set of rows. The seek method is especially effective when combined with an indexed column, allowing for quick jumps in the dataset.

SQL Example:


SELECT * FROM people

WHERE (created_at, id) > (last_seen_created_at, last_seen_id)

ORDER BY created_at ASC, id ASC

LIMIT 10;

Materialized Views

For datasets that don't change frequently, creating materialized views can be an effective way to optimize pagination. A materialized view stores the result of a query and can be indexed for fast access. By paginating over a materialized view instead of the base tables, you can significantly improve performance, especially for complex queries.