PostgreSQL indexing best practices

Don’t index every column
If indexes are so useful, why don’t we add them to every column?

There are a few good reasons, but a big one is that indexes add overhead to writes. Not only do indexes need to be kept up to date when we write to the table, but they can also prevent optimizations like HOT (Heap-Only Tuple) updates.

As such, the more write-heavy a table is, the more carefully you should think about the benefit of adding an index.

If you’ve inherited a database with too many indexes, you can start to understand how much each one is used with the view pg_stat_user_indexes. Remember to check any read replicas too!

2. Index columns that you filter on
The best candidates for indexes are columns that you filter on regularly. Primary keys are indexed by Postgres automatically, but foreign keys, and other columns, are not.

Let’s consider a simple users table with only two columns:

CREATE TABLE users (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
email text NOT NULL
);
INSERT INTO users (email)
SELECT n || ‘@gmail.com’ FROM generate_series(1, 100000) AS n;
ANALYZE users;
If we want to look up a user by their email address, here is the query plan:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, email FROM users WHERE email = ‘[email protected]’;
————————————————————
Seq Scan on users (cost=0.00..1887.00 rows=1 width=23) (actual time=21.498..34.505 rows=1 loops=1)
Filter: (email = ‘[email protected]’::text)
Rows Removed by Filter: 99999
Buffers: shared hit=637
Planning Time: 0.113 ms
Execution Time: 34.539 ms
Now let’s add an index on the email column and get a new query plan:

CREATE INDEX users_email ON users(email);
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, email FROM users WHERE email = ‘[email protected]’;
————————————————————
Index Scan using users_email on users (cost=0.42..3.44 rows=1 width=23) (actual time=0.076..0.079 rows=1 loops=1)|
Index Cond: (email = ‘[email protected]’::text)
Buffers: shared hit=4
Planning Time: 0.219 ms
Execution Time: 0.116 ms
We can see from the query plans that using the index is 150x more efficient (4 blocks of data, instead of 637) and 100x faster (0.335ms total instead of 34.652ms). That’s quite a difference, and this is only on 100k rows of data.

Naturally, the lower the proportion of rows filtered out, the less difference an index will make. Once the ratio gets too high, it might not even be faster than a sequential scan – as we saw in our post why isn’t Postgres using my index.

3. Only index data that you need to look up
If you have a proportion of a table that you rarely look up, and almost always filter out, there is little benefit to having it indexed. A common example given is a table containing soft-deleted data, where queries will normally contain WHERE deleted_at IS NULL

For these cases, Postgres has partial indexes. These are smaller, faster, and don’t need to be updated as often as full indexes. You do need to be careful, though, as they can only be used for queries that Postgres can guarantee matches the WHERE condition.

4. Consider other index types
You can get quite far with the trusty b-tree index – which is the default for good reasons!

But there are some cases where learning what else is available can pay off big-time. There is a good list of the in-built index types in the Postgres documentation, so I’m not going to re-hash (sorry) the details here.

The short version is that some index types can be faster, or much smaller, for specific use cases, while others can enable things that b-trees don’t support, like full-text search, or spatial queries.

5. Use indexes to pre-sort data
Sorting a large dataset is an expensive operation, in both time and memory.

Sorting the same data over and over can be avoided by adding an index with the order needed. As of PostgreSQL 13 (Sep 2020), it can even do an Incremental Sort, making use of an index that only satisfies the first part(s) of the sort order.

For an added performance boost, it is great to load data in a sorted fashion. This will reduce the number of page reads needed for queries using the sort order, making your queries more efficient, and again, faster.

6. Use multi-column indexes, but sparingly
Postgres lets you add more than one column to an index. You can order by a second (and third, etc) column, or include the additional column(s) in the payload to enable index-only scans.

However, as we noted when we wrote about multi-column indexes, there are downsides to consider. Postgres does have other options, so our advice is to use multi-column indexes, but to do so sparingly.

7. Look after your indexes
Indexes can require a bit of looking after (or maintenance) to stay performant over time.

Vacuum (and autovacuum) will clear out deleted (and updated) entries, but it won’t re-combine pages that are mostly empty. As such, over time your index can grow in size, and reduce in efficiency: an effect often referred to as bloat.

You can monitor for signs of bloat, and you can also see clues in query plans (when you use BUFFERS). It is one of the types of issue pointed out in our product, pgMustard.

To completely remove bloat from an index, you can use REINDEX. As of Postgres 12, we have the very useful REINDEX CONCURRENTLY, but you do need to be careful if that ever fails, as it can leave behind invalid indexes. pg_repack and pg_squeeze are other options that involve minimal locking.

8. Pick your battles
It is easy to spend hours tuning a single query, so it is pretty important to prioritize well.

There are lots of monitoring tools and extensions available to help you find your biggest bottlenecks. A popular tool among pgMustard customers is the trusty pg_stat_statemements.

In summary Here’s a quick recap of the best practices we covered for indexing in postgresql

Don’t index every column

Index columns that you filter on

Only index data that you need to look up

Consider other index types

Use indexes to pre-sort data

Use multi-column indexes, but sparingly

Look after your indexes

Pick your battles

REF
https://www.pgmustard.com/blog/indexing-best-practices-postgresql

About the Author

Trần Huy

View all author's posts

Bài viết khác

Relational Database Design Principle

1-1 1-n n-n https://nexwebsites.com/database/database-design/

Đọc thêm

PostgreSQL : subquery, CTE

What is subquery in PostgreSQL? In PostgreSQL, a subquery is a query that is nested inside another query. The subquery is executed first, and its results are used as input to the outer query. Subqueries can be used in various contexts, such as in the SELECT, WHERE, and HAVING clauses of a query. For example, […]

Đọc thêm

Optimize SQL : rule and todo list

Some rule and todo list to Optimize SQL REF https://www.pgmustard.com/blog/indexing-best-practices-postgresql

Đọc thêm

PostgreSQL Compound indexes

What is Compound indexes in PostgreSQL? A compound index (also known as a composite index or a multi-column index) refers to an index that is created on two or more columns of a table. It allows PostgreSQL to quickly find rows that match a query condition based on the values in multiple columns, which can […]

Đọc thêm

Basic understanding of Spark

Đọc thêm

Basic understanding of HBase

Đọc thêm

Server Vultr
Hosting Hawkhost
Dịch vụ tư vấn sửa chữa khắc phục lỗi máy tính online
Kindle Store

Bộ máy tính được xây dựng riêng, bao gồm 3 hệ điều hành Mac(Apple) - WIndows - Ubuntu trong 1 cỗ máy được cấu hình sẵn, phục vụ tốtcho việc học tập lập trình ứng dụng

Khoá học lập trình game con rắn cho trẻ em

Tôi viết sách này dành tặng cho những ông bố bà mẹ lớn tuổi và những người mới tiếp xúc với máy vi tính lần đầu theo phong cách thực hành và hình ảnh minh họa trực quan sinh động, cụ thể.

Tin hoc cho nguoi lon tuoi - tap 1 - mat sau copy

Kindle Store

In summary Here’s a quick recap of the best practices we covered for indexing in postgresql

About the Author

Bài viết khác

Relational Database Design Principle

PostgreSQL : subquery, CTE

Optimize SQL : rule and todo list

PostgreSQL Compound indexes

Basic understanding of Spark

Basic understanding of HBase

Sponsor

Bộ máy tính 3 trong 1 phục vụ học tập – lập trình (hoàn tiền nếu không hài lòng)

Khoá học cho trẻ em

Khóa học lập trình full stack ios – android – server – website cơ bản: dự án ứng dụng chat

Sách Tin học dành cho người lớn tuổi

Tài nguyên cho người mới

Cuối tuần cùng tôi

Khoá đào tạo số 18 – buổi học ngày 12-01-2020 : lập trình website cho người mới & người chưa biết gì: cơ bản html css

Khoá đào tạo số 16 – buổi học ngày 22-12-2019 : lập trình website cho người mới & người chưa biết gì: cơ bản html css

Khoá đào tạo số 11 – buổi số 5 ngày 20-10-2019 : lập trình website cho người mới & người chưa biết gì : cơ bản về wordpress (tt)

Thông báo nghỉ ngày 13-10-2019

Khoá đào tạo số 11 – buổi số 4 ngày 06-10-2019 : lập trình website cho người mới & người chưa biết gì : cơ bản về wordpress

Interested