Optimizing queries. The basics of EXPLAIN in PostgreSQL

ноября 27, 2017

Why the query runs so long? Why not use the indexes?
Probably all heard of EXPLAIN in PostgreSQL. But not so much those who understand how to use it. Himself for a long time could not find a easy to understand tutorial (badly searched?).
I hope this article will help those wishing to understand this wonderful tool.

This is not a translation, but rather the author's processing of materials Understanding EXPLAIN from Guillaume Lelarge. Some of the information is omitted, so I strongly recommend to read the original.

Not so bad

Query optimization is very important to understand the logic of the kernel of PostgreSQL.
I will try to explain. It is actually not so difficult. EXPLAIN displays the information that is essential to understanding what makes the core of every specific request. We will consider the EXPLAIN command, parallel to understanding what is happening inside PostgreSQL. Described applies to PostgreSQL 9.2 and above.

Our objectives:

the

to learn how to read and understand the EXPLAIN command
to understand what happens in PostgreSQL when you run the query

First steps

We will train on ~~cats~~ a test table with a million rows.
the

CREATE TABLE foo (c1 integer, c2 text);
INSERT INTO foo
SELECT i, md5(random()::text)
FROM generate_series(1, 1000000) AS i;

Try to read the data
the

EXPLAIN SELECT * FROM foo;

QUERY PLAN
— Seq Scan on foo (cost=0.00..18334.00 rows=1000000 width=37)
(1 row)

Read data from tables can be performed in several ways. In our case, the EXPLAIN reports that uses Seq Scan — sequential, block by block, reading the data from the table foo.
What is cost? This is not the time, but a spherical vacuum concept, designed to assess the cost of the operation. The first value is 0.00 — the cost of obtaining the first row. The second 18334.00 — the cost of getting all rows.
rows — the approximate number of rows returned when you execute a Seq Scan. This value returns the scheduler. In my case it coincides with the actual number of rows in the table.
width — the average size of single row in bytes.
Let's try to add 10 rows.
the

INSERT INTO foo
SELECT i, md5(random()::text)
FROM generate_series(1, 10) AS i;
EXPLAIN SELECT * FROM foo;

QUERY PLAN
— Seq Scan on foo (cost=0.00..18334.00 rows=1000000 width=37)
(1 row)

The value of rows has not changed. Statistics on the table the old. To update statistics calling the command ANALYZE.
the

ANALYZE foo;
EXPLAIN SELECT * FROM foo;

QUERY PLAN
— Seq Scan on foo (cost=0.00..18334.10 rows=1000010 width=37)
(1 row)

Now rows displays the correct number of rows.

What happens when you run ANALYZE?

the

Read a certain number of rows selected at random

the statistics values for each of the table columns:

The number of lines to read ANALYZE — depending on the parameter default_statistics_target.

Real data

All we have seen above in the output of the EXPLAIN — only the expectations of the planner. Let's try to compare them with the results on real data. Use EXPLAIN (ANALYZE) .
the

EXPLAIN (ANALYZE) SELECT * FROM foo;

QUERY PLAN
— Seq Scan on foo (cost=0.00..18334.10 rows=1000010 width=37) (actual time=0.012..61.524 rows=1000010 loops=1)
Total runtime: 90.944 ms
(2 rows)

This query is really executed. So if you run EXPLAIN (ANALYZE) for INSERT, DELETE, or UPDATE, your data will change. Be careful! In such cases, use the ROLLBACK command.
The output of the command information is added.
actual time — the actual time in milliseconds it took to retrieve the first row and all rows, respectively.
rows — the actual number of lines received if Seq Scan.
loops — the number of times I had to perform the operation Seq Scan.

Поиск по этому блогу

computer express