PostgreSQL is better than other SQL databases open source. Part 2

Friends, we present to your attention the second part of the translation "better Than PostgreSQL?". I hope it will evoke the same hot discussion in the comments, like the first part. And also happy to continue the discussion with you personally PG Day'16 Russia, which left quite a bit!

In the PostgreSQL slogan declares that it is "the Most advanced database of open source in the world." first part of this series, we considered storing the data — model, structures, types, and size limits, to give you a few reasons why Postgres confirms his words. In the second part we talk about data manipulation and searching, including indexing, virtual tables and query capabilities. In this episode we find out that differentiates PostgreSQL from other databases, open source, namely MySQL, MariaDB and Firebird.



the

Index


Postgres provides indexing features that others do not have the database open source. In addition to the standard indexes, it supports partial indexes, functional indexes, GiST and GIN indexes. Let's look at some of them in detail.

Partial indexes

Partial indexes can be created when you want to index only certain subset of a table. For example, only rows where the column values match specific criteria. This advantageous feature gives you the ability to maintain an adequate size of the indexes that will improve performance and reduce the occupied disk space. A key aspect of the partial indexes is that the indexed column may differ from the columns that define conditions. For example, you want to index only accounts paying users, and not those that were created for internal testing:

the
 - create index only for paying users
CREATE INDEX paying_accounts_idx ON accounts (account_id) 
WHERE account_type <> 'test';

It is important to note that sometimes in MySQL the term "partial index" is used for truncation of the indexed values to a certain number of bytes rather than restrict the indexed rows based on a condition. Partial indexes in the described form is not supported in MySQL.

Functional indices

Functional indices (or indexes based on expressions) can be created using any function to pre-calculate the column for indexing. The new values are indexed and treated as constants for query execution instead of calculated whenever a query is run. For example, if you have a web log clicks that collects clicks on a URL in whatever format they were received, you may want to create an index, leading the links to lower case to normalize the data (PostgreSQL is case sensitive: compose.io and Compose.io will be treated as different results):

the
-- create the index for the URL in lowercase
CREATE INDEX webhits_lower_urls_idx ON webhits (lower(url)); 

GIST and GIN (as well as BRIN!)

GiST (Generalized Search Tree Generalized Search Tree) allows you to combine B-tree, R-tree and user-defined index types to create a custom index with advanced queries. GiST is used in PostGIS (which we made standard on all installations of PostgreSQL January) and the OpenFTS (full text search engine open source). Postgres also supports SP-GiST, which allows you to create protezirovanie search indexes for fast search.

GIN (Generalized Inverted Index, the Generalized Inverted Index) index composite data types, which provide an opportunity to combine other types of data in different ways to create something completely individual. A detailed description of composite data types you can find in first part of this series.
The syntax for creating a GIST and GIN indexes would be the following: CREATE INDEX... ON... USING GIST|GIN.... Very simple!

In PostgreSQL 9.5 was presented by BRIN (Block Range Index), which allows you to split a large table into ranges based on the column to be indexed. This means that the query planner to scan only the range specified in the query. Also, if the index ranges are required for indexing, the disk space will be significantly less than the standard B-Tree index.

For comparison

Others before us SQL databases are closing the gap when it comes to functional indexes. In MySQL 5.7.6 were presented generated columns that can be used as functional indices. In MariaDB virtual (also known as "generated" or "calculated") columns were introduced in version 5.2, but only support using built-in functions to create columns (user-defined functions do not exist). In version 2.0 of the Firebird was presented indexing expressions using calculated columns. However, none of these databases support partial, GiST or GIN indexes. In addition, we mentioned in the first part, that native JSON data types cannot be indexed in these databases.

When you set up all the indexes and want to analyze their performance, don't forget to read the article by Matt Barr, mySidewalk "Simple check of indexes in PostgreSQL".

the

virtual tables


The virtual table for the many queries. All of us compare SQL databases allow some functionality of the virtual tables. PostgreSQL can give you more.

CTEs and recursion

Postgres supports Common Table Expressions (CTE) using the expression WITH. We demonstrated this feature in the article PostgreSQL — Series, Random and With. CTE allow you to create a virtual table in your query expressing a logical sequence of operations. Thus, they are much easier to read and test than a virtual table created by using nested queries somewhere in another part of the query. CTE in PostgreSQL can also be used recursively. This handy feature lets you go through the hierarchy repeatedly referring to himself by the query until there are no more levels of data that can be returned. Here is an example of a recursive CTE, which identificeret levels, themes, and parental relationship in the taxonomy topic:

the
-- query with recursion
WITH RECURSIVE CTE topic_taxonomy_recursive 
(level, parent_topic_name, topic_name)
AS ( 
SELECT 1, tt.parent_topic_name, tt.topic_name
FROM topic_taxonomy tt
WHERE tt.parent_topic_name = 'All Topics'
UNION ALL
SELECT ttr.level + 1, tt.parent_topic_name, tt.topic_name
FROM topic_taxonomy_recursive ttr, tt topic_taxonomy
WHERE ttr.topic_name = tt.parent_topic_name
) SELECT level, parent_topic_name, topic_name
FROM topic_taxonomy_recursive;

MySQL and MariaDB do not use the condition WITH and, therefore, technically do not support CTE. Although you can use nested queries to create these databases derived tables, they do not allow recursion. In addition, despite the fact that the query optimizer in MySQL has been refined since the release of version 5.6, nested queries in this database are known for their problematic and can significantly affect performance. Firebird in this matter is ahead of MySQL and MariaDB, and has the same functionality Postgresol: supports CTE using the WITH and provides the possibility of recursion.

Materialized views

Materialized views (Materialized views) is another handy feature is a virtual table that is supported by PostgreSQL. They, like normal views, can you imagine the result of a query that you will often use, but the difference is that the result is stored on disk as a regular table. Materialized views can be indexed. In addition, unlike conventional performances, which re-created every time they are called, represent a stored result is recorded in time. They are not updated, if you do not do it intentionally. This can significantly increase the speed with which the implementation of queries using materialized views. Instead of using conventional ideas or the need to perform complex table joins or to perform grouping function in the query, use of materialized views, where all the necessary data is already prepared and waiting on the disk. When you need to refresh data in materialized view with a stored result, it can be done on demand by using the REFRESH command. Here is an example of materialized view that provides summary income data:
the
-- create a materialized view that contains summary data about income
CREATE MATERIALIZED VIEW aggregatedMonthlyRevenue 
(year, month, total_revenue)
AS ( 
SELECT date_part('year', date) AS year,
date_part('month', date) AS month,
SUM(revenue) AS total_revenue
Revenue FROM
WHERE date >= '2014-01-01'
GROUP BY date_part('year', date),
date_part('month', date)
ORDER BY date_part('year', date),
date_part('month', date)
);

-- update the view, if necessary
REFRESH MATERIALIZED VIEW aggregatedMonthlyRevenue; 

Firebird, MySQL and MariaDB do not support materialized views, although you can use these databases a kind of a workaround by creating a table and using a stored procedure or a trigger to update it as necessary.

the

query Capabilities


Query capabilities in Postgres extensive.

We have already talked a little about WITH in the previous section. Let's look at some additional features that can be used in SELECT queries.

Combining queries

PostgreSQL provides UNION, INTERSECT, and EXCEPT for interaction between SELECT queries. UNION adds the results of the second SELECT query to the results of the first. INTERSECT returns only those rows that match both SELECT queries. EXCEPT returns only rows from the first SELECT query that don't match with rows from the second SELECT query. Let's look at an example using EXCEPT where we want to restore users ' contact information except for those cases when the user received an email in the past week and answered it.

the
/*
the query to select the info
about users who did not receive
letters in the last week
*/
SELECT c.lastName, c.firstName, c.email 
FROM customers c 
EXCEPT 
SELECT e.lastName, e.firstName, e.email 
FROM e in email_log 
WHERE e.email_date > current_date - interval '7 days' 
AND e.email_action_date > current_date - interval '7 days'
AND email_action_type = 'response';

Although MySQL, MariaDB, and Firebird support the UNION, none of them supports neither INTERSECT nor EXCEPT. Nevertheless, using the Union query and the EXISTS, it is possible to obtain the same result in PostgreSQL. But the query will be more complex.

Window functions

Window functions that represent an aggregate function over some rows of the result (providing a "window" in the subset), can be extremely beneficial. In fact, they allow you to iterate through the rows in the partitions that belong to the current row to perform a function. Standard functions include ROW_NUMBER(), RANK(), DENSE_RANK() and PERCENT_RANK(). The key word OVER optionally used with PARTITION BY and ORDER BY, shows that you are using the window function. As an example, in the section "Functions and not only" below, we used a windowing function with ROW_NUMBER() OVER... to determine the median in a series of numeric values. Note that the condition WINDOW queries with window functions is not mandatory, but allows you to create and name window to save the order.

Firebird, MySQL and MariaDB currently do not support window functions, although they were announced a few years ago when planning for Firebird 3.

Lateral subqueries

The keyword LATERAL can be applied to nested queries in the FROM clause to add a cross-reference between a subquery and the other tables or virtual tables that were created before him. This way you can write more simple queries. It works in such a way that each row is evaluated compared to the table that leads to cross-reference, which could mean improvement in speed during execution of the query. Here is an example in which we want to obtain a list of students and information about whether they have read in the last time something on the topic of technology:

the
-- query using LATERAL subquery
SELECT s.firstName, s.LastName, x.topic_name 
FROM students s 
Content_log JOIN c ON c.student_id = s.id 
LEFT OUTER JOIN LATERAL ( 
SELECT t.topic_name
FROM content_topics t
WHERE t.parent_topic_name = 'Technology'

AND c.date > current_date - interval '30 days'
) x ON true;

MySQL, Firebird and MariaDB to date do not support lateral subqueries. Again, you can find workarounds, but it will make the queries more complex.

And one more thing to note: MySQL and MariaDB do not support FULL OUTER JOIN, but you can use a workaround using UNION ALL to merge all the rows from the two tables.

the

Functions and not only


PostgreSQL provides a robust built-in operators and functions, including those that support specialized data types, discussed in detail in first part of this series. In addition, it allows you to create your own operators and functions (including aggregates), as well as stored procedures and triggers. We will not be able to consider in detail all of them because the topic is too broad (!), but let's look at a few simple examples of functions.

Postgres supports 4 kinds of user defined functions: the query language, procedural language, C-language and internal. Each type can take and return both basic and composite data types. Note that in PostgreSQL, the command CREATE FUNCTION is used not only for creating functions and stored procedures.

Let's look at an example of creating a function that returns a composite data type:

the
-- create a new composite type called "datetext"
CREATE TYPE datetext AS ( 
date date,
date_as_text text
);

/*
create a function that takes a date value
and returns the date and the corresponding datetext
*/
CREATE FUNCTION show_date_as_text(date) 
RETURNS datetext -- this is our composite type 
AS 
$$ SELECT $1, to_char($1, 'FMMonth FMDD, YYYY') $$
LANGUAGE SQL;

-- query using a function
SELECT show_date_as_text('2015-01-01');

-- Returns: (2015-01-01,"January 1, 2015")

Here is an example of a real function for finding the median in a series of numeric data:

the
-- create a function that finds the median in a series of numerical data
CREATE FUNCTION median(numeric[]) 
RETURNS numeric 
AS 
$$ SELECT AVG(x.result)
FROM (
SELECT result 
ROW_NUMBER() OVER (ORDER BY val) as ra,
ROW_NUMBER() OVER (ORDER BY val DESC) as rd
FROM unnest($1) result -- notice the use of array "unnest"
) AS x
WHERE x.ra BETWEEN x.rd - 1 AND x.rd + 1;
$$
LANGUAGE SQL;

-- query using a function
SELECT median(ARRAY[1,2,3,4,5,6,7]);

-- Returns: 4

Although other SQL database open source, participates in this comparison also allow you to create your own functions, stored procedures and triggers, they have no such diversity of data types and opportunities for customization as Postgres. Besides, PG allows you to create your own operators. Compare other databases do not support user-defined operators.

The possibility of individualization of Postgres and have no equal among MySQL, MariaDB and Firebird.

the

Language extensions


PostgreSQL has many language extensions. Some of them are part of the distribution, and many others are available through third-party developers.

In Compose we only support trusted (trusted) language extensions for PostgreSQL, to ensure the safety of your installations. We have added support for PL/Perl in February and PL/v8 (procedural language based on JavaScript) — in August. These language extensions that have more built-in functionality than the built-in PL/pgSQL based on SQL (also available in the Compose installations), allow you to create sophisticated scripts for manipulating and processing data on the server.

the

Summing up


PostgreSQL is extremely rich in functionality, with many built-in “features” and countless ways of their customization and extensions to meet your needs. Add to that the universally recognized reliability and maturity, it will become clear why this decision for databases is worth the effort of any large enterprise. It is affordable and effective for small projects.
Despite the fact that we told you only about a small set of features that set Postgres compared to other SQL solutions, open source, actually them much more (and more appeared in version 9.5!). We hope that this series of two articles provided a compelling overview of the reasons why you should choose PostgreSQL.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

Briefly on how to make your Qt geoservice plugin

Database replication PostgreSQL-based SymmetricDS

Yandex.Widget + adjustIFrameHeight + MooTools