Speed up queries on text columns using :gin indices with [Rails & PostgreSQL]

Before we start

This post assumes that you are aware of database indices and performance benefits they can bring. If not, please check out this link for a quick introduction.

Context

Yesterday, I noticed a very slow background process for a rather small task through Appsignal.

Quickly checking the codebase, the problem was because of doing a full text search on a column in the table with over 2+ million records, unindexed. Such as:

MyTable.where(
  "my_unindexed_column LIKE ?", "%some-text%"
)

Unsurprisingly, this query ended up being very slow due to missing indices. As I found our usual suspect, I added the index using the Rails DSL for migrations:


class AddIndexToUnindexedColumnOnMyTable < ActiveRecord::Migration
  # You should disable the migration being wrapped inside a database transaction,
  # to be able to add the index concurrently. Else, you may have a long wait
  # and read/writes to the table will be blocked until the migration is completed,
  # which wouldn't be nice :).

  disable_ddl_transaction!

  def change
    add_index :my_table, :column_to_index, algorithm: :concurrently
  end
end

Which added an index using default :btree strategy:

# db/schema.rb
# ...
add_index "my_table", ["column_to_index"], name: "index_my_table_on_column_to_index", using: :btree
# ...

This has worked fine locally, which was then deployed to test environments with no issues.

However, deploying to production with the table containing millions of records, I have hit the brick with error:

  01 PG::ProgramLimitExceeded: ERROR:  index row requires 28536 bytes, maximum size is 8191
  01 : CREATE  INDEX CONCURRENTLY "index_my_table_on_my_unindexed_column" ON "my_table"  ("my_unindexed_column")

Problem

Adding index on text columns using :btree strategy won’t work out due to huge size of the created index. We are better to use :gin strategy.

Gin stands for generalized inverted indices. They are most useful when we have multiple values stored in a single column, which was indeed the case.

Other examples can be for columns with hstore, array, jsonb and range types.

So, let’s add an index using :gin strategy.

Solution

Make sure you have extension enabled

Check your schema.rb or structure.sql file to see if btree_gin extension enabled.

If not, add it to your migration file as below or you will face issues because of missing extension.

Proceed to add the index concurrently

I have added a comment in migration snippet above ^ on why it’s important to add the index concurrently.

class CreateIndexOnFullText < ActiveRecord::Migration
  disable_ddl_transaction!

  def change
    enable_extension "btree_gin"

    add_index :my_table, :column_to_index, using: :gin, algorithm: :concurrently
  end
end

That’s it, you should see changes on your schema.rb or structure.sql file after running the migration. Note that your DB user should have have the SUPERUSER privileges in order to run enable_extension.

Now, go ahead and check the performance of the query again, which should be improved.

Try checking the query before, and after adding the index

To verify that you may indeed receive performance gains by adding the index, you can use the .explain method of ActiveRecord and see the costs. Also, keep your eye on load time for the query.

Compare with the results that you get before and after adding the index.

MyTable
  .where("my_column LIKE ?", "%test-value%")
  .explain

  MyTable Load (3.3ms)  SELECT "id".* FROM "my_tables" WHERE my_column LIKE '%test%'
=> EXPLAIN for: SELECT "id".* FROM "my_tables" WHERE my_column LIKE '%test%'
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Seq Scan on my_tables (cost=0.00..56.50 rows=1 width=5114)
   Filter: (my_column ~~ '%test%'::text)
(2 rows)

Indexing using :gin strategy to perform a full text search in PostgreSQL can give us a nice boost in performance, however, there is no silver bullet and it may not fit the problem all the time.

Using GIN may not be that beneficial if we are just searching for small substrings.

Thus, it’s important to check the results and be the judge for its benefits.

I have tested this on Postgres 9.4 & Rails 4.2, and Postgres 12.3 & Rails 5.2.