Before we start…⌗
This post assumes that you are aware of database indices and performance benefits they can bring. If not, please check out this link for a quick introduction.
Yesterday, I noticed a very slow background process for a rather small task through Appsignal.
Quickly checking the codebase, the problem was because of doing a full text search on
a column in the table with over 2+ million records,
unindexed. Such as:
MyTable.where( "my_unindexed_column LIKE ?", "%some-text%" )
Unsurprisingly, this query ended up being very slow due to missing indices. As I found our usual suspect, I added the index using the Rails DSL for migrations:
class AddIndexToUnindexedColumnOnMyTable < ActiveRecord::Migration # You should disable the migration being wrapped inside a database transaction, # to be able to add the index concurrently. Else, you may have a long wait # and read/writes to the table will be blocked until the migration is completed, # which wouldn't be nice :). disable_ddl_transaction! def change add_index :my_table, :column_to_index, algorithm: :concurrently end end
Which added an index using default
# db/schema.rb # ... add_index "my_table", ["column_to_index"], name: "index_my_table_on_column_to_index", using: :btree # ...
This has worked fine locally, which was then deployed to test environments with no issues.
However, deploying to production with the table containing millions of records, I have hit the brick with error:
01 PG::ProgramLimitExceeded: ERROR: index row requires 28536 bytes, maximum size is 8191 01 : CREATE INDEX CONCURRENTLY "index_my_table_on_my_unindexed_column" ON "my_table" ("my_unindexed_column")
Adding index on
text columns using
:btree strategy won’t work out due to huge
size of the created index. We are better to use
Gin stands for
generalized inverted indices. They are most useful
when we have multiple values stored in a single column, which was indeed the case.
Other examples can be for columns with
So, let’s add an index using :gin strategy.
Make sure you have extension enabled⌗
structure.sql file to see if
btree_gin extension enabled.
If not, add it to your migration file as below or you will face issues because of missing extension.
Proceed to add the index concurrently⌗
I have added a comment in migration snippet above ^ on why it’s important to add the index concurrently.
class CreateIndexOnFullText < ActiveRecord::Migration disable_ddl_transaction! def change enable_extension "btree_gin" add_index :my_table, :column_to_index, using: :gin, algorithm: :concurrently end end
That’s it, you should see changes on your
structure.sql file after running the migration.
Note that your DB user should have have the SUPERUSER privileges in order to run
Now, go ahead and check the performance of the query again, which should be improved.
Try checking the query before, and after adding the index⌗
To verify that you may indeed receive performance gains by adding the index, you can
.explain method of ActiveRecord and see the
costs. Also, keep your eye on
load time for the query.
Compare with the results that you get before and after adding the index.
MyTable .where("my_column LIKE ?", "%test-value%") .explain MyTable Load (3.3ms) SELECT "id".* FROM "my_tables" WHERE my_column LIKE '%test%' => EXPLAIN for: SELECT "id".* FROM "my_tables" WHERE my_column LIKE '%test%' QUERY PLAN -------------------------------------------------------------------------------------------- Seq Scan on my_tables (cost=0.00..56.50 rows=1 width=5114) Filter: (my_column ~~ '%test%'::text) (2 rows)
Indexing using :gin strategy to perform a full text search in PostgreSQL can give us a nice boost in performance, however, there is no silver bullet and it may not fit the problem all the time.
Using GIN may not be that beneficial if we are just searching for small substrings.
Thus, it’s important to check the results and be the judge for its benefits.
I have tested this on Postgres 9.4 & Rails 4.2, and Postgres 12.3 & Rails 5.2.