Postgres Vector Search

Full-text search in PostgreSQL is implemented using a concept called “text search vectors” (or just “tsvector”). Let’s dive into how it works:

Text Search Vectors (tsvector):
- A tsvector is a sorted list of distinct lexemes, which are words that have been normalized to merge different forms of the same word (e.g., “run” and “running”).
- PostgreSQL provides functions to convert plain text into tsvector format, which typically involves:
  - Parsing the text into tokens.
  - Converting tokens to lexemes.
  - Removing stop words (common words like “and” or “the” that are typically ignored in searches).
- Example: The text “The quick brown fox” might be represented in tsvector as ‘brown’:3 ‘fox’:4 ‘quick’:2.
Text Search Queries (tsquery):
- A tsquery represents a text search query, which includes lexemes and optional operators.
- Operators can be used to combine lexemes in different ways (e.g., AND, OR, NOT).
- Example: The query “quick & fox” would match any tsvector containing both “quick” and “fox”.
Searching:
- PostgreSQL provides the @@ operator to search a tsvector column with a tsquery.
- Example: WHERE column @@ to_tsquery(‘english’, ‘quick & fox’).
Ranking:
- Once you’ve found matches using the @@ operator, you often want to rank them by relevance.
- PostgreSQL provides the ts_rank function to rank results. It returns a number indicating how relevant a tsvector is to a tsquery.
- The ranking is based on various factors, including the frequency of lexemes and their proximity to each other in the text.
Indexes:
- One of the significant advantages of tsvector is that you can create a GiST or GIN index on it.
- These indexes significantly speed up full-text search queries.
- GIN indexes, in particular, are optimized for tsvector and provide very fast lookups.
Normalization and Configuration:
- PostgreSQL supports multiple configurations (e.g., “english”, “french”) that determine how text is tokenized and which stop words are used.
- This allows you to tailor your full-text search to specific languages or requirements.
Highlighting and Snippets:
- In addition to just searching, PostgreSQL provides functions like ts_headline to return snippets of the original text with search terms highlighted.

In summary, PostgreSQL’s full-text search works by converting regular text into a normalized format (tsvector) that is optimized for searching. This combined with powerful query capabilities (tsquery) and indexing options makes it a robust solution for many full-text search needs.

Implementing vector search using E.F Core and Postgres SQL

here are the steps to implement vector search in your dot net project:

Step 1: Add the required nuget packages

   
<PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL" Version="7.0.11" />
<PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL.Design" Version="1.1.0" />
<PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL.NetTopologySuite" Version="7.0.11" />

Step 2: Implement a vector in your entities by implementing properties of type NpgsqlTsVector as shown below

public class Blog
{
    public int Id { get; set; }
    public string Title { get; set; }
    public NpgsqlTsVector SearchVector { get; set; }
}

Step 3: add a computed column in your DbContext

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Blog>()
        .Property(b => b.SearchVector)
      .HasComputedColumnSql("to_tsvector('english', \"Blogs\".\"Title\")", stored: true);
}

in this case you are calculating the vector using the value of the title column of the blogs table, you can calculate the vector using a single column or a combination of columns

Now you are ready to use vector search in your queries, please check the example below

var searchTerm = "Jungle"; // Example search term
var searchVector = NpgsqlTsVector.Parse(searchTerm);

var blogs = context.Blogs
    .Where(p => p.SearchVector.Matches(searchTerm))
    .OrderByDescending(td => td.SearchVector.Rank(EF.Functions.ToTsQuery(searchTerm))).ToList();

In real world scenarios its better to create a vector by joining the values of several columns and weight them according to the relevance for your business case, you can check the test project I have created here : https://github.com/egarim/PostgresVectorSearch

and that’s it for this post, until next time, happy coding ))

PostgreSQL Full-text search using “text search vectors”

Postgres Vector Search

Implementing vector search using E.F Core and Postgres SQL

Submit a Comment Cancel reply

Search

Recent Posts

Categories

Archives