PostgreSQL Full-text search using “text search vectors”

PostgreSQL Full-text search using “text search vectors”

Postgres Vector Search

Full-text search in PostgreSQL is implemented using a concept called “text search vectors” (or just “tsvector”). Let’s dive into how it works:

  1. Text Search Vectors (tsvector):
    • A tsvector is a sorted list of distinct lexemes, which are words that have been normalized to merge different forms of the same word (e.g., “run” and “running”).
    • PostgreSQL provides functions to convert plain text into tsvector format, which typically involves:
      • Parsing the text into tokens.
      • Converting tokens to lexemes.
      • Removing stop words (common words like “and” or “the” that are typically ignored in searches).
    • Example: The text “The quick brown fox” might be represented in tsvector as ‘brown’:3 ‘fox’:4 ‘quick’:2.
  2. Text Search Queries (tsquery):
    • A tsquery represents a text search query, which includes lexemes and optional operators.
    • Operators can be used to combine lexemes in different ways (e.g., AND, OR, NOT).
    • Example: The query “quick & fox” would match any tsvector containing both “quick” and “fox”.
  3. Searching:
    • PostgreSQL provides the @@ operator to search a tsvector column with a tsquery.
    • Example: WHERE column @@ to_tsquery(‘english’, ‘quick & fox’).
  4. Ranking:
    • Once you’ve found matches using the @@ operator, you often want to rank them by relevance.
    • PostgreSQL provides the ts_rank function to rank results. It returns a number indicating how relevant a tsvector is to a tsquery.
    • The ranking is based on various factors, including the frequency of lexemes and their proximity to each other in the text.
  5. Indexes:
    • One of the significant advantages of tsvector is that you can create a GiST or GIN index on it.
    • These indexes significantly speed up full-text search queries.
    • GIN indexes, in particular, are optimized for tsvector and provide very fast lookups.
  6. Normalization and Configuration:
    • PostgreSQL supports multiple configurations (e.g., “english”, “french”) that determine how text is tokenized and which stop words are used.
    • This allows you to tailor your full-text search to specific languages or requirements.
  7. Highlighting and Snippets:
    • In addition to just searching, PostgreSQL provides functions like ts_headline to return snippets of the original text with search terms highlighted.

In summary, PostgreSQL’s full-text search works by converting regular text into a normalized format (tsvector) that is optimized for searching. This combined with powerful query capabilities (tsquery) and indexing options makes it a robust solution for many full-text search needs.

Implementing vector search using E.F Core and Postgres SQL

here are the steps to implement vector search in your dot net project:

Step 1: Add the required nuget packages

   
<PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL" Version="7.0.11" />
<PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL.Design" Version="1.1.0" />
<PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL.NetTopologySuite" Version="7.0.11" />

Step 2: Implement a vector in your entities by implementing properties of type NpgsqlTsVector as shown below

public class Blog
{
    public int Id { get; set; }
    public string Title { get; set; }
    public NpgsqlTsVector SearchVector { get; set; }
}

Step 3: add a computed column in your DbContext

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Blog>()
        .Property(b => b.SearchVector)
      .HasComputedColumnSql("to_tsvector('english', \"Blogs\".\"Title\")", stored: true);
}

in this case you are calculating the vector using the value of the title column of the blogs table, you can calculate the vector using a single column or a combination of columns

Now you are ready to use vector search in your queries, please check the example below

var searchTerm = "Jungle"; // Example search term
var searchVector = NpgsqlTsVector.Parse(searchTerm);

var blogs = context.Blogs
    .Where(p => p.SearchVector.Matches(searchTerm))
    .OrderByDescending(td => td.SearchVector.Rank(EF.Functions.ToTsQuery(searchTerm))).ToList();

In real world scenarios its better to create a vector by joining the values of several columns and weight them according to the relevance for your business case, you can check the test project I have created here : https://github.com/egarim/PostgresVectorSearch

and that’s it for this post, until next time, happy coding ))

 

 

 

 

P.O.U.N.D stack (Postgres, Oqtane, Ubuntu & DotNet)

P.O.U.N.D stack (Postgres, Oqtane, Ubuntu & DotNet)

A stack in software development refers to a collection of technologies, tools, and frameworks that are used together to build and run a complete application or solution. A typical stack consists of components that handle different aspects of the software development process, including frontend, backend, databases, and sometimes even the hosting environment.

A stack is often categorized into different layers based on the functionality they provide:

  1. Frontend: This layer is responsible for the user interface (UI) and user experience (UX) of an application. It consists of client-side technologies like HTML, CSS, and JavaScript, as well as libraries or frameworks such as React, Angular, or Vue.js.
  2. Backend: This layer handles the server-side logic, processing user requests, and managing interactions with databases and other services. Backend technologies can include programming languages like Python, Ruby, Java, or PHP, and frameworks like Django, Ruby on Rails, or Spring.
  3. Database: This layer is responsible for storing and managing the application’s data. Databases can be relational (e.g., MySQL, PostgreSQL, or Microsoft SQL Server) or NoSQL (e.g., MongoDB, Cassandra, or Redis), depending on the application’s data structure and requirements.
  4. Hosting Environment: This layer refers to the infrastructure where the application is deployed and run. It can include on-premises servers, cloud-based platforms like Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure, or container orchestration platforms like Kubernetes or Docker Swarm.

Developers often refer to specific combinations of these technologies as named stacks. Some examples include:

  1. LAMP: Linux (operating system), Apache (web server), MySQL (database), and PHP (backend programming language).
  2. MEAN: MongoDB (database), Express.js (backend framework), Angular (frontend framework), and Node.js (runtime environment).
  3. MERN: MongoDB (database), Express.js (backend framework), React (frontend library), and Node.js (runtime environment).

Selecting a stack depends on factors such as project requirements, team expertise, performance, and scalability needs. By using a well-defined stack, developers can streamline the development process, improve collaboration, and ensure that all components work together efficiently.

The P.O.U.N.D. Stack is an innovative software development stack that combines Postgres, Oqtane, Ubuntu, and DotNet to create powerful, modern, and scalable applications. This stack is designed to leverage the strengths of each technology, providing developers with an integrated and efficient environment for building web applications.

  1. Postgres (P): As the database layer, Postgres offers robust performance, scalability, and support for advanced data types, such as GIS and JSON. Its open-source nature and active community make it a reliable choice for handling the storage and management of application data.
  2. Oqtane (O): Serving as the frontend framework, Oqtane is built on top of the Blazor technology in .NET, allowing for the creation of modern, responsive, and feature-rich user interfaces. With Oqtane, developers can create modular and extensible applications, while also benefiting from built-in features such as authentication, authorization, and multi-tenancy.
  3. Ubuntu (U): As the operating system and hosting environment, Ubuntu provides a stable, secure, and easy-to-use platform for deploying and running applications. It is widely supported and offers excellent compatibility with a variety of hardware and cloud platforms, making it an ideal choice for hosting P.O.U.N.D. Stack applications.
  4. DotNet (D): The backend layer is powered by the .NET framework, which offers a versatile and high-performance environment for server-side development. With support for multiple programming languages (such as C#, F#, and VB.NET), powerful libraries, and a large ecosystem, .NET allows developers to build scalable and efficient backend logic for their applications.

In summary, the P.O.U.N.D. Stack brings together the power of Postgres, Oqtane, Ubuntu, and DotNet to deliver a comprehensive and efficient development stack. By leveraging the unique capabilities of each technology, developers can build modern, scalable, and high-performance web applications that cater to diverse business needs.

Creating an activity stream using Postgres 15

Creating an activity stream using Postgres 15

An activity stream is a data format used to represent a list of recent activities performed by an individual or group on a social network, web application, or other platform. It typically includes information such as the type of activity (e.g., posting a status update, commenting on a post), the person or entity performing the activity, and any associated objects or targets (e.g., a photo or link). Activity streams can be used to track user behavior, personalize content recommendations, and facilitate social interactions between users.

An activity stream typically consists of the following parts:

  1. Actor: The person or entity that initiates the action.
  2. Verb: The action being taken.
  3. Object: The thing on which the action is taken.
  4. Target: The thing to which the action is directed.
  5. Time: The time at which the action occurred.
  6. Context: Any additional information about the action, such as the location or device used to perform it.
  7. Metadata: Additional information about the action, such as the user’s preferences or the permissions required to perform it.

Activity streams can be used to represent data from any system, and there is no direct relationship between the stream of activities and the associated objects.

With a basic understanding of what an activity stream is, we can leverage PostgreSQL as a database storage to implement one. PostgreSQL is particularly suitable for activity streams due to its built-in support for JSON columns, which can store data with flexible schemas, and its GIS functionality, which makes it easy to filter activities based on location.

For this project, I have chosen to use Postgres 15 with GIS extensions, as well as the DBeaver Community Edition for managing the database. The GIS extensions are especially useful for this project since we want to display only activities that occurred around specific geographical points

Let’s begin our coding journey with the creation of an object storage in PostgreSQL. The object storage will have a column to store the object type and a JSON column to store the complete data of the object being stored.

CREATE DATABASE ActivityStream;

After creating the database, the next step is to install the PostGIS extension using the following query.

CREATE EXTENSION IF NOT EXISTS postgis; -- Enable PostGIS extension

 

CREATE TABLE objectstorage (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    latitude DECIMAL(9,6) NOT NULL,
    longitude DECIMAL(9,6) NOT NULL,
    location GEOMETRY(Point, 4326), -- 4326 is the SRID for WGS 84, a common coordinate system for GPS data
    object_type TEXT NOT NULL,
    object_data JSONB NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);


CREATE OR REPLACE FUNCTION update_location() RETURNS TRIGGER AS $$
BEGIN
    NEW.location := ST_SetSRID(ST_MakePoint(NEW.longitude, NEW.latitude), 4326);
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;


CREATE TRIGGER set_location
    BEFORE INSERT OR UPDATE
    ON objectstorage
    FOR EACH ROW
    EXECUTE FUNCTION update_location();

This query creates a table named objectstorage with columns for id, latitude, longitude, location, object_type, object_data, created_at, and updated_at. The id column is a primary key and generates a random UUID as its default value. The latitude and longitude columns store decimal values for geographic coordinates. The location column stores a geometry object of type Point using the WGS 84 coordinate system with SRID 4326. The object_type column stores the type of the object being stored, and the object_data column stores the complete data for the object in JSONB format. The created_at and updated_at columns store timestamps for when the row was created and last updated, respectively.

Additionally, this query creates a trigger function named update_location() that is triggered when a row is inserted or updated in the objectstorage table. The function updates the location column based on the values in the latitude and longitude columns using the ST_SetSRID() and ST_MakePoint() functions from PostGIS. The ST_SetSRID() function sets the coordinate system for the point, and the ST_MakePoint() function creates a point geometry object from the latitude and longitude values. The function returns the updated row.

To simplify our database interactions, we’ll create UPSERT functions as needed. Here’s an example of an UPSERT function we can use for the objectstorage table.

CREATE OR REPLACE FUNCTION upsert_objectstorage(
    p_id UUID, 
    p_latitude DECIMAL(9,6), 
    p_longitude DECIMAL(9,6),
    p_object_type TEXT,
    p_object_data JSONB
) RETURNS VOID AS $$
BEGIN
    -- Try to update the existing row
    UPDATE objectstorage SET
        latitude = p_latitude,
        longitude = p_longitude,
        location = ST_SetSRID(ST_MakePoint(p_longitude, p_latitude), 4326),
        object_type = p_object_type,
        object_data = p_object_data,
        updated_at = CURRENT_TIMESTAMP
    WHERE id = p_id;
    
    -- If no row was updated, insert a new one
    IF NOT FOUND THEN
        INSERT INTO objectstorage (id, latitude, longitude, location, object_type, object_data)
        VALUES (p_id, p_latitude, p_longitude, ST_SetSRID(ST_MakePoint(p_longitude, p_latitude), 4326), p_object_type, p_object_data);
    END IF;
END;
$$ LANGUAGE plpgsql;

Below is the code for the “activity” table, which is the central piece of an activity stream system. It includes a trigger function that updates the “location” column using PostGIS.

CREATE TABLE activity (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    verb TEXT NOT NULL,
    actor_id UUID NOT NULL REFERENCES objectstorage(id),
    object_id UUID NOT NULL REFERENCES objectstorage(id),
    target_id UUID REFERENCES objectstorage(id),
    latitude DECIMAL(9,6) NOT NULL,
    longitude DECIMAL(9,6) NOT NULL,
    location GEOMETRY(Point, 4326) NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE OR REPLACE FUNCTION update_activity_location() RETURNS TRIGGER AS $$
BEGIN
    NEW.location := ST_SetSRID(ST_MakePoint(NEW.longitude, NEW.latitude), 4326);
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER set_activity_location
    BEFORE INSERT OR UPDATE
    ON activity
    FOR EACH ROW
    EXECUTE FUNCTION update_activity_location();

 

Now the UPSERT function for the activity table

 

CREATE OR REPLACE FUNCTION upsert_activity(
    p_id UUID,
    p_verb TEXT,
    p_actor_id UUID,
    p_object_id UUID,
    p_target_id UUID,
    p_latitude DECIMAL(9,6),
    p_longitude DECIMAL(9,6)
) RETURNS VOID AS $$
BEGIN
    -- Try to update the existing row
    UPDATE activity SET
        verb = p_verb,
        actor_id = p_actor_id,
        object_id = p_object_id,
        target_id = p_target_id,
        latitude = p_latitude,
        longitude = p_longitude,
        location = ST_SetSRID(ST_MakePoint(p_longitude, p_latitude), 4326),
        updated_at = CURRENT_TIMESTAMP
    WHERE id = p_id;
    
    -- If no row was updated, insert a new one
    IF NOT FOUND THEN
        INSERT INTO activity (id, verb, actor_id, object_id, target_id, latitude, longitude, location)
        VALUES (p_id, p_verb, p_actor_id, p_object_id, p_target_id, p_latitude, p_longitude, ST_SetSRID(ST_MakePoint(p_longitude, p_latitude), 4326));
    END IF;
END;
$$ LANGUAGE plpgsql;



To avoid serialization issues and redundant code, we’ll modify our queries to return JSON arrays. We’ll add a new column named “self” to the activity table and create a trigger that saves the current activity values in JSON format.

 

ALTER TABLE activity ADD COLUMN self JSON;

CREATE OR REPLACE FUNCTION update_activity_self() RETURNS TRIGGER AS $$
BEGIN
    NEW.self = json_build_object(
        'id', NEW.id,
        'verb', NEW.verb,
        'actor_id',NEW.actor_id,
        'actor', (SELECT object_data FROM objectstorage WHERE id = NEW.actor_id),
        'object_id',NEW.object_id,
        'object', (SELECT object_data FROM objectstorage WHERE id = NEW.object_id),
        'target_id',NEW.target_id,
        'target', (SELECT object_data FROM objectstorage WHERE id = NEW.target_id),
        'latitude', NEW.latitude,
        'longitude', NEW.longitude,
        'created_at', NEW.created_at,
        'updated_at', NEW.updated_at
    )::jsonb;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER activity_self_trigger
    BEFORE INSERT OR UPDATE ON activity
    FOR EACH ROW
    EXECUTE FUNCTION update_activity_self();

CREATE OR REPLACE FUNCTION get_activities_by_distance_as_json(
    p_lat NUMERIC,
    p_long NUMERIC,
    p_distance INTEGER,
    p_page_num INTEGER,
    p_page_size INTEGER
) 
RETURNS JSON
AS $$
DECLARE
    activities_json JSON;
BEGIN
    SELECT json_agg(a.self) INTO activities_json
    FROM (
        SELECT a.self
        FROM activity a
        WHERE ST_DWithin(location::geography, ST_SetSRID(ST_Point(p_long, p_lat), 4326)::geography, p_distance)
        ORDER BY created_at DESC
        LIMIT p_page_size
        OFFSET (p_page_num - 1) * p_page_size
    ) a;
    
    RETURN activities_json;
END;
$$ LANGUAGE plpgsql;

 

An activity stream without a follow functionality would defeat the main purpose of an activity stream, which is to keep track of the activities of other actors without the need to constantly visit their profile page.

So here is the code for the follow functionality

CREATE TABLE follow (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    follower_id UUID NOT NULL REFERENCES objectstorage(id),
    followee_id UUID NOT NULL REFERENCES objectstorage(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

CREATE OR REPLACE FUNCTION follow_user(
    p_follower_id UUID,
    p_followee_id UUID
) RETURNS VOID AS $$
BEGIN
    -- Try to insert a new row into the follow table
    -- If the row already exists, do nothing
    BEGIN
        INSERT INTO follow (follower_id, followee_id)
        VALUES (p_follower_id, p_followee_id);
    EXCEPTION WHEN unique_violation THEN
        RETURN;
    END;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION unfollow_user(
    p_follower_id UUID,
    p_followee_id UUID
) RETURNS VOID AS $$
BEGIN
    -- Delete the row from the follow table where the follower_id and followee_id match
    DELETE FROM follow
    WHERE follower_id = p_follower_id AND followee_id = p_followee_id;
END;
$$ LANGUAGE plpgsql;

 

To create an activity stream, we need to first identify the actors that we are following. To accomplish this, we can define a function that takes an ID of an object from our object storage and retrieves the IDs of all the actors that are being followed by that object.

Here’s the function code:

CREATE OR REPLACE FUNCTION get_following_ids(p_user_id UUID)
RETURNS UUID[] AS $$
DECLARE
  following_ids UUID[];
BEGIN
  SELECT ARRAY_AGG(followee_id) INTO following_ids
  FROM follow
  WHERE follower_id = p_user_id;
  
  RETURN following_ids;
END;
$$ LANGUAGE plpgsql;


 

Now that we have obtained the list of actors we are following, the next step is to retrieve their activities. However, this can be a challenging task due to two reasons: first, using a relational database could result in complex joins that could slow down the data retrieval process; second, the actors we are following might have produced a large number of activities, and retrieving them all at once could potentially overload the server. To address these issues, we will introduce pagination to our queries to ensure efficient and scalable data retrieval.

 

CREATE OR REPLACE FUNCTION get_activities_by_following(p_page_num INTEGER, p_page_size INTEGER, p_following_ids UUID[])
    RETURNS TABLE (
        id UUID,
        verb TEXT,
        actor_id UUID,
        object_id UUID,
        target_id UUID,
        latitude DECIMAL(9,6),
        longitude DECIMAL(9,6),
        location GEOMETRY(Point, 4326),
        self_data JSON,
        created_at TIMESTAMP WITH TIME ZONE,
        updated_at TIMESTAMP WITH TIME ZONE
    ) AS $$
BEGIN
    RETURN QUERY
    SELECT a.id, a.verb, a.actor_id, a.object_id, a.target_id, a.latitude, a.longitude, a.location, a."self" , a.created_at, a.updated_at
    FROM activity a
    WHERE a.actor_id = ANY(p_following_ids)
    ORDER BY a.created_at DESC
    LIMIT p_page_size
    OFFSET (p_page_num - 1) * p_page_size;
END;
$$ LANGUAGE plpgsql;

 

We need a function that takes the result produced by the get_activities_by_following function, and converts it into a JSON array.

 

CREATE OR REPLACE FUNCTION get_activities_by_following_as_json(p_page_num INTEGER, p_page_size INTEGER, p_user_id UUID)
RETURNS JSON AS $$
DECLARE
    following_ids UUID[] := ARRAY(SELECT get_following_ids(p_user_id));
BEGIN
    RETURN (SELECT json_agg(self_data) FROM get_activities_by_following(p_page_num, p_page_size, following_ids));
END;
$$ LANGUAGE plpgsql;

 

To demonstrate our activity stream system, we need to create sample data. Let’s create 5 users and have them post ads on our objectstorage table.

 

--create users and activities

SELECT upsert_objectstorage(
    'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', -- object ID 1
    59.9311, -- latitude
    30.3609, -- longitude
    'user', -- object type
    '{"name": "Alice", "age": 27, "email": "alice@example.com", "picture_url": "https://example.com/pictures/alice.jpg"}' -- object data in JSON format
);
SELECT upsert_objectstorage(
    'cc7ebda2-019c-4387-925c-352f7e1f0b10', -- object ID 2
    59.9428, -- latitude
    30.3071, -- longitude
    'user', -- object type
    '{"name": "Bob", "age": 33, "email": "bob@example.com", "picture_url": "https://example.com/pictures/bob.jpg"}' -- object data in JSON format
);

SELECT upsert_objectstorage(
    '99875f15-49ee-4e6d-b356-cbab4f4e4a4c', -- object ID 3
    59.9375, -- latitude
    30.3086, -- longitude
    'user', -- object type
    '{"name": "Charlie", "age": 42, "email": "charlie@example.com", "picture_url": "https://example.com/pictures/charlie.jpg"}' -- object data in JSON format
);

SELECT upsert_objectstorage(
    '34f6c0a5-5d5e-463f-a2cf-11b7529a92a1', -- object ID 4
    59.9167, -- latitude
    30.25, -- longitude
    'user', -- object type
    '{"name": "Dave", "age": 29, "email": "dave@example.com", "picture_url": "https://example.com/pictures/dave.jpg"}' -- object data in JSON format
);

SELECT upsert_objectstorage(
    '8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', -- object ID 5
    59.9391, -- latitude
    30.3158, -- longitude
    'user', -- object type
    '{"name": "Eve", "age": 25, "email": "eve@example.com", "picture_url": "https://example.com/pictures/eve.jpg"}' -- object data in JSON format
);

--create ads

-- Bob's ad
SELECT upsert_objectstorage(
'f6c7599e-8161-4d54-82ec-faa13bb8cbf7', -- object ID
59.9428, -- latitude (near Saint Petersburg)
30.3071, -- longitude (near Saint Petersburg)
'ad', -- object type
'{"description": "Vintage bicycle, good condition", "ad_type": "sale", "picture_url": "https://example.com/pictures/bicycle.jpg"}' -- object data in JSON format
);

SELECT upsert_activity(
gen_random_uuid(), -- activity ID
'post', -- verb
'cc7ebda2-019c-4387-925c-352f7e1f0b10', -- actor ID (Bob)
'f6c7599e-8161-4d54-82ec-faa13bb8cbf7', -- object ID (Bob's ad)
NULL, -- target ID (no target)
59.9428, -- latitude (near Saint Petersburg)
30.3071 -- longitude (near Saint Petersburg)
);

-- Charlie's ad
SELECT upsert_objectstorage(
'41f7c558-1cf8-4f2b-b4b4-4d4e4df50843', -- object ID
59.9375, -- latitude (near Saint Petersburg)
30.3086, -- longitude (near Saint Petersburg)
'ad', -- object type
'{"description": "Smartphone, unlocked", "ad_type": "sale", "picture_url": "https://example.com/pictures/smartphone.jpg"}' -- object data in JSON format
);

SELECT upsert_activity(
gen_random_uuid(), -- activity ID
'post', -- verb
'99875f15-49ee-4e6d-b356-cbab4f4e4a4c', -- actor ID (Charlie)
'41f7c558-1cf8-4f2b-b4b4-4d4e4df50843', -- object ID (Charlie's ad)
NULL, -- target ID (no target)
59.9375, -- latitude (near Saint Petersburg)
30.3086 -- longitude (near Saint Petersburg)
);


-- Dave's ad
SELECT upsert_objectstorage(
'c3dd7b47-1bba-4916-8a6a-8b5f2b50ba88', -- object ID
59.9139, -- latitude (near Saint Petersburg)
30.3341, -- longitude (near Saint Petersburg)
'ad', -- object type
'{"description": "Vintage camera, working condition", "ad_type": "exchange", "picture_url": "https://example.com/pictures/camera.jpg"}' -- object data in JSON format
);

SELECT upsert_activity(
gen_random_uuid(), -- activity ID
'post', -- verb
'34f6c0a5-5d5e-463f-a2cf-11b7529a92a1', -- actor ID (Dave)
'c3dd7b47-1bba-4916-8a6a-8b5f2b50ba88', -- object ID (Dave's ad)
NULL, -- target ID (no target)
59.9139, -- latitude (near Saint Petersburg)
30.3341 -- longitude (near Saint Petersburg)
);

-- Eve's ad
SELECT upsert_objectstorage(
'3453f3c1-296d-47a5-a5a5-f5db5ed3f3b3', -- object ID
59.9375, -- latitude (near Saint Petersburg)
30.3141, -- longitude (near Saint Petersburg)
'ad', -- object type
'{"description": "Plants, various types", "ad_type": "want", "picture_url": "https://example.com/pictures/plants.jpg"}' -- object data in JSON format
);

SELECT upsert_activity(
gen_random_uuid(), -- activity ID
'post', -- verb
'8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', -- actor ID (Eve)
'3453f3c1-296d-47a5-a5a5-f5db5ed3f3b3', -- object ID (Eve's ad)
NULL, -- target ID (no target)
59.9375, -- latitude (near Saint Petersburg)
30.3141 -- longitude (near Saint Petersburg)
);

-- Alice's ad
SELECT upsert_objectstorage(
    'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c02', -- new object ID for Alice's ad
    59.9311, -- latitude
    30.3609, -- longitude
    'ad', -- object type
    '{"description": "Used bicycle, good condition", "ad_type": "sell", "picture_url": "https://example.com/pictures/bicycle.jpg"}' -- ad data in JSON format
);

SELECT upsert_activity(
    gen_random_uuid(), -- activity ID
    'post', -- verb
    'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', -- actor ID (Alice)
    'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c02', -- object ID (Alice's ad)
    NULL, -- target ID (no target)
    59.9311, -- latitude
    30.3609 -- longitude
);
-- Charly's ad 
SELECT upsert_objectstorage(
    '99875f15-49ee-4e6d-b356-cbab4f4e4a4d', -- new object ID for Charlie's ad
    59.9375, -- latitude
    30.3086, -- longitude
    'ad', -- object type
    '{"description": "Books, various genres", "ad_type": "sell", "picture_url": "https://example.com/pictures/books.jpg"}' -- ad data in JSON format
);

SELECT upsert_activity(
    gen_random_uuid(), -- activity ID
    'post', -- verb
    '99875f15-49ee-4e6d-b356-cbab4f4e4a4c', -- actor ID (Charlie)
    '99875f15-49ee-4e6d-b356-cbab4f4e4a4d', -- object ID (Charlie's ad)
    NULL, -- target ID (no target)
    59.9428, -- latitude
    30.3071 -- longitude
);
-- Bob's ad
SELECT upsert_objectstorage(
    'cc7ebda2-019c-4387-925c-352f7e1f0b12', -- new object ID for Bob's ad
    59.9428, -- latitude
    30.3071, -- longitude
    'ad', -- object type
    '{"description": "Vintage record player, needs repair", "ad_type": "exchange", "picture_url": "https://example.com/pictures/record_player.jpg"}' -- ad data in JSON format
);
SELECT upsert_activity(
    gen_random_uuid(), -- activity ID
    'post', -- verb
    'cc7ebda2-019c-4387-925c-352f7e1f0b10', -- actor ID (Bob)
    'cc7ebda2-019c-4387-925c-352f7e1f0b12', -- object ID (Bob's ad)
    NULL, -- target ID (no target)
    59.9428, -- latitude
    30.3071 -- longitude
);

 

Now that we have created objects and activities, the activity stream will still be empty because actors need to follow each other to generate activity. Therefore, we need to establish follow relationships between users to create a stream showing their activities.

 

-- Follow data

-- Follow Eve and Alice to themselves
SELECT follow_user('8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', '8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54');
SELECT follow_user('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', 'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01');
-- Follow Eve and Alice to Bob, Charlie, and Dave
SELECT follow_user('8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', 'cc7ebda2-019c-4387-925c-352f7e1f0b10');
SELECT follow_user('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', 'cc7ebda2-019c-4387-925c-352f7e1f0b10');
SELECT follow_user('8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', '99875f15-49ee-4e6d-b356-cbab4f4e4a4c');
SELECT follow_user('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', '99875f15-49ee-4e6d-b356-cbab4f4e4a4c');
SELECT follow_user('8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54', '34f6c0a5-5d5e-463f-a2cf-11b7529a92a1');
SELECT follow_user('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01', '34f6c0a5-5d5e-463f-a2cf-11b7529a92a1');

-- Follow data

 

It’s time to test our activity stream first lets try to get the followers for the user Alice

SELECT get_following_ids('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01') -- get the objects that Allice is following

here is the result

get_following_ids                                                                                                                                    
-----------------------------------------------------------------------------------------------------------------------------------------------------+
{
b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01,
cc7ebda2-019c-4387-925c-352f7e1f0b10,
99875f15-49ee-4e6d-b356-cbab4f4e4a4c,
34f6c0a5-5d5e-463f-a2cf-11b7529a92a1
}

Now lets get the activities of the objects that Alice is following, we will get page 1 and how 10 records per page

SELECT * FROM get_activities_by_following(1, 10, ARRAY(SELECT get_following_ids('b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01')));

here is the result

id                                  |verb|actor_id                            |object_id                           |target_id|
------------------------------------+----+------------------------------------+------------------------------------+---------+
f905356f-2fe3-4f51-b6de-d2cd107f46b8|post|cc7ebda2-019c-4387-925c-352f7e1f0b10|f6c7599e-8161-4d54-82ec-faa13bb8cbf7|         |
43a92964-5bcd-4096-93bc-e5e87c76455e|post|99875f15-49ee-4e6d-b356-cbab4f4e4a4c|41f7c558-1cf8-4f2b-b4b4-4d4e4df50843|         |
69ec53ac-bbaa-4c36-81ef-8764647d7914|post|34f6c0a5-5d5e-463f-a2cf-11b7529a92a1|c3dd7b47-1bba-4916-8a6a-8b5f2b50ba88|         |
de6b052f-8a84-4b37-9920-9f76cbb539d4|post|b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01|b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c02|         |
3c35544a-3ee0-47ae-bddc-1017127ff4d6|post|99875f15-49ee-4e6d-b356-cbab4f4e4a4c|99875f15-49ee-4e6d-b356-cbab4f4e4a4d|         |
e76dcbb9-56c4-46d8-bb42-2f67dec4c5aa|post|cc7ebda2-019c-4387-925c-352f7e1f0b10|cc7ebda2-019c-4387-925c-352f7e1f0b12|         |


 

Now lets makes this better and get the activities in JSON format

SELECT * FROM get_activities_by_following_as_json(1, 2, 'b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01');

and here is the result

[
   {
      "id":"43a92964-5bcd-4096-93bc-e5e87c76455e",
      "verb":"post",
      "actor":{
         "age":42,
         "name":"Charlie",
         "email":"charlie@example.com",
         "picture_url":"https://example.com/pictures/charlie.jpg"
      },
      "object":{
         "ad_type":"sale",
         "description":"Smartphone, unlocked",
         "picture_url":"https://example.com/pictures/smartphone.jpg"
      },
      "target":null,
      "actor_id":"99875f15-49ee-4e6d-b356-cbab4f4e4a4c",
      "latitude":59.937500,
      "longitude":30.308600,
      "object_id":"41f7c558-1cf8-4f2b-b4b4-4d4e4df50843",
      "target_id":null,
      "created_at":"2023-03-12T17:54:11.636928+03:00",
      "updated_at":"2023-03-12T17:54:11.636928+03:00"
   },
   {
      "id":"f905356f-2fe3-4f51-b6de-d2cd107f46b8",
      "verb":"post",
      "actor":{
         "age":33,
         "name":"Bob",
         "email":"bob@example.com",
         "picture_url":"https://example.com/pictures/bob.jpg"
      },
      "object":{
         "ad_type":"sale",
         "description":"Vintage bicycle, good condition",
         "picture_url":"https://example.com/pictures/bicycle.jpg"
      },
      "target":null,
      "actor_id":"cc7ebda2-019c-4387-925c-352f7e1f0b10",
      "latitude":59.942800,
      "longitude":30.307100,
      "object_id":"f6c7599e-8161-4d54-82ec-faa13bb8cbf7",
      "target_id":null,
      "created_at":"2023-03-12T17:54:11.636928+03:00",
      "updated_at":"2023-03-12T17:54:11.636928+03:00"
   }
]

 

And now before I go, here is a good , this query will return all the activities that happened around a specific geo location

 

SELECT get_activities_by_distance_as_json(59.9343, 30.3351, 1600, 1, 10);

Here are the results, all those places are near my home ))

         "name":"Charlie",
         "email":"charlie@example.com",
         "picture_url":"https://example.com/pictures/charlie.jpg"
      },
      "object":{
         "ad_type":"sale",
         "description":"Smartphone, unlocked",
         "picture_url":"https://example.com/pictures/smartphone.jpg"
      },
      "target":null,
      "actor_id":"99875f15-49ee-4e6d-b356-cbab4f4e4a4c",
      "latitude":59.937500,
      "longitude":30.308600,
      "object_id":"41f7c558-1cf8-4f2b-b4b4-4d4e4df50843",
      "target_id":null,
      "created_at":"2023-03-12T17:54:11.636928+03:00",
      "updated_at":"2023-03-12T17:54:11.636928+03:00"
   },
   {
      "id":"e5e26df0-e58f-4b25-96c1-5b3460beb0c8",
      "verb":"post",
      "actor":{
         "age":25,
         "name":"Eve",
         "email":"eve@example.com",
         "picture_url":"https://example.com/pictures/eve.jpg"
      },
      "object":{
         "ad_type":"want",
         "description":"Plants, various types",
         "picture_url":"https://example.com/pictures/plants.jpg"
      },
      "target":null,
      "actor_id":"8d7685d5-5b1f-4a7a-835e-b89e7d3a3b54",
      "latitude":59.937500,
      "longitude":30.314100,
      "object_id":"3453f3c1-296d-47a5-a5a5-f5db5ed3f3b3",
      "target_id":null,
      "created_at":"2023-03-12T17:54:11.636928+03:00",
      "updated_at":"2023-03-12T17:54:11.636928+03:00"
   },
   {
      "id":"de6b052f-8a84-4b37-9920-9f76cbb539d4",
      "verb":"post",
      "actor":{
         "age":27,
         "name":"Alice",
         "email":"alice@example.com",
         "picture_url":"https://example.com/pictures/alice.jpg"
      },
      "object":{
         "ad_type":"sell",
         "description":"Used bicycle, good condition",
         "picture_url":"https://example.com/pictures/bicycle.jpg"
      },
      "target":null,
      "actor_id":"b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c01",
      "latitude":59.931100,
      "longitude":30.360900,
      "object_id":"b8dcbf13-cb01-4a35-93d5-5a5f5a2f6c02",
      "target_id":null,
      "created_at":"2023-03-12T17:54:11.636928+03:00",
      "updated_at":"2023-03-12T17:54:11.636928+03:00"
   }
]

 

In conclusion, this article provided a step-by-step guide on how to create an activity stream system using PostgreSQL as the database storage. The article covered various aspects, such as the creation of the object storage table, activity table, follow functionality, and pagination to handle the huge amount of data generated by users. Additionally, the article discussed the use of PostGIS extensions for geographical search and the benefits of using JSON columns in PostgreSQL to store complex data structures. Overall, the article provided a comprehensive guide to building an activity stream system that can handle a large volume of data efficiently. By following this guide, developers can create their own activity stream systems using PostgreSQL and implement them into their applications.

You can find the complete code for this tutorial here :

https://github.com/egarim/PostgresActivityStream