Carview!

PostgreSQL Contributor Story: Florin Irion

Posted by Floor Drees in EDB on 2026-01-22 at 12:35

In 2025 we started a program to help colleagues who show promise for PostgreSQL Development to become contributors. In this post we highlight Florin's journey, a Staff SDE at EDB based in Italy.

Understanding ALTER TABLE Behavior on Partitioned Tables in PostgreSQL

Posted by Chao Li in Highgo Software on 2026-01-21 at 08:53

Partitioned tables are a core PostgreSQL feature, but one area still causes regular confusion—even for experienced users:

How exactly does ALTER TABLE behave when partitions are involved?

Does an operation propagate to partitions? Does it affect future partitions? Does ONLY do what it claims? Why do some commands work on parents but not on partitions—or vice versa?

Today, PostgreSQL documentation describes individual ALTER TABLE sub-commands well, but it rarely explains their interaction with partitioned tables as a whole. As a result, users often discover the real behavior only through trial and error.

This post summarizes a systematic investigation of ALTER TABLE behavior on partitioned tables, turning scattered rules into a consistent classification model.

The Problem: “Inconsistent” Is Not the Same as “Undocumented”

The PostgreSQL community often describes ALTER TABLE behavior on partitioned tables as inconsistent. In practice, the deeper problem is that:

The rules do exist, but
They are spread across code paths, error messages, and historical decisions, and
They are not documented in a way that lets users predict outcomes.

Without a mental model, even simple questions become hard to answer:

If I run this on the parent, what happens to existing partitions?
What about partitions created later?
Does ONLY prevent propagation—or is it ignored?
Can I override settings per partition?

How I Evaluated Each `ALTER TABLE` Sub-command

To make sense of this, I tested ALTER TABLE sub-commands against partitioned tables using the same set of questions each time.

Four evaluation criteria

For every sub-command, I asked:

Propagation
Does the action on a parent partitioned table propagate to existing partitions?
Inheritance for new partitions
If I create a new partition later, does it inherit

[...]

PDXPUG February 19th, 2026: What’s New in PostgreSQL 18

Posted by Mark Wong on 2026-01-20 at 21:58

2026 Thursday February 19th Meeting 6:30pm:8:30pm

Please note the new meeting location. And please RSVP on MeetUp as space is limited.

Location: Multnomah Arts Center – The front desk can guide you to the meeting room.

7688 SW CAPITOL HWY • PORTLAND, OR 97219

Speaker: Mark Wong

PostgreSQL 18 was released September 25, 2025.

We will review freely available presentations available on the internet.

Come learn what’s new, share experiences, or just meet with local peers! Casual, informal.

https://www.meetup.com/pdxpug/events/312977438/

Postgres Serials Should be BIGINT (and How to Migrate)

Posted by Elizabeth Garrett Christensen in Crunchy Data on 2026-01-20 at 13:00

Lots of us started with a Postgres database that incremented with an id SERIAL PRIMARY KEY. This was the Postgres standard for many years for data columns that auto incremented. The SERIAL is a shorthand for an integer data type that is automatically incremented. However as your data grows in size, SERIALs and INTs can run the risk of an integer overflow as they get closer to 2 Billion uses.

We covered a lot of this in a blog post The Integer at the End of the Universe: Integer Overflow in Postgres a few years ago. Since that was published we’ve helped a number of customers with this problem and I wanted to refresh the ideas and include some troubleshooting steps that can be helpful. I also think that BIGINT is more cost effective than folks realize.

SERIAL and BIGSERIAL are just shorthands and map directly to the INT and BIGINT data types. While something like CREATE TABLE user_events (id SERIAL PRIMARY KEY) would have been common in the past, the best practice now is BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY is recommended. SERIAL/ BIGSERIAL are not SQL standard and the GENERATED ALWAYS keyword prevents accidental inserts, guaranteeing the database manages the sequence instead of a manual or application based addition.

INT - goes up to 2.1 Billion (2,147,483,647) and more if you do negative numbers. INT takes up 4 bytes per row column.
BIGINT- goes up 9.22 quintillion (9,223,372,036,854,775,807) and needs a 8-bytes for storage.

Serials vs UUID

Before I continue talking about serials in Postgres, it is worth noting that Postgres also has robust UUID support, including v7 which was just released. If you decide to go with UUID, great. This makes a ton of sense for things that can be URLs or are across systems. However not all ids need to be UUIDs, so lots of folks still continue with a serialized / incremented integers.

Cost difference between INT and BIGINT

Postgres does not pack data tightly like a text file. It writes data in aligned tuples / rows, and sta

[...]

PostgreSQL on Kubernetes vs VMs: A Technical Decision Guide

Posted by Umair Shahid in Stormatics on 2026-01-20 at 11:01

If your organization is standardizing on Kubernetes, this question shows up fast:

“Should PostgreSQL run on Kubernetes too?”

The worst answers are the confident ones:

“Yes, because everything else is on Kubernetes.”
“No, because databases are special.”

Both are lazy. The right answer depends on what you’re optimizing for: delivery velocity, platform consistency, latency predictability, operational risk, compliance constraints, and, most importantly, who is on-call when things go sideways.

I have seen PostgreSQL run very well on Kubernetes. I’ve also seen teams pay a high “complexity tax” for benefits they never actually used. This post is an attempt to give you a technical evaluation you can use to make a decision that fits your environment.

Start with the real question: are you running a database, or building a database platform?

This is the cleanest framing I have found:

Running a database: You have a small number of production clusters that are business-critical. You want predictable performance, understandable failure modes, straightforward upgrades, and clean runbooks.
Building a database platform: You want self-service provisioning, standardized guardrails, GitOps workflows, multi-tenancy controls, and a repeatable API so teams can spin up PostgreSQL clusters without opening tickets.

Kubernetes shines in the second world. VMs shine in the first.

Yes, you can do either on either platform. But the default fit differs.

A neutral comparison model: 6 dimensions that actually matter

Here is a practical rubric you can use in architecture reviews.

If you want a quick decision shortcut:

If your main goal is self-service and standardization, Kubernetes is compelling. If your main goal is pre

[...]

4 causes of table bloat in PostgreSQL and how to address them

Posted by Shinya Kato on 2026-01-20 at 08:55

What Is Table Bloat?

Table bloat in PostgreSQL refers to the phenomenon where "dead tuples" generated by UPDATE or DELETE operations remain uncollected by VACUUM, causing data files to grow unnecessarily large.

For VACUUM to reclaim dead tuples, it must be guaranteed that those tuples "cannot possibly be referenced by any currently running transaction." If old transactions persist for any reason, VACUUM's garbage collection stops at that point.

This article explains the following four causes of table bloat: how each manifests, how to identify the root cause, and how to resolve it.

Long-running transactions
Uncommitted prepared transactions
Queries on standby servers with hot_standby_feedback enabled
Logical replication lag

Test Environment

PostgreSQL 19dev (34740b90bc123d645a3a71231b765b778bdcf049)

Long-Running Transactions

This is probably the most familiar cause. Whether active or idle, a long-running transaction prevents VACUUM from reclaiming dead tuples generated by UPDATE or DELETE operations that occurred after the transaction started. This is because the long-running transaction might need to read the pre-update versions of those tuples.

Setup

In Terminal 1, start a transaction and obtain a transaction ID.

Terminal 1:
=# BEGIN;
BEGIN
=*# SELECT txid_current();
 txid_current
--------------
          782
(1 row)

In a separate Terminal 2, delete data and run VACUUM.

Terminal 2:
=# DELETE FROM t;
DELETE 100
=# VACUUM (VERBOSE) t;
INFO:  00000: vacuuming "postgres.public.t"
LOCATION:  heap_vacuum_rel, vacuumlazy.c:848
INFO:  00000: finished vacuuming "postgres.public.t": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total), 0 eagerly scanned
tuples: 0 removed, 100 remain, 100 are dead but not yet removable
removable cutoff: 782, which was 2 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
visibility map: 0 pages set all-visible, 0 pages set all-frozen (0 were all-visible)

[...]

Exploration: CNPG Extensions(ImageVolume)

Posted by Umut TEKIN in Cybertec on 2026-01-20 at 05:49

Introduction

PostgreSQL is the most advanced open source database system and it is widely used across many industries. Among its many strengths, extensibility places PostgreSQL in a unique spot. CNPG has been supporting extensions; however, this traditionally required building custom container images to include the necessary extensions.

This has changed with the introduction of PostgreSQL 18 and Kubernetes 1.33. PostgreSQL 18 has introduced an extension_control_path parameter, while Kubernetes 1.33 adds the ImageVolume feature. Together, these features enable CNPG to dynamically load extensions into a cluster at pod startup.

extension_control_path is a search path for extensions. ImageVolume allows to mount an OCI - compliant container image as an immutable and read - only volume to a pod at a specified filesystem path. Previously, PostgreSQL extensions were tightly coupled with the CNPG container images, but this is no longer the case. By leveraging the ImageVolume feature, extensions no longer need to be embedded at image build time. As a result, we can;

Stick with official images
Dynamically add extensions to a cluster without rebuilding container images
Have more flexibility due to decoupling container images from the distribution of the extensions

Requirements

In order to use this feature, we need;

PostgreSQL 18+
Kubernetes 1.33+
Container runtime with ImageVolume support:
- containerd v2.1.0+
- CRI-O v1.31+
CloudNativePG-compatible extension container images

Bootstrapping A Cluster with the pgvector Extension

Prepare a manifest file like the following:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: physics
spec:
  imageName: ghcr.io/cloudnative-pg/postgresql:18-minimal-trixie
  instances: 1
  storage:
    size: 1Gi
  postgresql:
    extensions:
    - name: pgvector
      image:
        reference: ghcr.io/cloudnative-pg/pgvector:0.8.1-18-trixie
---
apiVersion: postgresql.cnpg.i

[...]

How Blocking-Lock Brownouts Can Escalate from Row-Level to Complete System Outages

Posted by Jeremy Schneider on 2026-01-20 at 04:23

This article is a shortened version. For the full writeup, go to https://github.com/ardentperf/pg-idle-test/tree/main/conn_exhaustion

This test suite demonstrates a failure mode when application bugs which poison connection pools collide with PgBouncers that are missing peer config and positioned behind a load balancer. PgBouncer’s peering feature (added with v1.19 in 2023) should be configured if multiple PgBouncers are being used with a load balancer – this feature prevents the escalation demonstrated here.

The failures described here are based on real-world experiences. While uncommon, this failure mode has been seen multiple times in the field.

Along the way, we discover unexpected behaviors (bugs?) in Go’s database/sql (or sqlx) connection pooler with the pgx client and in Postgres itself.

Sample output: https://github.com/ardentperf/pg-idle-test/actions/workflows/test.yml

The Problem in Brief

Go’s database/sql allows connection pools to become poisoned by returning connections with open transactions for re-use. Transactions opened with db.BeginTx() will be cleaned up, but – for example – conn.ExecContext(..., "BEGIN") will not be cleaned up. PR #2481 proposes some cleanup logic in pgx for database/sql connection pools (not yet merged); I tested the PR with this test suite. The PR relies on the TxStatus indicator in the ReadyForStatus message which Postgres sends back to the client as part of its network protocol.

A poisoned connection pool can cause an application brownout since other sessions updating the same row wait indefinitely for the blocking transaction to commit or rollback its own update. On a high-activity or critical table, this can quickly lead to significant pile-ups of connections waiting to update the same locked row. With Go this means context deadline timeouts and retries and connection thrashing by all of the threads and processes that are trying to update the row. Backoff logic is often lacking in these code paths. When there is a currently running SQL (hung –

[...]

MERISE: The French Database Modeling Superpower That Could Save Your Data Model

Posted by Lætitia AVROT in EDB on 2026-01-20 at 00:00

You saved my life. There’s not one day when I don’t use MERISE. An experienced developer told me this six months after attending my masterclass. When a senior pro says this with half a year of hindsight, it’s not just politeness. It’s because they’ve seen how much damage a lack of methodology can do in the long run. The Death of Data Modeling? 🔗Lately, I’ve noticed a frustrating trend. Many schools are skipping data modeling entirely.

Unconventional PostgreSQL Optimizations

Posted by Haki Benita on 2026-01-19 at 22:00

When it comes to database optimization, developers often reach for the same old tools: rewrite the query slightly differently, slap an index on a column, denormalize, analyze, vacuum, cluster, repeat. Conventional techniques are effective, but sometimes being creative can really pay off!

In this article, I present unconventional optimization techniques in PostgreSQL.

<small>image by <a href="https://www.abstrakt.design">abstrakt design</a></small> — image by abstrakt design

Table of Contents

Eliminate Full Table Scans Based on Check Constraints

Imagine you have this table of users:

db=# CREATE TABLE users (
    id INT PRIMARY KEY,
    username TEXT NOT NULL,
    plan TEXT NOT NULL,
    CONSTRAINT plan_check CHECK (plan IN ('free', 'pro'))
);
CREATE TABLE

For each user you keep their name and which payment plan they're on. There are only two plans, "free" and "pro", so you add a check constraint.

Generate some data and analyze the table:

db=# INSERT INTO users
SELECT n, uuidv4(), (ARRAY['free', 'pro'])[ceil(rando

[...]

The DATE Data Type in Oracle vs. PostgreSQL

Posted by Akhil Reddy Banappagari in HexaCluster on 2026-01-19 at 15:47

Choosing a correct datatype mapping while migrating from Oracle to PostgreSQL is very important to avoid migration failures. Especially when we have date and time involved, it is very important to understand the behavior in both Oracle and PostgreSQL.

Who Contributed to PostgreSQL Development in 2025?

Posted by Robert Haas in EDB on 2026-01-19 at 15:29

Here is another annual blog post breaking down code contributions to PostgreSQL itself (not ecosystem projects) by principal author. I have mentioned every year that this methodology has many limitations and fails to capture a lot of important work, and I reiterate that this year as usual. Nonetheless, many people seem to find these statistics helpful, so here they are.

The strange case of the underestimated Merge Join node

Posted by Frédéric Yhuel in Dalibo on 2026-01-19 at 07:15

This post appeared first on the Dalibo blog.

Brest, France, 19 January 2026

We recently encountered a strange optimizer behaviour, reported by one of our customers:

Customer: “Hi Dalibo, we have a query that is very slow on the first execution after a batch process, and then very fast. We initially suspected a caching effect, but then we noticed that the execution plan was different.”

Dalibo: “That’s a common issue. Autoanalyze didn’t have the opportunity to process the table after the batch job had finished, and before the first execution of the query. You should run the VACUUM ANALYZE command (or at least ANALYZE) immediately after your batch job.”

Customer: “Yes, it actually solves the problem, but… your hypothesis is wrong. We looked at pg_stat_user_tables, and are certain that the tables were not vacuumed or analyzed between the slow and fast executions. We don’t have a production problem, but we would like to understand.”

Dalibo: “That’s very surprising! we would also like to understand…”

So let’s dive in!

Execution plans

The query is quite basic (table and column names have been anonymized):

SELECT    *
FROM      bar
LEFT JOIN foo ON (bar.a = foo.a)
WHERE     id = 10744501
ORDER BY  bar.x DESC, foo.x DESC;

Here’s the plan of the first execution of the query after the batch job:

                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Sort (cost=17039.22..17042.11 rows=1156 width=786) (actual time=89056.476..89056.480 rows=6 loops=1)
   Sort Key: bar.x DESC, foo.x DESC
   Sort Method: quicksort Memory: 25kB
   Buffers: shared hit=2255368 read=717581 dirtied=71206 written=11997
   -> Merge Right Join (cost=2385.37..16980.41 rows=1156 width=786) (actual time=89056.428..89056.432 rows=6 loops=1)
         Inner Unique: true
         Merge Cond: (foo.a = bar.a)
         Buffe

[...]

Turbocharging LISTEN/NOTIFY with 40x Boost

Posted by Robins Tharakan on 2026-01-18 at 11:40

Unless you've built a massive real-time notification system with thousands of distinct channels, it is easy to miss the quadratic performance bottleneck that Postgres used to have in its notification queue. A recent commit fixes that with a spectacular throughput improvement.

The commit in question is 282b1cde9d, which landed recently and targets a future release (likely Postgres 19, though as with all master branch commits, there's a standard caveat that it could be rolled back before release).

The Problem

Previously, when a backend sent a notification (via NOTIFY), Postgres had to determine which other backends were listening to that channel. The implementation involved essentially a linear search or walk through a list for each notification, which became costly when the number of unique channels grew large.

If you had a scenario with thousands of unique channels (e.g., a "table change" notification system where each entity has its own channel), the cost of dispatching notifications would scale quadratically. For a transaction sending N notifications to N different channels, the effort was roughly O(N^2).

The Fix

The optimization introduces a shared hash table to track listeners. Instead of iterating through a list to find interested backends, the notifying process can now look up the channel in the hash table to instantly find who (if anyone) is listening. This implementation uses Partitioned Hash Locking, allowing concurrent LISTEN/UNLISTEN commands without global lock contention.

Additionally, the patch includes an optimization to "direct advance" the queue pointer for listeners that are not interested in the current batch of messages. This is coupled with a Wake-Only-Tail strategy that signals only the backend at the tail of the queue, avoiding "thundering herd" wake-ups and significantly reducing CPU context switches.

Finally, the patch helps observability by adding specific Wait Events, making it easier to spot contention in pg_stat_activity.

Benchmarking Methodology

[...]

Illinois Prairie PUG January Edition

Posted by Henrietta Dombrovskaya on 2026-01-17 at 15:14

We just had the first meetup of 2026, and all I can say is a huge thank you to Ryan Booz and all attendees, both in person and virtual!

I was so happy to see many familiar faces, as well as first-timers. We had great attendance (one of those rare situations when I didn’t order enough pizza :)). Ryan Booz, who, as I previously mentioned, is one of the few out-of-towners who dare to face Chicago winter weather, presented a great talk on configuring Postgres for effective logging and query-optimization analysis.

I liked the fact that we had 30 participants that early in the year, when people are just starting to get back to their regular activity level. More importantly, we now have a group of active members who not only keep coming to the meetups but also actively listen, participate in discussions, and stay long after the presentation ends, discussing what they just heard, sharing experiences, suggesting future topics, and talking about how we can make Postgres more appealing to application developers! I always have to remind the last group of people staying late that, as much as I love them all, I need to close the house, but those are my happiest moments!

On days like that, I have this strong feeling of community building happening right here, and that’s the most rewarding thing I could wish for.

Presentation slides and sql examples are available here, and below is the meetup recording. Please not the quick turnaround of the video! We tried!

Introducing pgEdge Load Generator: Realistic PostgreSQL Workload Simulation

Posted by Dave Page in pgEdge on 2026-01-16 at 05:52

Anyone who has worked with PostgreSQL in production environments knows that testing database performance is rarely straightforward. Synthetic benchmarks like pgbench are useful for stress testing, but they don't reflect how real applications behave. Production workloads have peaks and troughs, complex query patterns, and user behaviour that varies throughout the day. This is why I'm pleased to introduce the pgEdge Load Generator.

The Problem with Traditional Benchmarks

Most database benchmarking tools focus on raw throughput: how many queries per second can the database handle at maximum load? Whilst this is valuable information, it tells us little about how a system will cope with real-world usage patterns.Consider a typical e-commerce platform. Traffic peaks during lunch breaks and evenings, drops off overnight, and behaves differently at weekends compared to weekdays. A stock trading application has intense activity during market hours and virtually none outside them. These temporal patterns matter enormously for capacity planning, replication testing, and failover validation.

What Is pgEdge Load Generator?

The pgEdge Load Generator (pgedge-loadgen) is a command-line tool that creates realistic PostgreSQL workloads for testing and validation. It's not a benchmarking tool; it's a workload simulator designed to exercise your database in ways that mirror actual application behaviour.The tool provides seven pre-built applications spanning different use cases:Transaction Processing (TPC-based):

-
wholesale
(TPC-C): Classic OLTP with orders, inventory, and payment processing

analytics
(TPC-H): Decision support with 22 complex analytical queries

-
brokerage
(TPC-E): Mixed read/write stock trading simulation

-
retail
(TPC-DS): Multi-channel retail decision support

Semantic Search (pgvector-based):

-
ecommerce
: Product search with vector embeddings

-
knowledgebase
: FAQ and documentation similarity matching

[...]

Hacking Workshop for February 2026

Posted by Robert Haas in EDB on 2026-01-15 at 18:26

For next month, I'm scheduling 2 or 3 discussions of Amit Langote's talk, Hacking Postgres Executor For Performance, given at PGConf India 2025. If you're interested in joining us, please sign up using this form and I will send you an invite to one of the sessions. Thanks to Amit for agreeing to attend the sessions.Read more »

Contributions to PostgreSQL by HexaCluster in 2025

Posted by Avi Vallarapu in HexaCluster on 2026-01-14 at 20:54

Building on a strong foundation in 2024, 2025 has been a year of acceleration, scale, and enterprise impact for HexaCluster’s PostgreSQL initiatives.

Announcing the second PostgreSQL Edinburgh meetup

Posted by Jimmy Angelakos on 2026-01-14 at 15:33

The Lister Learning and Teaching Centre at the University of Edinburgh. Photo by Paul Zanre - COPYRIGHT: PAUL ZANRE PHOTOGRAPHY.

I'm thrilled to announce that the PostgreSQL Edinburgh meetup is back! 🐘

After a fantastic launch, we are gathering again on Thursday, February 12th. We'll be meeting at the University of Edinburgh's Lister Learning and Teaching Centre for another evening to talk tech, eat pizza, and get to know each other.

Whether you're a seasoned DBA, an app developer just getting started with databases, or simply curious about high availability and search integrations, this meetup is for you. All levels are welcome!

The Agenda

Here's the schedule for our second outing, featuring two insightful technical talks.

18:00: Doors Open, Pizza & Networking 🍕
18:55: Introductions & Community Announcements
19:00: Follow Your Leader — Alastair Turner (Percona)
19:40: Break
20:00: Postgres to Elasticsearch Syncing: A "War Story" — Sean Hammond (seanh.cc)
20:45: Event End

The Talks

Follow Your Leader
Alastair Turner from Percona joins us to dive deep into high availability. He will be sharing the critical mechanics of how to ensure availability after a failover occurs—a vital topic for anyone running Postgres in production.

Postgres to Elasticsearch Syncing
Sean Hammond will be sharing a "war story" about a legendary outage. He'll walk us through the complexities and challenges encountered when syncing data between Postgres and Elasticsearch, offering lessons learned.

Join Us!

This event is completely free, but registration is required so we can get the numbers right for the food & refreshments!

Register here 👇

🎟️ luma.com/uj0qvxqk

The meetup is bound by the PostgreSQL Code of Conduct. We are also looking for volunteer help (greeting attendees, logistics, etc.) for this and future meetups. If you can help, please drop me an email at vyruss000 (at) gmail.com.

Follow our community u

[...]

PostgreSQL Contributor Story: Mark Wong

Posted by Floor Drees in EDB on 2026-01-14 at 11:32

Earlier this year we started a program (“Developer U”) to help colleagues who show promise for PostgreSQL Development to become contributors. Meet Mark Wong, wearer of many hats.

Stand Up, Mentor! Help Postgres Shine in GSoC 2026!

Posted by Pavlo Golub in Cybertec on 2026-01-14 at 03:00

Google Summer of Code is back for 2026! We’re celebrating the 22nd year of this incredible program that has brought countless talented developers into the open-source world. Please take a moment to review Google’s announcement and familiarize yourself with what makes this year special.

Now is the time to work on getting together a strong set of projects we’d like to have GSoC contributors work on over the summer. Like previous years, we must have an excellent Ideas list ready before the deadline for mentoring organizations. This list is the primary criterion Google uses to evaluate and accept us into the program.

The deadline for Mentoring organizations to apply is February 3, 2026 at 18:00 UTC. The list of accepted organizations will be published on February 19. Here’s the complete timeline for your planning:

GSoC 2026 Timeline:

January 19: Organization Applications Open
February 3 (18:00 UTC): Deadline for Organization Applications
February 19: Accepted Organizations Announced
March 16 – 31: Contributor Application Period
April 2: Deadline for Final Proposals
May 4: Community Bonding Begins

I’ve already created our GSoC 2026 Ideas page as our central hub for project submissions. The quality of this Ideas list will define the tone of our participation this year, so let’s make it outstanding!

What Makes a Great Project Idea?

Each project on our Ideas page should include these essential elements:

Project Title and Description (2-5 sentences that clearly explain the project)
Expected Outcomes (what will be delivered)
Required/Preferred Skills (programming languages, technologies, domain knowledge)
A potential mentor (two mentors are highly recommended)
Project Size: Small (90 hours), Medium (175 hours), or Large (350 hours)
Difficulty Rating: Easy, Medium, or Hard

Critical Guidelines to Remember:

Never link to just a bug tracker as your project description. T

[...]

PgPedia Week, 2025-12-28

Posted by Ian Barwick on 2026-01-13 at 23:47

PostgreSQL 19 changes this week logical replication logical decoding enabled dynamically based on logical slot presence and does not require wal_level to be set to logical PostgreSQL 18 articles Instant database clones with PostgreSQL 18 (2025-12-22) - Radim Marek

more...

Dealing with integer overflow in sequence-generated primary keys

Posted by Laurenz Albe in Cybertec on 2026-01-13 at 06:00

In a previous article, I recommended using bigint for sequence-generated primary keys (but I make an exception for lookup tables!). If you didn't heed that warning, you might experience integer overflow. That causes downtime and pain. So I thought it would be a good idea to show you how to monitor for the problem and how to keep the worst from happening.

The problem of integer overflow in sequence-generated keys

There are two ways how people typically create sequence-generated primary keys. The “traditional” way is to use serial:

CREATE TABLE tab (
   id serial PRIMARY KEY,
   ...
);

That will actually create a (four-byte) integer column with a DEFAULT value:

\d tab
                             Table "public.tab"
 Column |  Type   | Collation | Nullable |             Default
--------+---------+-----------+----------+---------------------------------
 id     | integer |           | not null | nextval('tab_id_seq'::regclass)
...

The smarter, SQL standard compliant way of creating an auto-incrementing column is to use an identity column:

CREATE TABLE tab (
   id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
   ...
);

An identity column will also create a sequence behind the scenes. No matter which of the two techniques you choose: if you insert a lot of rows, you will eventually get the dreaded error message

ERROR:  nextval: reached maximum value of sequence "tab_id_seq" (2147483647)

Note that you don't need to have a large table to encounter this problem: If you INSERT and DELETE rows frequently, you will hit the same problem.

The classical solution for integer overflow

The solution to the problem is to convert the column to bigint, which is an eight-byte integer:

ALTER TABLE tab ALTER id TYPE bigint;

That solution is simple enough, but modifying a table like that requires PostgreSQL to rewrite the entire table. The reason for rewriting the table is that the storage format of bigint is different from that of a four-byte integer. However, such a t

[...]

The hidden cost of PostgreSQL arrays

Posted by Radim Marek on 2026-01-12 at 20:50

Starting with arrays in PostgreSQL is as simple as declaring a column as integer[], inserting some values, and you are done.

Or building the array on the fly.

SELECT '{1,2,3}'::int[];
SELECT array[1,2,3];

  int4
---------
 {1,2,3}
(1 row)

  array
---------
 {1,2,3}
(1 row)

The official documentation provides a good introduction. But beneath this straightforward interface lies a set of more complex properties than most of us realise. Arrays in PostgreSQL are not just "lists" in a field. They have their own memory management strategy, their own index logic, and a lot of edge-case scenarios.

As it goes with boringSQL deep-dives, this article will explore the corners of array functionality that might break your production.

The document model temptation

Wait? Are we going to talk about JSONB arrays? Not at all. The whole concept of arrays in RDBMSs is actually document storage in disguise.

In database design, locality ensures faster retrieval times by keeping related data close on physical storage.Whether you use a distinct integer[] type or a JSON list [1, 2, 3], you are making the exact same architectural decision: you are prioritising locality over normalisation.

When you store tag_ids in an array, you are embedding related data directly into a row - just like a NoSQL database might embed subdocuments. This is not inherently wrong. Document databases exist for good reasons: they eliminate joins, simplify reads, and map naturally to application objects.

But PostgreSQL is a relational database. It was designed around the relational model, where:

foreign keys enforce referential integrity
joins connect normalised tables
updates modify individual rows, not entire lists

Arrays give you document-model convenience, but you lose relational promises. There are no foreign keys and no ON DELETE referential_action (like CASCADE) for array elements. If you delete a tags entry, the orphaned ID will remain in your array forever.

The rule of thumb is tha

[...]

pg_statviz 0.9 released with new features

Posted by Jimmy Angelakos on 2026-01-12 at 20:30

Happy New Year! I'm excited to announce release 0.9 of pg_statviz, the minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics.

This is a significant feature release that expands the scope of analysis to include several new modules and a visualization update:

Configuration changes visualization: A major update providing a timeline view of PostgreSQL setting changes, allowing you to correlate performance shifts with configuration updates.
Session activity age tracking: The connection module now tracks oldest_xact_age, oldest_query_age, and oldest_backend_age, helping you quickly identify long-running transactions and idle connections that may be holding resources or preventing vacuum cleanup.
Replication statistics: A new module provides visibility into high-availability setups, tracking standby lag from pg_stat_replication and replication slot statistics from pg_stat_replication_slots.
SLRU cache statistics: New metrics for the Simple LRU buffer efficiency allow for better diagnosis of specific internal bottlenecks (like multixact or pg_xact contention).
Checksum failure tracking: Added tracking for checksum failures via pg_stat_database to assist in early detection of data corruption issues.

This release also includes important maintenance and optimizations:

Space optimization: snapshot_conf() has been optimized to only store configuration changes rather than saving values for every snapshot. The upgrade path automatically compacts existing duplicate configuration rows to reclaim space.
Dependency updates: A GitHub Actions CI pipeline has been added, dependencies have been updated to their latest stable versions, and argh has been pinned to <0.30 for CLI argument compatibility.

pg_statviz takes the view that everything should be light and minimal. Unlike commercial observability platforms, it doesn't require invasive agents or open connections to the database —

[...]

Optimizing data throughput for Postgres snapshots with batch size auto-tuning

Posted by Esther Minano in Xata on 2026-01-12 at 14:00

Why static batch size configuration breaks down in real world networks and how automatic batch size tuning improves snapshot throughput.

Updating CloudNativePG's documentation

Posted by Floor Drees in CloudNativePG on 2026-01-12 at 00:00

Meet the mentee: Anushka Saxena worked with the project maintainers on transforming the documentation for CloudNativePG, as part of the LFX mentorship program.

Idle Session Triggers a Transaction Wraparound?

Posted by Virender Singla on 2026-01-10 at 13:51

At first glance, the idea that an idle session could lead to a transaction wraparound might seem counterintuitive. Most PostgreSQL users are familiar with how long-running queries or open transactions can cause table bloat and wraparound risks by pinning the xmin horizon, which prevents autovacuum from reclaiming dead tuples and Transaction IDs.

An idle session causing transaction wraparound indeed involves a temporary table. The real culprit here is the temporary table. If you create a temp table and leave the session idle indefinitely, you risk exhausting your Transaction IDs.

The Root Cause: Why Autovacuum Can’t Help

Temporary tables reside in backend local memory rather than shared_buffers. Because of this isolation, autovacuum cannot access or process them. As other database traffic continues to burn through transaction IDs, the “age” of temporary table increases. Resolution in this specific case is faster, however, because xmin continues to move forward and does not obstruct the vacuum across the entire instance. You can simply perform a vacuum freeze on the temporary table, drop it entirely, or terminate the backend process to rectify the situation.

While temporary table data is invisible to the autovacuum process, autovacuum will intervene to clean up orphaned temporary tables in the pg_catalog if a backend process or the entire instance crashes before it could not clean up the temp table.

The Challenge of Mitigation

This behavior is well-documented in the PostgreSQL manuals, yet it’s an unfortunate edge case where autovacuum is aware of the rising Transaction IDs age but lacks the authority to intervene.

Final Thoughts

While autovacuum cannot be faulted here, the core issue is that DML operations on temporary tables, like those on persistent tables, utilize standard Transaction IDs (xmin, xmax). Recycling these IDs necessitates a vacuum freeze, despite the temporary table’s scope being restricted to a single session. This mechanism is also why temporary tables are incom

[...]

PgPedia Week, 2025-12-21

Posted by Ian Barwick on 2026-01-09 at 07:03

PostgreSQL 19 changes this week cumulative statistics support for reading/writing auxiliary data from/to disk added pg_createsubscriber support for handling existing publications added pg_sync_replication_slots() retry logic added PostgreSQL 18 articles PostgreSQL 18 Asynchronous Disk I/O - Deep Dive Into Implementation (2025-12-15) - Josef Machytka / Credativ

more...

Not a Backup Replacement: What PostgreSQL Instant Recovery Actually Solves

Posted by Zhang Chen on 2026-01-09 at 00:00

When people first hear "instant recovery," they often assume it is a replacement for backup—or worse, a risky shortcut only experts should attempt. But the truth is exactly the opposite. Instant recovery does not challenge any design boundary of PostgreSQL. It simply rejects a premise the kernel never promised: "the database must be in a bootable state." This article explains, from the perspective of PostgreSQL internals, why instant recovery is a capability that was always meant to exist, how it differs fundamentally from backup recovery, and why time itself is often the scarcest resource at an incident scene.

Latest Blog Posts

Posted by Floor Drees in EDB on 2026-01-22 at 12:35

Posted by Chao Li in Highgo Software on 2026-01-21 at 08:53

The Problem: “Inconsistent” Is Not the Same as “Undocumented”

How I Evaluated Each ALTER TABLE Sub-command

Four evaluation criteria

Posted by Mark Wong on 2026-01-20 at 21:58

Posted by Elizabeth Garrett Christensen in Crunchy Data on 2026-01-20 at 13:00

Posted by Umair Shahid in Stormatics on 2026-01-20 at 11:01

Start with the real question: are you running a database, or building a database platform?

A neutral comparison model: 6 dimensions that actually matter

Posted by Shinya Kato on 2026-01-20 at 08:55

What Is Table Bloat?

Test Environment

Long-Running Transactions

Setup

Posted by Umut TEKIN in Cybertec on 2026-01-20 at 05:49

Introduction

Requirements

Bootstrapping A Cluster with the pgvector Extension

Posted by Jeremy Schneider on 2026-01-20 at 04:23

The Problem in Brief

Posted by Lætitia AVROT in EDB on 2026-01-20 at 00:00

Posted by Haki Benita on 2026-01-19 at 22:00

Posted by Akhil Reddy Banappagari in HexaCluster on 2026-01-19 at 15:47

Posted by Robert Haas in EDB on 2026-01-19 at 15:29

Posted by Frédéric Yhuel in Dalibo on 2026-01-19 at 07:15

Execution plans

Posted by Robins Tharakan on 2026-01-18 at 11:40

The Problem

The Fix

Benchmarking Methodology

Posted by Henrietta Dombrovskaya on 2026-01-17 at 15:14

Posted by Dave Page in pgEdge on 2026-01-16 at 05:52

The Problem with Traditional Benchmarks

What Is pgEdge Load Generator?

Posted by Robert Haas in EDB on 2026-01-15 at 18:26

Posted by Avi Vallarapu in HexaCluster on 2026-01-14 at 20:54

Posted by Jimmy Angelakos on 2026-01-14 at 15:33

The Agenda

The Talks

Join Us!

Posted by Floor Drees in EDB on 2026-01-14 at 11:32

Posted by Pavlo Golub in Cybertec on 2026-01-14 at 03:00

Posted by Ian Barwick on 2026-01-13 at 23:47

Posted by Laurenz Albe in Cybertec on 2026-01-13 at 06:00

The problem of integer overflow in sequence-generated keys

The classical solution for integer overflow

Posted by Radim Marek on 2026-01-12 at 20:50

The document model temptation

Posted by Jimmy Angelakos on 2026-01-12 at 20:30

Posted by Esther Minano in Xata on 2026-01-12 at 14:00

Posted by Floor Drees in CloudNativePG on 2026-01-12 at 00:00

Posted by Virender Singla on 2026-01-10 at 13:51

Posted by Ian Barwick on 2026-01-09 at 07:03

Posted by Zhang Chen on 2026-01-09 at 00:00

Top posters

Top teams

Feeds

Planet

Contact

How I Evaluated Each `ALTER TABLE` Sub-command