Posts about PGConf

Mini Summit 3: APT Extension Packaging

7 Apr 2025

Orange card with large black text reading “APT Extension Packaging”. Smaller text below reads “Christoph Berg, Debian/Cybertec” and “04.09.2025”. A photo of Christoph looking cooly at the camera appears on the right.

This Wednesday, April 9 at noon America/New_York (16:00 UTC) for Extension Mini Summit #3, where Christoph Berg will take us on a tour of the PostgreSQL Global Development Group’s APT repository with a focus on packaging extensions. For those of us foolish enough to consider building our own binary packaging systems for extensions, this will be an essential session. For everyone else, come be amazed by the sheer volume of extensions readily available from the repository. Browse on over to the Meetup to register for this live video conference.

2025 Postgres Extensions Mini Summit Two

1 Apr 2025

Orange card with large black text reading “Implementing an Extension Search Patch”. Smaller text below reads “Peter Eisentraut, EDB” and “03.26.2025”. A photo of Peter speaking into a mic at a conference appears on the right.

Last Wednesday, March 26, we hosted the second of five virtual Extension Mini-Summits in the lead up to the big one at the Postgres Development Conference (PGConf.dev) on May 13 in Montreal, Canada. Peter Eisentraut gave a very nice presentation on the history, design decisions, and problems solved by “Implementing an Extension Search Path”. That talk, plus another 10-15m of discussion, is now available for your viewing pleasure:

Video
Slides

If you’d like to attend any of the next three Mini-Summits, join the Meetup!

Once again, with many thanks again to Floor Drees for the effort, here’s the transcript from the session.

Introduction

Floor Drees introduced the organizers:

David Wheeler, Principal Architect at Tembo, maintainer of PGXN
Yurii Rashkovskii, Omnigres
Keith Fiske, Crunchy Data
Floor Drees, Principal Program Manager at EDB, PostgreSQL CoCC member, PGDay Lowlands organizer

Peter Eisentraut, contributor to PostgreSQL development since 1999, talked about implementing an extension search path.

The stream and the closed captions available for the recording are supported by PGConf.dev and their gold level sponsors, Google, AWS, Huawei, Microsoft, and EDB.

Implementing an extension search path

Peter: Thank you for having me!

I’m gonna talk about a current project by me and a couple of people I have worked with, and that will hopefully ship with Postgres 18 in a few months.

So, what do I know about extensions? I’m a Postgres core developer, but I’ve developed a few extensions in my time, here’s a list of extensions that I’ve built over the years.

Some of those are experiments, or sort of one-offs. Some of those are actually used in production.

I’ve also contributed to well-known extensions: orafce; and back in the day, pglogical, BDR, and pg_failover_slots, at EDB, and previously 2ndQuadrant. Those are obviously used widely and in important production environments.

I also wrote an extension installation manager called pex at one point. The point of pex was to do it in one shell script, so you don’t have any dependencies. It’s just a shell script, and you can say pex install orafce and it installs it. This was a proof of concept, in a sense, but was actually quite useful sometimes for development, when you just need an extension and you don’t know where to get it.

And then I wrote, even more experimental, a follow-on project called autopex, which is a plugin module that you load into Postgres that automatically installs an extension if you need it. If you call CREATE EXTENSION orafce, for example, and you don’t have it installed, autopex downloads and installs it. Obviously highly insecure and dubious in terms of modern software distribution practice, but it does work: you can just run CREATE EXTENSION, and it just installs it if you don’t have it. That kind of works.

So anyways, so I’ve worked on these various aspects of these over time. If you’re interested in any of these projects, they’re all under my GitHub account.

In the context of this presentation…this was essentially not my idea. People came to me and asked me to work on this, and as it worked out, multiple people came to me with their problems or questions, and then it turned out it was all the same question. These are the problems I was approached about.

The first one is extension management in the Kubernetes environment. we’ll hear about this in a future talk in this series. Gabriele Bartolini from the CloudNativePG project approached me and said that the issue in a Kubernetes environment is that if you launch a Postgres service, you don’t install packages, you have a pre-baked disk image that contains the software that you need. There’s a Postgres server and maybe some backup software in that image, and if you want to install an extension, and the extension is not in that image, you need to rebuild the image with the extension. That’s very inconvenient.

The ideal scenario would be that you have additional disk images for the extensions and you just somehow attach them. I’m hand waving through the Kubernetes terminology, and again, there will be a presentation about that in more detail. But I think the idea is clear: you want to have these immutable disk images that contain your pieces of software, and if you want to install more of them, you just wanna have these disk images augment ’em together, and that doesn’t work at the moment.

Problem number two is: I was approached by a maintainer of the Postgres.app project, a Mac binary distribution for Postgres. It’s a nice, user-friendly binary distribution for Postgres. This is sort of a similar problem: on macOS you have these .app files to distribute software. They’re this sort of weird hybrid between a zip file with files in it and a directory you can look into, so it’s kind of weird. But it’s basically an archive with software in it. And in this case it has Postgres in it and it integrates nicely into your system. But again, if you want to install an extension, that doesn’t work as easily, because you would need to open up that archive and stick the extension in there somehow, or overwrite files.

And there’s also a tie in with the way these packages are signed by Apple, and if you, mess with the files in the package, then the signature becomes invalid. It’s the way it’s been explained to me. I hope this was approximately accurate, but you already get the idea, right? There’s the same problem where you have this base bundle of software that is immutable or that you want to keep immutable and you want to add things to it, which doesn’t work.

And then the third problem I was asked to solve came from the Debian package maintainer, who will also speak later in this presentation series. What he wanted to do was to run the tests of an extension while the package is being built. That makes sense. You wanna run the tests of the software that you’re building the package for in general. But in order to do that, you have to install the extension into the the normal file system location, right? That seems bad. You don’t want to install the software while you’re into the main system while you’re building it. He actually wrote a custom patch to be able to do that, which then my work was inspired by.

Those are the problems I was approached about.

I had some problems I wanted to solve myself based on my experience working with extensions. While I was working on these various extensions over the years, one thing that never worked is that you could never run make check. It wasn’t supported by the PGXS build system. Again, it’s the same issue.

It’s essentially a subset of the Debian problem: you want to run a test of the software before you install it, but Postgres can only load an extension from a fixed location, and so this doesn’t work. It’s very annoying because it makes the software development cycle much more complicated. You always have to then, then run make all, make install, make sure you have a server running, make installcheck. And then you would want to test it against various different server versions. Usually they have to run this in some weird loop. I’ve written custom scripts and stuff all around this, but it’s was never satisfactory. It should just work.

That’s the problem I definitely wanted to solve. The next problem — and these are are all subsets of each other — that if you have Postgres installed from a package, like an RPM package for example, and then you build the extension locally, you have to install the extension into the directory locations that are controlled by your operating system. If you have Postgres under /usr, then the extensions also have to be installed under /usr, whereas you probably want to install them under /usr/local or somewhere else. You want to keep those locally built things separately, but that’s not possible.

And finally — this is a bit more complicated to explain — I’m mainly using macOS at the moment, and the Homebrew package manager is widely used there. But it doesn’t support extensions very well at all. It’s really weird because the way it works is that each package is essentially installed into a separate subdirectory, and then it’s all symlinked together. And that works just fine. You have a bunch of bin directories, and it’s just a bunch of symlinks to different subdirectories and that works, because then you can just swap these things out and upgrade packages quite easily. That’s just a design choice and it’s fine.

But again, if you wanna install an extension, the extension would be its own package — PostGIS, for example — and it would go into its own directory. But that’s not the directory where Postgres would look for it. You would have to install it into the directory structure that belongs to the other package. And that just doesn’t work. It’s just does not fit with that system at all. There are weird hacks at the moment, but it’s not satisfactory. Doesn’t work at all.

It turned out, all of these things have sort of came up over the years and some of these, people have approached me about them, and I realized these are essentially all the same problem. The extension file location is hard-coded to be inside the Postgres installation tree. Here as an example: it’s usually under something like /usr/share/postgresql/extension/, and you can’t install extensions anywhere else. If you want to keep this location managed by the operating system or managed by your package management or in some kind of immutable disk image, you can’t. And so these are essentially all versions of the same problem. So that’s why I got engaged and tried to find a solution that addresses all of ’em.

I had worked on this already before, a long time ago, and then someone broke it along the way. And now I’m fixing it again. If you go way, way back, before extensions as such existed in Postgres in 9.1, when you wanted to install a piece of software that consists of a shared library object and some SQL, you had to install the shared library object into a predetermined location just like you do now. In addition, you had to run that SQL file by hand, basically, like you run psql -f install_orafce.sql or something like that. Extensions made that a little nicer, but it’s the same idea underneath.

In 2001, I realized this problem already and implemented a configuration setting called dynamic_library_path, which allows you to set a different location for your shared library. Then you can say

dynamic_library_path = '/usr/local/my-stuff/something'

And then Postgres would look there. The SQL file just knows where is because you run it manually. You would then run

psql -f /usr/local/my-stuff/something/something.sql

That fixed that problem at the time. And when extensions were implemented, I was essentially not paying attention or, you know, nobody was paying attention. Extension support were a really super nice feature, of course, but it broke this previously-available feature: then you couldn’t install your extensions anywhere you wanted to; you were tied to this specific file system, location, dynamic_library_path still existed: you could still set it somewhere, but you couldn’t really make much use of it. I mean, you could make use of it for things that are not extensions. If you have some kind of plugin module or modules that install hooks, you could still do that. But not for an extension that consist of a set of SQL scripts and a control file and dynamic_library_path.

As I was being approached about these things, I realized that was just the problem and we should just now fix that. The recent history went as follows.

In April, 2024, just about a year ago now, David Wheeler started a hackers thread suggesting Christoph Berg’s Debian patch as a starting point for discussions. Like, “here’s this thing, shouldn’t we do something about this?”

There was, a fair amount of discussion. I was not really involved at the time. This was just after feature freeze,and so I wasn’t paying much attention to it. But the discussion was quite lively and a lot of people pitched in and had their ideas and thoughts about it. And so a lot of important, filtering work was done at that time.

Later, in September, Gabriele, my colleague from EDB who works on CloudNativePG, approached me about this issue and said like: “hey, this is important, we need this to make extensions useful in the Kubernetes environment.” And he said, “can you work, can you work on this?”

I said, “yeah, sure, in a couple months I might have time.” [Laughs]. But it sort of turns out that, at PGConf.EU we had a big brain trust meeting of various people who basically all came and said, “hey, I heard you’re working on extension_control_path, I also need that!”

Gabriele was there, and Tobias Bussmann from Postgres.app was there ,and Christoph, and I was like, yeah, I really need this extension_control_path to make this work. So I made sure to talk to everybody there and, and make sure that, if we did this, would it work for you? And then we kind of had a good idea of how it should work.

In November the first patch was posted and last week it was committed. I think there’s still a little bit of discussion of some details and, we certainly still have some time before the release to fine tune it, but the main work is hopefully done.

This is the commit I made last week. The fact that this presentation was scheduled gave me additional motivation to get it done. I wanna give some credits to people who reviewed it. Obviously David did a lot of reviews and feedback in general. My colleague Matheus, who I think I saw him earlier, he was also here on the call, did help me quite a bit with sort of finishing the patch. And then Gabriele, Marco and Nicolò, who work on CloudNativePG, did a large amount of testing.

They set up a whole sort of sandbox environment making test images for extensions and, simulating the entire process of attaching these to the main image. Again, I’m butchering the terminology, but I’m just trying to explain it in general terms. They did the whole end-to-end testing of what that would then look like with CloudNativePG. And again, that will, I assume, be discussed when Gabriele presents in a few weeks.

These are the stats from the patch

commit 4f7f7b03758

doc/src/sgml/config.sgml                                     |  68 +++++
doc/src/sgml/extend.sgml                                     |  19 +-
doc/src/sgml/ref/create_extension.sgml                       |   6 +-
src/Makefile.global.in                                       |  19 +-
src/backend/commands/extension.c                             | 403 +++++++++++++++++----------
src/backend/utils/fmgr/dfmgr.c                               |  77 +++--
src/backend/utils/misc/guc_tables.c                          |  13 +
src/backend/utils/misc/postgresql.conf.sample                |   1 +
src/include/commands/extension.h                             |   2 +
src/include/fmgr.h                                           |   3 +
src/test/modules/test_extensions/Makefile                    |   1 +
src/test/modules/test_extensions/meson.build                 |   5 +
.../modules/test_extensions/t/001_extension_control_path.pl  |  80 ++++++

the reason I show this is that, it’s not big! What I did is use the same infrastructure and mechanisms that already existed for the dynamic_library_path. That’s the code in that’s in dfmgr there in the middle. That’s where this little path search is implemented9. And then of course, in extension..c there’s some code that’s basically just a bunch of utility functions, like to list all the extensions and list all the versions of all the extensions. Those utility functions exist and they needed to be updated to do the path search. Everything else is pretty straightforward. There’s just a few configuration settings added to the documentation and the sample files and so on. It’s not that much really.

One thing we also did was add tests for this, Down there in test_extensions. We wrote some tests to make sure this works. Well, it’s one thing to make sure it works, but the other thing is if we wanna make changes or we find problems with it, or we wanna develop this further in the future, we have a record of how it works, which is why you write tests. I just wanted to point that out because we didn’t really have that before and it was quite helpful to build confidence that we know how this works.

So how does it work? Let’s say you have your Postgres installation in a standard Linux file system package controlled location. None of the actual packages look like this, I believe, but it’s a good example. You have your stuff under the /usr/bin/, you have the shared libraries in the /usr/lib/something, you have the extension control files and SQL files in the /usr/share/ or something. That’s your base installation. And then you wanna install your extension into some other place to keep these things separate. So you have /usr/local/mystuff/, for example.

Another thing that this patch implemented is that you can now also do this: when you build an extension, you can write make install prefix=something. Before you couldn’t do that, but there was also no point because if you installed it somewhere else, you couldn’t do anything with it there. Now you can load it from somewhere else, but you can also install it there — which obviously are the two important sides of that.

And then you set these two settings: dynamic_library_path is an existing configuration setting, yYou set that to where your lib directory is, and then the extension_control_path is a new setting. The titular setting of this talk, where you tell it where your extension control files are.

There’s these placeholders, $libdir and $system which mean the system location, and then the other locations are your other locations, and it’s separated by colon (and semi-colon on Windows). We had some arguments about what exactly the extension_control_path placeholder should be called and, people continue to have different opinions. What it does is it looks in the list directories for the control file, and then where it finds the control file from there, it loads all the other files.

And there’s a fairly complicated mechanism. There’s obviously the actual SQL files, but there’s also these auxiliary control files, which I didn’t even know that existed. So you can have version specific control files. It’s a fairly complicated system, so we wanted to be clear that what is happening is the, the main control file is searched for in these directories, and then wherever it’s found, that’s where it looks for the other things. You can’t have the control file in one path and then the SQL files in another part of the path; that’s not how it works.

That solves problem number five. Let’s see what problem number five was. I forgot [Chuckles]. This is the basic problem, that you no longer have to install the extensions in the directories that are ostensibly controlled by the operating system or your package manager.

So then how would Debian packaging use this? I got this information from Christoph. He figured out how to do this. He just said, “Oh, I did this, and that’s how it works.” During packaging, the packaging scripts that built it up in packages that you just pass these:

PKGARGS="--pgoption extension_control_path=$PWD/debian/$PACKAGE/usr/share/postgresql/$v/extension:\$system
--pgoption dynamic_library_path=$PWD/debian/$PACKAGE/usr/lib/postgresql/$v/lib:/usr/lib/postgresql/$v/lib"

These options set the control path and the dynamic_library_path and these versions and then it works. This was confirmed that this addresses his problem. He no longer has to carry his custom patch. This solves problem number three.

The question people ask is, “why do we have two?” Or maybe you’ve asked yourself that. Why do we need two settings. We have the dynamic_library_path, we have the extension_control_path. Isn’t that kind of the same thing? Kind of, yes! But in general, it is not guaranteed that these two things are in a in a fixed relative location.

Let’s go back to our fake example. We have the libraries in /usr/lib/postgresql and the SQL and control files in /usr/share/postgresql, for example. Now you could say, why don’t we just set it to /usr? Or, for example, why don’t we just set the path to /usr/local/mystuff and it should figure out the sub directories. That would be nice, but it doesn’t quite work in general because it’s not guaranteed that those are the subdirectories. There could be, for example. lib64, for example, right? Or some other so architecture-specific subdirectory names. Or people can just name them whatever they want. So, this may be marginal, but it is possible. You need to keep in mind that the subdirectory structure is not necessarily fixed.

So we need two settings. The way I thought about this, if you compile C code, you also have two settings. And if you think about it, it’s exactly the same thing. When you compile C code, you always have to do -I and -L: I for the include files, L for the lib files. This is basically the same thing. The include file is also the text file that describes the interfaces and the libraries are the libraries. Again, you need two options, because you can’t just tell the compiler, oh, look for it in /usr/local because the subdirectories could be different. There could be architecture specific lib directories. That’s a common case. You need those two settings. Usually they go in parallel. If somebody has a plan on how to do it simpler, follow up patches are welcome.

But the main point of why this approach was taken is also to get it done in a few months. I started thinking about this, or I was contacted about this in September and I started thinking about it seriously in the October/November timeframe. That’s quite late in the development cycle to start a feature like this, which I thought would be more controversial! People haven’t really complained that this breaks the security of extensions or anything like that. I was a little bit afraid of that.

So I wanted to really base it on an existing facility that we already had, and that’s why I wanted to make sure it works exactly in parallel to the other path that we already have, and that has existed for a long time, and was designed for this exact purpose. That was also the reason why we chose this path of least resistance, perhaps.

This is the solution progress for the six problems that I described initially. The CloudNativePG folks obviously have accompanied this project actively and have already prototyped the integration solution. And, and presumably we will hear about some of that at the meeting on May 7th, where Gabriele will talk about this.

Postgres.app I haven’t been in touch with, but one of the maintainers is here, maybe you can give feedback later. Debian is done as I described, and they will also be at the next meeting, maybe there will be some comment on that.

One thing that’s not fully implemented is the the make check issue. I did send a follow-up patch about that, which was a really quick prototype hack, and people really liked it. I’m slightly tempted to give it a push and try to get it into Postgres 18. This is a work in progress, but it’s, there’s sort of a way forward. The local install problem I said is done.

Homebrew, I haven’t looked into. It’s more complicated, and I’m also not very closely involved in the development of that. I’ll just be an outsider maybe sending patches or suggestions at some point, maybe when the release is closer and, and we’ve settled everything.

I have some random other thoughts here. I’m not actively working on these right now, but I have worked on it in the past and I plan to work on it again. Basically the conversion of all the building to Meson is on my mind, and other people’s mind.

Right now we have two build systems: the make build system and the Meson build system, and all the production packages, as far as I know, are built with make. Eventually we wanna move all of that over to Meson, but we want to test all the extensions and if it still works. As far as I know, it does work; there’s nothing that really needs to be implemented, but we need to go through all the extensions and test them.

Secondly — this is optional; I’m not saying this is a requirement — but you may wish to also build your own extensions with Meson. But that’s in my mind, not a requirement. You can also use cmake or do whatever you want. But there’s been some prototypes of that. Solutions exist if you’re interested.

And to facilitate the second point, there’s been the proposal — which I think was well received, but it just needs to be fully implemented — to provide a pkg-config file to build against the server, and cmake and Meson would work very well with that. Then you can just say here’s a pkg-config file to build against the server. It’s much easier than setting all the directories yourself or extracting them from pg_config. Maybe that’s something coming for the next release cycle.

That’s what I had. So extension_control_path is coming in Postgres 18. What you can do is test and validate that against your use cases and and help integration into the downstream users. Again, if you’re sort of a package or anything like that, you know, you can make use of that. That is all for me.

Thank you!

Questions, comments

Reading the comments where several audience members suggested Peter follows Conference Driven Development he confirmed that that’s definitely a thing.
Someone asked for the “requirements gathering document”. Peter said that that’s just a big word for “just some notes I have”. “It’s not like an actual document. I called it the requirements gathering. That sounds very formal, but it’s just chatting to various people and someone at the next table overheard us talking and it’s like, ‘Hey! I need that too!’”
Christoph: I tried to get this fixed or implemented or something at least once over the last 10 something-ish years, and was basically shot down on grounds of security issues if people mess up their system. And what happens if you set the extension path to something, install an extension, and then set the path to something else and then you can’t upgrade. And all sorts of weird things that people can do with their system in order to break them. Thanks for ignoring all that bullshit and just getting it done! It’s an administrator-level setting and people can do whatever they want with it.

So what I then did is just to implement that patch and, admittedly I never got around to even try to put it upstream. So thanks David for pushing that ahead. It was clear that the Debian version of the patch wasn’t acceptable because it was too limited. It made some assumptions about the direct restructure of Debian packages. So it always included the prefix in the path. The feature that Peter implemented solves my problem. It does solve a lot of more problems, so thanks for that.
Peter: Testing all extensions. What we’ve talked about is doing this through the Debian packaging system because the idea was to maybe make a separate branch or a separate sub-repository of some sort, switch it to build Meson, and rebuild all the extension packages and see what happens. I guess that’s how far we’ve come. I doesn’t actually mean they all work, but I guess that most of them has tests, so we just wanted to test, see if it works.

There are some really subtle problems. Well, the ones I know of have been fixed, but there’s some things that certain compilation options are not substituted into the Makefiles correctly, so then all your extensions are built without any optimizations, for example, without any -O options. I’m not really sure how to detect those automatically, but at least, just rebuild everything once might be an option. Or just do it manually. There are not thousands of extensions. There are not even hundreds that are relevant. There are several dozens, and I think that’s good coverage.
Christoph: I realize that doing it on the packaging side makes sense because we all have these tests running. So I was looking into it. The first time I tried, I stopped once I realized that Meson doesn’t support LLVM yet; and the second time I tried, I just diff-ed the generated Makefiles to see if there’s any difference that looks suspicious. At thus point I should just continue and do compilation run and see what the tests are doing and and stuff.

So my hope would be that I could run diff on the results; the problem is compiling with Postgres with Autoconf once and then with Meson the second time, then see if it has an impact on the extensions compiled. But my idea was that if I’m just running diff on the two compilations and there’s no difference, there’s no point in testing because they’re identical anyway.
Peter Oooh, you want the actual compilation, for the Makefile output to be the same.
Christoph: Yeah. I don’t have to run that test, But the diff was a bit too big to be readable. There was lots of white space noise in there. But there were also some actual changes. Some were not really bad, like9 in some points variables were using a fully qualified path for the make directory or something, and then some points not; but, maybe we can just work on making that difference smaller and then arguing about correctness is easier.
Peter: Yeah, that sounds like a good approach.
Jakob: Maybe I can give some feedback from Postgres.app. So, thank you very much. I think this solves a lot of problems that we have had with extensions over the years, especially because it allows us to separate the extensions and the main Postgres distribution. For Postgres.app we basically have to decide which extensions to include and we can’t offer additional extensions when people ask for them without shipping them for everyone. So that’s a big win.

One question I am wondering about is the use case of people building their own extensions. As far as I understand, you have to provide the prefix/ And one thing I’m wondering whether there is there some way to give a default value for the prefix. Like in pg_config or in something like that, so people who just type make install automatically get some path.
Peter: That might be an interesting follow on. I’m making a note of it. I’m not sure how you’d…
Jakob: I’m just thinking because a big problem is that a lot of people who try things don’t follow the instructions for the specific Postgres. So for example, if we write documentation how to build extensions and people on a completely different system — like people Google stuff and they get instruction — they’ll just try random paths. Right now, if you just type make install, it works on most systems because it just builds into the standard directories.
Peter: Yeah, David puts it like, “should there be a different default extension location?” I think that’s probably not an unreasonable direction. I think that’s something we should maybe think about, once this is stabilized. I think for your Postgres.app use case, it, I think you could probably even implement that yourself with a one or two line patch so that at least, if you install Postgres.app, then somebody tries to build an extension, they get a reasonable location.
David: If I could jump in there, Jakob, my assumption was that Postgres.app would do something like designate the Application Support directory and Preferences in ~/Library as where extensions should be installed. And yeah, there could be some patch to PGXS to put stuff there by default.
Jakob: Yeah, that would be nice!
Peter: Robert asked a big question here. What do we think the security consequences of this patch? Well, one of the premises is that we already have dynamic_library_path, which works exactly the same way, and there haven’t been any concerns about that. Well, maybe there have been concerns, but nothing that was acted on. If you set the path to somewhere where anybody can write stuff, then yeah, that’s not so good. But that’s the same as anything. Certainly there were concerns as I read through the discussion.

I assumed somebody would hav security questions, so I really wanted to base it on this existing mechanism and not invent something completely new. So far nobody has objected to it [Chuckles]. But yeah, of course you can make a mess of it if you go into that extension_control_path = /tmp! That’s probably not good. But don’t do that.
David: That’s I think in part the xz exploit kind of made people more receptive to this patch because we want to reduce the number of patches that packaging maintainers have to maintain.
Peter: Obviously this is something people do. Better we have one solution that people then can use and that we at least we understand, as opposed to everybody going out and figuring out their own complicated solutions.
David: Peter, I think there are still some issues with the behavior of MODULEDIR from PGXS and directory in the control file that this doesn’t quite work with this extension. Do you have some thoughts on how to address those issues?
Peter: For those who are not following: there’s an existing, I guess, rarely used feature that, in the control file, you can specify directory options, which then specifies where other files are located. And this doesn’t work the way you think it should maybe it’s not clear what that should do if you find it in a path somewhere. I guess it’s so rarely used that we might maybe just get rid of it; that was one of the options.

In my mental model of how the C compiler works, it sets an rpath on something. If you set an absolute rpath somewhere and you know it’s not gonna work if you move the thing to a different place in the path. I’m not sure if that’s a good analogy, but it sort of has similar consequences. If you hard-code absolute path, then path search is not gonna work. But yeah, that’s on the list I need to look into.
David: For what it’s worth, I discovered last week that the part of this patch where you’re stripping out $libdir and the extension make file that was in modules, I think? That also needs to be done when you use rpath to install an extension and point to extensions today with Postgres 17. Happy to see that one go.
Christoph: Thanks for fixing that part. I was always wondering why this was broken. The way it was broken. It looked very weird and it turned out it was just broken and not me not understanding it.
David: I think it might have been a documentation oversight back when extensions were added at 9.1 to say this is how you list the modules.

Anyway, this is great! Im super excited for this patch and where it’s going and the promise for stuff in the future. Just from your list of the six issues it addresses, it’s obviously something that covers a variety of pain points. I appreciate you doing that.
Peter: Thank you!

Many thanks and congratulations wrap up this call.

The next Mini-Summit is on April 9, Christoph Berg (Debian, and also Cybertec) will join us to talk about Apt Extension Packaging.

Mini Summit 2: Extension Search Path Patch

24 Mar 2025

This Wednesday, March 26 at noon America/New_York (16:00 UTC), Peter Eisentraut has graciously agreed to give a talk at the Extension Mini Summit #2 on the extension search path patch he recently committed to PostgreSQL. I’m personally stoked for this topic, as freeing extensions from the legacy of a single directory opens up a number of new patterns for packaging, installation, and testing extensions. Hit the Meetup to register for this live video conference, and to brainstorm novel uses for this new feature, expected to debut in PostgreSQL 18.

2025 Postgres Extensions Mini Summit One

24 Mar 2025

Back on March 12, we hosted the first in a series of PostgreSQL Extensions Mini Summits leading up to the Extension Ecosystem Summit at PGConf.dev on May 13. I once again inaugurated the series with a short talk on the State of the Extension Ecosystem. The talk was followed by 15 minutes or so of discussion. Here are the relevant links:

And now, with many thanks to Floor Drees for the effort, the transcript from the session.

Introduction

Floor Drees introduced the organizers:

David Wheeler, Principal Architect at Tembo, maintainer of PGXN
Yurii Rashkovskii, Omnigres
Keith Fiske, Crunchy Data
Floor Drees, Principal Program Manager at EDB, PostgreSQL CoCC member, PGDay Lowlands organizer

David presented a State of the Extension Ecosystem at this first event, and shared some updates from PGXN land.

The stream and the closed captions available for the recording are supported by PGConf.dev and their gold level sponsors, Google, AWS, Huawei, Microsoft, and EDB.

State of the Extensions Ecosystem

So I wanted to give a brief update on the state of the Postgres extension ecosystem, the past, present, and future. Let’s give a brie history; it’s quite long, actually.

There were originally two approaches back in the day. You could use shared preload libraries to have it preload dynamic shareable libraries into the main process. And then you could do pure SQL stuff using, including procedural languages like PL/Perl, PL/Tcl, and such.

And there were a few intrepid early adopters, including PostGIS, BioPostgres, PL/R, PL/Proxy, and pgTAP, who all made it work. Beginning of Postgres 9.1 Dimitri Fontaine added support for explicit support for extensions in the Postgres core itself. The key features included the ability to compile and install extensions. This is again, pure SQL and shared libraries.

There are CREATE, UPDATE, and DROP EXTENSION commands in SQL that you can use to add extensions to a database, upgrade them to new versions and to remove them. And then pg_dump and pg_restore support so that extensions could be considered a single bundle to be backed up and restored with all of their individual objects being included as part of the backup.

Back then, a number of us, myself included, saw this as an opportunity to have the extensibility of Postgres itself be a fundamental part of the community and distribution. I was a long time user of Perl and used CPAN, and I thought we had something like CPAN for Postgres. So, I proposed PGXN, the PostgreSQL Extension Network, back in 2010. The idea was to do distribution of source code. You would register namespaces for your extensions.

There was discovery via a website for search, documentation published, tags to help you find different kinds of objects, and to support installation through a command line interface. The compile and install stuff that Postgres itself provides, using PGXS and Configure.

This is what PGXN looks like today. It was launched in 2011. There’s a command line client, this website, an API an a registry you can upload your extensions to. The most recent one was pg_task a day or so ago.

In the interim, since that came out in 2011/2012, the cloud providers have come into their own with Postgres, but their support for extensions tends to be rather limited. For non-core extension counts, as of yesterday, Azure provides 38 extensions, GCP provides 44 extensions, and AWS 51. These are the third party extensions that don’t come with Postgres and its contrib itself. Meanwhile, PGXN has 420 extensions available to download, compile, build, and install.

A GitHub project that tracks random extensions on the internet, (joelonsql/PostgreSQL-EXTENSIONs.md), which is pretty comprehensive, has almost 1200 extensions listed. So the question is why is the support not more broad? Why aren’t there a thousand extensions available in every one of these systems?

Rthis has been a fairly common question that’s come up in the last couple years. A number of new projects have tired to fill in the gaps. One is Trusted Language Extensions. They wanted to make it easier to distribute extensions without needing dynamic shared libraries by adding additional features in the database itself.

The idea was to empower app developers to make it easy to install extensions via SQL functions rather than having to access the file system of the database server system itself. It can be portable, so there’s no compilation required, it hooks into the create extension command transparently, supports custom data types, and there have been plans for foreign data wrappers and background workers. I’m not sure how that’s progressed in the past year. The pg_tle extension itself was created by AWS and Supabase.

Another recent entrant in tooling for extensions is pgrx, which is native Rust extensions in Postgres. You build dynamic shared libraries, but write them in pure Rust. The API for pgrx provides full access to Postgres features, and still provides the developer-friendly tooling that Rust developers are used to. There’s been a lot of community excitement the last couple of years around pgrx, and it remains under active development — version 0.13.0 just came out a week or so ago. It’s sponsored and run out of the PgCentral Foundation.

There have also been a several new registries that have come up to try to fill the gap and make extensions available. They have emphasized different things than PGXN. One was ease of use. So, for example, here pgxman says it should be really easy to install a client in a single command, and then it installs something, and then it downloads and installs a binary version of your an extension.

And then there was platform neutrality. They wanted to do binary distribution and support multiple different platform, to know what binary∑ to install for a given platform. They provide stats. PGXN doesn’t provide any stats, but some of them are list stats like how many downloads we had, how many in the last 180 days.

And curation. Trunk is another binary extension registry, from my employer, Tembo. They do categorization of all the extensions on Trunk, which is at 237 now. Quite a few people have come forward to tells us that they don’t necessarily use Trunk to install extensions, but use them to find them, because the categories are really helpful for people to figure out what sorts of things are even available, and an option to use.

So here’s the State of the Ecosystem as I see it today.

There have been some lost opportunities from the initial excitement around 2010. Extensions remain difficult to find and discover. Some are on PGXN, some are on GitHub, some are on Trunk, some are on GitLab, etc. There’s no like one place to go to find them all.
They remain under-documented and difficult to understand. It takes effort for developers to write documentation for their extensions, and a lot of them aren’t able to. Some of them do write the documentation, but they might be in a format that something like PGXN doesn’t understand.
The maturity of extensions can be difficult to gauge. If you look at that list of 1200 extensions on GitHub, which ones are the good ones? Which ones do people care about? That page in particular show the number of stars for each extension, but that the only metric.
They’re difficult to configure and install. This is something TLE really tried to solve, but the uptake on TLE has not been great so far, and it doesn’t support all the use cases. There are a lot of use cases that need to be able to access the internal APIs of Postgres itself, which means compiling stuff into shared libraries, and writing them in C or Rust or a couple of other compiled languages.

That makes them difficult to configure. You have ask questions lik: Which build system do I use? Do I install the tooling? How do I install it and configure it? What dependencies does it have? Et cetera.
There’s no comprehensive binary packaging. The Postgres community’s own packaging systems for Linux — Apt, and YUM — do a remarkably good job of packaging extensions. They probably have more extensions packaged for those platforms than any of the others. If they have the extension you need and you’re using the PGDG repositories, then this stuff is there. But even those are still like a fraction of all the potential available extensions that are out there.
Dependency management can be pretty painful. It’s difficult to know what you need to install. I was messing around yesterday with the PgSQL HTTP extension, which is a great extension that depends on libcurl. I thought maybe I could build a package that includes libcurl as part of it. But then I realized that libcurl depends on other packages, other dynamic libraries. So I’d have to figure out what all those are to get them all together.

A lot of that goes away if you use a system like apt or yum. But if you, if you don’t, or you just want to install stuff on your Mac or Windows, it’s much more difficult.
Centralized source distribution, we’ve found found, is insufficient. Even if all the extensions were available on PGXN, not everybody has the wherewithal or the expertise to find what they need, download it, compile it, and build it. Moreover, you don’t want to have a compiler on your production system, so you don’t want to be building stuff from source on your production system. So then you have to get to the business of building your own packages, which is a whole thing.

But in this state of the extension ecosystem we see new opportunities too. One I’ve been working on for the past year, which we call “PGXN v2”, is made possible by my employer, Tembo. The idea was to consider the emerging patterns — new registries and new ways of building and releasing and developing extensions — and to figure out the deficiencies, and to engage deeply with the community to work up potential solutions, and to design and implement a new architecture. The idea is to serve the community for the next decade really make a PGXN and its infrastructure the source of record for extensions for Postgres.

In the past year, I did a bunch of design work on it. Here’s a high level architectural view. We’d have a root registry, which is still the source code distribution stuff. There’s a web UX over it that would evolve from the current website. And there’s a command line client that knows how to build extensions from the registry.

But in addition to those three parts, which we have today, we would evolve a couple of additional parts.

One is “interactions”, so that when somebody releases a new extension on PGXN, some notifications could go out through webhooks or some sort of queue so that downstream systems like the packaging systems could know something new has come out and maybe automate building and updating their packages.
There could be “stats and reports”, so we can provide data like how many downloads there are, what binary registries make them available, what kinds of reviews and quality metrics rate them. We can develop these stats and display those on the website.
And, ideally, a “packaging registry” for PGXN to provide binary packages for all the major platforms of all the extensions we can, to simplify the installation of extensions for anybody who needs to use them. For extensions that aren’t available through PGDG or if you’re not using that system and you want to install extensions. Late last year, I was focused on figuring out how t build the packaging system.

Another change that went down in the past year was the Extension Ecosystem Summit itself. This took place at PGConf.Dev last May. The idea was for a community of people to come together to collaborate, examine ongoing work in the extension distribution, examine challenges, identify questions, propose solutions, and agree on directions for execution. Let’s take a look at the topics that we covered last year at the summit.

One was extension metadata, where the topics covered included packaging and discoverability, extension development, compatibility and taxonomies as being important to represent a metadata about extensions — as well as versioning standards. One of the outcomes was an RFC for version two of the PGXN metadata that incorporates a lot of those needs into a new metadata format to describe extensions more broadly.
Another topic was the binary distribution format and what it should look like, if we were to have major, distribution format. We talked about being able to support multiple versions of an extension at one time. There was some talk about the Python Wheel format as a potential precedent for binary distribution of code.

There’s also an idea to distribute extensions through Docker containers, also known as the Open Container Initiative. Versioning came up here, as well. One of the outcomes from this session was another PGXN RFC for binary distribution, which was inspired by Python Wheel among other stuff.

I wanted to give a brief demo build on that format. I hacked some changes into the PGXS Makefile to add a new target, trunk that builds a binary package called a “trunk” and uploads it to an OCI registry for distribution. Here’s what it looks like.
- On my Mac I was compiling my semver extension. Then I go into a Linux container and compile it again for Linux using the make trunk command. The result is two .trunk files, one for Postgres 16 on Darwin and one for Postgres 16 on Linux.
- There are also some JSON files that are annotations specifically for OCI. We have a command where we can push these images to an OCI registry.
- Then we can then use an install command that knows to download and install the version of the build appropriate for this platform (macOS). And then I go into Linux and do the same thing. It also knows, because of the OCI standard, what the platform is, and so it installs the appropriate binary.
Another topic was ABI and API compatibility. There was some talk at the Summit about what is the definition of an ABI and an API and how do we define internal APIs and their use? Maybe there’s some way to categorize APIs in Postgres core for red, green, or in-between, something like that. There was desire to have more hooks available into different parts of the system.

One of the outcomes of this session was that I worked with Peter Eisentraut on some stability guidance for the API and ABI that is now committed in the docs. You can read them now on in the developer docs, they’ll be part of the Postgres 18 release. The idea is that minor version releases should be safe to use with other minor versions. If you compiled your extension against one minor version, it should be perfectly compatible with other minor versions of the same major release.

Interestingly, there was a release earlier this year, like two weeks after Peter committed this, where there was an API break. It’s the first time in like 10 years. Robert Treat and I spent quite a bit of time trying to look for a previous time that happened. I think there was one about 10 years ago, but then this one happened and, notably it broke the Timescale database. The Core Team decided to release a fix just a week later to restore the ABI compatibility.

So it’s clear that even though there’s guidance, you should in general be able to rely on it, and it was a motivating factor for the a new release to fix an ABI break, there are no guarantees.

Another thing that might happen is that I proposed a Google Summer of Code project to build an ABI checker service. Peter [embarrassing forgetfulness and misattributed national identity omitted] Geoghegan POC’d an ABI checker in 2023. The project is to take Peter’s POC and build something that could potentially run on every commit or push to the back branches of the project. Maybe it could be integrated into the build farm so that, if there’s a back-patch to an earlier branch and it turns red, they quickly the ABI was broken. This change could potentially provide a higher level of guarantee — even if they don’t end up using the word “guarantee” about the stability of the ABIs and APIs. I’m hoping this happens; a number of people have asked about it, and at least one person has written an application.
Another topic at the summit last year was including or excluding extensions in core. They’ve talked about when to add something to core, when to remove something from core, whether items in contrib should actually be moved into core itself, and whether to move metadata about extensions into catalog. And once again, support for multiple versions came up; this is a perennial challenge! But I’m not aware of much work on these questions. I’m wondering if it’s time for a revisit,
As a bonus item — this wasn’t a formal topic at the summit last year, but it came up many times in the mini-summits — is the challenge of packaging and lookup. There’s only one path to extensions in SHAREDIR. This creates a number of difficulties. Christoph Berg has a patch for a PGDG and Debian that adds a second directory. This allowed the PGDG stuff to actually run tests against extensions without changing the core installation of the Postgres service itself. Another one is Cloud Native Postgres immutability. If that directory is part of the image, for your CloudNative Postgres, you can’t install extensions into it.

It’s a similar issue, for Postgres.app immutability. Postgres.app is a Mac app, and it’s signed by a certificate provided by Apple. But that means that if you install an extension in its SHAREDIR, it changes the signature of the application and it won’t start. They work around this issue through a number of symlink shenanigans, but these issues could be solved by allowing extension to be installed in multiple locations.

Starting with Christoph’s search path patch and a number of discussions we had at PGConf last year, Peter Eisentraut has been working on a search path patch to the core that would work similar to shared preload libraries, but it’s for finding extension control files. This would allow you to have them in multiple directories and it will find them in path.

Another interesting development in this line has been, the CloudNativePG project has been using that extension search path patch to prototype a new feature coming to Kubernetes that allows one to mount a volume that’s actually another Docker image. If you have your extension distributed as an OCI image, you can specify that it be mounted and installed via your CNPG cluster configuration. That means when CNPG spins up, it puts the extension in the right place. It updates the search path variables and stuff just works.

A lot of the thought about the stuff went into a less formal RFC I wrote up in my blog, rather than on PGXN. The idea is to take these improvements and try to more formally specify the organization of extensions separate from how Postgres organizes shared libraries and shared files.

I said, we’re bringing the Extension Summit back! There will be another Extension Summit hosted our team of organizers, myself, Floor, Keith Fiske from Crunchy Data, and Yurii from Omnigres. That will be on May 13th in the morning at PGConf.dev; we appreciate their support.

The idea of these Mini Summits is to bring up a number of topics of interest. Have somebody come and do a 20 or 40 minute talk about it, and then we can have discussion about implications.

Floor mentioned the schedule, but briefly:

March 12: David Wheeler, PGXN: “State of the Extension Ecosystem”
March 24: Peter Eisentraut, Core Team: “Implementing an Extension Search Path”
April 9: Christoph Berg, Debian: “Apt Extension Packaging”
April 23:
May 7: Gabriele Bartolini, CNPG “Extension Management in CloudNativePG”

So, what are your interests in extensions and how they can be improved. There are a lot of potential topics to talk about at the Summit or at these Mini Summits: development tools, canonical registry, how easy it is to publish, continuous delivery, yada, yada, yada, security scanning — all sorts of stuff that could go into conceiving, designing, developing, distributing extensions for Postgres.

I hoe you all will participate. I appreciate you taking the time to listen to me for half an hour. So I’d like to turn it over to, discussion, if people would like to join in, talk about implications of stuff. Also, we can get to any questions here.

Questions, comments, shout-outs

Floor: David, at one point you talked about, metadata taxonomy. If you can elaborate on that a little bit, that’s Peter’s question.

David: So one that people told me that they found useful was one provided by Trunk. So it has these limited number of categories, so if you’re interested in machine learning stuff, you could go to the machine learning stuff and it shows you what extensions are potentially available. They have 237 extensions on Trunk now.

PGXN itself allows arbitrary tagging of stuff. It builds this little tag cloud. But if I look at this one here, you can see this one has a bunch of tags. These are arbitrary tags that are applied by the author. The current metadata looks like this. It’s just plain JSON, and it has a list of tags. The PGXN Meta v2 RFC has a bunch of examples. It’s an evolution of that META.json, so the idea is to have a classifications that includes tags as before, but also adds categories, which are a limited list that would be controlled by the core [he means “root”] registry:

{
  "classifications": {
    "tags": [
      "testing",
      "pair",
      "parameter"
    ],
    "categories": [
      "Machine Learning"
    ]
  }
}

Announcements

Yurii made a number of announcements, summarizing:

There is a new library that they’ve been developing at Omnigres that allows you to develop Postgres extensions in C++. For people who are interested in developing extensions in C++ and gaining the benefits of that and not having to do all the tedious things that we have to do with C extensions: look for Cppgres. Yurii thinks that within a couple of months it will reach parity with pgrx.

David: So it sounds like it would work more closely to the way PGXS and C works. Whereas pgrx has all these additional Rust crates you have to load and like slow compile times and all these dependencies.

Yurii: This is just like a layer over the C stuff, an evolution of that. It’s essentially a header only library, so it’s a very common thing in the C++ world. So you don’t have to build anything and you just include a file. And in fact the way I use it, I amalgamate all the header files that we have into one. Whenever I include it in the project, I just copy the amalgamation and it’s just one file. You don’t have any other build chain associated yet. It is C++ 20, which some people consider new, but by the time it’s mature it’s already five years old and most compilers support it. They have decent support of C++ 20 with a few exclusions, but those are relatively minor. So for that reason, it’s not C++ 23, for example, because it’s not very well supported across compilers, but C++ 20 is.
Yurii is giving a talk about PostgresPM at the Postgres Conference in Orlando. He’ll share the slides and recording with this group. The idea behind PostgresPM is that it takes a lot of heuristics, takes the URLs of packages and of extensions and creates packages for different outputs like for Red Hat, for Debian, perhaps for some other formats in the future. It focuses on the idea that a lot of things can be figured out.

For example: do we have a new version? Well, we can look at list of tags in the Git repo. Very commonly that works for say 80 percent of extensions. Do we need a C compiler? We can see whether we have C files. We can figure out a lot of stuff without packagers having to specify that manually every time they have a new extension. And they don’t have to repackage every time there is a new release, because we can detect new releases and try to build.
Yurii is also running an event that, while not affiliated with PGConf.dev, is strategically scheduled to happen one day before PGConf.dev: Postgres Extensions Day. The Call for Speakers is open until April 1st. There’s also an option for people who cannot or would not come to Montreal this year to submit a prerecorded talk. The point of the event is not just to bring people together, but also ti surface content that can be interesting to other people. The event itself is free.

Make sure to join our Meetup group and join us live, March 26, when Peter Eisentraut joins us to talk about implementing an extension search path.

Extension Ecosystem Summit 2025

5 Mar 2025

I’m happy to announce that some PostgreSQL colleagues and have once again organized the Extension Ecosystem Summit at PGConf.dev in Montréal on May 13. Floor Drees, Yurii Rashkovskii, Keith Fiske will be on hand to kick off this unconference session:

Participants will collaborate to learn about and explore the ongoing work on PostgreSQL development and distribution, examine challenges, identify questions, propose solutions, and agree on directions for execution.

Going to PGConf.dev? Select it as an “Additional Option” when you register, or update your registration if you’ve already registered. Hope to see you there!

Extension Ecosystem Mini-Summit 2.0

We are also once again hosting a series of virtual gatherings in the lead-up to the Summit, the Postgres Extension Ecosystem Mini-Summit.

Join us for an hour or so every other Wednesday starting March 12 to hear contributors to a variety of community and commercial extension initiatives outline the problems they want to solve, their attempts to so, challenges discovered along the way, and dreams for an ideal extension ecosystem in the future. Tentative speaker lineup (will post updates as the schedule fills in):

March 12: David Wheeler, PGXN: “State of the Extension Ecosystem”
March 24: Peter Eisentraut, Core Team: “Implementing an Extension Search Path”
April 9: Christoph Berg, Debian: “Apt Extension Packaging”
April 23:
May 7: Gabriele Bartolini, CNPG “Extension Management in CloudNativePG”

Join the meetup for details. These sessions will be recorded and Posted to the PGConf.dev YouTube and we’ll have again detailed transcripts. Many thanks to my co-organizers Floor Drees and Yurii Rashkovskii, as well as the PGConf.dev organizers for making this all happen!

PGConf & Extension Ecosystem Summit EU 2024

29 Oct 2024

Last week I MCed the first Extension Ecosystem Summit EU and attended my first at PGConf EU in Athens, Greece. Despite my former career as an archaeologist — with a focus on Mediterranean cultures, no less! — this was my first visit to Greece. My favorite moment was the evening after the Summit, when I cut out of a networking shindig to walk to Pláka and then circumnavigate the Acropolis. I mean just look at this place!

Nighttime photo of the Acropolis of Athens — The Acropolis of Athens on the evening of October 22, 2024. © 2024 David E. Wheeler

Highlight of the trip for sure. But the Summit and conference were terrific, as well.

Extension Ecosystem Summit

Floor Drees kindly organized The Extension Ecosystem Summit EU, the follow-up to the PGConf.dev original. While the Vancouver Summit focused on developers, we tailored this iteration to users. I started the gathering with a condensed version of my POSETTE talk, “State of the Postgres Extension Ecosystem”, but updated with a Trunk OCI Distribution demo. Links:

We then moved into a lightning round of 10 minute introductions to a variety of extensions:

Jimmy Angelakos showed off pg_statviz, a “minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics”.
Adam Hendel gave a brief history of pgmq, is a “lightweight message queue like AWS SQS and RSMQ, but on Postgres.”
Gülçin Yıldırım Jelínek introduced pgzx, “a library for developing PostgreSQL extensions written in Zig.”
James Sewell talked about pgvectorscale and pgai for managing machine training workloads on Postgres.
Alastair Turner described pg_tde, an extension that provides transparent data encryption for Postgres.

Quite the whirlwind! There followed open discussion, in which each maintainer went to a corner to talk to attendees about contributing to their extensions. Details to come in a more thorough writeup on the Tembo blog, but I personally enjoyed some fascinating discussions about extension distribution challenges.

PGConf.eu

Following the Summit, I attended several thought-provoking and provocative presentations at PGConf.eu, which took place at the same hotel, conveniently enough.

Floor Drees speaking at a podium, next to a slide reading “Why Postgres?” — Floor Drees speaking at PGConf.eu 2024. © 2024 David E. Wheeler

In the conference keynote, Stacey Haysler explained the The PostgreSQL License Fee. I’m pleased to say that my employer “pays” license fee!
Andres Freund (yes, that one) summarized NUMA vs PostgreSQL, covering some of the issues and opportunities for optimization of PostgreSQL on servers using the NUMA multi-processor memory architecture.
Heikki Linnakangas offered an overview of The Wire Protocol, that bit of PostgreSQL technology that lets clients talk to PostgreSQL.
Ophir Lojkine and Thomas Guillemard showed how a pretty sweet framework, SQLPage, lets archaeologists like Thomas write complete web applications in pure SQL, enabling rapid data entry while in the field.
Tomas Vondra’s Performance Archaeology dug into the history of PostgreSQL improvements, mapping their compounding returns over time.
Floor Drees (photo above) talked about Supporting extensions, but really now, making a strong case for the need to organize infrastructure to support the broader extension ecosystem.

There were many more talks, but clearly I tend to be drawn to the most technical, core-oriented topics. And also archaeology.

Museums

Speaking of which, I made time to visit two museums while in Athens. First up was the National Archaeological Museum of Athens, where I was delighted to explore the biggest collection of Mycenaean artifacts I’ve ever seen, including massive collections from the excavations of Heinrich Schliemann. So much great Bronze Age stuff here. I mean, just look at this absolute unit:

Photo of a Mycenaean Krater featuring a horse-drawn chariot — From the museum description: “Fragment of a krater depicting a chariot with two occupants. A male figure holding a staff walks in front of the chariot. Much of the Mycenaean Pictorial Style pottery (14th-12th centuries BC) with representations of humans, chariots, horses and bulls on large kraters, was produced at Berbati in the Argolid and exported to Cyprus, where it was widely imitated. Birds, fish, wild goats or imaginary creatures (i.e. sphinxes) occur on other types of vessels, such as jugs and stirrup jars. Usually only fragments of these vases survive in mainland Greece from settlement contexts. In Cyprus, however, complete vases are preserved, placed as grave gifts in tombs.” © Photo 2024 David E. Wheeler

The animal decorations on Mycenaean and Akrotiri pottery is simply delightful. I also enjoyed the Hellenistic stuff, and seeing the famed Antikythera Mechanism filled my nerd heart with joy. A good 3 hours poking around; I’ll have to go back and spend a few days there sometime. Thanks to my pal Evan Stanton for gamely wandering around this fantastic museum with me.

Immediately after the PGConf.eu closing session, I dashed off to the Acropolis Museum, which stays open till 10 on Fridays. Built in 2009, this modern concrete-and-glass building exhibits several millennia of artifacts and sculpture exclusively excavated from the Acropolis or preserved from its building façades. No photography allowed, alas, but I snapped this photo looking out on the Acropolis from the top floor.

Photo of the Acropolis as viewed from inside the Acropolis Museum. — The Acropolis as viewed from inside the Acropolis Museum. Friezes preserved from the Parthenon inside the museum reflect in the glass, as does, yes, your humble photographer. © 2024 David E. Wheeler

I was struck by the beauty and effectiveness of the displays. It easily puts the lie to the assertion that the Elgin Marbles must remain in the British Museum to protect them. I saw quite a few references to the stolen sculptures, particularly empty spots and artfully sloppy casts from the originals, but the building itself makes the strongest case that the marbles should be returned.

But even without them there remains a ton of beautiful sculpture to see. Highly recommended!

Back to Work

Now that my sojourn in Athens has ended, I’m afraid I must return to work. I mean, the event was work, too; I talked to a slew of people about a number of projects in flight. More on those soon.

⛰️ Postgres Ecosystem Summit EU

9 Oct 2024

Given the success of the Extension Ecosystem Summit at PGConf.dev back in May, my colleague Floor Drees has organized a sequel, the Extension Ecosystem Summit EU on Tuesday, October 22, at the Divani Caravel Hotel in Athens. That’s “Day 0” at the same hotel as PGConf.eu. Tembo, Percona, Xata, and Timescale co-sponsor.

While the May event took the form of an open-space technology (OST)-style unconference aimed at extension developers, the EU event aims to inform an audience of Postgres users about the history and some exemplary use cases for extensions. From the invite:

Join us for a gathering to explore the current state and future of Postgres extension development, packaging, and distribution. Bring your skills and your devices and start contributing to tooling underpinning many large Postgres installations.

Jimmy Angelakos - pg_statviz: pg_statviz is a minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics.

Adam Hendel (Tembo) - pgmq: pgmq is a lightweight message queue. Like AWS SQS and RSMQ but on Postgres. Adam is pgmq’s maintainer since 2023, and will present a journey from pure Rust → pgrx → pl/pgsql.

Alastair Turner (Percona) - pg_tde: pg_tde offers transparent encryption of table contents at rest, through a Table Access Method extension. Percona has developed pg_tde to deliver the benefits of encryption at rest without requiring intrusive changes to the Postgres core.

Gülçin Yıldırım Jelínek (Xata) - pgzx: pgzx is a library for developing PostgreSQL extensions written in Zig.

Mats Kindahl (Timescale) - TimescaleDB (C), [pgvectorscale] (Rust) and pgai (Python): maintaining extensions written in different languages.

I will also deliver the opening remarks, including a brief history of Postgres extensibility. Please join us if you’re in the area or planning to attend PGConf.eu. See you there!

PGConf.dev 2024

25 Jun 2024

In addition to the afore-blogged Extension Summit, I also attended a slew of the regular PGConf.dev sessions, gave a talk on the future of the extension ecosystem, socialized with extension authors and core developers, and joined discussions in a number of unconference sessions. Some notes on selected talks and events:

Sessions

I enjoyed The road to new SQL/JSON features, where Álvaro Herrera gave a brief history of SQL/JSON in Postgres, starting with the JSON type in 9.2 (2012), JSONB in 2014, and SQL standard jsonpath in Postgres 12 (2017). Getting the SQL/JSON syntax finished turned out to be substantially more difficult, thanks to parsing issues. It took many attempts and a couple of reversions before most of the functionality was completed last year and included in Postgres 16. The forthcoming Postgres 17 finishes the work, with the standard fully supported except for “the JSON_TABLE plan param and json simplified accessor.”

It’s a great time to use Postgres for JSON object storage and management.

In Anarchy in the Database, subtitled “A Survey and Evaluation of Database Management System Extensibility”, Abigale Kim described her Master’s thesis work investigating Postgres extension incompatibilities. Installing and running tests for pairs of extensions, she found a number of conflicts and issues, such as a bug when Citus was paired with auto_explain (fixed in May). In all, 17% of pairs failed! Abi also found that 19% of extensions contain code copied from the Postgres core; page_inspect is 75% copied code!

Abi advocates for adding an extension manager into core, with well-defined hooks to manage extension load order and to streamline enabling and disabling extensions. Very interesting research, highlighting the need to think more deeply about how best to enable and empower the extension ecosystem.

Jeff Davis and Jeremy Schneider gave a thorough overview of Collations from A to Z. The problem rose to wide attention about six years ago when an libc upgrade changed a collation, leading to data loss, crashes, and duplicate primary keys. Ideally, sort orders would never change. But humans gotta human, language will evolve, and the order of things will need to be updated. In such situations, one must be aware of the changes and reindex or rebuild all indexes (and replace hot standbys, which can’t be reindexed).

I very much appreciated the context, as the ongoing issue with collations and upgrades has confused me. Should application authors choose collations or should DBAs? The new [builtin] collation provider in PostgresSQL 17 tries tries to bridge the gap by supporting unchanging Unicode code-point collation ordering that’s reasonably meaningful to humans. But I also realize that, for some projects with no need for human sort ordering, the C collations is more than sufficient.

In her keynote, When Hardware and Databases Collide, Margo Seltzer offered a provocation: Could PostgreSQL adopt something like CXL to scale to a virtually infinite pool of memory? Could one build a “complete fabric of CXL switches to turn an entire data center into a database”? I have no idea! It sure sounds like it could enable gigantic in-memory databases.

Tricks from in-memory databases by Andrey Borodin mostly went over my head, but each of the experiments sped things up a few percentage points. Together they might add up to something.

The Making PostgreSQL Hacking More Inclusive panel was terrific, and much-needed. I’m grateful that Amit Langote, Masahiko Sawada, and Melanie Plageman shared their experiences as up-and-coming non-white-male committers. I think the resulting discussion will help drive new inclusion initiatives in the PostgreSQL community, such as session moderator Robert Haas’s recently-announced Mentoring Program for Code Contributors.

Oh, and I gave a talk, The future of the extension ecosystem, in which I expanded on my mini-summit talk to suss out the needs of various members of the extension ecosystem (authors, users, DBAs, industry) and our plans to meet those needs in PGXN v2. Links:

Unconference

I also participated in the Friday Unconference. Abi, Yurii, and I led a discussion on Improving extensions in core. We discussed the need for an ABI stability policy, extension management, smoke testing (including for conflicts between extensions), a coalition to advocate for extensions in core (since launched as the Postgres Extension Developers Coalition), inline extensions, WASM-based extensions, and server installation immutability. Great discussions and a few actionable outcomes, some of which I’ve been working on. More soon in future posts.

In Increase Community Participation, we talked about the challenges for broadening the PostgreSQL contributor community, attracting and retaining contributors, recognizing contributions, and how to address issues of burnout and allow people to “retire”. I joined the discourse on how we could adopt or at least support GitHub workflows, such as pull requests, to encourage more patch review in a familiar environment. Personally, I’ve been creating pull requests in my fork for my patches for this very reason.

We also touched on training and mentoring new contributors (hence the mentoring announcement) and changes to postgresql.org, notably adding dedicated pages for each project governance committee, especially for the Contributors Committee (there’s a Contributors Committee?), as well as information for how to become a contributor and be listed on the contributor page.

Final Thoughts

I attended PGCon from 2009 to 2014, and always enjoyed the commearderie in Ottawa every year. Most people went to the same pub after sessions every night (or for some part of each evening), where random connections and deep technical nerdery would continue into the small hours, both indoors and out. The Black Oak was a highlight of the conference for me, every year.

In the intervening years I got busy with non-Postgres work and scaled back my participation. I finally returned in 2023 (other than a virtual unconference in 2022), and found it much the same, although the Black Oak had closed, and now there were 2-3 where people went, diluting the social pool a bit — though still a highlight.

As the new iteration of the Postgres Developer Conference, PGConf.dev is a worthy successor. Vancouver was a nice city for it, and people bought the same energy as always. I connected with far more people, and more meaningfully, than at any other conference. But other than the reception and dinner on Wednesday, there was no one (or three) place where people tended to aggregate into the wee hours. Or at least I wasn’t aware of it. The end of PGCon is bittersweet for me, but I’m happy to continue to participate in PGCONf.dev.

See you next year!

🏔 Extension Ecosystem Summit 2024

18 Jun 2024

The PostgreSQL Extension Ecosystem Summit took place at PGConf.dev in Vancouver on May 28, 2024 and it was great! Around 35 extension developers, users, and fans gathered for an open-space technology (OST)-style unconference. I opened with a brief presentation (slides) to introduce the Summit Theme:

Extension issues, designs and features

Development, packaging, installation, discovery, docs, etc.

Simplify finding, understanding, and installing

Towards ideal ecosystem of the future

For authors, packagers, DBAs, and users

Lots of problems, challenges, decisions

Which do you care about?

Collaborate, discover, discuss, document

Find answers, make decisions, set directions

Inform the PGXN v2 project

Before the Summit my co-organizers and I had put up large sticky notes with potential topics, and after reviewing the four principles and one law of [OST], we collectively looked them over and various people offered to lead discussions. Others volunteered to take notes and later published them on the community wiki. Here’s our report.

Extension Metadata

Samay Sharma of Tembo took point on this discussion, while David Wagoner of EDB took notes. The wide-ranging discussion among the five participants covered taxonomies, versioning, system dependencies, packaging & discoverability, development & compatibility, and more.

The discoverability topic particularly engaged the participants, as they brainstormed features such as user comments & ratings, usage insights, and test reporting. They settled on the idea of two types of metadata: developer-provided metadata such as external dependencies (software packages, other extensions the extension depends on etc.) and user metadata such as ratings. I’m gratified how closely this hews to the metadata sketch’s proposed packaging (author) and registry (third party) metadata.

Binary Distribution Format

I led this session, while Andreas “ads” Scherbaum took notes. I proposed to my four colleagues an idea I’d been mulling for a couple months for an extension binary distribution format inspired by Python wheel. It simply includes pre-compiled files in subdirectories named for each pg_config directory config. The other half of the idea, inspired by an Álvaro Hernández blog post, is to distribute these packages via OCI — in other words, just like Docker images. The participants agreed it was an interesting idea to investigate.

We spent much of the rest of the time reviewing and trying to understand the inherent difficulty of upgrading binary extensions: there’s a period between when an extension package is upgraded (from Yum, Apt, etc.) and ALTER EXTENSION UPDATE updates it in the database. If the new binary doesn’t work with old versions, it will break (and potentially crash Postgres!) until they update. This can be difficult in, say, a data analytics environment with uses of the extension in multiple databases and functions, and users may not have the bandwidth to ALTER EXTENSION UPDATE any code that depends on the extension.

This issue is best solved by defensive coding of the C library to keep it working for new and old versions of an extension, but this complicates maintenance.

Other topics included the lack of support for multiple versions of extensions at one time (which could solve the upgrade problem), and determining the upgrade/downgrade order of versions, because the Postgres core enforces no version standard.

ABI/API discussion

Yurii Rashkovskii took point on this session while David Christensen took notes. Around 25 attendees participated. The discussion focused in issues of API and ABI compatibility in the Postgres core. Today virtually the entire code base is open for use by extension developers — anything in header files. Some recent research revealed a few potentially-incompatible changes in minor releases of Postgres, leading some to conclude that extensions must be compiled and distributed separately for every minor release. The group brainstormed improvements for this situation. Ideas included:

Spelunking the source to document and categorize APIs for extensions
Documenting color-coded safety classifications for APIs: green, yellow, or red
Designing and providing a better way to register and call hooks (observability, administration, isolation, etc.), rather than the simple functions Postgres offers today
Developing a test farm to regularly build and tests extensions, especially ahead of a core release
And of course creating more hooks, such as custom relation type handling, per-database background workers, a generic node visitor pattern, and better dependency handling

Including/Excluding Extensions in Core

Keith Fiske led the discussion and took notes for this session, along with 10-15 or so attendees. It joined two topics: When should an extension be brought into core and when should a contrib extension be removed from core. The central point was the adoption of new features in core that replace the functionality of and therefore reduce the need for some extensions.

Replacing an extension with core functionality simplifies things for users. However, the existence of an extension might prevent core from ever adding its features. Extensions can undergo faster, independent development cycles without burdening the committers with more code to maintain. This independence encourages more people to develop extensions, and potentially compels core to better support extensions overall (e.g., through better APIs/ABIs).

Contrib extensions currently serve, in part, to ensure that the extension infrastructure itself is regularly tested. Replacing them with core features would reduce the test coverage, although one participant proposed a patch to add such tests to core itself, rather than as part of contrib extensions.

The participants collaborated on a list of contrib extensions to consider merging into core:

amcheck
pageinspect
pg_buffercache
pg_freespacemap
pg_visibility
pg_walinspect
pgstattuple

They also suggested moving extension metadata (SQL scripts and control files) from disk to catalogs and adding support for installing and using multiple versions of an extension at one time (complicated by shared libraries), perhaps by the adoption of more explicit extension namespacing.

Potential core changes for extensions, namespaces, etc.

Yurii Rashkovskii and David Christensen teamed up on this session, as well (notes). 15-20 attendees brainstormed core changes to improve extension development and management. These included:

File organization/layout, such as putting all the files for an extension in a single directory and moving some files to the system catalog.
Provide a registry of “safe” extensions that can be installed without a superuser.
Adding a GUC to configure a second directory for extensions, to enable immutable Postgres images (e.g., Docker, Postgres.app). The attendees consider this a short term fix, but still useful. (Related: I started a pgsql-hackers thread in April for a patch to to just this).
The ability to support multiple versions of an extension at once, via namespacing, came up in this session, as well.
Participants also expressed a desire to support duplicate names through deeper namespacing. Fundamentally, the problem of namespace collision redounds to issues un-relocatable extensions.

Until Next Time

I found it interesting how many topics cropped up multiple times in separate sessions. By my reading most cited topics were:

The need to install and use multiple versions of an extension
A desire for deeper namespacing, in part to allow for multiple versions of an extension
A pretty strong desire for an ABI compatibility policy and clearer understanding of extension-friendly APIs

I expect to put some time into these topics; indeed, I’ve already started a Hackers thread proposing an ABI policy.

I greatly enjoyed the discussions and attention given to a variety of extension-related topics at the Summit. So much enthusiasm and intelligence in one places just makes my day!

I’m thinking maybe we should plan to do it again next year. What do you think? Join the #extensions channel on the Postgres Slack with your ideas!

Mini Summit Six

22 May 2024

Last week, a few members of the community got together for for the sixth and final Postgres Extension Ecosystem Mini-Summit. Follow these links for the video and slides:

Or suffer through my interpolation of YouTube’s auto-generated transcript, interspersed with chat activity, if you are so inclined.

Introduction

I opened the meeting, welcomed everyone, and introduced myself as host. I explained that today I’d give a brief presentation on the list of issues I I’ve dreamed up and jotted down over the last couple mini-summits as possible potential topics to take on at the Summit in Vancouver on May 28th.

Presentation

These are things that I’ve written down as I’ve been thinking through the whole architecture myself, but also that come up in these Summits. I’m thinking that we could get some sense of the topics that we want to actually cover at the summit. There is room for about 45 people, and I assume we’ll break up “unconference style” into four or five working groups. People an move to corners, hallways, or outdoors to discuss specific topics.
Recall the first mini-summit I showed a list of things that of potential topics that might come up as we think through what’s issues in the ecosystem. I left off with the prompt “What’s important to you?” We hope to surface the most important issues to address at the summit and create a hierarchy. To that end, I’ve created this Canva board¹ following Open Space Technology² to set things up, with the rules and an explanation for how it workjs.
I expect one of us (organizers) to give a brief introduction at the start of the summit to outline the principles of Open Space Technology, which are similar to unconferences.
Open Space Technology principles are:
- Whoever comes are the right people
- Whatever happens is the only thing that could happen
- Whenever it starts at the right time (but we start at 2 p.m. and we have only three hours so we’ll try to make the best of it)
- When it’s over it’s over
- And whatever happens is the right place
There is also a “Law of Mobility”. If you start out interested in one topic and attending a session or discussion about one topic, and you decide you want to do something else, you can wander over to another session . Open Space Technology calls these people “bumblebees” who cross-pollinate between topics. “Butterflies” are the people who hover around a particular topic to make it happen.
And “Come to be Surprised” about what will come up.
I’ve split potential topics into topics in Post-its. we might have four or five spaces. Pick a space, pick a session; we have two two-hour-long sessions. I assume we’ll have 15-30 minutes to open the Summit, do intros, and split up the sessions; then have people do an hour on one topic and an hour on a second topic. At the end, we’ll do the readout in which we talk about decisions we came to.
If you’re interested in facilitating any of these topics, simply drag it in and stick your name on it.
First I thought I’d briefly go over the list of topics as I’ve imagined them. I posted the list on Slack a couple weeks ago and added to it as things have come up in the discussions. But I want to give a high level view of what these brief descriptions mean.
This is ad-hoc; I don’t have anything super planned. Please feel free to jump in at any time! I think I’ve turned on “talking permitted” for everybody, or stick your hand up and we’ll be glad to figure out other stuff, especially if you’re thinking of other topics or related things, or if you think things should be merged.
Any questions or thoughts or comments?
I put the topics in broad categories. There’s some crossover, but the the first one I think of is metadata. I’ve thought about metadata a fair bit, and drafted an RFC for the kinds of things to put in an updated metadata standard, like:
- How do you specify third-party dependencies? For example, PostGIS depends on additional libraries; how can those be specified in an ideally platform neutral way within the metadata?
- How to specify the different types of extensions there are? Stephen wrote a blog post last year about this: you have CREATE EXTENSION extensions, LOAD command extensions, background workers, applications, and more. You have things that need shared_preload_libraries and things that don’t. How do we describe those things about an extension within a distribution package?
- Taxonomies have come up a few times. PGXN currently allows extension authors to put an arbitrary number of tags into their META.json file. Maybe in part because of the precedent of the stuff that that I released early on, people mostly put stuff in there to describe it, like “fdw”, or “function” or “JSON”. Some of the newer uh binary distribution packaging systems, in particular Trunk, have a curated list of categories that they assign. so there might be different ways we want to classify stuff.
  
  Another approach is crates.io, which has a canonical list of categories (or “slugs”), that authors can assign. These are handy they group things together in a more useful way, like “these are related to data analytics” or “these are related to Vector search” — as opposed to the descriptive tags PGXN has now. So, what ought that to look like? What kind of controls should we have? And who might want to use it?
- How would we specify system requirements. For example “this package requires only a subset of OSes”, or the version of an OS, or the version of postgres, or CPU features. Steven’s mentioned vector-based ones a few times, but there’s also things like encryption instructions provided by most chips. Or the CPU architecture, like “this supports aarch64 but not amd64.” How should we specify that?
- I covered categorization under taxonomies
- Versioning. I blogged about this a couple months ago. I’m reasonably sure we should just stick to SemVer, but it’s worth bringing up.
Thoughts on metadata, or stuff I’ve left out? This is in addition to the stuff that’s in the META.json spec. It leaves room for overlap with core stuff. How do we create one sort of metadata for everything, that might subsume the control file as well as the metadata spec or trunk.toml?
- Jeremy S in chat: So far this is seeming like a good recap of ground that’s been covered, questions & topics that have been raised. Great to see how broad it’s been
The next category is the source registry. This is thinking through how we should evolve the PGXN root registry for distributing extension source code. There are questions like identity, namespacing, and uniqueness.
- These are broad categories but identity is how do you identify yourself to the system and claim ownership over something.
- What sort of namespacing should we use? Most systems, including PGXN, just use an arbitrary string and you own a string from [first release]. But other registries, like Go, allow you to use domain-based namespacing for packages. This is really nice because it allows a lot more flexibility, such as the ability to switch between different versions or forks.
- Then there’s the level of uniqueness of the namespacing. This is kind of an open question. Another another approach I thought of is that, rather than string that names your extension distribution being unique, it could be your username and the string. That makes it easier when somebody abandoned something and somebody else forks it and has a new username. Then maybe people can switch more easily. To be able to account for and handle that sort of evolution in a way that single string uniqueness makes trickier.
- Distributed versus centralized publishing. I’ve written about this a couple times. I am quite attracted to the Go model where packages are not centrally distributed but are in three or four supported Version Control Systems, and as long as they use SemVers and appropriate tags, anybody can use them. The centralized index just indexes a package release the first time it’s pulled. This is where host names come into play as part of the namespacing. It allows the system to be much more distributed. Now Go caches all of them in a number of different regions, so when you download stuff it goes through the Go stuff. When you say “give me the XYZ package,” it’ll generally give you the cached version, but will fall back on the repositories as well. So there’s still the centralized stuff.
  
  I think there’s a a lot to that and it goes along with the namespacing issue. But there are other ideas at play as well. For example, almost all the other source code distribution systems just use a centralized system: crates.io, CPAN, npm, and all the rest.
  
  And maybe there are other questions to consider, like is there some sort of protocol we should adopt as an abstraction, such as Docker, where Docker is not a centralized repository other than hub.docker.com. Anyone can create a new Docker repository, give it a host name, and then it becomes something that anybody can pull from. It’s much more distributed. So there are a number of ideas to think through.
- Binary packaging and distribution patterns. I have a separate slide that goes into more detail, but there are implications for source code distribution, particularly with the metadata but perhaps other things. We also might want to think through how it might vary from source distribution.
- Federated distribution gets at the Docker idea, or the OCI idea that Alvaro proposed a few weeks ago. Stuff like that.
- What services and tools to improve or build. This goes to the fundamental question of why we’ve had all these new packaging systems pop up in the last year or so. People were saying “there are problems that aren’t solved by PGXN.” How do we as a community collectively decide what are the important bits and what we should build and provide. Features include developer tools, command line clients, search & browse, and discovery.
- Stats, reports, and badging. This is another fundamental problem that some of the emerging registries have tried to to address: How do you find something? How do you know if it’s any good? How do you know who’s responsible for it? How do you know whether there’s some consensus across the community to use it? The topic, then, is what sort of additional metadata could we provide at the registry level to include some hint about these issues. For example, a system to regularly fetch stars and statistical analysis of a GitHub or a Bitbucket project. Or people wanted review sites or the ability to comment on on systems.
  
  There’s also badging, in particular for build and test matrices for extensions that will not only encourage people to better support broad arrays of versions of Postgres and platforms. There could be badges for that. so you can see how well an extension supports various platforms. And any other sort of badging, like quality badging. The idea is a brainstorming of what sorts of things might be useful there, and what what might be best to build first, might be the the low hanging fruit.
Any questions, comments,m thoughts, additional suggestions on the root registry?

Interlude

Steven Miller: So the idea is there are topics on the left and then they get lined up into the schedule? So there are five five different rooms, so horizontally aligned it4ms are at the same time?
David Wheeler (he/him): Correct. These are session one and these are session two.
Jeremy S: I was kind of waiting to jump to that. It seemed like you were just doing a review of all the topics we’ve covered, but I was waiting until till you got through everything to bring that up.
Steven Miller: Oh yeah, good call, good call.
Jeremy S: I have the same kind of question/concern. This is a great list of topics, now what do we want to do with the time in Vancouver? David, do you think we need to go through everything on the list? How do you want to spend the time today?
David Wheeler (he/him): I was trying to do a quick review just so people knew what these words mean. If you all feel like you have a good idea, or you want to add topics of your own, please do!
Jeremy S: Like I commented in the chat, it’s amazing to see how much ground we’ve covered, and it’s good to have a a quick recap. It’s 9:22 right now Pacific time — 22 after the hour wherever you are — I just want to make sure we don’t run out of time going through everything.
David Wheeler (he/him): I agree, I’ll make it work. I can speed up a little. I know I can be verbose about some of this stuff.
David G. Johnson: Unless the ones from India, in which case they have half hour time zone.
David Wheeler (he/him): I was gonna say! [Laughs]

Presentation Continues

Binary packaging. This is the problem that PGXMan and trunk have tried to solve with varying degrees of success. I think it’d be worthwhile for us to think through as a community what, ideally, should a community-provided binary packaging system look like?
- And what’s the format? Do we want to do tarballs, do OCI like Alvaro proposed? Do we want something like RPM or Apt or Python wheels? That’s a that’s actually something I’m super interested to get into. There was a question that came up two weeks ago in Yurii’s presentation. I think Daniele suggested that the Python wheel package format allows you to put dynamic libs into the wheel. That’s pretty interesting and worth looking into as well.
- How we go about building a community-based binary packaging registry? How do we do the build farming, what platforms and architectures and OSes would it support, and what sort of security, trust, and verification? And the centralization: who runs it, who’s responsible for it, how should it work at a high level?
- Philippe Noël in chat: Phil from ParadeDB here (pg_search, pg_analytics, pg_lakehouse) — First minisummit I can attend, glad to be here
Thank for coming, Philippe! Again, interrupt me anytime.
The next topic is developer tooling. Developer tooling today is kind of all over the place. There a PGXN client, there’s the PGXN utils client (which doesn’t compile anymore, as far as I can tell), there’s pgrx stuff, and maybe a few other things. What sorts of tools would be useful for developers who actually develop and build extensions?
- CLIs and APIs can do metadata management, or scaffolding your source code and adding new features through some sort of templating system.
- The packaging and Publishing system based on how we uh ultimately elect to distribute source code, and how we ultimately elect to distribute binary code. How does that get packaged up with the namespacing and all the decisions we made, to be as easy as possible for the developer?
- What build pipelines do we support? today PGXS and pgrx are perhaps the most common, but I’ve seen GNU autoconf configure stuff and stuff that uses Rust or Go or Python-based builds. Do we want to support those? And how do we integrate them with our binary packaging format and where Postgres expects to put stuff?
  
  I think this is an important topic. One of the things I’ve been dealing with as I’ve talked to the people behind Apache Age and a couple other projects is how they put put stuff in /usr/local by default. I suggest that it’d be better if it went where pg_config wants to put it. How do we want to go about integrating those things?
- Tooling for CI/CD workflows to make it as easy as possible to test across a variety of platforms, Postgres versions, and those pipelines.
Kind of a broad Community topic here. This gets to where things are hosted. There’s a Postgres identity service that does Oauth 2; is that something we want to plug into? Is there a desire for the community to provide the infrastructure for the systems or for at least the core systems of PGXN v2? Who would support it? The people who work on the development ideally would also handle the devops, but should work closely with whoever provides the infrastructure to make sure it’s all copacetic. And that there’s a a plan for when something happens. People exit the community for whatever reason; how will systems continue to be maintained? I don’t think there’s a plan today for PGXN.
Another topic is documentation. How do we help engineers figure out how to build extensions; tutorials and references for all the things and all the various details. Do we end up writing a book, or do we just have very specifically-focused tutorials like, “So you want to build a foreign data wrapper; here’s a guide for that.” Or you just need to write a background worker, here’s an example repository to clone. What should those things look like?
- CREATE EXTENSION
- Hooks
- Background workers
- CLI apps/services
- Web apps
- Native apps
This also kind of covers the variety of different kinds of extensions we might want to package and distribute.
Then there’s all the stuff that I filed under “core,” because I think it impacts the core Postgres project and how it may need to evolve or we might want it to evolve over time. One is the second extension directory; there’s a patch pending now, but it’ll probably be deferred until until Postgres 17 ships; it’s on hold while we’re in the freeze. This is a patch that Christoph Berg wrote for the Debian distribution; it allows you to have a second destination directory for your extensions where Postgres knows to find stuff, including shared object libraries. This would make it easier for projects like Postgres.app and for immutable Docker containers to mount a new directory and have all the stuff be there.
I would love to see some sort of more coherent idea of what an extension pack package looks like, where like if I install pgTAP, all of its files are in a single subdirectory that Postgres can access. Right now it’s in package config, and the sharedir and the libder and the docdir — it’s kind spread all over.
Should there be a documentation standard, in the way you have JavaDoc and rustdoc and Godoc, where docs are integrated into the source code, so it’s easy to use, and there’s tooling to build effective documentation. Today, people mostly just write short READMEs and leave it at that, which is not really sufficient for a lot of projects.
There’s the longstanding idea of inline extensions that Dimitri proposed back as far as 2013, something they called “units”. Oracle calls them “packages” or “modules”. Trusted Language Extensions start making a stab at the problem, trying to make something like inline extensions with the tooling we have today. How should that evolve? What sorts of ideas do we want to adapt to make it so that you don’t have to have physical access to the file system to manage your extensions? Where you could do it all over SQL or libpq.
How can we minimize restarts? A lot of extensions require loading DSOs in the shared_preload_libraries config, which requires a cluster restart. How can we minimize that need? There are ways to minimize restarts; it’s just a broad category I threw in.
What Namespacing is there? I touched on this topic when I wrote about Go Namespacing a while ago. My current assumption is, if we decided to support user/extension_string or hostname/user/extension_string namespacing for package and source distribution, Postgres itself still has to stick to a single string. How would we like to see that evolve in the future?
What kind of sandboxing, code signing, security and trust could be built into the system? Part of the reason they’ve resisted having a second extension directory up to now is to have one place where everything was, where the DBA knows where things are, and it’s a lot it’s easier to manage there. But it’s also because otherwise people will put junk in there. Are there ideas we can borrow from other projects or technologies where anything in some directory is sandboxed, And how is it sandboxed? Is it just for a single database or a single user? Do we have some sort of code signing we can build into the system so that Postgres verifies an extension when you install it? What other kinds of security and trust could implement?

This is a high level, future-looking topic that occurred to me, but it comes up especially when I talk to the big cloud vendors.
An idea I had is dynamic module loading. It came up during Jonathan’s talk, where there was a question about how one could use Rust crates in PL/Rust, a trusted language. Well, a DBA has to approve a pre-installed list of crates that’s on the file system where PL/Rust can load them. But what if there was a hook where, in PL/Perl for example, you use Thing and a hook in the Perl use command knows to look in a table that the DBA manages and can load it from there. Just a funky idea I had, a way to get away from the file system and more easily let people, through permissions, be able to load modules in a safe way.
A topic that came up during Yurii’s talk was binary compatibility of minor releases, or some sort of ABI stability. I’d be curious what to bring up with hackers on formalizing something there. Although it has seemed mostly pretty stable over time to me, that doesn’t mean it’s been fully stable. I’d be curious to hear about the exceptions.

So! That’s my quick review. I did the remainder of them in 11 minutes!

Discussion

Jeremy S: Well done.
David Wheeler (he/him): What I’d like to do is send an email to all the people who are registered to come to The Summit in two weeks, as well as all of you, to be able to access this board and put stars or icons or something — stickers which you can access —
Jeremy S: I do feel like there’s something missing from the board. I don’t know that it’s something we would have wanted to put on sooner, but I kind of feel like one of the next steps is just getting down into the trenches and looking at actual extensions, and seeing how a lot of these topics are going to apply once we start looking like at the list. I was looking around a bit.

It’s funny; I see a mailing list thread from a year or two ago where, right after Joel made his big list of 1,000 extensions, he jumped on the hackers list and said, “hey could we stick this somewhere like on the wiki?” And it looks like nobody quite got around to doing anything like tha. But that’s where I was thinking about poking around, maybe maybe starting to work on something like that.

But I think once we start to look at some of the actual extensions, it’ll help us with a lot of these topics, kind of figure out what we’re talking about. Like when you’re when you’re trying to figure out dependencies, once you start to figure out some of the actual extensions where this is a problem and other ones where it’s not, it might help us to have be a lot more specific about the problem that we’re trying to solve. Or whether it’s versioning, which platform something is going to build on, all that kind of stuff. That’s where I was thinking a topic — or maybe a next step or a topic that’s missing, or you were talking about how many extensions even build today. If you go through the extensions on PGXN right now, how many of them even work, at all. So starting to work down that list.
David Wheeler (he/him): So, two thoughts on that. One is: create a sticky with the topic you want and stick it in a place that’s appropriate, or create another category if you think that’s relevant.
Jeremy S: It’s kind of weird, because what I would envision is what I want to do on the wiki — I’ll see if I can start this off today, I have rights to make a Postgres Wiki page — is I want to make a list of extensions, like a table, where down the left is the extensions and across the top is where that extension is distributed today. So just extensions that are already distributed like in multiple places. I’m not talking about the stuff that’s on core, because that’s a given that it’s everywhere. But something like pg_cron or PGAudit, anybody who has extensions probably has them. That gives some sense of the extensions that everybody already packages. Those are obviously really important extensions, because everybody is including them.

And then the next thing I wanted to do was the same thing with the list of those extensions on the left but a column for each of the categories you have here. For, say, PGAudit, for stuff across the top — metadata, registry packaging, developer stuff — for PGAudit are their packaging concerns? For PGAudit, go down the list of registry topics like identity, where’s the where is the source for PGAudit, is the definitive upstream GitLab, isit GitHub, is it git.postgresql.org? I could go right down the list of each of these topics for PGAudit. and then go down the list of all of your topics for pg_hint_plan. That’s another big one; pg_hint_plan is all over the place. Each of your topics I could take and apply to each of the top 10 extensions and there might be different things that rise to the surface for pg_hint_plan than there are for, like, pgvector.
David Wheeler (he/him): That sounds like a worthwhile project to me, and it could be a useful reference for any of these topics. Also a lot of work!
Jeremy S: Well, in another way to like think about Vancouver might be, instead of like splitting people up by these topics — I’m spitballing here, this this might be a terrible idea — but you could take a list of like 20 or 30 important extensions split people up into groups and say, “here’s five extensions for you, now cover all these topics for your five extensions.” You might have one group that’s looking at like pg_hint_plan and pgvector and PGAudit, and then a different group that has pg_cron and whatever else we come up with. That’s just another way you could slice it up.
David Wheeler (he/him): Yeah, I think that you’re thinking about it the inverse the way I’ve been thinking of it. I guess mine is perhaps a little more centralized and top down, and that comes from having worked on PGXN in the past and thinking about what we’d like to build in the future. But there’s no reason it couldn’t be bottom up from those things. I will say, when I was working on the metadata RFC, I did work through an example of some actually really fussy extension — I don’t remember which one it was — or no, I think it was the ML extension.³ I think that could be a really useful exercise.

But the idea the Open Technology Space is that you can create a sticky, make a pitch for it, and have people vote by putting a star or something on them. I’m hoping that, a. we can try to figure out which ones we feel are the most important, but ultimately anybody can grab one of these and say “I want to own this, I’m putting it in session session one, and put your put your name on it. They ca be anything, for the most part.
Jeremy S: Sure. I think I don’t totally grok the Canva board and how that all maps out, but at the end of the day whatever you say we’re doing in Vancouver I’m behind it 100%.
David Wheeler (he/him): I’m trying to make it as open as possible. If there’s something you want to talk about, make a sticky.
Jeremy S: I’ll add a little box. I’m not sure how this maps to what you want to do with the time in Vancouver.
David Wheeler (he/him): Hopefully this will answer the question. First we’ll do an intro and welcome and talk about the topics, give people time to look at them, I want to send it in advance so people can have a sense of it in advance. I know the way they do the the Postgres unconference that’s been the last day of PGCon for years, they have people come and put a sticky or star or some sort of sticker on the topics they like, and then they pick the ones that have the most and and those are the ones they line up in here [the agenda]. But the idea of the Open Technology stuff is a person can decide on whatever topic they want, they can create their sticky, they can put it in the set slot they want and whatever space they want, and —
Jeremy S: Ooooh, I think I get it now. Okay, I didn’t realize that’s what you were doing with the Canva board. Now I get it.
David Wheeler (he/him): Yeah, I was trying to more or less do an unconference thing, but because we only have three hours try to have a solid idea of the topics we want to address are before we get there.
Jeremy S: I don’t know though. Are you hoping a whole bunch of people are going to come in here and like put it — Okay, it took me five or ten minutes to to even realize what you were doing, and I don’t have high hopes that we’ll get 20 people to come in and vote on the Post-it notes in the next seven days.
David Wheeler (he/him): Yeah, maybe we need to… These instructions here are meant to help people understand that and if that needs to be tweaked, let’s do it.
- David G. Johnston in chat: How many people are going to in this summit in Vancouver?
- David G. Johnston in chat: Is the output of a session just discussions or are action items desired?
Steven Miller: I have another question. Are people invited to present at the Summit if they’re not physically present at the Summit? And then same question for viewership
David Wheeler (he/him): I don’t think they are providing remote stuff at the Summit
Steven Miller: okay
David Wheeler (he/him): David, last I heard there were 42 people registered. I think we have space for 45. We can maybe get up to 50 with some standing room, and there’s a surprisingly large number of people (laughs).
- David G. Johnston in chat: So average of 10 in each space?
Jeremy S: Have you gone down the list of names and started to figure out who all these people? Cuz that’s another thing. There might be people who have very little background and just thought “this sounds like an interesting topic.” How those people would contribute and participate would be very different from someone who’s been working with extensions for a long time.
David Wheeler (he/him): David, yeah, and we can add more spaces or whatever if it makes sense, or people can just arbitrarily go to a corner. Because it’s an unconference they can elect to do whatever interests them. I’m just hoping to have like the top six things we think are most important to get to ahead of time.

Jeremy, Melanie sent me the list of participants, and I recognized perhaps a quarter of the names were people who’re pretty involved in the community, and the rest I don’t know at all. so I think it’s going to be all over the map.
Steven Miller: So would it work if somebody wanted to do a presentation, they can. They grab stickies from the left and then you could also duplicate stickies because maybe there’d be some overlap, and then you put them in a session. But there’s basically supposed to be only one name per field, and that’s who’s presenting.
David Wheeler (he/him): You can put however many names on it as you want. Open technology usually says there’s one person who’s facilitating and another person should take notes.
Steven Miller: Okay.
David Wheeler (he/him): But whatever works! The way I’m imagining it is, people say, “Okay I want to talk to other people about make some decisions about, I don’t know, documentation standards.” So they go off to a corner and they talk about it for an hour. There are some notes. And the final half hour we’ll have readouts from those, from whatever was talked about there.
Steven Miller: These are small working sessions really,it’s not like a conference presentation. Okay, got it
David Wheeler (he/him): Yeah. I mean, somebody might come prepared with a brief presentation if they want to set the context. [Laughs] Which is what I was trying to do for the overall thing here. But the idea is these are working sessions, like “here’s the thing we want to look at” and we want to have some recommend commendations, or figure out the parameters, or you have a plan — maybe — at the end of it. My ideal, personally, is that at the at the end of this we have a good idea of what are the most important topics to address earlier on in the process of building out the ecosystem of the future, so we can start planning for how to execute on that from those proposals and decisions. That’s how I’m thinking about it
Steven Miller: Okay, yeah I see.
Jeremy S: This sounds a lot like the CoffeeOps meetups that I’ve been to. They have a similar process where you use physical Post-it notes and vote on topics and then everybody drops off into groups based on what they’re interested in.
David Wheeler (he/him): Yeah it’s probably the same thing, the Open Technology stuff.
Steven Miller: Maybe we should do one field so we kind of get an idea?
David Wheeler (he/him): Sure. Let’s say somebody comes along and there are a bunch of stickers on this one [drops stickers on the sticky labeled “Identity, namespacing, and uniqueness”]. So so we know that it’s something people really want to talk about. So if somebody will take ownership of it, they can control click, select “add your name”, find a slot that makes sense (and we may not use all of these) and drag it there. So “I’m going to take the first session to talk about this.” Then people can put the stickies on it over here [pasties stickers onto the topic sticky in the agenda], so you have some sense of how many people are interested in attending and talking about that topic. But there are no hard and fast rules.

Whether or not they do that, say, “David wants to talk about identity name spacing uniqueness in the core registry,” we’re going to do that in the first session. We’ll be in the northeast corner of the room — I’m going to try to get access to the room earlier in the day so I can have some idea of how it breaks up, and I’ll tweak the the Canva to to add stuff as appropriate.
- David G. Johnston in chat: Same thing multiple times so people don’t miss out on joining their #2 option?
- David G. Johnston in chat: How about #1, #2, #3 as labels instead of just one per person?
Jeremy S: Are you wanting us to put Post-it notes on the agenda now, before we know what’s been voted for?
David Wheeler (he/him): Yep! Especially if there’s some idea you had Jeremy. If there’s stuff you feel is missing or would be a different approach, stick it in here. It may well be not that many people interested in what I’ve come up with but they want to talk about those five extensions.
David Wheeler (he/him): (Reading comment from David Johnson): “One two and three as labels instead of just one per person?” David I’m sorry I don’t follow.
David G. Johnston: So basically like rank choice. If you’re gonna do I core one time and binary packaging one time, and they’re running at the same time, well I want to do both. I want to do core — that’s my first choice — I want to do binary packaging — that’s my second choice. If I had to choose, I’d go to number one. But if you have enough people saying I want to see this, that’s my number two option, you run binary packaging twice, not conflicting with core so you can get more people.
David Wheeler (he/him): I see, have people stick numbers on the topics that most interest in them. Let’s see here… [pokes around the Canva UX, finds stickers with numbers.] There we go. I’ll stick those somewhere that’s reasonable so people can rank them if they want, their top choices.

This is all going to be super arbitrary and unscientific. The way I’ve seen it happen before is people just drop stars on stuff and say, okay this one has four and this one has eight so we definitely want to talk about that one, who’s going to own it, that sort of thing. I think what makes sense is to send this email to all the participants in advance; hopefully people will take a look, have some sense of it, and maybe put a few things on. Then, those of us who are organizing it and will be facilitating on the day, we should meet like a day or two before, go over it, and make some decisions about what we definitely think should be covered, what things are open, and get a little more sense of how we want to run things. Does that make sense?
Jeremy S: Yeah, I think chatting ahead of time would be a good idea. It’ll be interesting to see how the Canva thing goes and what happens with it.
David Wheeler (he/him): It might be a mess! Whatever! But the answer is that whatever happens this is the right place. Whenever it starts is the right time. Whatever happens could only happen here. It’s super arbitrary and free, and we can adapt as much as we want as it goes.
David Wheeler (he/him): I think that’s it. Do you all feels like you have some sense of what we want to do?
Jeremy S: Well not really, but that’s okay! [Laughs]
Steven Miller: Okay, so here’s what we are supposed to do. Are we supposed to go find people who might be interested to present — they will already be in the list of people who are going to Vancouver. Then we talk to them about these Post-its and we say, “would you like to have a small discussion about one of these things. If you are, then put a sticky note on it.” And then we put the sticky notes in the fields, we have a list of names associated with the sticky notes. Like, maybe Yurii is interested in binary distribution, and then maybe David is also interested in that. So there’s like three or four people in each section, and we’re trying to make sure that if you’re interested multiple sections you get to go to everything?
David Wheeler (he/him): Yeah you can float and try to organize things. I put sessions in here assuming people would want to spend an hour, but maybe a topic only takes 15 minutes.
David G. Johnston: Staying on my earlier thought on what people want to see, people who are willing to present and can present on multiple things, if we have a gold star for who’s willing to actually present on this topic. So here’s a topic, I got eight people who want to see it but only one possible presenter. Or I got five possible presenters and three possible viewers. But you have that dynamic of ranked choice for both “I’ll present stuff” or “I’m only a viewer.
David Wheeler (he/him): I think that typically these things are self-organizing. Somebody says, “I want to do this, I will facilitate, and I need a note taker.” But they negotiate amongst themselves about how they want to go about doing it. I don’t think it necessarily has to be formal presentation, and usually these things are not. Usually it’s like somebody saying, “here’s what this means, this is the topic, we’re going to try to cover, these are the decisions we want to make, Go!”
Jeremy S: You’re describing the the the unconference component of PGCon that has been down in the past.
David Wheeler (he/him): More or less, yes
Jeremy S: So should we just come out and say this is a unconference? Then everybody knows what you’re talking about really fast, right?
David Wheeler (he/him): Sure, sure, yeah. I mean —
Jeremy S: We’re just we’re doing the same thing as – yeah.
David Wheeler (he/him): Yeah, I try to capture that here but we can use the word “unconference” for sure. [Edits the Canva to add “an unconference session” to the title.] There we go.
Steven Miller: I imagine there are people who might be interested to present but they just aren’t in this meeting right now. So maybe we need to go out and advertise this to people.
David Wheeler (he/him): Yeah, I want to draft an email to send to all the attendees. Melanie told me we can send an email to everybody who’s registered.
Jeremy S: And to be clear it’s full, right? Nobody new can register at this point?
David Wheeler (he/him): As far as I know, but I’m not sure how hard and fast the rules are. I don’t think any more people can register, but it doesn’t mean other people won’t wander in. People might have registered and then not not come because they’rein the patch the patch session or something.

So I volunteer to draft that email today or by tomorrow and share it with the Slack channel for feedback. Especially if you’re giving me notes to clarify what things mean, because it seems like there are more questions and confusions about how it works than I anticipated — in part because it’s kind of unorganized by design [chuckles].
- Jeremy S in chat: https://wiki.postgresql.org/wiki/PgConUnconferenceFAQ
David Wheeler (he/him): Oh that’s a good thing to include Jeremy. that’s a good call. But to also try to maximize participation of the people who’re planning to be there. It may be that they say, “Oh this sounds interesting,” or whatever, so and I’ll add some different stickers to this for some different meanings, like “I’m interested” or “I want to take ownership of this” or “this is my first, second, third, or fourth choice”. Sound good?
Steven Miller: Yes, it sounds good to me!
David Wheeler (he/him): Thanks Steven.
Jeremy S: Sounds good, yeah.
David Wheeler (he/him): All right, great! Thanks everybody for coming!

Hit the #extensions channel on the Postgres Slack for the link! ↩︎
In the meeting I kept saying “open technology” but meant Open Space Technology 🤦🏻‍♂️. ↩︎
But now I can look it up. It was pgml, for which I mocked up a META.json. ↩︎

Extension Summit Topic Review

13 May 2024

Boy howdy that went fast.

This Wednesday, May 15, the final Postgres extension ecosystem mini-summit will review topics covered in previous Mini-Summits, various Planet PostgreSQL posts, the #extensions channel on the Postgres Slack and the Postgres Discord. Following a brief description of each, we’ll determine how to reduce the list to the most important topics to take on at the Extension Ecosystem Summit at PGConf.dev in Vancouver on May 28. I’ll post a summary later this week along with details for how to participate in the selection process.

In the meantime, here’s the list as of today:

Metadata:
- Third-party dependencies
- Types of extensions
- Taxonomies
- System requirements (OS, version, CPU, etc.)
- Categorization
- Versioning
Registry:
- Identity, namespacing, and uniqueness
- Distributed vs. centralized publishing
- Binary packaging and distribution patterns
- Federated distribution
- Services and tools to improve or build
- Stats, Reports, Badging: (stars, reviews, comments, build & test matrices, etc.)
Packaging:
- Formats (e.g., tarball, OCI, RPM, wheel, etc.)
- Include dynamic libs in binary packaging format? (precedent: Python wheel)
- Build farming
- Platforms, architectures, and OSes
- Security, trust, and verification
Developer:
- Extension developer tools
- Improving the release process
- Build pipelines: Supporting PGXS, prgx, Rust, Go, Python, Ruby, Perl, and more
Community:
- Community integration: identity, infrastructure, and support
- How-Tos, tutorials, documentation for creating, maintaining, and distributing extensions
- Docs/references for different types of extensions: CREATE EXTENSION, hooks, background workers, CLI apps/services, web apps, native apps, etc.
Core:
- Second extension directory (a.k.a. variable installation location, search path)
- Keeping all files in a single directory
- Documentation standard
- Inline extensions: UNITs, PACKAGEs, TLEs, etc.
- Minimizing restarts
- Namespacing
- Sandboxing, code signing, security, trust
- Dynamic module loading (e.g., use Thing in PL/Perl could try to load Thing.pm
- from a table of acceptable libraries maintained by the DBA)
- Binary compatibility of minor releases and/or /ABI stability

Is your favorite topic missing? Join us at the mini-summit or drop suggestions into the #extensions channel on the Postgres Slack.

Mini Summit Five

7 May 2024

The video for Yurii Rashkovskii’s presentation at the fifth Postgres Extension Ecosystem Mini-Summit last week is up. Links:

Here’s my interpolation of YouTube’s auto-generated transcript, interspersed with chat activity.

Introduction

I opened the meeting and introduced Omnigres’s Yurii Rashkovskii.

Presentation

Yurii: Today I’m going to be talking about universally buildable extensions. This is going to be a shorter presentation, but the point of it is to create some ideas, perhaps some takeaways, and actually provoke a conversation during the call. It would be really amazing to explore what others think, so without further ado…
I’m with Omnigres, where we’re building a lot of extensions. Often they push the envelope of what extensions are supposed to do. For example, one of our first extensions is an HTTP server that embeds a web server inside of Postgres. We had to do a lot of unconventional things. We have other extensions uniquely positioned to work both on developer machines and production machines — because we serve the the developers and devops market.
The point of Omnigres is turning Postgres into an application runtime — or an application server — so we really care how extensions get adopted. When we think about application developers, they need to be able to use extensions while they’re developing, not just in production or on some remote server. They need extensions to work on their machine.
The thing is, not everybody is using Linux Other people use macOS and Windows and we have to account for that. There are many interesting problems associated with things like dependencies.
So there’s a very common approach used by those who who try to orchestrate such setups and by some package managers: operating out of container. The idea is that with a can create a stable environment where you bring all the dependencies that your extension would need, and you don’t have to deal with the physical reality of the host machine. Whether it’s a developer machine, CI machine, production machine, you always have the same environment. That’s definitely a very nice property.
However, there are some interesting concerns that we have to be aware when we operate out of a container. One is specifically mapping resources. When you have a container you have to map how many cores are going there, memory, how do we map our volumes (especially on Docker Desktop), how we connect networking, how we pass environment variables.
That means whenever you’re running your application — especially locally, especially in development — you’re always interacting with that environment and you have to set it up. This is particularly problematic with Docker Desktop on macOS and Windows because these are not the same machines. You’re operating out of a virtual machine machine instead of your host machine, and obviously containers are Linux-specific, so it’s always Linux.
What we found is that often times it really makes a lot of sense to test extensions, especially those written in C, on multiple platforms. Because in certain cases bugs, especially critical memory-related bugs, don’t show up on one platform but show up on another. That’s a good way to catch pretty severe bugs.
There are also other interesting, more rare concerns. For example, you cannot access the host GPU through Docker Desktop on macOS or through Colima. If you’re building something that could have use the host GPU that would work on that machine it’s just not accessible. If you’re working something ML-related, that can be an impediment
This also makes me wonder: what are other reasons why we’re using containers. One reason that struck out very prominently was that Postgres always has paths embedded during compile time. That makes it very difficult to ship extensions universally across different installations, different distributions. I wonder if that is one of the bigger reasons why we want to ship Postgres as a Docker container: so that we always have the same path regardless of where where it’s running.
Any questions so far about Docker containers? Also if there’s anybody who is operating a Docker container setup — especially in their development environment — if you have any thoughts, anything to share: what are the primary reasons for you to use a Docker container in your development environment?
- Jeremy S in chat: When you say it’s important to test on multiple platforms, do you mean in containers on multiple platforms, or directly on them?
- Jeremy S in chat: That is - I’m curious if you’ve found issues, for example, with a container on Mac/windows that you wouldn’t have found with just container on linux
Daniele: Probably similarity with the production deployment environments. That’s one. Being free from whatever is installed on your laptop, because maybe I don’t feel like upgrading the system Python version and potentially breaking the entire Ubuntu, whereas in a Docker container you can have whatever version of Python, whatever version of NodeJS or whatever other invasive type of service. I guess these are these are good reasons. These were the motivation that brought me to start developing directly in Docker instead of using the desktop.
Yurii: Especially when you go all the way to to production, do you find container isolation useful to you?
Daniele: Yeah I would say so; I think the problem is more to break isolation when you’re are developing. So just use your editor on your desktop, reload the code, and have a direct feedback in the container. So I guess you have to break one barrier or two to get there. At least from the privilege points of having a Linux on desktop there is a smoother path, because it’s not so radically different being in the container. Maybe for Windows and macOS developers it would be a different experience
Yurii: Yeah, I actually wanted to drill down a little bit on this In my experience, I build a lot on macOS where you have to break through the isolation layers with the container itself and obviously the VM. I’ve found there are often subtle problems that make the experience way less straightforward.
One example I found it that, in certain cases, you’re trying to map a certain port into the container and you already have something running [on that port] on your host machine. Depending on how you map the port — whether you specify or don’t specify the address to bind on — you might not get Docker to complain that this port is actually overridden.
So it can be very frustrating to find the port, I’m trying to connect to it but it’s not connecting to to the right port. There’s just very small intricate details like this, and sometimes I’ve experienced problems like files not perfectly synchronizing into the VM — although that has gotten a little better in the past 2–3 years — but there there were definitely some issues. That’s particularly important for the workflows that we’re doing at Omnigres, where you’re running this entire system — not just the database but your back end. To be able to connect to what’s running inside of the container is paramount to the experience.
Daniele: Can I ask a question about the setup you describe? When you go towards production, are those containers designed to be orchestrated by Kubernetes? Or is there a different environments where you have your Docker containers in a local network, I assume, so different Dockers microservices talking to each other. Are you agnostic from what you run in it, or do you run it on Kubernetes or on Docker Compose or some other form of glue that you you set up yourself, or your company has set up?
- Steven Miller in chat: … container on Mac/windows [versus linux]
- Steven Miller in chat: Have seen with chip specific optimizations like avx512
Yurii: Some of our users are using Docker Compose to run everything together. However, I personally don’t use Docker containers. This is part of the reason why the topic of this presentation is about universally buildable extensions. I try to make sure that all the extensions are easily compilable and easily distributable on any given supported platform. But users do use Docker Compose, it’s quite common.
Does anyone else here have a preference for how to move Docker containers into production or a CI environment?
Nobody? I’ll move on then.
- Steven Miller in chat: Since in docker will run under emulation, but on linux will run with real hardware, so the environment has different instruction set support even though the docker —platform config is the same
- Jeremy S in chat: That makes sense
Yurii: I wanted to show just a little bit of a proof of concept tool that we’ve been working on, on and off for the last year—
David Wheeler (he/him): Yurii, there are a couple comments and questions in chat, I don’t know if saw that
Yurii: I didn’t see that sorry.
Jeremy is saying, “when you say it’s important to test on multiple platforms do you mean in containers on multiple platforms or directly on them?” In that particular instance I meant on multiple platforms, directly.
The other message from Jeremy was, “I’m curious if you found issues for example with a container on Mac or Windows that you wouldn’t have found with just container on Linux?” Yeah I did see some issues depending on the type of memory-related bug. Depending on the system allocator, I was either hitting a problem or not. I was not hitting it on Linux, I believe and it was hidden macOS. I don’t remember the details right now, unfortunately, but that difference was indicative of a bug.
Steven wrote, trying to connect this… “Have * seen chip-specific optimizations for containers?” And, “Docker will run under emulation but on Linux will run with real Hardware.” Yeah that’s an interesting one about ax512. I suppose this relates to the commentary about about GPU support, but this is obviously the other part of supporting specific hardware, chip-specific optimizations That’s an interesting thing to learn; I was not aware of that! Thank you Steven.
Let’s move on. postgres.pm is a pro of concept that I was working on for some time. The idea behind it was both ambitious but also kind of simple: Can we try describing Postgres extensions in such a way that they will be almost magically built on any supported platform?
The idea was to build an expert system of how to build things from a higher level definition. Here’s an example for pgvector:
```
:- package(vector(Version), imports([git_tagged_revision_package(Version)])).
git_repo("https://github.com/pgvector/pgvector").
:- end_package.
```
It’s really tiny! There are only two important things there: the Git tagged revision package and Git repo. There’s nothing else to describe the package.
The way this works is by inferring as much information as possible from what’s available. Because it’s specified as a Git-tagged revision package, it knows that it can download the list of version-shaped revisions — the versions — and it can checkout the code and do further inferences. It infers metadata from META.json if it’s available, so it will know the name of the package, the description, authors, license, and everything else included there.
- David G. Johnston in chat: PG itself has install-check to verify that an installed instance is functioning. What are the conventions/methods that extension authors are using so that a deployed container can be tested at a low level of operation for the installed extensions prior to releasing the image to production?
It automatically infers the build system. For example for C extensions, if it sees that there’s a Makefile and C files, it infers that you need make and a C compiler and it tries to find those on the system: it will try to find cc, gcc, Clang — basically all kinds of things.

*David Wheeler (he/him)() in chat: Feel free to raise hands with questions
Here’s a slightly more involved example for pg_curl. Ah, there was a question from David Johnson. David says, “PG has install-check to verify that installed instance is functioning. What are the conventions methods that extension authors are using so the deployed container can be tested at a low level of operation for the installed extension prior to releasing the image to production?”
I guess the question is about general conventions for how extension authors ensure that the extensions work, but I suppose maybe part of this question is whether that’s also testable in a production environment. David, are you talking about the development environment alone or both?
David G. Johnston: Basically, the pre-release to production. You go in there in development and you cut up an extension and source and then you build your image where you compile it — you compile PG, you compile it, or you deploy packages. But now you have an image, but you’ve never actually tested that image. I can run installcheck on an installed instance of Postgres and know that it’s functioning, but it won’t test my extension. So if I install PostGIS, how do I test that it has been properly installed into my database prior to releasing that image into production?
- Tobias Bussmann in chat: shouldn’t have the extension a make installcheck as well?
Yurii: To my knowledge there’s no absolutely universal method. Of course the PGXS methods are the most standard ones — like installcheck — to to run the tests. In our [Omnigres’s] case, we replaced pg_regress with pg_yregress, another tool that we’ve developed. It allows for more structural tests and tests that test certain things that pg_regress cannot test because of the way it operates.
- David Wheeler (he/him) in chat: https://docs.omnigres.org/pg_yregress/intro/
I can share more about this later if that’s of interest to anybody. So we basically always run pg_yregress on our extensions; it creates a new instance of Postgres — unless told to use a pre-existing instance — and it runs all the tests there as a client. It basically deploys the the extension and runs the set of tests on it.
David G. Johnston: Okay.

Yurii: I guess you know it depends on how you ship it. For example, if you look at the pgrx camp, they have their own tooling for that, as well. I’ve also seen open-source extensions where they could be written in, say, Rust, but still using pg_regress tests to test their behavior. That would often depend on how their build system is integrated in those tests. I guess the really short answer is there’s probably no absolutely Universal method.

David thank you for pasting the link to pg_yregress. If there are ny questions about it, feel free to ask me. Any other thoughts or questions before I finish this slide? Alright will carry on then.

:- package(pg_curl(Version), imports(git_explicit_revision_package(Version))).
:- inherit(requires/1).
git_repo("https://github.com/RekGRpth/pg_curl").
git_revisions([
        '502217c': '2.1.1',
        % ... older versions omitted for now ...
    ]).
requires(when(D := external_dependency(libcurl), version::match(D, '^7'))).
:- end_package.

The difference between this example and the previous one is that here it specifies that there will be an explicit revision map because that project does not happen to have version tags, so they have to be done manually. You can see that in the Git revision specification. But what’s more interesting about this is that it specifies what kind of dependency it needs. In this particular instance it’s libcurl, and the version has to match version 7 — any version 7.
These kinds of requirements, as well as compiler dependencies, make dependencies, and others are always solved by pluggable satisfiers. They look at what’s available depending on the platform — Linux, a particular flavor of Linux, macOS, etc — and picks the right tools to see what’s available. In the future there’s a plan to add features like building these dependencies automatically, but right now it depends on the host system, but in a multi-platform way.
- David Wheeler (he/him) in chat: How does it detect that libcurl is required?
The general idea behind this proof of concept is that we want to specify high level requirements and not how exactly to satisfy them. If you compare this to a Docker file, the Docker file generally tells you exactly what to do step by step: let’s install this package and that package, let copy files, etc. so it becomes a very specific set of instructions.
- Jeremy S in chat: And how does it handle something with different names in different places?
There was a question: “how does it detect that libcurl is required?” There there is this line at the bottom says “requires external dependency libcurl, so that was the definition.”
The other question was “how does it handle something with different names in different places?” I’m not sure I understand this question.
Jeremy S: I can be more spe specific. A dependency like libc is called libc on Debian platforms and it’s called glibc on Enterprise Linux. You talked about available satisfiers like Homebrew, Apt and package config, but what if it has a different name in Homebrew than in Apt or something like? Does it handle that or is that just something you haven’t tackled yet?
Yurii: It doesn’t tackle this right now, but it’s part of the division where it should go. For certain known libraries there’s an easy way to add a mapping that will kick in for a distribution, and otherwise it will be a satisfier for another one. They’re completely pluggable, small satisfiers looking at all the predicates that describe the system underneath.
- David G. Johnston in chat: How is the upcoming move to meson in core influencing or impacting this?
Just for point of reference, this is built on top of Prolog, so it’s like a knowledge base and rules for how to apply on this knowledge to particular requirements.
- Tobias Bussmann in chat: Prolog 👍
- Shaun Thomas in chat: What if there are no satisfiers for the install? If something isn’t in your distro’s repo, how do you know where to find the dependency? And how is precedence handled? If two satisfiers will fulfill a requirement, will the highest version win?
Jeremy S: I remember Devrim talking about, if you read through the [RPM] spec files, what find is all this spaghetti code with #ifdefs and logic branches and in his case is just dealing with differences between Redhat and SUSE. If this is something that we manually put in, we kind of end up in a similar position where it’s on us to create those mappings, it’s on us to maintain those mappings over time — we kind of own it — versus being able to automate some kind of automatic resolution. I don’t know if there is a good automatic way to do it. David had found something that he posted, which I looked at a little bit, but Devrim talked about how much of maintenance overhead it becomes in the long run to constantly have to maintain this which seemed less than ideal.
Yurii: It is less than ideal. For now, I do think that would have to be manual, which is less than ideal. But it could be addressed at least on on a case-by-case basis. Because we don’t really have thousands of extensions yet — in the ecosystem maybe a thousand total — I think David Wheeler would would know best from his observations, and I think he mentioned some numbers in his presentation couple of weeks ago. But basically handling this on on a case-by-case basis where we need this dependency and apparently it’s a different one on a different platform, so let’s address that. But if there can be a method that can at least get us to a certain level of unambiguous resolution automatically or semi-automatically, that would be really great.
- Samay Sharma in chat: +1 on the meson question.
Jeremy S: I think there’s a few more questions in the chat.
Yurii: I’m just looking at them now. “how is the upcoming move to meson and core influencing or impacting this?” I don’t think it’s influencing this particular part in any way that I can think of right now. David, do you have thoughts how it can? I would love to learn.
David G. Johnston: No, I literally just started up a new machine yesterday and decided to build it from meson instead of make and the syntax of the meson file seems similar to this. I just curious if there are any influences there or if it’s just happenstance.
Yurii: Well from from what I can think right now, there’s just general reliance on either implicitly found PG config or explicitly specified PG config. That’s just how you discover Postgres itself. There’s no relation to how Postgres itself was built. The packaging system does not handle say building Postgres itself or providing it so it’s external to this proof of concept.
David G. Johnston: That’s a good separation of concerns, but there’s also the idea that, if core is doing something, we’re going to build extensions against PostgresSQL, if we’re doing things similar to how core is doing them, there’s less of a learning curve and less of everyone doing their own thing and you have 500 different ways of doing testing.
Yurii: That’s a good point. That’s something definitely to reflect on.
I’ll move on to the next question from Sean. “What if there are no satisfiers for the install? If something isn’t in your distro how do you know where to find the dependency?” And “if two satisfiers will fulfill a requirement, will the highest version win?” If there are no satisfiers right now it will just say it’s not solvable. So we fail to do anything. You would have to go and figure that out. It is a proof of concept, it’s not meant to be absolutely feature complete but rather an exploration of how we can describe the the packages and their requirements.
David Wheeler (he/him): I assume the idea is that, as you come upon these you would add more satisfiers.
Yurii: Right, you basically just learn. We learn about this particular need in a particular extension and develop a satisfier for it. The same applies to precedence: it’s a question of further evolution. Right now it just finds whatever is available within the specified range.
If there are no more pressing questions I’ll move to the next slide. I was just mentioning the problem of highly specific recipes versus high-level requirements. Now I want to shift attention to another topic that has been coming up in different conversations: whether to build and ship your extension against minor versions of Postgres.
Different people have different stances in this, and even package managers take different stands on it. Some say, just build against the latest major version of Postgres and others say build extensions against every single minor version. I wanted to research and see what the real answer should be: should we build against minor versions or not?

I’ve done a little bit of experimentation and my answer is “perhaps”, and maybe even “test against different minor versions.” In my exploration of version 16 (and also 15 bu Id didn’t include it) there there are multiple changes between minor versions that can potentially be dangerous. One great example is when you have a new field inserted in the middle of a structure that is available through a header file. That definitely changes the layout of the structure.

 typedef struct BTScanOpaqueData
 {
-    /* these fields are set by _bt_preprocess_keys(): */
+    /* all fields (except arrayStarted) are set by _bt_preprocess_keys(): */
     bool            qual_ok;                /* false if qual can never be satisfied */
+    bool            arrayStarted;     /* Started array keys, but have yet to "reach
+                                                               * past the end" of all arrays? */
     int                     numberOfKeys    /* number of preprocessed scan keys */
 }

In this particular case, for example, will not get number of keys if you’re intending to. I think that change was from 16.0 to 16.1. If you build against 16.0 and then try to run on 16.1, it might not be great.

The other concern that I found is there are new apis appearing in header files between different versions. Some of them are implemented in header files, either as macros or static and line functions. When you’re building against that particular version, you’ll get the particular implementation embedded.
Others are exports of symbols, like in this case, try index open and contain mutable functions after planning, if you’re using any of this. But this means that these symbols are not available on some minor versions and they’re available later on, or vice versa: they may theoretically disappear.
There are also changes in inline behavior. There was a change between 16.0 and 16.1 or 16.2 where an algorithm was changed. Instead of just > 0 there’s now >= 0, and that means that particular behavior will be completely different between these implementations. This is important because it’s coming from a header file, not a source file, so you’re embedding this into your extension.
David Wheeler (he/him) in chat: That looks like a bug fix
Yeah it is a bug fix. But what I’m saying is, if you build your extension against say 16.0m which did not have this bug fix, and then you deploy it on 16.1, then you still have the bug because it’s coming from the header file.
*David Wheeler (he/him): Presumably they suggest that you build from the latest minor release and that’s Backward compatible to the earlier releases.
Yurii: Right and that’s a good middle ground for this particular case. But but of course sometimes when you do a minor upgrade you have to remember that you have to rebuild your extensions against that minor version so you can just easily transfer them yeah.
- Jeremy S in chat: The struct change in a minor is very interesting
*David Wheeler (he/him)Jeremy points out that struct change is pretty interesting.
Yurii: Yeah, it’s interesting because it’s super dangerous! Like if somebody is expecting a different versioned structure, then it can be pretty nasty.
- Shaun Thomas in chat: Yeah. It’s a huge no-no to insert components into the middle of a struct.
Jeremy S: Is that common? I’m really surprised to see that in a minor version. On the other hand, I don’t know that Postgres makes promises about — some of this seems to come down to, when you’re coding in C and you’re coding directly against structures in Postgres, that’s really interesting. That’s — I’m surprised to see that still.
- Steven Miller in chat: In the case of trunk, we would have built against minor versions in the past then upgrade the minor version of postgres without reinstalling the binary source of the extension, so this is an issue
- David G. Johnston in chat: Yeah, either that isn’t a public structure and someone is violating visibility (in which case yes, you should be tracking minor builds)
- Shaun Thomas in chat: I’m extremely shocked that showed up in 16.2.
Yurii: Yeah, I didn’t expect that either, because that’s just a great way to have absolutely undefined behavior. Like if somebody forgot to rebuild their extension against a new minor, then this can be pretty terrible.
But my general answer to all of this unless you’re going really deep into the guts of Postgres, unless you’re doing something very deep in terms query planning, query execution, you’re probably okay? But who knows.
- Jason Petersen in chat: yeah it feels like there’s no stated ABI guarantee across minor updates
- Jason Petersen in chat: other than “maybe we assume people know not to do this"
- David Christensen in chat: yeah ABI break in minor versions seems nasty
Jeremy S: But it’s not just remembering to rebuild your extension. Let’s let’s suppose somebody is just downloading their extensions from the PGDG repo, because there’s a bunch of them there. They’re not compiling anything! They’re they’re downloading an RPM and the extension might be in a different RPM from Postgres and the extension RPMs — I don’t know that there have been any cases with any of the extensions in PGDG, so far, where a particular extension RPM had to have compatibility information at the level of minors.
- Shaun Thomas in chat: There was actually a huge uproar about this a couple year ago because they broke the replication ABI by doing this.
- David G. Johnston in chat: I see many discussions about ABI stability on -hackers so it is a goal.
- Steven Miller in chat: PGDG is the same binaries for each minor version because the postgres package is only major version, right?
Yurii: Yeah, that’s definitely a concern, especially when it comes to the scenario when you rebuild your extensions but just get pre-built packages. It’s starting to leak out of the scope of this presentation, but I thought it was a very interesting topic to bring to everybody’s attention.
- Jason Petersen in chat: “it’s discussed on hackers” isn’t quite the same as “there’s a COMPATIBILITY file in the repo that states a guarantee”
- Jason Petersen in chat: (sorry)
My last item. Going back to how we ship extensions and why do we need complex build systems and packaging. Oftentimes you want your extensions to depend on some library, say OpenSSL or SQLite or whatever, and the default is to bring the shared dependency that would come from different packages on different systems.
What we have found at Omnigres is that it is increasingly simpler to either statically link with your dependencies — and pay the price of larger libraries — but then you have no questions about where it comes from — what what package, which version – you know exactly what which version it is and how it’s getting built. But of course you also have a problem where, if you want to change the version of the dependency it’s harder because it’s statically linked. The question is whether you should be doing that or not, depending on the authors of the extension and their promises for compatibility with particular versions of their dependencies. This one is kind of naive and simple, as in just use static. Sometimes it’s not possible or very difficult to do so, some some libraries don’t have build systems amenable to static library production.
What we found that works pretty nicely is using rpath in your dynamic libraries. You can use special variables — $ORIGIN or @loader_path on Linux or macOS, respectively, to specify that your dependency is literally in the same folder or directory where your extension is. So you can ship your extension with the dependencies alongside, and it will not try to load them immediately from your system but from the same directory. We find this pretty pretty useful.
That’s pretty much it. Just to recap I talked about the multi-platform experience, the pros and cons of containers, inferencing how you build and how you can build extensions with dependencies, static and rpath dependencies, and the problems with PG minor version differences. If anybody has thoughts, questions, or comments I think that would be a great. Thank you.

Discussion

David Wheeler (he/him): Thank you, Yurii, already some good discussion. What else do you all have?
David G. Johnston: PG doesn’t use semantic versioning. They we have a major version and a minor version. The minor versions are new releases, they do change behaviors. There are goals from the hackers to not break things to the extent possible. But they don’t guarantee that this will not change between dot-three and dot-four. When you’re releasing once a year that’s not practical if things are broken, you can’t wait nine months to fix something. Some things you need to fix them in the next update and back-patch.
- Steven Miller in chat: Thank you, this is very useful info
- Jeremy S in chat: Dependency management is hard 🙂 it’s been a topic here for awhile
David G. Johnston: So we don’t have a compatibility file, but we do have goals and if they get broken there’s either a reason for it or someone just missed it. From an extension standpoint, if you want to be absolutely safe but absolutely cost intensive, you want to update every minor release: compile, test, etc. Depending on what your extension is, you can trade off some of that risk for cost savings. That’s just going to be a personal call. The systems that we build should make it easy enough to do releases every “dot” and back-patching. Then the real cost is do you spend the time testing and coding against it to make sure that the stuff works. So our tool should assume releasing extensions on every minor release, not every major release, because that’s the ideal.
- Shaun Thomas in chat: It’s good we’re doing all of this though. It would suck to do so much work and just become another pip spaghetti.
Yurii: That’s exactly what I wanted to bring to everybody’s attention, because there’s still a lot of conversations about this and there was not enough clarity. So that helps a lot.
Jeremy S: Did you say release or did you say build with every Miner? I think I would use the word “build”.
David G. Johnston: Every minor release, the ones that go out to the public. I mean every commit you could update your extension if you wanted. but really the ones that matter are the ones that go public. So, 16.3 or 16.4 comes out, automation would ideally would build your extension against it run your test and see if anything broke. And then deploy the new [???] of your extension against version 16.3. Plus that would be your your release.
Jeremy S: I think there are two things there: There’s rebuilding it — because you can rebuild the same version of the extension and that would pick up if they they added a field in the middle of a struct which is what happened between 16.0 and 16.1, rebuild the same version. Versus: the extension author … what would they be doing? If they they could tag a new version but they’re not actually changing any code I don’t think it is a new release of the extension, because you’re not even changing anything in the extension, you’re just running a new build. It’s just a rebuild.
- David Wheeler (he/him) in chat: It’d be a new binary release of the same version. In RPM it goes from v1.0.1-1 to v1.0.1-2
It reminds me of what Alvaro did in his his OCI blog post, where he said you really have to … Many of us don’t understand how tightly coupled the extensions need to be to the database. And these C extensions that we’re we’re building have risks when we separate them don’t just build everything together.
David G. Johnston: The change there would be metadata. Version four of my extension, I know it works on 16.0 to 16.1. 16.2 broke it, so that’s where it ends and my version 4.1 is known to work on 16.2.
Jeremy S: But there is no difference between version 4 and version 4.1. There’s a difference in the build artifact that your build farm spit out, but there’s no difference in the extension, right?
- Keith Fiske in chat: Still confusing if you don’t bump the release version even with only a library change
- Keith Fiske in chat: How are people supposed to know what library version is running?
David G. Johnston: Right. If the extension still works, then` your metadata would just say, “not only do I work through version 16.2, I now work through 16.3.
Jeremy S: But it goes back to the question: is the version referring to a build artifact, or is the version referring to a version of the code? I typically think of versions as a user of something: a version is the thing. It would be the code of the extension. Now we’re getting all meta; I guess there are arguments to be made both ways on that.
- Jason Petersen in chat: (it’s system-specific)
- Jason Petersen in chat: no one talks in full version numbers, look at an actual debian apt-cache output
David Wheeler (he/him): Other questions? Anybody familiar with the rpath stuff? That seems pretty interesting to me as a potential solution for bundling all the parts of an extension in a single directory — as opposed to what we have now, where it’s scattered around four different directories.
Jason Petersen: I’ve played around with this. I think I was trying to do fault injection, but it was some dynamically loaded library at a different point on the rpath. I’m kind of familiar with the mechanics of it.

I just wanted to ask: In a bigger picture, this talks about building extensions that sort of work everywhere. But the problems being solved are just the duplication across the spec files, the Debian files, etc. You still have to build a different artifact for even the same extension on the same version of Postgres on two different versions of Ubuntu, Right? Am I missing something? It is not an extension that runs everywhere.
Yurii: No, you still have to build against the set of attributes that constitute your target, whether that’s architecture, operating system, flavor. It’s not yet something you can build and just have one binary. I would love to have that, actually! I’ve been pondering a lot about this. There’s an interesting project, not really related to plugins, but if you’ve seen A.P.E. and Cosmopolitan libc, they do portable executables. It’s a very interesting hack that allows you to run binaries on any operating system.
Jason Petersen: I expected that to be kind of “pie in the sky.”
Yurii: It’s more of a work of art.
Jason Petersen: Do you know of other prior art for the rpath? Someone on Mastodon the other day was talking about Ruby — I can’t remember the library, maybe it was ssh — and they were asking, “Do I still have to install this dynamic library?” And they said, “No, we vendor that now; whenever you install this it gets installed within the Ruby structure.” I’m not sure what they’re doing; maybe it’s just a static linking. But I was curious if you were aware of any prior art or other packaging systems where system manages its own dynamic libraries, and use rpath to override the loading of them so we don’t use the system ones and don’t have to conflict with them. Because I think that’s a really good idea! I just was wondering if there’s any sort of prior art.
Daniele: There is an example: Python Wheels binaries us rpath. A wheel is a ZIP file with the C extension and all the depending libraries the with the path modified so that they can refer to each other in the the environment where they’re bundled. There is a tool chain to obtain this packaging — this vendoring — of the system libraries. There are three, actually: one for Unix, one for macOS, one for Windows. But they all more or less achieve the same goal of having libraries where they can find each other in the same directory or in a known directory. So you could take a look at the wheel specification for Python and the implementation. That could be a guideline.
Jason Petersen: Cool.
Yurii: That’s an excellent reference, thank you.
David Wheeler (he/him): More questions?
Jeremy S: Yeah, I have one more. Yurii, the build inferencing was really interesting. A couple things stood out to me. One that you mentioned was that you look for The META.json file. That’s kind of neat, just that it’s acknowledged a useful thing; and a lot of extensions have it and we want to make use of it. I think everybody knows part of the background of this whole series of meetings is — one of the things we’re asking is, how can we improve what’s the next generation of META.json to make all of this better? Maybe I missed this, but what was your high-level takeaway from that whole experience of trying to infer the stuff that wasn’t there, or infer enough information to build something if there isn’t a META.json at all? Do you feel like it worked, that it was successful? That it was an interesting experiment but not really viable long term? How many different extensions did you try and did it work for? Once you put it together, were you ever able to point it at a brand new extension you’d never seen before and actually have it work? Or was it still where you’d try a new extension and have to add a little bit of extra logic to handle that new extension? What’s your takeaway from that experience?
Yurii: The building part is largely unrelated to META.json, that was just primarily the metadata itself. I haven’t used in a lot of extensions because I was looking for different cases — extensions that exhibit slightly different patterns — not a whole ton of them yet. I would say that, so far, this is more of a case-by-case scenario to see for a particular type of or shape of extension what we need to do. But generally, what I found so far that it works pretty nicely for C extensions: it just picks up where all the stuff is, downloads all the necessary versions, allows to discover the new versions — for example you don’t need to update the specification for a package if you have a new release, it will just automatically pick that up rom the list of tags. These these were the current findings. I think overall the direction is promising, just need to continue adjusting the results and see how much further it can be taken and how much more benefit it can bring.
Jeremy S: Thank you.
Yurii: Any other comments or thoughts?
David Wheeler (he/him): Any more questions for Yurii?
David Wheeler (he/him): I think this is a an interesting space for some research between Devrim’s presentation talking about how much effort it is to manually maintain all the extensions in the Yum repository. I’ve been doing some experiments trying to build everything from PGXN, and the success rate is much lower than I’d like. I think there are some interesting challenges to automatically figuring out how things work versus convincing authors to specify in advance.
Jeremy S: Yep. Or taking on that maintenance. Kind of like what a spec file maintainer or a Debian package maintainer is doing.
Yurii: Yeah, precisely.

Wrap Up

David Wheeler (he/him): Thanks, Yurii, for that. I wanted to remind everyone that we have our final Mini-Summit before PGConf on May 15th. That’s two weeks from today at noon Eastern or 4 pm UTC. We’re going to talk about organizing the topics for the Summit itself. I posted a long list of stuff that I’ve extracted from my own brain and lots more topics that I’ve learned in these presentations in the Slack. Please join the community Slack to participate.

The idea is to winnow down the list to a reasonable size. We already are full with about 45 attendees, and we we can maybe have a few more with standing room and some hallway track stuff. We’ll figure that out, but it’s a pretty good size, so I think we’ll be able to take on a good six or maybe eight topics. I’m going to go over them all and we’ll talk about them and try to make some decisions in advance, so when we get there we don’t have to spend the first hour figuring out what we want to, we can just dive in.

And that’s it. Thank you everybody for coming, I really appreciate. We’ll see you next time
- Tobias Bussmann in chat: Thanks for the insights and discussion!
- Jeremy S: Thank you!

Mini Summit: Universally Buildable Extensions

30 Apr 2024

Well that snuck up on me. Tomorrow, May 1 2024, Yurii Rashkovskii of Omnigres will be giving a presentation at the fifth installment of the Postgres extension ecosystem mini-summit. The talk, “Universally buildable extensions: dev to prod”, should provoke some interesting asymmetrical thinking and discussion. I hope to see you there.

Note! If you reserved a spot at a prior mini-summit, you will need to do so again for each subsequent event or get no reminders from Eventbrite. If it’s sold out just email david@ this domain, ping me on Mastodon or via the #extensions channel on the Postgres Slack or the Postgres Discord for a link or, if you’d like a calendar invitation.

Mini Summit Four

25 Apr 2024

My thanks to Jonathan Katz for his presentation, “Trusted Language Extensions for PostgreSQL”, at last week’s Postgres Extension Ecosystem Mini-Summit. As usual I’ve collected the transcript here interspersed with comments from the chat window. First, links!

Video
PDF Slides [TBD]

And now, rivers of text!

Introduction

I opened the meeting and introduced Jonathan Katz.

Presentation

Thank you for having me. I’m very excited to talk about this, and extensions in general. I see a lot of folks here I know and recognize and some new folks or folks I’ve not met in person yet.
Borrowed from the original presentation on TLEs from November of 2022, to level set on why we built it. I know this is a slightly more advanced audience, so some stuff that might seem relatively introductory to some of you, though there is some material on the internals of extensions.
The premise is why we decided to build TLEs, what were the problems we’re trying to solve. Part of it is just understanding Postgres extensions. In general this group is very familiar with Extensions but there are two points I want to hit on.
One of the things that excites me most about Postgres is that, when you look back at Postgres as the Berkeley database project researching how to create an object relational database, an accidental or intentional features is not just that Postgres is an object-relational database, but that Postgres is an extensible database, built from the get-go to be able to add functionality without necessarily having to fork it.
Early on you’d have to Fork the database to add additional functionality, but the beauty of the Postgres design was the ability to keep adding functionality without forking.
It did require making changes to Postgres to further enhance that capability, which we’ll discuss in a bit, but that’s a really powerful concept.
The second point is that there is the large landscape of both open source and Commercial extensions. I think a lot of folks in this group are very familiar with the available open source extensions, but there are entire businesses built on, effectively, commercial extensions on top of Postgres. Again, that’s a really powerful notion!
It’s kind of like the Postgres economy: we created something that’s so special that it’s able to spawn all these different businesses whether it’s building things on top of Postgres or using Postgres as the heart of their business. Extensions have a very large role to to to play in that.
Which gets us to the history of extensions. The Syntax for Postgres extensions has been around for over a decade, since Postgres 9.1. I know there’s folks here well familiar with building extensions prior to that syntax! But we’re very grateful for the extension syntax because, as a recovering application developer, I would say it made things a lot easier.
Instead of having to futz around the file system to figure out where to install everything (wink wink nudge nudge on the topic today), you had a very simple syntax. Once an extension is installed, I can CREATE EXTENSION postgis (or whatever extension) and there you go! I have PostGIS installed.
Again, that’s really cool! Because anything we can do to make it simpler to install and use extensions further drives their adoption, and ultimately makes it even easier to develop and build applications with Postgres and continues to drive that forward.
So what can you build with Postgres, what extensions are available? It’s a whole range of things. For starters, there are extensions that I call “the ones that you take for granted”. If you’re using any monitoring tool you’re likely running and you may not even realize it. pg_stat_statements helps to aggregate statistics as queries execute and bubbles it up to whatever monitoring tool you use. It’s a great tool for performance tuning.
The example I like to give for my personal life was that, back when I was an application developer trying to manage my own databases, I had some query that was blocking my entire logical decoding system, so we weren’t keeping up with transactions. Looking in pg_stat_statements I see a recursive query where I should have had a UNION with SELECT DISTINCT instead of a gigantic query that was amassing so many rows. Fixed it: I had a 100x speed up in the query. Thank you pg_stat_statements!
Side note: I got to say, “hey, I sped up this query by 100x,” even though it was my fault it was slow to begin with.
There are utility functions to help with data types. UID OSSP is very widely used. Newer versions of Postgres have a random UUID function, but previously, anytime you needed a UUI you would CREATE EXTENSION "uuid-ossp".
The other fun thing about this extension is that developers learned about SQL identifiers that require double quotes to install the extension.
I think I saw Keith on here today. pg_partman! What’s really cool about pg_partman too is that a lot of it is PL/pgSQL. This PL/pgSQL code provides a way to manage partitions across all your tables in your entire database. Again, that’s really powerful because Postgres has added declarative partitioning in version 10, but pg_partman is still incredibly useful because there are all sorts of aspects to partition management not supported in Postgres today. This is another example where Postgres provides the core functionality and you can use the extension to package additional functionality that makes it easier for use.
Foreign data wrappers.Postgres has a whole interface to interface with other databases. It could be other Postgres databases, other relational databases, non-relational databases, file systems, etc. Postgres has a C-level interface that allows you to design the calls to optimally use all these different databases. Again, packaged up as an extension, being able to add things as we go on.
I’ll wait till the end to answer questions this will be a relatively short presentation, so we should have some time for discussion
Last but not least, a little bit on PostGIS. I think this is one of the most powerful aspects of Postgres. PostGIS itself is incredibly powerful because you have a geospatial database that happens to be Postgres underneath. A lot of heavy PostGIS users don’t even realize they’re using Postgres! They think they’re using PostGIS. That is really the power of Postgres extensibility in a nutshell: It looks like you have like a brand new, domain-specific database – and yet underneath it it’s just boring old Postgres doing all the things you expect a database to do. That is also a very powerful notion/
- Tobias Bussmann in chat: Many PostGIS users don’t know they are using PostgreSQL 🤣
- nils in chat: 🎉
To add a coda to it, you have pgRouting, an extension built on top of PostGIS, which is built on top of Postgres. So you have a cascading effect of extensions building on top of extensions building on top of Postgres.
So we’re supposed to talk about trusted language extensions. To really to TLEs it’s important to understand the anatomy of an extension. There are certain things that you need in order to have an extension: You need a control file, which is kind of like your packaging manifest. it tells you what’s in the extension. It goes into a directory.
You have SQL files, which effectively map out the objects that you’re going to have in your database. If you have functions that need to map to a C function or if you need to create a table access method in order to build your new your new storage layer, the SQL files are the building block.
If you have C- specific code, it goes in a library file or a shared object file that gets stored in a library directory.
It’s a very simple layout. What’s cool is if you go to create an extension, there’s a particular pattern that it forms: You need to know that when you have an extension, the information in the control file goes into the Postgres catalog. Then, if there are any functions or objects or whatever in that extension, we need to install the object itself, but we also need to make sure that there’s a dependency on the extension. That way, if we need to remove the extension or upgrade it, we know all the objects that we’ve collected.
So why this “extension building 101”? This gets at the heart of why we built TLes. Because the first thing to notice is that we install extensions directly on the file system. There are a lot of domains where that’s restricted — certainly managed service providers.
I worked a lot with containers previously, and a container is effectively an immutable file system: once you have things installed on it it’s installed. You typically don’t want to give your app developers access to your production systems, because your app developers are going to install everything under the sun on them, myself included. You certainly want to be able to restrict certain domains.
But we also don’t want to inhibit what developers want to build. We want to make it as easy as possible for them to manage their code and be able to install in different environments. That gets to another point beyond restricting the file system. Some extensions may not be universally available, depending on where you’re running them. You might be running an extension on an on-premise environment that might not work in a managed service provider. Or different managed service providers have different extensions available.
The final bit — and something that I’ve definitely personally experienced — is that, between major versions of Postgres, the API or the ABI will change. These are the interface points at the C layer. When they change it can break extensions. Ask any of the package managers how much they need to nag people to upgrade their extensions: they always want to make sure that they’re keeping it up-to-date, and ensuring that it’s compatible.
But this can also lead to other issues, because as the end user, this makes it challenging to perform major version upgrades — particularly if I’m dependent on an extension that hasn’t been updated to work with the latest version of Postgres. A subtle line of code change in Postgres could end up breaking an extension.
Quick story: that actually happened to me while I was managing pg_tle. I think it was a change in Postgres 15.1 actually broke something in the pg_tle extension. I had to to fix it. I think that’s part of the point: if you’re able to use a language that’s on top of C, and you have the C layer abstracted away, in theory it could make it easier to perform major version upgrades.
That leads into TLE.
I think there were two Notions behind trusted language extensions, or TLEs, when they were initially designed. The first is giving power to app developers to be able to build extensions. It’s actually one thing I noticed as I started making the journey from app developer to quasi-DBA to, ultimately, product manager not writing code. Part of that Journey was the power of putting some of my business logic in the database.
There’s always this tension between how much business logic to put in application code versus the database. But there are certain things that were just clear wins for me when they wer in the database. The first was a lot of search functions I wrote where filtering data down to a very small set in the database and returning to the application would save on network time, processing time on the app side, etc. There were some very clear wins by encapsulating them in functions.
But also solving things that were just much easier to solve in the database. Having specific data types that solve a particular problem — geospatial extensions keep coming to mind, pgvector dealing with Vector data, and being able to store it in a database without having delegate it out into an application certainly is a is a clear win.
The other thing was installing extensions. I think the notion of portability is very powerful. If I have a way to manage my extensions from a SQL interface, it makes it much easier to move it between different systems.
Now, we do need to be careful as soon as we start saying “SQL interface”. I don’t want to suggest that we should have a SQL interface to ship C code everywhere We know there are some challenges with C code. C is highly performant, you can effectively build anything under the sun using C, but it’s not memory-safe, and it’s very easy if you’re not familiar with what you’re doing — and even if you are familiar with what you’re doing! — you can easily make mistakes that could either lead to crashes or or possibly worse.
As we were thinking about all this with TLE, there’s three things. First, we need an interface to be able to install and manage extension code effectively regardless of environment. We need a SQL interface to do that. We also need to make sure there’s an appropriate trust boundary. Now, Postgres provides a trust boundary with the notion of a trusted language. But there are other things we need to build with trust, as well.
For example, you might not want everyone in your environment to be be able to install the a TLE, so we need to make sure there’s appropriate access controls there.
Finally, we need a way to package it up — which I think is what we’re going to talk about at the Extension Summit.
If there are any takeaways from why we built TLE (I think this is the the slide that encapsulates it), it’s that, by using by using Postgres’s built-in trusted language interface it allows you to write extension code in languages that we know are going to respect Postgres security boundaries.
Postgres has this definition of a trusted language which, if you look at for it, you have to effectively dance around the documentation to find it.
But effectively I’d summarize as, if you allow an unprivileged user to write code in a trusted language, they can’t do anything to escalate their privileges, access the file system directly, or do anything that would violate Postgres’s security boundary.
It’s a pretty good definition. Arguably, the easiest way to violate that definition is that you as the database administrator mark an untrusted language as trusted in the catalog. But I strongly advise to not do that!
What trusted languages are available. There’s a great wiki page called the “PL Matrix” on the Postgres Wiki that shows the status of all all the known PLs in Postgres and whether they’re trusted or not. I suggest looking at that.
- David Wheeler (he/him) in chat: https://wiki.postgresql.org/wiki/PL_Matrix
- Jeremy S in chat: Linux had kernel modules forever, but many people were rightfully hesitant because a kernel module could easily crash your entire system. One of the reasons eBPF is exploding today is because it’s verified and safe and enables code (like entire networking apps) to run directly in the linux kernel.
  
  I see TLE similarly
A big thing regarding a trusted language is performance. There are a variety of trusted languages, and they all have different properties you know around them. The ones I’m showing today are the ones available in RDS Postgres. But the reason I want to show them is that, part of the idea of trusted language extensions is allowing app developers who may be less familiar with C to write extension code and access some of the same internals as a C extension, but from one of these languages.
Here are some of the known trusted languages today that work with TLE. If you’re using the TlE open source project, you can use any available trusted language — or you can use untrusted languages an just use the TLE packaging mechanism. In that case you lose the trusted part, but gain the extension installation aspect of TLE.
There are a few things included in TLE to make sure that TLE can be installed safely. It is an opt-in feature. We do have a shared preload library for pg_tle called “pg underscore TLE”, and you do need to have your database super user install pg_le initially. This ensures that we’re respecting your security boundary, If you’re going to use trusted language extensions, you do have an explicit opt-in to using it.
- David Wheeler (he/him) in chat: https://github.com/aws/pg_tle
After that, an app developer can create their own trusted language extension.
Here’s a simple example from the TlE announcement with two functions and packaged into an extension you can install. You can give it a name like any Postgres extension; this one is called “tle_test”. The code looks like the SQL file in any extension. And it’s effectively packaged up like an extension using the pgtle.install_extension command. If you go to the pg_le GitHub project you can see the the different interface points.
Once it’s installed you can use CREATE EXTENSION like any other extension: it follows all the usual Postgres semantics: extension installation, uninstallation, software life cycle management. pg_tle has its own interface for that functionality, bu once you’ve installed it, managing the extension is just like managing any other Postgres extension, and follows those conventions.
Effectively TLE is offering, loosely, a packaging mechanism (I think packaging has a lot more connotations): it’s a grouping mechanism for your code. One of the parts that I always found most useful in pg_tle was this part, effectively versioning my store procedures.
When I talked about the example where I was putting business logic into the database, one part I would usually mess up is: what version of my stored procedures was running in a given database. Particularly if you have hundreds or thousands of databases that you’re managing, that can be a challenge.
Now I had far fewer databases I was managing, I was more dealing with our developer environments: staging and production. But I was managing the store procedures within our migration scripts — which is totally fine, because if I know what version of the migration that I ran then I would know what version of the stored procedures are on that database. Kind of. Sort of. Maybe. You know: unless someone manually modified it — in which case shame on me for giving access to the database. But there basically could be some gaps in knowing what version of a stored procedure was on a particular server.
With pg_le we can significantly reduce the risk of that problem because we have a way to version our store procedures, and be able to know exactly what we’re running at any given time, and create a consistent packaging mechanism wherever we’re running our code. And it goes beyond stored procedures because there’s far more that you can can build with your code.
What else does TLE add? We discussed was the packaging mechanism, but a lot of the power of Postgres extensions is the ability to use the underlying Postgres internals. One of these types of internals is called a “hook”.

Hooks are the Postgres feature that you’ve never heard of, that are not well documented, and yet are the foundational part of many extensions. Hooks are almost everywhere in Postgres. You particularly see a lot of them during the query execution process. For example the process utility hook which allows you to modify any utility command, anything that’s not a direct SQL statement. There are all sorts of hooks: there are password check hooks, client authentication hooks, hooks called around shared memory allocation, hooks called at each step of the the execution phase.
- Florents Tselai in chat: Hooks are almost undocumented indeed
  
  The best resources I’ve found:
  - https://github.com/taminomara/psql-hooks
  - and the standard: https://wiki.postgresql.org/images/e/e3/Hooks_in_postgresql.pdf
Hooks are very powerful; particularly enabling a lot of extensions adding different semantic behavior to Postgres. We could probably do a whole series of talks just on all the different ways you can extend Postgres. I mean, that’s why David has organized the summit! But hooks are very simply a powerful mechanism to define behavior and Postgres.
Because they’re so powerful, for the hooks that we expose in tle we make sure that there is a super user opt-in. Remember, an unprivileged user can define this behavior but you do need someone with privilege to be able to enable something like a hook.
For example, a password check hook probably means that you have the ability to evaluate a plain text password that’s coming through. Now on that topic we can have a very long debate, but let’s save that for Vancouver. But with this hook, you do have the ability to do password checks, so you want to make sure that, when you enable a function that calling a password check hook that there’s a certain level of privilege to that function. Or you you know you want to make sure you do your appropriate evaluation to make sure that you trust that function.
In addition to that check, there’s an additional check from the pg_tle admin role that requires someone with administrative privileges over your TLE to register that hook. The concept of “TLE features” are the way to map hooks into the TLE. We’ve been building it up since we launched TLE by adding a few hooks. There’s both the check password hook and the client authentication hook.
There’s also the ability to register custom data types — which is pretty cool, because data types are what attracted me to Postgres when I was an app developer: “Oh! There are all these data types! I can do all these rich comparisons against an index? Cool! Oh wait, you can even add custom data types? That’s even cooler!”
TLE allows you to create the base data type, so you can really expand the data types that you’re able to add. This is what TLE features does: it enables that safe mapping between trusted language code and the Postgres C internals.
In order to create a hook, you need to match the hook function definition. The TLE documentation documents how to create it appropriately, but it doesn’t need all the parameters that you would find in the hook function.
In this check password hook — I call this the “delay check password test”, meaning you’re probably trying to avoid someone trying to guess your password repeatedly, and if they keep failing so what, because they’re not going to brute force it anyway. There are actually more practical examples of check password hooks. But what’s cool is that you can define everything around the your hook behavior from within the hook function and then it acts as if you wrote a C-based hook! You just happen to write it in a in a trusted language.
Hooks do execute with elevated privileges, particularly around authentication you want to be very careful. So there are some safeguards built into TLE to make sure that you only enable hooks when you want to.
Last but not least: choosing a trusted language. I know this group is more focused on extension building, but I do want to talk about what an app developer goes through when choosing a trusted language.
Because everything has its trade-offs to consider. The Golden Rule (I actually took this from Jim Mlodgensky) is: when in doubt use PL/pgSQL, because it does have a lot of access to context that’s already available in Postgres. What’s interesting about this is that what we see today is based on PL/SQL. PL/pgSQL was developed to try to make it simpler to migrate from Oracle, but at the same time to provide a lot of rich functionality around Postgres.
As someone much more familiar with Ruby and Python, I can tell you that PL/pgSQL can be a little bit quirky. But it is very well documented, and it can solve all the problems that you need to in Postgres. And it already has a lot of very simple ways to directly access your data from Postgres. Certainly an easy choice to go with.
But wait, there’s more!
like PL/v8, writing JavaScript in your database, this is really cool! I remember when it came out and how mind-blowing it was, in particular for JSON processing. PL/v8 is awesome. PL/v8 came out right around the same time as the document database! So you kind of had perfect storm of being able to process JSON and write it in JavaScript — both within your Postgres database and it could be quite powerful.
Another really cool feature of PL/v8 is the ability to directly call another function or another PL/v8 function from within PL/v8, and not have to go through Postgres function processing, which adds a lot of additional overhead.
And now the one that’s all abuzz right now: PL/Rust. Being able to write and execute Rust code within Postgres. This is pretty cool, because Rust is a compiled language! There’s a trusted way to run PL/Rust within Postgres. There are a few techniques to do it. First, whenever you’re running Rust on your server, to make sure that you’re guarding against breakouts.
There is a library, I believe it called postgres FTD, that effectively compiles out some of the less dressed parts of Rust, such as unsafe function calls. But you can still get everything that you want in PL/Rust today: you get the Rust standard Library, the ability to run crates — and you do want to evaluate crates to make sure that you’re comfortable running them in your environment. But then you get this compiled language that is CPU efficient, memory efficient, and memory safe. (Well, a lot of Rust is memory safe) It’s pretty cool!
- Steven Miller in chat: In PL/Rust, does it run the compilation when the function is created? Then if there is a compiler issue it just shows up right there?
I wrote a blog post last year that compared some different function calls between PL/pgSQL, PL/v8, and PL/Rust. First I was doing some array processing, and you could see that the Pl/Rust calls were very comparable to the C calls. And then there’s some additional Vector processing, given that I’ve been obsessing on vectors for the past 14 months. Seeing rust actually win against PL/pgSQL and PL/v8 (I don’t remember the numbers off the top of my head I can look up that blog as soon as I switch windows). Pretty cool!
This brings us in some ways to the best of all worlds, because I can take an extension that normally I would write in C, particularly because I’m focused on performance, I can write it in PL/Rust, package it as a trusted language extension, and run it anywhere that TLE and PL/Rust are supported. Again, that is very powerful, because suddenly I have what I hope is the best of all worlds: I have this portability, I don’t have to worry as much about major version upgrades because pg_le is acting as that abstraction layer between the Postgres C code and the application code that I’m writing.
- Jeremy S in chat: Versioning of stored procedures is a very interesting use case
- Darren Baldwin in chat: Agreed! Moving stuff to the database layer seems to be something very foreign and “scary” to most app devs I’ve talked to
- Anup Sharma in chat: Is TLE a requirement for any PostgreSQL extension, or is it dependent?
- Steven Miller in chat: So during a major version upgrade, the function declaration stays the same, so that’s why your application doesn’t need to change with respect to the extensions during a major version upgrade. And at some point during the migration, you create the function again, which recompiles. So it all works the same! That’s great
Last slide, then I’m certainly looking forward to discussion. pg_tle is open source, and it’s open source for a lot of reasons. A lot of it is because we want to make sure that trusted language extension are as portable as possible. But in some ways the ideas behind TLE are not original. If you look at other databases there is this notion of, let’s call it inline extensions, or inline SQL, ou call them modules, you call them packages. But the idea is that I can take reusable chunks of code, package them together, and have them run anywhere. It doesn’t matter where the database is located or hosted.
This is something that I personally want to work with folks on figuring out how we can make this possible in Postgres. Because even in Postgres this is not an original idea. Dimitri Fontaine was talking about this as far back as 2012 in terms of his vision of where of the extension framework was going.
What I’m looking forward to about this Extension Summit — and hopefully and hopefully I’m not in conflicting meetings while it’s going on — is talking about how we can allow app developers to leverage all the great parts of Postgres around function writing, function building, and ultimately packaging these functions, and making it simple simpler for them to be able to move it wherever their applications are running.
So it is open source, open to feedback, under active development, continue to add more features to support Postgres. Iltimately we want to hear what’ll make it easier for extension writers to be able to use TLE, both as a packaging mechanism and as a as a development mechanism.
So with that uh I that is the end of my slides and happy to uh get into a discussion about this.

Discussion

David Wheeler (he/him): Awesome, thank you Jonathan. there was one question about PL/Rust in the comments. Stephen asks whether it compiles when you create the function, so if there are compiler issues they they show up there.
Jonathan Katz: Correct It compiles when you create the function and that’s where you’ll get compile errors. I have definitely received my fair share of those [chuckles]. There is a Discord. PL/Rust is developed principally by the folks uh responsible for the pgrx project, the folks at ZomboDB, and they were super helpful and debugging all of my really poor Rust code.
David Wheeler (he/him): While while people are thinking about the questions I’ll just jump in here. You mentioned using crates with PL/Rust. How does that work with pg_le since they have to be loaded from somewhere?
Jonathan Katz: That’s a good question. I kind of call it shifting the problem. TLE solves one problem in that you don’t need to necessarily have everything installed on your on your local file system outside of pg_tle itself. If you’re using PL/Rust and you need crates, you do need those crates available either within your file system or within whatever package management tools you’re using. So it shifts the problem. I think it’s going to be a good discussion, about what we can do to help ensure that there is a trusted way of loading those.
David Wheeler (he/him): Yeah I wonder if they could be vendored and then just included in the upload through the function call.

Anup Sharma asked asked if pg_tle s a requirement any extension or is it dependent.
Jonathan Katz: It’s not requirement. This is a project that is making it possible to write Postgres extensions in trusted languages. There ar plenty of extension authors on this call who have written very, very, very good extensions in C that do not use TLE.
David Wheeler (he/him): You can use trusted languages to write extensions without TLE as well. It’s just a way of getting it into the database without access to the file system, right?
Jonathan Katz: Correct. I think I saw Keith here. pg_partman is PL/pgSQL.
- Anup Sharma in chat: Understood. Thanks
- Tobias Bussmann in chat: I think it is important not to confuse Trusted Language Extensions TLE with “trusted extensions” which is a feature of Postgres 13
- Keith Fiske in chat: Pretty much all of it is. Just the background worker isn’t
- Jonathan Katz in chat: hat’s what I thought but didn’t want to misspeak 🙂
David Wheeler (he/him): Right Any other questions or comments or any implications that you’re thinking about through for extension distribution, extension packaging, extension development?
- Steven Miller in chat: Is background worker the main thing that a TLE could not do in comparison to traditional extensions?
Jason Petersen: The crates thing kind of raised my interest. I don’t know if Python has this ability to bring in libraries, or if JavaScript has those dependencies as well. But has there been any thought within pg_tle for first classing the idea of having a local subdirectory or a local file system layout for the “native” dependencies? I’m using “native” in quotes here because it could be JavaScript, it could be Python, whatever of those languages, so they could be installed in a way that’s not operating system independent.

I know this is kind of a complex setup, but what I’m getting at is that a lot of times you’ll see someone say “you need to install this package which is called this and Red Hat or this on Mac or this on Debian — and then you can install my extension. Has there been any push towards solving that problem by having your TLE extensions load things from like a a sort of Walled Garden that you set up or something? So it’s specific to the database instead of the OS?
Jonathan Katz: That’s a good question. There has been thought around this. I think this is going to be probably something that requires a thorough discussion in Vancouver. Because if you look at the trusted languages that exist in Postgres today, the definition of trusted language is: thou shall not access the file system. But if you look at all these different languages, they all have external dependencies in some in some way shape or form. Through Perl there’s everything in CPAN; through JavaScript there’s everything in npm. Let’s say installed the appropriate CPAN libs and npm libs within uh your database for everything I recall from playing with trusted PL/v8 and PL/Perl is that you still can’t access those libraries. You can’t make the include or the require call to get them.

Where PL/Rust is unique is that first off we just said, “yes, you can use your Cargo crates here.” But I think that also requires some more thinking in terms of like how we make that available, if it’s OS specific, vendor specific, or if there’s something universal that we can build that helps to make that more of a trusted piece. Because I think at the end of the day, we still want to give the administrative discretion in terms of what they ultimately install.

With the trusted language extensions themselves, we’re able to say, “here’s the post security boundary, we’re operating within that security boundary.” As soon as we start introducing additional dependencies, effectively that becomes a judgment call: are those dependencies going to operate within that security boundary or not. We need to be make sure that administrators still have the ability to to make that choice.
I think there are some very good discussion topics around this, not just for something like PL/Rust but extension distribution in general I think that is you know one of the I think that’ll be one of the key discussions at the Extension Summit.
- David Wheeler (he/him) in chat: What if the required modules/packages/whatever were in a table. e.g. in Perl I do use Foo::Bar and it has a hook to load a record with the ID Foo::Bar from a table
David G. Johnson: Has there been any thought to having the default version of an extension tied to the version of PostgreSQL? Instead of it just being 1.3 and, whether I’m on version 12 or 15, because 1.3 might not even work on version 12 but it would work on version 15. The versioning of the an extension and the versioning of PostgreSQL seem like they’re almost too independent.
Jonathan Katz: So David, I think what you need to do is chastise the extension developers to let them know they should be versioning appropriately to to the the version of Postgres that they’re using. [Chuckles]

There is a good point in there, though. There is a lot of freedom in terms of how folks can build extensions. For example, just top of mind, pgvector supports all the supported versions of Postgres. Version 0.7.0 is going to be coming out soon so it’s able to say, “pgvector 0.7.0 works with these versions.” Dumb. PG plan meanwhile maintains several back releases; I think 1.6.0 is the latest release and it only supports Postgres 16. I don’t believe it supports the earlier versions (I have to double check), but there’s effectively things of that nature.

And then there aer all sorts of different things out there, like PostGIS has its own life cycles. So there’s something good in that and maybe the answer is that becomes part of the control file, saying what versions ov Postgres an extension is compatible with. That way we’re not necessarily doing something to break some environment. I’m just brainstorming on on live TV.
David G. Johnson: The other day I open a but report on this. but PostgreSQL dump and restore will dump it without the version that’s in the source database, and when yoq restore it, it’s going to restore to whatever the current version for the control file is even if you’re upgrading to a different database. versus restoring it to whatever the original version was. That dynamic just seemed problematic.
David Wheeler (he/him): I think it’s less problematic for trusted language extensions or extensions that have no C code in them, because pg_dump does dump the extension, so you should be able to load it up. I assume base backup and the others do the same thing.
David G. Johnson: I haven’t checked into that. It dumps CREATE EXTENSION and then it dump any user tables that are marked by the extension. So these code tables are marked as being user tables for TLE?
David Wheeler (he/him): What do you mean by code tables?
Regina Obe: That’s a good point. For example my Tiger geocoder is all PL/pgSQL, but it’s only the CREATE EXTENSION thing that’s named. So for your TLE table, it would try to reload it from the original source, wouldn’t it? In which case it would be the wrong version.
Jonathan Katz: We had to add some things into TLE to make sure it worked appropriately with pg_dump. Like I know for a fact that if you dump and load the extension it works it works fine. Of it doesn’t then there’s a bug and we need to fix it.
David G. Johnson: Okay yeah I haven’t played with this. Literally this is new to me for the most part. I found the whole fact that the control file is not updated when you do ALTER EXTENSION to be, at least in my mind, buggy.
Jonathan Katz: In the case of TLE, because it’s in theory major version-agnostic. When I say “in theory,” it’s because we need to make sure the TLE code in library itself is able to work with every major version. But once that’s abstracted away the TLEs themselves can just be dumped and reloaded into different versions of Postgres. I think we I we have a TAP test for that, I have to double check. But major version upgrades was something we 100% tested for
David Wheeler (he/him): I assume it’d be easier with pg_tle since there’s no need to make sure the extension is is installed on the file system of the new server.
Jonathan Katz: Yep. if you look at the internals for pg_tle, effectively the TLEs themselves are in a table. When you do a CREATE EXTENSION it gets loaded from that particular table.
David G. Johnson: Right, and when you do a pg_dump you make suer that table was dumped to the dump file.
Jonathan Katz: Yes. But this is a key thing that we we had to make sure would does work: When loading in a pg_dump, a lot of the CREATE EXTENSIONS get called before the table. So we need to make sure that we created the appropriate dependency so that we load the TLE data before the CREATE EXTENSION. Or the CREATE EXTENSION for the TLE itself.
- Jeremy S in chat, replying to “Is background worker the main…”: doing a background worker today, I think requires working in C, and I don’t think core PG exposes this yet. Maybe it could be possible to create a way to register with a hook to a rust procedure or something, but maybe a better way in many cases is using pg_cron
- Jonathan Katz in chat: We can add support for BGWs via the TLE API; it’s just not present currently.
- nils in chat: Creative thinking, if a background worker doesn’t work in TLE, how about create your UDF in tle and schedule with pg_cron 🤡
David Wheeler (he/him): You mentioned in the comments that you think that background workers could be added. How would that work?
Jonathan Katz: It would be similar to the the other things that we’ve added, the data types and the hooks. It’s effectively creating the interface between the C API and what we’d expose as part of the TLE API. It’s similar to things like pgrx, where it’s binding to Postgres C API but it’s exposing it through a Rust API. We do something similar with the TLE API.
- Steven Miller in chat: Thank you Jeremy. I like this idea to use TLE, then depend on cron for a recurring function call
- Steven Miller in chat: Ah yes Nils same idea 😄
- Jason Petersen in chat: Thumbs up to nils about pgcron. If you need a recurring BGW just write it in plpgsql and schedule it
- nils in chat: Great hackers think alike
- Jason Petersen in chat: (I know I do this)
David Wheeler (he/him): That that makes sense. I just thought the background workers were literally applications that are started when the postmaster starts up shut down when the postmaster shuts down.
Jonathan Katz: But there’s dynamic background workers.
David Wheeler (he/him): Oh, okay.
Jonathan Katz: That’s how a parallel query works.
- Jeremy S in chat: Threading? 😰
David Wheeler (he/him): Gotcha, okay. Sorry my information’s out of date. [chuckles]
Jonathan Katz: Well maybe one day we’ll have you know some some form of threading, too. I don’t think like we’ll get a wholesale replacement with threads, but I think there are certain areas where threads would help and certain areas workers are the way to go/
David Wheeler (he/him): Yeah, yeah that makes sense.
Jonathan Katz: Hot take!
David Wheeler (he/him): What other questions do you have for about TLEs or extensions more broadly and packaging in relation to TLEs?
David G. Johnson: Just a random thought: Have you thought about incorporating foreign servers and pointing the TLE, instead of a local database, point it to a master, company-wide foreign table?
David Wheeler (he/him): Like a TLE registry?
David G. Johnson: Right, yeah something global would be nice. like okay we hosted on PGXN at there’s a TLE registry. But because for a company who wants maintain code internally between projects, and they want a shared library, they can publish it on one server, send up a link to it over foreign server, and then just point at that.
Jonathan Katz: Could be!
David Wheeler (he/him): I mean you could just use foreign foreign tables for that for the tables that TLE uses for its its registry, right?
David G. Johnson: That’s I’m thinking.
David Wheeler (he/him): Yeah that’s a cute idea.
Jonathan Katz: I think that just to to go back a few more minutes. I think you I was asked to talk about the vision. One one way to view extensions is trying things out before they’re in core, or before they’re in Postgres. The aspect that I would ultimately like to see in core someday is the ability to do that’s called “inline modules.” There is a SQL standard syntax, CREATE MODULE, that for this purpose. Some folks were trying to see see if we could get it into, I believe, Postgres 15. There was some push back on the design and it died on the vine for the time being.

But I do think it’s something to consider because when I talk to folks, whether it’s random Postgres users RDS customers, etc., and I go through TLE, one of the things that really stands out is one of the things that we had discussed here and I saw in the chat, which is this aspect: being able to version your stored procedures. This is in part what modules aims to solve. One is just having a SQL interface to load all these things and group it together. But then once you have that grouping you have the ability to version it. This is the part that’s very powerful. As soon as I saw this I was like, “man I could have used that that would have saved me like hours of debugging code in production.” Mot saying that I was ever sloppy and you know in random store procedures in my production database!
- David Wheeler (he/him) in chat: I see CREATE MODULE in the db2 docs.
Jonathan Katz: But that’s kind of the vision. The fact that Postgres is extensible has led to this very widely adopted database. But I think there are things that we can also learn in our extensions and bring back upstream. There are certainly reasons why they we developing things in extensions! Like pgvector is an example of that, where we talked about it at PGCon last year. And part of the thought of not trying to add a vector data type to Postgres was, first, to make sure we could settle on what the the binary format would be; and once that’s solidified, then we could add it.

But I had an aside with Tom [Lane] where we talked about the fact that this is something we need to move fast on, the vector space is moving very quickly, extensions are a way to be able to move quickly when something like Postgres moves more deliberately.

This is in some ways where TLE is, our way to be able to see what kind of interface makes sense for being able to do inline extension loading and ultimately how we want that to look in core.
David Wheeler (he/him): Can you create data types with a binary representation in TLE?
Jonathan Katz: Yes as of (I want to say) the the 1.3 release. I have to double check the version. The way we’re able to do it safely is that it actually leverages the BYTEA type. When you create that representation it stores it as a BYTEA. What you get for free today is that, if you create your equality/inequality operators, you can use a b-tree look up on these data types.

So there’s a “dot dot dot” there. If we wanted to be able to use like GIST in GIN and build data types for our other index interfaces, there’s more work to be done. That would require a TLE interface. I spent a lot of time playing with GIST and GIN, and the interface calls themselves involve pointers. So that will require some thought yeah.
David Wheeler (he/him): I assume it’s a similar issue for Rust data types that are basically just serde-serialized.
Jonathan Katz: Yeah we can at least like store things in BYTEA, and that’s half the battle. It allows us to do a safe representation on disk as opposed just “here’s some random binary; good luck and don’t crash the database!”
- Jason Petersen in chat: I also wondered about the function interface for things like storage features (table access methods).
  
  I assume they’re similarly hairy
David Wheeler (he/him): Any other last minute questions?
Jonathan Katz: Table access methods. Yes table access methods are very hairy as are index access methods. I spent a lot of time the past 14 months looking at the index access method interface, which has a lot of brilliance in it, and certainly some more areas to develop. But it’s amazing! The fact that we can implement vector indexes and get all the best parts of Postgres is a phenomenal advantage.
Jeremy S: One last question. We’re leading up to Vancouver and we’re going to be starting to think about some of the topics that we want to make sure to talk about at the Summit. I think you mentioned one earlier (I should have written it down), but any final thoughts about topics that we should make sure to discuss?
Jonathan Katz: Just in general or TLE specific?
Jeremy S: Both. I mean for sure TLE-specific, but also just generally related to extensions
Jonathan Katz: My TLE-specific one dovetails into the general one. The first one is: is there ultimately a path forward to having some kind of inline extension management mechanism in core Postgres. That’s the top, part one, I spent the past five minutes talking about that.

But I think the big thing, and why we’re all here today, is how do we make it easier for developers to install extensions, manage extensions, etc. I think the notion of package management thanks to the work of Andres finding the backdoor to xz also shines a new light, because there’s a huge security component to this. I remember, David, some of our earlier chats around this. I think you know —- again, being ap-developer sympathetic — I definitely want to see ways to make it easier to be able to load extensions.

Having spend spent a lot of time on the other side, the first thing that comes to mind is security. How do we create a protocol for managing the extension ecosystem that also allows folks to opt into it and apply their own security or operational or whatever the requirements are on top of it. That’s the thing that’s most top of mind. I don’t expect to have like a full resolution from the Extension Summit on it, but at least the start of it. What is ultimately that universal packaging distribution protocol for Postgres extensions that we can all agree on?
David Wheeler (he/him): Thank you so much! Before we go I just wanted to tee up that in two weeks Yuri Rashkovskii is going to talk about his idea for universally buildable extensions: dev to prod. That’ll be on May 1st at noon Eastern and 4pm UTC. Thank you everybody for coming.

Mini Summit: Jonathan Katz on TLEs

15 Apr 2024

Coming up this week: the fourth installment of the Postgres extension ecosystem mini-summit. The topic: Trusted Language Extensions, a.k.a. TLEs. Jonathan Katz of AWS will give a brief presentation on “TLE Vision and Specifics” followed by community discussion of the issues TLEs aim to address, what works and what doesn’t, and the role of TLEs in the future of the extension ecosystem.

Join us! Note! that if you reserved a spot at a prior mini-summit, sadly you will need to do so again for each subsequent summit or miss out on reminders from Eventbrite. And if Eventbrite says the event is sold out, rest assured we have plenty more virtual seats! Just send at david@ this domain, ping me on Mastodon or via the #extensions channel on the Postgres Slack or the Postgres Discord.

Mini Summit Three

10 Apr 2024

Terrific presentation and discussion at last week’s Postgres Extension Ecosystem Mini-Summit. In fact, I later learned that some people couldn’t make it because the Eventbrite event page showed it as sold out!

This is a limitation of the free level at Eventbrite, which caps tickets at 25. But this is a fully-remote event and we can host more people than that. We had over 30 last week! So if you’d like to attend one of the remaining three events and Eventbrite shows it as sold out, have a look at the bookmarks for the #extensions channel on the Postgres Slack, or email the username david at this domain and I will send you the link!

Okay, back to business. Last week, Devrim Gündüz kindly gave a super thorough presentation in his capacity as the maintainer of the Postgres community Yum and ZYpp repositories. This topic sparked lots of interesting discussion and commentary, which I detail below. But why read when you can look?

Still reading? Okay then, here we go!

Introduction

I opened the meeting and introduced Devrim Gündüz.

Presentation

Thank you for the introduction. Going to talk about Yum and ZYpp dot postgresql.org, these are the challenge for us, and the challenge of making RPMs from extensions.
Work for EDB as Postgres expert, Postgres Major contributor responsible for the Postgres Yum repo. If you’re using RPMs, I’m building them. I also contribute to Rocky, SUSE, and Fedora Linux, proud Postgres community member, live in London, and I’m also doing some DJing, so if I get fired I have an extra job to do.
Agenda: Last year at PGConf.eu in Prague, I had 60 slides for 5 minutes, so expect huge number of slides for today. I want to leave some time for discussion discussion and questions.
I’m going to start with how it started back in the 2000s, talk about the contents of the repos, which distributions we t because that’s another challenge um how do we do the extension packaging how to RPM how to build RPMs of an extension and how to update an extension RPM.
Then the horror story um when what happens when there’s a new Postgres beta is out, which is due in the next next month or so for Postgres 17.
Then we are going to speak about the extension maintainers, problem for us, and then the problems about relying on the external repos.
So if you have any questions please ask as soon as you can. I may not be able to follow the chat during the presentation, but I’ll try as much as I can.
So let’s talk about how it started. When I started using Linux in 1996 um and Postgres in 1998, we always had Tom Lane and we had we had Lamar for for who build built RPMs for RedHta Upstream. So they were just building the distribution packages, nothing community. It was only specific to RedHat — not RedHat Enterprise Linux but RedHat 3-4-5 — not the non-enterprise version of RedHat back then, but it was called it Fedora core back then, the first version was released in November 2003, which was another challenge for packaging
One of the problems with the distribution packaging was that it was slightly behind the minor Postgres releases, sometimes major post releases
So that was one single Postgres version available for a given distro, say Postgres 6.4 or something, 7.0, and multiple versions were not available back then, and the minor version was slightly behind.
I started building packages for my laptop because I wanted to use Postgres and not all versions wer available back then. So I started building packages for my laptop and my server. They were based on the packaging of Tom Lane and Lamar.
Then I uploaded them to my personal server and emailed the PG mailing lists lists and said, “I’m running on own packages, use at your own risk. This is just a rebuild of the upstream packages on the RedHat version that I’m using or the Fedora version that I’m using. Up to you! This is a personal project, nothing serious.”
So then then people started downloading them, and using them. There was no signature, nothing official back then. Then Lamar said he didn’t have enough time for the RPMs. He sent an email to the mailing lists and I said Devrim is stepping up to the plate, and I did it. So that was I think in 2004, about which is about 20 years ago.
So 19 years ago we had the first domain, postgresql.rpm.org, and then we had more packages. In 2007 we had the first repository RPM and then we had yum.postgresql.org. This means that, starting 2007, this began to be the official RPM repo of the community, which which was a good thing because we could control everything under the Community.
I was living in Canada back then. We had the first RPM build farm — instead of using my laptop and my VMs — we had the actual machine back then.
In 2010 we had what was then called multiple postmaster support, which means the parallel installation of the Postgres RPMs. That was a that was a revolution because even still, the current Fedora or RedHat RPMs cannot be installed in parallel. So if you want to install post 13, 14, 15, 16, and 17 or whatever, it is available in the community repo. This is a great feature because you may want to test or do something else. This is how we how we started, 14 years ago we had this feature in the community repo.
Then next year we moved the repo to the community servers and unified the spec files. Our builds are extremely simple — like you can start your own builds in less than 10 minutes: five minutes for pulling the git repo and then a few minutes for for a package. Extremely simple builds, and now we are in 2024
Let’s talk about the contents, because we have four different contents for the repo. The first one is the “common” repo. We call it “common” in the repository config file. It has the RPMs that work across all Postgres major releases. Let’s say pg_badger, or the ODBC driver, JDBC driver, or GDAL for PostGIS. There’s lots of stuff: Python packages, which are not extensions but we like Patroni, which actually works for all Postgres releases.
This was an effort to get rid of duplicates in the Postgres repo. I think we shaved lots of gigabytes back then, and we still save a lot
Then, to address one of the topics of today’s talks, we have the “non-common” RPMs. (These are the name of the directories, by the way.) They are the RPMs which are Postgres version-specific, basically they are extensions. Say plpgsql_check 15 or whatever. Lots of extensions are there.
Then we have extras. They are not actually Postgres packages, they shouldn’t be included in our repo by default, but many people actually look for these packages because they they want to use Patroni and they don’t have the supporting RPMs or supporting RTM RPMs, or they’re not up-to-date.
I’m building a console, console-template, ETCD, HAProxy keepalived and vip-manager. They are all open source, some of them are Go packages — which, actually, I don’t build them, I just distribute the precompiled binaries via the repo. So that makes easier for people to deploy the packages.
And then we have the “non-free” repo. These are the packages that depend on closed-source software like Oracle libraries, or that have license restrictions. For example, ora2pg depends on Perl DBD::Oracle, oracle_fdw depends on Oracle, pg_storm depends on Cuda Nvidia stuff, timescaledb-tsl actually is Timescale DB with the TSL license, informix_fdw and db2_ftw.
So we have some non-free packages which actually depend on non-free stuff as well. All of them are well-maintained: I’m trying to keep everything up to date — like real up-to-date! That brings some problems but we will get there.
We support RedHat Enterprise Linux and of course Rocky Linux. This year we started supporting Alma Linux as well. Of course they are more or less identical, but we test them, install, and support to verify the packages in these three distributions.
We have x86_64 aarchm64, ppc64le, and RedHat 9, 8, and 7. We have also RedHat 6 for Postgres 12, but it’s going to be killed by the end of this year. We have Fedora, but only two major releases, which matches the Fedora lifecycle. And SUSE — my pain in the neck — um I’ll get there.
Since you all are here for extension packaging, let’s get there: what happens for extension packaging.
First of all, we have the first extension, which is the in-core extensions. They are delivered with the contrib sub-package, which matches the directory name in The Postgres tarball. There are separate packages for each major version, so postgres15-contrib, postgres13-contrib, etc. These are the installation directories for each extension. We are going to use those directories for the other [non-cre] extensions as well.
When we add a new extension to the repo, it’s going to use these directories if they have a binary or if they an extension config file, if the library or the mem files — all are going are all installed under these directories. This magic is done by PGXS, which has been there forever. We just provide initial stuff and then the rest is done by the the PGXS magic. This is the base for a lot of the core extensions.
So what happens when we do non-common and non-free package? First of all, they are distributed separately for each Postgres major version. Let’s go back to the one of the extensions, say plpgsql_check. We have a separate package for Postgres 14, a separate package for Postgres 15, 16, 13, and 12. If they build against all the supported Postgres versions, we have separate packages for each.
Of course from time to time — as far as I remember Timescale DB does this —- Timescale DB only supports Postgres 15 and 16 nowadays. So we drop support for the older versions in the Git repo. But they are separate packages; they are all installed in these directories along with the main contrib package.
This is the naming convention that I use: $extensionName_PGMajorVersion. Four or six years ago, some of the packages didn’t have an underscore before the PG major version. It was a problem, so someone complained in the hackers mailing list, and then I made the changes.
Currently all the previous and all the new packages have this naming convention except a few packages like PostGIS — because in PostGIS we have multiple versions of the same extension. let’s say we have PostGIS 3.3, 3.2, 3.4, 3.1, 3.0. We have combinations of each — I think we have PostGIS 3.0 in some of the distributions but mostly PostGIS 3.1, 3.2, 3.3, and 3.4, and then all the Postgres versions they support — A lot of builds! So there are some exceptions where we have the extension name and extension major version before the Postgres major version.
- Jeremy S in chat: question: curious if you have thoughts about relocatable rpm packages. ever tried? just best to avoid?
I have a question from Jeremy. This is a very basic question to answer. This is actually forbidden by the packaging rules. The RPM packaging rules forbid you to distribute or create relocatable RPM packages. We we stick to the packaging guidelines, so this this cannot be done.
- Jeremy S in chat: Thanks! (Didn’t realize packaging guidelines forbid this)
Let’s talk about how we build extensions. Often our develop package is enough: many of our packages just just rely on on Postgres itself. But of course packages like PostGIS may depend on some other packages, like GDAL, which requires lots of lots of extra dependencies as well. The most problematic one is the GIS Stack: they need EPEL on RHEL and RHEL and its derivatives.
There there has been a discussion in the past about whether should require EPEL by default. The answer is “no,” because not all not all of our users are installing, for example, the GIS stack or other packages. Most of our users — not the majority of our users —- rely on the um rely on just our repo.
On the other hand, in order to provide completeness for our users, I added lots of python packages in the past to support Patroni — because the upstream packages (I’m sorry not maybe upstream packages) were not enough. The version wasn’t enough or maybe too low. So From some time to time I add non Postgres-related packages to the repo just to support the Postgres package. In the past it was PGAdmin, but now it’s not in our repo so it’s not a problem: their upstream is building their own RPMs, which is a good thing for us. We are building extra packages mostly for Patroni.
However, this is a potential problem for some enterprise users because large companies don’t even want to use the EPEL repo because they feel like it’s like it’s not a community repo, but a community repo controlled by Fedora and RedHat. That’s why from time to time I try to add some of the packages to our repo.
If it’s a problem for enterprise users, does it mean we should we maintain tons of extra packages in the EPEL repo for the GIS stack? The answer is “no, definitely no”. Not just because of the human power that we need to maintain those those packages — I mean rebuilding them is easy: I just get the source RPM, commit the spec files into our repo, and rebuild them. But maintaining them is something else.
We will have a similar problem when we release our own ICU package in the next few years. Because, now that we have the in core collation — but just for C Locale —- and people are trying to get rid of glibc, maybe we should have an ICU package. But maintaining an ICU Library across a single Postgres major version is a real big challenge that I don’t know how to solve for now, at least.
And then SLES — my pain in the neck — the GIS stack requires lots of repos on SLES 15. They are they are well documented on on our website.
Fedora is safe because Fedora is Fedora, everything is there, it’s easy to get a package there.
- Jeremy S in chat: “Building them is easy. Maintaining them is something else.”
Yeah that’s that’s the problem, Jeremy.
So how do you RPMify an extension?
The first thing is to know that the extension exists. This is one of the big problems between developers and users. The developer creates a useful extension and then they don’t create a tarball, they don’t release anything. They expect people to install Git on their production databases and git pull, install make, gcc, all the development libraries and build a binary, blah, blah blah.
I’m sorry that’ss not going to work. It also doesn’t work for pip — pip is not a package manager, it just destroys your server. It downloads things to random places and then everything’s gone. That’s why I added lots of Python packages to support Patroni, because most of the users use the packaging package manager to install Postgres and other packages to their servers. It’s the same for Debian, Ubuntu, for RPMs, for Windows, for macOS.
So first of all we know have to know that the extension exists and we have to have a tallbal. If I see an extension that seems good enough I’ll get there. PGXN is a good place, because when I go to pgxn.org a few times per day and see if there is a new version of an extension or if there’s new extension, it’s a good piece. But there’s a problem: we have hundreds of extensions — maybe thousands — but not all of them are on PGXN. They should be!
- David Wheeler (he/him) in chat: You should follow @pgxn@botsin.space for new releases :-)
- nils in chat: pg_xz incoming
I don’t know how to solve this problem, but we should expect every extension to announce their extensions on PGXN. I’m not just talking about installing everything through PGXN, but at least have an entry that there’s a new extension, this is repo, the website, the readme and the is a tarball. It doesn’t have to be on PGXN, as long as we have something.
And then I check the version. If there is an extension that will kill your database and the version is 0.001, that’s not going to be added to the repo, because we don’t want to distribute an experimental feature.
- David Wheeler (he/him) in chat: LOL, all my extensions start at 0.0.1
- David G. Johnston in chat: If it isn’t on PGXN it doesn’t exist.
Another problem is that lots of people write extensions but some of them are just garbage. I’m sorry but that’s the truth. I mean they just release a version and then do nothing.
- David Wheeler (he/him) in chat: LOLOLOL
From the chat, “pgxn_xz is coming”: that’s right! We have blackhole_fdw, which was written by Andrew Dunstan. When you create blackhole_fdw, it throws all of your data into black hole, so everything is gone.
Yeah, “if it’s not on PGXN it doesn’t exist,” that’s what I hope we achieve achieve in the next year or so.
- Jimmy Angelakos in chat, replying to “If it isn’t on PGXN …”: I would correct that to “If it isn’t on PGXN it isn’t maintained.” Sometimes even ON PGXN…
Yeah Jimmy, that’s one of the big problems that we have: maintenance.
We create the spec file, just copy one of the existing ones and start editing. It’s easy but sometimes we have to add patches. We used to carry patches for each Postgres major version to change the Makefiles for the specific Postgres major version. But I realized that it was [not a great pattern]. Now we just export the path, which fixes the problem.
- David G. Johnston in chat: As a policy though, someone who cares and wants responsibility needs to apply to be part of the RPM community.
Then I initiate a scratch build for any missing requirements. If there are any missing build requirements it fails to build. I only do it on Fedora latest, not for every package because it doesn’t always help because some distros may not have the missing dependency
- Alvaro Hernandez in chat: Hi everybody!
- David G. Johnston in chat: Delegated to PGXN for that directory.
Let’s say we rely on some really good feature that comes with a latest version of something, but that latest version may not appear in RedHat 7 or 8. So some dist dros may have it, but the version may be lower than required. Or some distros may have the dependency under different name. Now in the spec file we have “if SUSE then this” and “if RedHat then this” “if Fedora then”, “if RedHat nine then this”, etc. That’s okay, it’s expected. As long as we have the package, I don’t care.
Then I push it to the Git repo, which I use not just for the spec files and patches, but also for carrying the spec files and pitches to the build instances.
- Jorge in chat: How to handle extension versioning properly? I mean, in the control file the version could be anything, like for ex. citus default_version = ‘12.2-1’ where the “published version” is v12.1.2, then the “default version” could remain there forever.
  
  Also have seen in the wild extensions that the control file have a version 0.1 (forever) and the “released” version is 1.5
If something fails I go back to the drawing board. GCC may fail (gcc 14 has been released on Fedora 40 and is causing lots of issues for for packaging nowadays), it could be cmake — too recent or too old. It could be LLVM — LLVM18 is a problem for Postgres nowadays. I either try to fix it ping upstream. I often ping upstream because the issue must be fixed anyway
If everything is okay, just push the packages to the repo.
- Ruohang Feng (Vonng) in chat: question: how about adding some good extensions written in Rust/pgrx to the repo? like pgml, pg_bm25, pg_analytics, pg_graphql….
One issues is that there is no proper announcement. Maybe I have an awesome extension available in the Postgres repo that people crave and, we build the extensions, it took a lot of time (thank you Jimmy, he helped me a lot) and then I didn’t actually announce it that much. On the other hand, people just can use PG stat base [?] to install and start using it in a few seconds. This is something that we should improve.
- Steven Miller in chat: How to handle system dependencies like libc versions being updated on the target system? Do extensions need to be complied against exactly a specific libc version?
- From Nevzat in chat: how can we make sure bugfix or extension is safe before installing it
- vrmiguel in chat: Interesting approach to identify build/runtime requirements
  
  Over at Tembo we have a project called trunk-packager which attempts to automatically create .deb packages given the extension’s shared object.
  
  We try to identify the dynamic libs the extension requires by parsing its ELF and then trying to map the required .so to the Debian package that provides it, saving this info in the .deb’s control file
From the chat: How to handle extension versions properly? That’s a good thing but, extension version and the release version don’t have to match. Thr extension version isn’t the same thing as the release version. It’s the version of the SQL file or the functions or the tables, the views, sort procedures, or whatever. If it’s 0.1 it means it’s 0.1 it means nothing nothing has changed in this specific regarding the control file. They they may bump up the package version because they may add new features, but if they don’t add new features to the SQL file, then they don’t update the extensions. I hope that answers your question George
I have another question from Ruohang. Yaaaaay! I was afraid that someone would ask that one. We have no extensions written in Rust in repo so far. It’s not like Go; there is a ban against Go because we don’t want to download the world, all the internet just to build an extension. If I recall correctly they’re rewriting pg_anonymizer in Rust. They will let me know when they release it or they’re ready to release it, and then I’ll build it. It’s not something I don’t like, it just hasn’t happened.
- Keith Fiske in chat: I still update the control file with my extensions even if it’s only a library change. Makes it easier to know what version is installed from within the database, not just looking at the package version (which may not be accessible)
- Ruohang Feng (Vonng) inchat: question: How to handle RPM and extension name conflicts, e.g., Hydra’s columnar and Citus’s columnar.
- *David Wheeler (he/him) in chat, replying to “I still update the c…” Yeah I’ve been shifting to this pattern, it’s too confusing otherwise
If you think there are good extensions like these, just create a ticket on redmine.postgresql.org. I’m happy to take a look as long as I know them. That’s one of the problems: I have never heard about pg_analytics or pgml, because they’re not on PGXN. Or maybe they are. This is something that we should improve in the next few months.
- Jimmy Angelakos in chat: Go is a pretty terrible ecosystem. It has its own package manager in the language, so it’s kinda incompatible with distro packages
- Jason Petersen in chat: (but that means a build is safe within a single release, it doesn’t mean you can move a built extension from one Fedora version to another, right?)
- David Wheeler (he/him) in chat, replying to “How to handle system…”: Libc is stable in each major version of the OS, and there are separate RPMs for each.
Another question from Steven: how to handle system dependencies like libc version updates. The answer is no. It’s mostly because they don’t update the libc major version across the across across the lifetime of the of the release. So we don’t need to rebuild the extension against libc.
- Steven Miller in chat, replying to “How to handle system…”: Ok I see, TY
- Jason Petersen in chat, replying to “How to handle system…”: Is that how we deploy in trunk, though?
- David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Huh? You can build binaries in a sandbox and then you just need the binary in your package.
[Addressing Nevzat’s question]: That’s a great question. It’s up to you! It’s no different than installing Postges or any other thing. I just build RPMs. If you’re reading the hackers mailing list nowadays, people rely on me an Christoph and others, so that we don’t inject any code into the RPMs or Debian packages. You just need to trust us not to add extra code to the packages. But if there’s a feature problem or any bug then you should complain upstream, not to us. so you should just test.
- Jimmy Angelakos in chat, replying to “Go is a pretty terri…”: Find me one person that does this.
- Steven Miller in chat, replying to “How to handle system…”: We don’t have OS version as one of the dimensions of version packaging but should
[Addressing vrmiguel’s comment]: Yeah, that could be done but like I don’t like complex things, that’s why I’m an RPM packager.
- Jason Petersen in chat, replying to “Go is a pretty terri…”: (doesn’t go statically link binaries, or did they drop that philosophy?)
- vrmiguel in chat: I think citus has changed it to citus_columnar
- David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Hugo: https://github.com/hugomods/docker/tree/main/docker/hugo
- David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Jason: Static except for libc, yes
Another question from Ruohang: uh how to handle RPM and extension name conflicts. I think Citus came first, so you should complain to Hydra and ask them to change the name. They shouldn’t be identical. We have something similar with Pgpool: they they are conflicting with the PCP Library ,which has been in the Linux for the last 25 years. I think Pgpool has to change their name.
- Jeremy S in chat, replying to “I still update the c…”: So you think people will run the “alter extension upgrade” eh?
[Addressing Keith Fiske’s comment]: I’m not saying I don’t agree with you, but it means every time I have to update my extension version in my running database — it’s some extra work but that’s okay. It’s the user problem, not my problem.
Question from Jason [on moving an extension from one Fedora to another]: Right, it may not be safe because the GCC version may be different and other stuff may be different. One distro to another is not safe, Jason; sorry about that.
[Back to Steven’s question]: Yes, David’s answer is right.
[Addressing vrmiguel’s comment about citus_columnar]: You are right.
Jimmy I’m not going to read your comment about go because I don’t think think you can swear enough here.
- vrmiguel in chat, replying to “Go is a pretty terri…”: Are there known Postgres extensions written in Go? Not sure how Go is relevant here
- Jason Petersen in chat: you said “gcc” and not “libc” there, are you implying that things like compiler versions and flags need to be identical between postgres and extensions
- Keith Fiske in chat, replying to “I still update the c…”: I think they should …
- David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Were some experiments a few years ago. https://pkg.go.dev/github.com/microo8/plgo
Let me continue now. First you have to know the extension exists, and then the you also need to know that the extension has an update. Unfortunately the same problem: the extension exists or has an update and they just don’t let us know.
- Jimmy Angelakos in chat, replying to “Go is a pretty terri…”: @vrmiguel now you know why :D
This is a big challenge Fedora has in house solution.When you add a new package to Fedora, I think they crawl their repo once a day and if there’s new release they create a ticket in their bug tracker automatically, so that the maintainer knows there’s a new version. This can be done, but would need a volunteer to do it. Orr maybe the easiest thing is just add everything to the to PGXN,
When we update an extension we, have to make sure it doesn’t break anything. It requires some testing. As I said earlier, building is one thing, maintaining the extension is a bigger thing. If you want to raise a baby, you are responsible until until the end of your life. Consider this like your baby: either just let us know if you can’t maintain an extension anymore or please respond to the tickets that I open.
- Steven Miller in chat: One other detail about compatibility dimensions. We have noticed some extensions can be complied with chip-specific instructions like AVX512, for example vector does this which optimizes performance in some cases
- Alvaro Hernandez in chat, replying to “you said “gcc” and n…”: I’d reverse the question: do we have strong guarantees that there are no risks if versions and/or flags may differ?
  
  I believe extensions are already risky in several ways, and we should diminish any other risks, like packaging ones.
  
  So I’d say absolutely yes, compile extensions and Postgres in exactly the same way, versions and environments.
Sometimes a new minor version of an extension breaks a previous Postgres release. For example, an extension drops support for Postgres 12 even though Postgres 12 is still supported. Or they didn’t do the upgrade path work. I have to make sure everything is safe.
- nils in chat, rReplying to “I think citus has ch…”: It was never changed, the extension has always either been embedded in Citus or later moved to a separate extension called citus_columner.
  
  I think the name conflict comes from the access method being called columnar, which Citus claimed first. (Hydra’s started actually as a fork from Citus’ codebase).
  
  (disclaimer; I work on Citus and its ecosystem)
- Jason Petersen in chat, replying to “I think citus has ch…”: hi nils
Next month a new beta comes out. Everyone is happy, let’s start testing new features. For the packagers that means it’s time to start building extensions against beta-1. So a build might fail, we fix it, and then it may fail against beta-2. I understand if extension authors may want to wait until rc-1. That’s acceptable as long as they let us know. Many of them fail, and then Christoph and I create tickets against them and display them on wiki.postgresql.org. It’s a Hall of Shame!
- Eric in chat: When you decide to package a new extension do you coordinate with upstream to make that decision?
- David Wheeler (he/him) in chat, replying to “When you decide to p…”: I learned I had extensions in the yum repo only after the fact
- Eric in chat, replying to “When you decide to p…”: I see
- vrmiguel in chat: @Devrim Gündüz I’m curious about how RPM deals with extensions that depend on other Pg extensions
- David Wheeler (he/him) in chat: You can test Postgres 17 in the pgxn-tools docker image today. Example: https://github.com/theory/pgtap/actions/runs/8502825052
This list pisses off the extension authors because they don’t respond to ticket. So what do we do next? It happens again and again and again, because they just don’t respond to us. On Monday uh I got a response from an extension maintainer. He said “you are talking like you are my boss!” I said, “I’m talking like I’m your user, I’m sorry. I just asked for a very specific thing.”
- nils in chat: I love Devrim’s issues against our repo’s! They are like clockwork, every year 😄
- David Wheeler (he/him) in chat, replying to “You can test Postgre…”: It relies on the community apt repo
- Eric in chat, replying to “When you decide to p…”: Related: ever had upstream request you stop packaging an extension?
- Steven Miller* in chat, replying to “One other detail abo…”: Even if compiled inside a container, on a specific chip it can get chip-specific instructions inside the binary. For example building vector on linux/amd64 inside docker on a chip with AVX512, the container image will not work on another linux/amd64 system that does not have AVX512
- David Wheeler (he/him) in chat: :boss:
Unresponsive maintainers are a challenge: they don’t respond to tickets, or emails, or they don’t update the extensions for recent Postgres versions.
Don’t get me wrong even the big companies also do this, or they don’t update the extensions for the new GCC versions. I don’t expect them to test everything against all all the GCC versions; that’s that’s my problem. But just respond please.
What’s the responsibility of the packager in this case? Should we fork if they don’t respond at all? No we are not forking it! VBut going to conferences helps, because if the extension author is there I can talk to them in person in a quiet place, in a good way, just “please update the package tomorrow or you’re going to die”. Of course not this but you see what I mean.
[Looking at chat]: I’m going to skip any word about containers; sorry about that.
[Addressing Eric’s question]: That’s a good so so the question! No, actually they support us a lot, because that’s the way that people use their extensions. And do we coordinate with upstream? No, I coordinate with myself and try to build it. Of course upstream just can just create a ticket, send me email, or find me at a conference. They can say, “hey, we have an extension, could you package an RPM?” Sure, why not." I don’t coordinate with Upstream as long as uh there is no problem with the builds.
- Eric in chat, replying to “When you decide to p…”: So you haven’t run into a situation where upstream did not want/appreciate you packaging for them?
[Respondinding to nils’s comment]: Thank you, thanks for responding!
[Responding to vrmiguel’s question about depending on other extensions]: We actually add dependency to that one. That’s bit of uh work, like PG rotting depends on PostGIS. In order to provide a seamless installation the PostGIS package, in the PostGIS spec file, I add an extra line that says it provides PostGiS without the version as part of the name. Then when we install pg rotting, it looks for any PostGIS package — which is fine because it can run against any PostGIS version. So I add the dependency to other extensions if we need them.
- David G. Johnston in chat: The tooling ideally would report, say to PGXN or whatever the directory location for the initial application is, the issues and remind them that if the build system cannot build their extension it will not be included in the final RPM. You are an unpaid service provider for them and if they don’t meet their obligations their don’t get the benefit of the service.
[Responding to Eric’s upstream follow-up question]: I haven’t seen anything in any upstream where a person didn’t want me to package. But I haven’t seen many appreciations, either; I mean they don’t appreciate you. I’m being appreciated by EDB — money, money, money, must be funny — thanks EDB! But I haven’t had any rejections so far. Good question!
- Eric in chat, replying to “When you decide to p…”: Fair. Cool. Thanks
Relying on external repos is a big problem for SUSE. Some of the maintainers just discontinue their repo. One problem with SUSE is they don’t have an EPEL-like repo. EPEL is a great thing. The barrier to add a package to EPEL is not low but not high, either. If you if you’re an advanced packager you can add a package quick enough. Of course it requires review from others. But this a big problem for SUSE.
Lack of maintenance is a problem. We have a repo but they don’t update it; so I have to go find another repo from build.opensuse.org, change it, update the website, change our build instance, etc. That’s a big problem.
- David Wheeler (he/him) in chat, replying to “The tooling ideally …”: I want to add build and test success/fail matrices to extension pages on PGXN
- Florents Tselai in chat: How do you handle Pl/Python-based extensions + pip dependencies? Especially with virtualenv-based installations. i.e. Pl/Python usually relies on a /usr/bin/python3, but people shouldn’t install dependencies there.
And then there’s costs! What’s the cost of RPMifying an extension? Hosting a build server? We have a very beefy bare metal build server hosted by Enterprise DB, just because I’m working for them and they have a spare machine. Hosting a build server is a cost.
I have to use some external resources for architecture reasons, like some of our build instances, like PPC 64 ,is hosted somewhere else. There are some admin tasks to keep everything and running, like EDB’s IT team actually helped me to fix an issue today in both of our PPC instances.
- Jason Petersen in chat, replying to “How do you handle Pl…”: I think early on he said he makes the extensions rely on RPMs that provide those Python dependencies
- David Wheeler (he/him) in chat, replying to “How do you handle Pl…”: I have used this pattern for RPMifying Perl tools
Then, maintaining build instances requires keeping them up-to-date, and also that each update doesn’t break anything. It’s not like “dnf update and build a package”. No. It may be a problem with Fedora because Fedora may can update anything any time they want. But it’s a less problem for SUSE and RedHat, but we have to take care that the updates don’t break anything.
Redhat, the company, actually follows our release schedule. We release every three months. Unless something bad happens, we know the next release is in May, on a Thursday. So every Wednesday, one day before our minor release, RedHat releases their new maintenance releases. RedHat is going to release 9.4 on Wednesday before our minor release. What does that mean for us as an RPM packager for RedHat?
*RedHat releases a new version with a new LLVM, for example, and then it means we have to rebuild the packages against the new LLVM so that people can use it. That means I have to work until Thursday morning to build the packages. That’s fine but another problem is for Rocky and Alma Linux users, because they’re are not going to have the updated LLVM package, or any any updated package, like GCC. It’s not like the old RedHat days; they change everything uh in minor versions.
So I have to rebuild GCC and LLVM on our instances, add them to our special repo “sysupdates”, which is in the config file, and this takes many hours because building GCC and LLVM is a big thing.
In the last two years I have not been able to build the from GCC Source RPM. I had to edit everything and not edit the spec files blah blah to be able to build it. I have no idea how how they can break in Source RPM.
So that’s another cost: in May I’m going to spend lots of cycles to keep up with the latest RedHat release, and also make the make the Rocky Linux and Alma Linux users happier. Maintaining build systems is not as easy as running Yup or Zypper update. It requires employing the packager — because I have the bills pay I have the beers to drink.
[Addressing Florents’s PL/Python question]: I don’t know what the PL/Python based extensions are, but I tried to get rid of everything related to pip. I’m not a developer, a DBA isn’t a developer, a Sysadmin isn’t a developer. They’re not suposed to use pip; they are supposed to use the package manager to keep up with everything. My point is if someone needs pip then I should fix it. That’s what I did for Patroni. I added lots of packages to our Git repo just to be able to support Patroni.
- Ian Stanton in chat: Need to drop, thank you Devrim!
- Jeremy S in chat, replying to “How do you handle Pl…”: A lot of larger companies have inventory management and risk control processes that heavily leverage package management
- Alvaro Hernandez in chat: Need to go, ttyl!
- vrmiguel in chat, replying to “you said “gcc” and n…”: Do you think there are no guarantees at all? For instance, Postgres loads up the extension with dlopen, which could fail with version mismatch. If that doesn’t occur and the extension loads ‘fine’, how likely do you think an issue could be?
  
  Also I’m curious how often you’ve seen problems arise from libc itself (rather than any of the many things that could cause UB in a C program) and how these problems have manifested
- Ahmet Melih Başbuğ in chat: Thank you

Conclusion

I thanked Devrim and all the discussion, and pitched the next mini-summit, where I think Jonathan Katz will talk about the TLE vision and specifics.

Thank you all for coming!