Add file_fdw support for external decompressors #4

jasonmp85 · 2013-12-31T20:40:52Z

This change augments the file foreign data wrapper by adding a new
decompressor option, whose value must be the path to an executable
which will be used to read filename. The contract is that the binary
will receive the filename as its sole argument and must decode it to
standard out.

If no decompressor is specified, file_fdw behaves as before.

Parse the new option and validate the the file it references actually exists and is executable.

Certain callers will need this, so provide it if found.

I was planning to concatenate the program name and file name before each `BeginCopyFrom` invocation, but it seems better to do it in the function that parses options. It's not being done yet but this sets up all the callers to expect it.

This requires escaping the filename. I went with wrapping it in single quotes and replacing single quotes with "'\''" whenever they occur. This may not be entirely appropriate for Windows installs, but this is a good-enough solution for now. See: http://stackoverflow.com/a/3669819

Found some issues here and there.

Turns out it's unsafe to modify a list while iterating over it, since the delete method actually frees the node (and possibly the list, too!) rather than just updating the next/prev pointers.

The compression guess is really only used for finding out the foreign relation size if no `ANALYZE` has yet been performed.

Duplicates the agg.csv-based tests, but using a decompressor. Includes a Perl-based decompressor since the codebase already depends on Perl and I didn't want to hardcode a path to the gunzip executable.

It's OK to have single quotes in filenames.

jasonmp85 · 2013-12-31T20:41:12Z

Oops. Heh. Meant to put this in a personal fork.

guedes · 2014-01-02T22:32:14Z

@jasonmp85 BTW, if you want to submit this you should see http://wiki.postgresql.org/wiki/Submitting_a_Patch before, since this repo is just a mirror - we don't work with pull requests on github. :)

jasonmp85 · 2014-01-02T22:35:11Z

Nah, it was for an internal exercise. Not intended for a patch.

Kind of annoying that you can disable issues and wikis for an org but not pull request. This was just a misfired hub command on my part, and there's no way to delete a PR (only a way to close it).

guedes · 2014-01-02T22:58:47Z

Ah, ok. :)

When a subtransaction is aborted in plpython because of an SPI exception, it tries to find a matching python exception in a hash `PLy_spi_exceptions` and to make python vm raise it. That hash is generated during module initialization, but the exception objects are not marked to prevent the garbage collector from collecting them, which can lead to a segmentation fault when processing any SPI exception. PoC to reproduce the issue: ```sql CREATE OR REPLACE FUNCTION server_crashes() RETURNS VOID AS $$ import gc gc.collect() plan = plpy.prepare('SELECT raises_an_spi_exception();', []) plpy.execute(plan) $$ LANGUAGE plpythonu; CREATE OR REPLACE FUNCTION raises_an_spi_exception() RETURNS VOID AS $$ DECLARE sql TEXT; BEGIN sql = format('%I', NULL); -- NullValueNotAllowed END $$ LANGUAGE plpgsql; SELECT server_crashes(); -- segfault here ``` Stacktrace of the problem (using PostgreSQL `REL9_5_STABLE` and python `2.7.3-0ubuntu3.8` on a Ubuntu 12.04): ``` Program received signal SIGSEGV, Segmentation fault. 0x00007f3155c7670b in PyObject_Call (func=0x7f31b7db2a30, arg=0x7f31b87d17d0, kw=0x0) at ../Objects/abstract.c:2525 2525 ../Objects/abstract.c: No such file or directory. (gdb) bt #0 0x00007f3155c7670b in PyObject_Call (func=0x7f31b7db2a30, arg=0x7f31b87d17d0, kw=0x0) at ../Objects/abstract.c:2525 #1 0x00007f3155d81ab1 in PyEval_CallObjectWithKeywords (func=0x7f31b7db2a30, arg=0x7f31b87d17d0, kw=0x0) at ../Python/ceval.c:3890 #2 0x00007f3155c766ed in PyObject_CallObject (o=0x7f31b7db2a30, a=0x7f31b87d17d0) at ../Objects/abstract.c:2517 #3 0x00007f31561e112b in PLy_spi_exception_set (edata=0x7f31b8743d78, excclass=0x7f31b7db2a30) at plpy_spi.c:547 #4 PLy_spi_subtransaction_abort (oldcontext=<optimized out>, oldowner=<optimized out>) at plpy_spi.c:527 #5 0x00007f31561e2185 in PLy_spi_execute_plan (ob=0x7f31b87d0cd8, list=0x7f31b7c530d8, limit=0) at plpy_spi.c:307 #6 0x00007f31561e22d4 in PLy_spi_execute (self=<optimized out>, args=0x7f31b87a6d08) at plpy_spi.c:180 #7 0x00007f3155cda4d6 in PyCFunction_Call (func=0x7f31b7d29600, arg=0x7f31b87a6d08, kw=0x0) at ../Objects/methodobject.c:81 #8 0x00007f3155d82383 in call_function (pp_stack=0x7fff9207e710, oparg=2) at ../Python/ceval.c:4021 #9 0x00007f3155d7cda4 in PyEval_EvalFrameEx (f=0x7f31b8805be0, throwflag=0) at ../Python/ceval.c:2666 #10 0x00007f3155d82898 in fast_function (func=0x7f31b88b5ed0, pp_stack=0x7fff9207ea70, n=0, na=0, nk=0) at ../Python/ceval.c:4107 #11 0x00007f3155d82584 in call_function (pp_stack=0x7fff9207ea70, oparg=0) at ../Python/ceval.c:4042 #12 0x00007f3155d7cda4 in PyEval_EvalFrameEx (f=0x7f31b8805a00, throwflag=0) at ../Python/ceval.c:2666 #13 0x00007f3155d7f8a9 in PyEval_EvalCodeEx (co=0x7f31b88aa460, globals=0x7f31b8727ea0, locals=0x7f31b8727ea0, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:3253 #14 0x00007f3155d74ff4 in PyEval_EvalCode (co=0x7f31b88aa460, globals=0x7f31b8727ea0, locals=0x7f31b8727ea0) at ../Python/ceval.c:667 #15 0x00007f31561dc476 in PLy_procedure_call (kargs=kargs@entry=0x7f31561e5690 "args", vargs=<optimized out>, proc=0x7f31b873b2d0, proc=0x7f31b873b2d0) at plpy_exec.c:801 #16 0x00007f31561dd9c6 in PLy_exec_function (fcinfo=fcinfo@entry=0x7f31b7c1f870, proc=0x7f31b873b2d0) at plpy_exec.c:61#17 0x00007f31561de9f9 in plpython_call_handler (fcinfo=0x7f31b7c1f870) at plpy_main.c:291 ```

refresh_by_match_merge() has some issues in the way it builds a SQL query to construct the "diff" table: 1. It doesn't require the selected unique index(es) to be indimmediate. 2. It doesn't pay attention to the particular equality semantics enforced by a given index, but just assumes that they must be those of the column datatype's default btree opclass. 3. It doesn't check that the indexes are btrees. 4. It's insufficiently careful to ensure that the parser will pick the intended operator when parsing the query. (This would have been a security bug before CVE-2018-1058.) 5. It's not careful about indexes on system columns. The way to fix #4 is to make use of the existing code in ri_triggers.c for generating an arbitrary binary operator clause. I chose to move that to ruleutils.c, since that seems a more reasonable place to be exporting such functionality from than ri_triggers.c. While #1, #3, and #5 are just latent given existing feature restrictions, and #2 doesn't arise in the core system for lack of alternate opclasses with different equality behaviors, #4 seems like an issue worth back-patching. That's the bulk of the change anyway, so just back-patch the whole thing to 9.4 where this code was introduced. Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us

repo-lockdown · 2019-06-17T13:24:29Z

Thanks for your Pull Request! 😄 This repo on GitHub is just a mirror of our real git repositories though, and can't really handle PRs. 😦 Hopefully you can redo the PR, and direct it to the git.postgresql.org repos? We have a developer guide, if that helps: https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F. If this was a PR for pgAdmin, please visit https://www.pgadmin.org/docs/pgadmin4/dev/submitting_patches.html.

jasonmp85 added 9 commits December 30, 2013 14:06

Add "decompressor" option to file_fdw

541e463

Parse the new option and validate the the file it references actually exists and is executable.

Populate decompressor in fileGetOptions

5a06bef

Certain callers will need this, so provide it if found.

Bugfixes for decompressor file_fdw option

a5102c9

Found some issues here and there.

Fix population of program, null list issue

51fa91a

Turns out it's unsafe to modify a list while iterating over it, since the delete method actually frees the node (and possibly the list, too!) rather than just updating the next/prev pointers.

Add rudimentary EXPLAIN support for programs

0403974

The compression guess is really only used for finding out the foreign relation size if no `ANALYZE` has yet been performed.

Add decompressor tests to file_fdw suite

32ab3e2

Duplicates the agg.csv-based tests, but using a decompressor. Includes a Perl-based decompressor since the codebase already depends on Perl and I didn't want to hardcode a path to the gunzip executable.

Add filename escaping test for file_fdw

b4f5cee

It's OK to have single quotes in filenames.

jasonmp85 closed this Dec 31, 2013

roman0yurin pushed a commit to roman0yurin/postgres that referenced this pull request Mar 27, 2018

Many many define fixes. issues postgres#4 and postgres#5 fixes too

9033e24

roman0yurin pushed a commit to roman0yurin/postgres that referenced this pull request Mar 27, 2018

Many many define fixes. issues postgres#4 and postgres#5 fixes too

492247e

roman0yurin pushed a commit to roman0yurin/postgres that referenced this pull request Mar 27, 2018

Many many define fixes. issues postgres#4 and postgres#5 fixes too

78a5136

repo-lockdown bot locked and limited conversation to collaborators Jun 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add file_fdw support for external decompressors #4

Add file_fdw support for external decompressors #4

Uh oh!

jasonmp85 commented Dec 31, 2013

Uh oh!

jasonmp85 commented Dec 31, 2013

Uh oh!

guedes commented Jan 2, 2014

Uh oh!

jasonmp85 commented Jan 2, 2014

Uh oh!

guedes commented Jan 2, 2014

Uh oh!

repo-lockdown bot commented Jun 17, 2019

Uh oh!

Uh oh!

Add file_fdw support for external decompressors #4

Add file_fdw support for external decompressors #4

Uh oh!

Conversation

jasonmp85 commented Dec 31, 2013

Uh oh!

jasonmp85 commented Dec 31, 2013

Uh oh!

guedes commented Jan 2, 2014

Uh oh!

jasonmp85 commented Jan 2, 2014

Uh oh!

guedes commented Jan 2, 2014

Uh oh!

repo-lockdown bot commented Jun 17, 2019

Uh oh!

Uh oh!