Skip to content

Add file_fdw support for external decompressors #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Add file_fdw support for external decompressors #4

wants to merge 9 commits into from

Conversation

jasonmp85
Copy link

This change augments the file foreign data wrapper by adding a new
decompressor option, whose value must be the path to an executable
which will be used to read filename. The contract is that the binary
will receive the filename as its sole argument and must decode it to
standard out.

If no decompressor is specified, file_fdw behaves as before.

Parse the new option and validate the the file it references actually
exists and is executable.
Certain callers will need this, so provide it if found.
I was planning to concatenate the program name and file name before
each `BeginCopyFrom` invocation, but it seems better to do it in the
function that parses options. It's not being done yet but this sets up
all the callers to expect it.
This requires escaping the filename. I went with wrapping it in single
quotes and replacing single quotes with "'\''" whenever they occur.
This may not be entirely appropriate for Windows installs, but this is
a good-enough solution for now.

See: http://stackoverflow.com/a/3669819
Found some issues here and there.
Turns out it's unsafe to modify a list while iterating over it, since
the delete method actually frees the node (and possibly the list, too!)
rather than just updating the next/prev pointers.
The compression guess is really only used for finding out the foreign
relation size if no `ANALYZE` has yet been performed.
Duplicates the agg.csv-based tests, but using a decompressor. Includes
a Perl-based decompressor since the codebase already depends on Perl
and I didn't want to hardcode a path to the gunzip executable.
It's OK to have single quotes in filenames.
@jasonmp85
Copy link
Author

Oops. Heh. Meant to put this in a personal fork.

@jasonmp85 jasonmp85 closed this Dec 31, 2013
@guedes
Copy link

guedes commented Jan 2, 2014

@jasonmp85 BTW, if you want to submit this you should see http://wiki.postgresql.org/wiki/Submitting_a_Patch before, since this repo is just a mirror - we don't work with pull requests on github. :)

@jasonmp85
Copy link
Author

Nah, it was for an internal exercise. Not intended for a patch.

Kind of annoying that you can disable issues and wikis for an org but not pull request. This was just a misfired hub command on my part, and there's no way to delete a PR (only a way to close it).

@guedes
Copy link

guedes commented Jan 2, 2014

Ah, ok. :)

rafatower referenced this pull request in CartoDB/postgres Nov 23, 2016
When a subtransaction is aborted in plpython because of an SPI
exception, it tries to find a matching python exception in a hash
`PLy_spi_exceptions` and to make python vm raise it.

That hash is generated during module initialization, but the exception
objects are not marked to prevent the garbage collector from collecting
them, which can lead to a segmentation fault when processing any SPI
exception.

PoC to reproduce the issue:

```sql
CREATE OR REPLACE FUNCTION server_crashes()
RETURNS VOID
AS $$
    import gc
    gc.collect()
    plan = plpy.prepare('SELECT raises_an_spi_exception();', [])
    plpy.execute(plan)
$$ LANGUAGE plpythonu;

CREATE OR REPLACE FUNCTION raises_an_spi_exception()
RETURNS VOID
AS $$
DECLARE
  sql TEXT;
BEGIN
  sql = format('%I', NULL); -- NullValueNotAllowed
END
$$ LANGUAGE plpgsql;

SELECT server_crashes(); -- segfault here
```

Stacktrace of the problem (using PostgreSQL `REL9_5_STABLE` and python
`2.7.3-0ubuntu3.8` on a Ubuntu 12.04):

```
 Program received signal SIGSEGV, Segmentation fault.
 0x00007f3155c7670b in PyObject_Call (func=0x7f31b7db2a30, arg=0x7f31b87d17d0, kw=0x0) at ../Objects/abstract.c:2525
 2525    ../Objects/abstract.c: No such file or directory.
 (gdb) bt
 #0  0x00007f3155c7670b in PyObject_Call (func=0x7f31b7db2a30, arg=0x7f31b87d17d0, kw=0x0) at ../Objects/abstract.c:2525
 #1  0x00007f3155d81ab1 in PyEval_CallObjectWithKeywords (func=0x7f31b7db2a30, arg=0x7f31b87d17d0, kw=0x0) at ../Python/ceval.c:3890
 #2  0x00007f3155c766ed in PyObject_CallObject (o=0x7f31b7db2a30, a=0x7f31b87d17d0) at ../Objects/abstract.c:2517
 #3  0x00007f31561e112b in PLy_spi_exception_set (edata=0x7f31b8743d78, excclass=0x7f31b7db2a30) at plpy_spi.c:547
 #4  PLy_spi_subtransaction_abort (oldcontext=<optimized out>, oldowner=<optimized out>) at plpy_spi.c:527
 #5  0x00007f31561e2185 in PLy_spi_execute_plan (ob=0x7f31b87d0cd8, list=0x7f31b7c530d8, limit=0) at plpy_spi.c:307
 #6  0x00007f31561e22d4 in PLy_spi_execute (self=<optimized out>, args=0x7f31b87a6d08) at plpy_spi.c:180
 #7  0x00007f3155cda4d6 in PyCFunction_Call (func=0x7f31b7d29600, arg=0x7f31b87a6d08, kw=0x0) at ../Objects/methodobject.c:81
 #8  0x00007f3155d82383 in call_function (pp_stack=0x7fff9207e710, oparg=2) at ../Python/ceval.c:4021
 #9  0x00007f3155d7cda4 in PyEval_EvalFrameEx (f=0x7f31b8805be0, throwflag=0) at ../Python/ceval.c:2666
 #10 0x00007f3155d82898 in fast_function (func=0x7f31b88b5ed0, pp_stack=0x7fff9207ea70, n=0, na=0, nk=0) at ../Python/ceval.c:4107
 #11 0x00007f3155d82584 in call_function (pp_stack=0x7fff9207ea70, oparg=0) at ../Python/ceval.c:4042
 #12 0x00007f3155d7cda4 in PyEval_EvalFrameEx (f=0x7f31b8805a00, throwflag=0) at ../Python/ceval.c:2666
 #13 0x00007f3155d7f8a9 in PyEval_EvalCodeEx (co=0x7f31b88aa460, globals=0x7f31b8727ea0, locals=0x7f31b8727ea0, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at ../Python/ceval.c:3253
 #14 0x00007f3155d74ff4 in PyEval_EvalCode (co=0x7f31b88aa460, globals=0x7f31b8727ea0, locals=0x7f31b8727ea0) at ../Python/ceval.c:667
 #15 0x00007f31561dc476 in PLy_procedure_call (kargs=kargs@entry=0x7f31561e5690 "args", vargs=<optimized out>, proc=0x7f31b873b2d0, proc=0x7f31b873b2d0) at plpy_exec.c:801
 #16 0x00007f31561dd9c6 in PLy_exec_function (fcinfo=fcinfo@entry=0x7f31b7c1f870, proc=0x7f31b873b2d0) at plpy_exec.c:61#17 0x00007f31561de9f9 in plpython_call_handler (fcinfo=0x7f31b7c1f870) at plpy_main.c:291
```
postgres-mirror pushed a commit that referenced this pull request Mar 19, 2018
refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:

1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query.  (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.

The way to fix #4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause.  I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.

While #1, #3, and #5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching.  That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.

Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us
postgres-mirror pushed a commit that referenced this pull request Mar 19, 2018
refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:

1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query.  (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.

The way to fix #4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause.  I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.

While #1, #3, and #5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching.  That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.

Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us
postgres-mirror pushed a commit that referenced this pull request Mar 19, 2018
refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:

1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query.  (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.

The way to fix #4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause.  I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.

While #1, #3, and #5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching.  That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.

Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us
postgres-mirror pushed a commit that referenced this pull request Mar 19, 2018
refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:

1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query.  (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.

The way to fix #4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause.  I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.

While #1, #3, and #5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching.  That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.

Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us
postgres-mirror pushed a commit that referenced this pull request Mar 19, 2018
refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:

1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query.  (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.

The way to fix #4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause.  I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.

While #1, #3, and #5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching.  That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.

Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us
roman0yurin pushed a commit to roman0yurin/postgres that referenced this pull request Mar 27, 2018
roman0yurin pushed a commit to roman0yurin/postgres that referenced this pull request Mar 27, 2018
roman0yurin pushed a commit to roman0yurin/postgres that referenced this pull request Mar 27, 2018
@repo-lockdown
Copy link

repo-lockdown bot commented Jun 17, 2019

Thanks for your Pull Request! 😄 This repo on GitHub is just a mirror of our real git repositories though, and can't really handle PRs. 😦 Hopefully you can redo the PR, and direct it to the git.postgresql.org repos? We have a developer guide, if that helps: https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer%3F. If this was a PR for pgAdmin, please visit https://www.pgadmin.org/docs/pgadmin4/dev/submitting_patches.html.

@repo-lockdown repo-lockdown bot locked and limited conversation to collaborators Jun 17, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants