Skip to content

Python: Add SqlAlchemy model #5680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 22, 2021
Merged

Conversation

mrthankyou
Copy link
Contributor

@mrthankyou mrthankyou commented Apr 15, 2021

This PR is meant to add a module for dealing with SqlAlchemy in Python CodeQL queries. For context, this will be used in an upcoming CodeQL security query so I decided to go ahead and submit this PR to get the ball rolling.

@mrthankyou mrthankyou changed the title Python: Add SqlAlchemy module Python: Add SqlAlchemy model Apr 15, 2021
@mrthankyou mrthankyou marked this pull request as ready for review April 15, 2021 19:24
@mrthankyou mrthankyou requested a review from a team as a code owner April 15, 2021 19:24
this = API::moduleImport("sqlalchemy").getMember("create_engine").getReturn()
}

override string toString() { result = "Use of SqlAlchemy create_engine member" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% the best wording practices here so if this needs to be changed let me know!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually we don't build classes for this, but instead use a predicate, such as

/** Gets an instance of ... */
getSqlAlchemyEngineInstance() {
  result = API::moduleImport("sqlalchemy").getMember("create_engine").getReturn()
}

and I think doing so in this case would also make things a bit more clean.

@mrthankyou
Copy link
Contributor Author

Small note: I have another model I'm currently building and is related to the security query that will be using the model in this PR. I've decided to wait for this to be merged in before I'll submit the other query. This way, if there are any glaring issues in this PR, I can apply them to upcoming PR as well.

- Replaced classes that look for SqlAlchemy instances with predicates
- General clean-up of code
@mrthankyou
Copy link
Contributor Author

mrthankyou commented Apr 19, 2021

This is ready for re-review.

PS: Are there additional requirements such as tests for supplementary code like this to a query? I know tests matter for the actual queries and I thought I saw tests written for other models but I haven't seen any guidance on models.

@mrthankyou
Copy link
Contributor Author

Friendly ping about this PR 😄

@RasmusWL
Copy link
Member

Are there additional requirements such as tests for supplementary code like this to a query? I know tests matter for the actual queries and I thought I saw tests written for other models but I haven't seen any guidance on models.

Missed this response, but yes, please add tests demonstrating behavior that is covered by the library modeling. Please also include test examples for any interesting behavior that you can do with the library that you currently don't model (that is, if there is some SQL executions that you know can happen with the SQL Alchemy library, that it just hasn't been possible to model yet).

The way I would want you to do this is by creating a new folder python/ql/test/experimental/library-tests/frameworks/sqlalchemy/, and adding a copy of our standadized ConceptsTest.ql file. Then add your examples as .py files, and add annotations inside the Python file -- this uses a special comment syntax, so everything after a $ in a comment means it's an inline annotation.

For an example, you can see the tests we did for SQL execution in django: https://github.com/github/codeql/blob/c8a6e837b57513df5c8ccd081e193be3443987d1/python/ql/test/library-tests/frameworks/django/SqlExecution.py

and this is the ConceptsTest.ql file that you jsut need to have a copy of: https://github.com/github/codeql/blob/c8a6e837b57513df5c8ccd081e193be3443987d1/python/ql/test/library-tests/frameworks/django/ConceptsTest.ql

@RasmusWL
Copy link
Member

I think there might be a misconception about the SqlExecution concept. We want to capture calls that execute SQL provided in textual form, and not just any call that might execute some SQL under the hood. I think the ORM layers sits on a knife edge on this, but we're only interested in the calls where you can provide actual SQL (as a string) to be executed, and not so much in calls that build uses the ORM to interact with a database.

One of the examples I found was (rewritten slightly)

with engine.connect() as conn:
    t = text("select * from table")
    result = conn.execution_options(stream_results=True).execute(t)

As I see it, the current modeling would capture that the t argument could represent SQL (correct). As I see it, there is currently no taint-step from the argument to text and the resulting object, so such a use-case would never be able to be found by our SQL injection query 😞

I'm not so sure about the SqlAlchemyQueryCall, are there any methods on sqlalchemy.orm.Query that accepts SQL to be executed?

Looking forward to seeing your tests, probably they will clear most of this up 👍

@mrthankyou
Copy link
Contributor Author

As I see it, there is currently no taint-step from the argument to text and the resulting object, so such a use-case would never be able to be found by our SQL injection query 😞

I'm a little lost here so I'm going to ask some questions and hopefully this will get cleared up on my end:

When I eventually submit my SqlAlchemy query that relies on this PR, I will have accounted for this taint-step in the query I'm writing. However, as it stands the current SQL execution query can't account for this additional taint-step and will fail.

Is the expectation here that if this PR is merged in it should automatically work with the current queries?

I'm not so sure about the SqlAlchemyQueryCall, are there any methods on sqlalchemy.orm.Query that accepts SQL to be executed?

There are some cases where the taint-step text() is not required. I've attached a link here (see Vulnerable Sinks section) that documents the testing I did on SqlAlchemy.

After researching SqlAlchemy and it's various query methods, I discovered several types of SQL injection possibilities.

The SQLExecution.py file contains these examples and can be broken up into two types of injections. Injections requiring the text() taint-step and injections NOT requiring the text() taint step.
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've talked this over in private a bit. For the record, I've created a few small changes in mrthankyou#1, and with that merged, I think this PR is OK to go, and I'm happy to do the last bit of polishing to get this promoted out of experiemntal.

*/
private class SqlAlchemyExecuteCall extends DataFlow::CallCfgNode, SqlExecution::Range {
SqlAlchemyExecuteCall() {
// new way
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this and I have no idea what I was referencing. This model library points to SqlAlchemy 1.X so I don't think I'm referencing 2.0.

private import semmle.python.ApiGraphs
private import semmle.python.Concepts

private module SqlAlchemy {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any desire on your side Rasmus to version this model? SqlAlchemy came out with 2.0 and I don't know how compatible this model library is for 2.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.0 has still not been released it seems (https://pypi.org/project/SQLAlchemy/#history), so I think it's OK. Thanks for letting me know about this though 👍

RasmusWL
RasmusWL previously approved these changes Jun 30, 2021
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is ok to go now 👍

I assume the MISSING keyword represents a scenario where the test is skippable?

It means that the modeling was not able to find the result we know should be there. (so the expected result is MISSING, if the modeling provides a result where we know there shouldn't be any, we mark that as SPURIOUS)

So if you want to, feel free to fix up those instances (like filter_by and having).

private import semmle.python.ApiGraphs
private import semmle.python.Concepts

private module SqlAlchemy {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.0 has still not been released it seems (https://pypi.org/project/SQLAlchemy/#history), so I think it's OK. Thanks for letting me know about this though 👍

@mrthankyou
Copy link
Contributor Author

Hi,

I have submitted the changes but am still working on getting the tests updated. Unfortunately I'm encountering errors while attempting to codeql test run and am having to debug it.

Please note that I'm slammed right now with work and moving. I have no free time to work on this at least for the next week. If I have some free time today I will try to tackle it. I'm sorry for the delay on my end. This is unfortunately out of my control and will be tied down to at least next Wednesday.

@RasmusWL
Copy link
Member

All the tests do pass though, so I don't think there's anything more for you to do on the tests 😉

@mrthankyou
Copy link
Contributor Author

Yep, you're correct Rasmus. Testing results below:

codeql test run . --search-path /Users/hacker/Programming/hacking/research/codeql/my-codeql/python/ql

Executing 2 tests in 1 directories.
Extracting test database in /Users/hacker/Programming/hacking/research/codeql/my-codeql/python/ql/test/experimental/library-tests/frameworks/sqlalchemy.
Compiling queries in /Users/hacker/Programming/hacking/research/codeql/my-codeql/python/ql/test/experimental/library-tests/frameworks/sqlalchemy.
Executing tests in /Users/hacker/Programming/hacking/research/codeql/my-codeql/python/ql/test/experimental/library-tests/frameworks/sqlalchemy.
[1/2 eval 10.4s] PASSED /Users/hacker/Programming/hacking/research/codeql/my-codeql/python/ql/test/experimental/library-tests/frameworks/sqlalchemy/ConceptsTest.ql
[2/2 eval 1.4s] PASSED /Users/hacker/Programming/hacking/research/codeql/my-codeql/python/ql/test/experimental/library-tests/frameworks/sqlalchemy/InlineTaintTest.ql
All 2 tests passed.

@RasmusWL RasmusWL merged commit 802d9bd into github:main Jul 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants