Skip to content

Python: CWE-943 - Add NoSQL injection query #5612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 81 commits into from
Aug 24, 2021

Conversation

mrthankyou
Copy link
Contributor

@mrthankyou mrthankyou commented Apr 6, 2021

This PR introduces a new query that searches for NoSQL injections. This query specifically looks for vulnerabilities that utilize any one of these three major Python MongoDB libraries:

The most common scenario this query looks for is for a Python application to accept user input from a HTTP request, convert that HTTP request to a Python object, and then inject that object directly into a sink method provided by one of the three libraries above.

This query is also only currently testing for find related methods as well as MongoEngine's objects method.

PS: Please note that both @mrthankyou and @jorgectf wrote this query. If accepted as a bug bounty we would like to split the bounty 50/50. This will be re-iterated in the bug bounty submission but I thought it would be a good idea to mention here. Any review edits will be done by @mrthankyou.

@jorgectf
Copy link
Contributor

jorgectf commented Apr 6, 2021

🎉

@RasmusWL RasmusWL self-assigned this Apr 7, 2021
Although it is for `json.loads` and the like.
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a few minor fixes to this PR in jty-team#5, but overall it looks ok 👍 (so requesting changes until that is merged)

A few small things to consider for the future:

  • I would have merged all the tests files for mongoengine into one file, since they all contain most of the same setup code anyway, so it would be easier to just have all the different variants one after each other, instead of having them in separate files. All the _good.py variants are also a bit trivial... maybe just having one or two of these would be enough.
  • it would have been nice with QLDoc for all predicates/classes, even the ones that are private.

This PR is going through a bit of internal process, and if everything goes according to plan, it should be merged within the next few days +1

For our own future work on this query (promoting to standard set of queries), I still think parts of these comments about manually adding $eq to the NoSQL query to make it safe would be worthwhile to investigate a bit further. The part about $where also needs more investigation, I still didn't try out whether it would be possible to use it with a string containing a JS function body, such as db.users.find({'$where': 'this.name.first === "John"'}).

This PR also highlights some work that we need to do to make writing queries easier. Right now this query only looks at decoding from JSON, so in the unlikely scenario that the input is encoded as YAML, this query would not be able to catch it. If we provided an easy way to know when a remote user input had been turned into a dictionary, that would probably make this query a bit easier to write. (we did make an internal note about this, but also wanted to recognize it on this PR).


/** Gets a reference to a `Mongo` collection method. */
private DataFlow::Node mongoCollectionMethod() {
mongoCollection() in [result.(DataFlow::AttrRead), result.(DataFlow::AttrRead).getObject()] and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a continuation of #5612 (comment)

I still don't see why we would flag such method calls on databases and not just only on collections.

client = pymongo.MongoClient(...)
client.mydb.find(...)  # <-- I'm surprised if this one is allowed :O
client.mydb.mycollection.find(...)

So I think this code should be

Suggested change
mongoCollection() in [result.(DataFlow::AttrRead), result.(DataFlow::AttrRead).getObject()] and
mongoCollection() = result.(DataFlow::AttrRead).getObject() and

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure, but asked @mrthankyou and you are right, we don't think it's possible to execute a query without having previously specified a collection. Fixed in 68c6831#diff-ca1baaa83ce64bcb0c8f0dca86c8be11cef6a77a6bc422ee1584f1eadf17cc12R141

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just spotted 2 more minor things related to library modeling

@jorgectf
Copy link
Contributor

jorgectf commented Jun 28, 2021

Thank you so much for the review @RasmusWL 😊

I would have merged all the tests files for mongoengine into one file, since they all contain most of the same setup code anyway, so it would be easier to just have all the different variants one after each other, instead of having them in separate files.

Done in 0ca4f24. They are now divided by frameworks.

All the _good.py variants are also a bit trivial... maybe just having one or two of these would be enough.

I've deleted some redundant tests that, although they weren't the same, they would work if the rest do too. 3fd1129

it would have been nice with QLDoc for all predicates/classes, even the ones that are private.

Feel free to suggest any rephrasing needed 😃 68c6831

Would be great if this modeling of xmltodict could be moved to it's own file.

Done in 51395d1.

For our own future work on this query (promoting to standard set of queries), I still think parts of these comments about manually adding $eq to the NoSQL query to make it safe would be worthwhile to investigate a bit further. The part about $where also needs more investigation, I still didn't try out whether it would be possible to use it with a string containing a JS function body, such as db.users.find({'$where': 'this.name.first === "John"'}).

This is an interesting issue. For the first one, I guess using a Sanitizer or SanitizerGuard checking that the lhs (of a dict whose rhs is our dict-RFS) equals '$eq' would work, and regarding $where... perhaps we would need to create another configuration, I can't think of a way to do it using current's chained config.

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these updates 💪 I think maybe you were a bit too eager to delete tests though? 😄

I'm not in a rush for this one, so happy to get the last minor bits and pieces sorted out before we merge this 👍

Comment on lines 24 to 39
@app.route("/connection_connect_find")
def connection_connect_find():
unsafe_search = request.args['search']
json_search = json.loads(unsafe_search)

db = connect('mydb')
return db.movie.find({'name': json_search})

@app.route("/get_db_find")
def get_db_find():
unsafe_search = request.args['search']
json_search = json.loads(unsafe_search)

db = me.get_db()
return db.movie.find({'name': json_search})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you highlight why these tests are now trivial? As far as I can tell, none of the other tests highlight that mongoengine.connection.connect() and mongoengine.get_db() are supported. So as far as I can tell, these tests are still useful 😊

Copy link
Contributor

@jorgectf jorgectf Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! My approach was that connect('mydb') is the same function as me.connect('mydb') (although they come from a different class (mongoengine.connection/mongoengine)). If me.connect('mydb') works, connect('mydb') should also get flagged (if it is specified in the codeql query). And the same happens with get_db(). I believed you meant something about that regarding trivial tests, but I will put them back :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks 👍

My main point was that all the tests in the mongoengine_good.py file is just copies of the mongoengine_bad.py file, with a single sanitizer step added. So it would really be enough with a single test that this sanitizer works. (just imagine how many lines of code it would take if there were 15 different sanitizers, and we had to use it for all the different ways to do an exploit)

Copy link
Contributor

@jorgectf jorgectf Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I get what you mean. I will leave only 1 test in *_good.py then :)

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I see it, all of my required changes have been addressed 👍

@mrthankyou
Copy link
Contributor Author

Some of the CI checks are unsuccessful. Is this something I need to look into? Thanks!

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the CI checks are unsuccessful. Is this something I need to look into? Thanks!

Yes, please do. I think it is just the expected files that needs to be updated after the cleanup in e02a63a 😊 also noticed what I guess is a small typo in the test code

Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
@RasmusWL
Copy link
Member

Nice, I just think we need the .expected files to be updated then 😉

@mrthankyou
Copy link
Contributor Author

Is Python3 Language Tests (GitHub Actions) caused by out-of-date tests? I can take a look when a free moment appears in my schedule this week. Now that I can finally run tests on my local machine I will be able to handle this.

@RasmusWL
Copy link
Member

We've made some changes on how to do type-tracking very recently, so since we run tests on the commit that would arise if this was merged into the main branch, the failure is due to the fact that the code in this PR has not been updated. No need to worry about it for this PR, I can just fix it once it's merged 👍

@mrthankyou
Copy link
Contributor Author

Interesting. Would like to see those new changes sometime ;)

Thanks!

@mrthankyou
Copy link
Contributor Author

mrthankyou commented Aug 6, 2021

@RasmusSemmle There is a conflict with Concepts.qll. I assume you want this fixed? It seems simple and am happy to do so.

@yoff
Copy link
Contributor

yoff commented Aug 9, 2021

@RasmusSemmle There is a conflict with Concepts.qll. I assume you want this fixed? It seems simple and am happy to do so.

Indeed, if you can then please do :-)

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Barring in mind my comment here, this PR LGTM.

@RasmusWL RasmusWL merged commit ca341bd into github:main Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants