Skip to content

Python: CWE-117 Log injection #6182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions python/ql/src/experimental/Security/CWE-117/LogInjection.qhelp
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@

<!DOCTYPE qhelp PUBLIC
"-//Semmle//qhelp//EN"
"qhelp.dtd">
<qhelp>

<overview>

<p>If unsanitized user input is written to a log entry, a malicious user may be able to forge new log entries.</p>

<p>Forgery can occur if a user provides some input creating the appearance of multiple
log entries. This can include unescaped new-line characters, or HTML or other markup.</p>
</overview>

<recommendation>
<p>
User input should be suitably sanitized before it is logged.
</p>
<p>
If the log entries are plain text then line breaks should be removed from user input, using for example
<code>replace(old, new)</code> or similar. Care should also be taken that user input is clearly marked
in log entries, and that a malicious user cannot cause confusion in other ways.
</p>
<p>
For log entries that will be displayed in HTML, user input should be HTML encoded before being logged, to prevent forgery and
other forms of HTML injection.
</p>
</recommendation>

<example>
<p>
In the example, the name provided by the user is recorded using the log output function (<code>logging.info</code> or <code>app.logger.info</code>, etc.).
In these four cases, the name provided by the user is not provided The processing is recorded. If a malicious user provides <code>Guest%0D%0AUser name: Admin</code>
as a parameter, the log entry will be divided into two lines, the first line is <code>User name: Guest</code> code>, the second line is <code>User name: Admin</code>.
</p>
<sample src="LogInjectionBad.py" />

<p>
In a good example, the program uses the <code>replace</code> function to provide parameter processing to the user, and replace <code>\r\n</code> and <code>\n</code>
with empty characters. To a certain extent, the occurrence of log injection vulnerabilities is reduced.
</p>

<sample src="LogInjectionGood.py" />
</example>

<references>
<li>OWASP: <a href="https://owasp.org/www-community/attacks/Log_Injection">Log Injection</a>.</li>
</references>
</qhelp>
20 changes: 20 additions & 0 deletions python/ql/src/experimental/Security/CWE-117/LogInjection.ql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
/**
* @name Log Injection
* @description Building log entries from user-controlled data is vulnerable to
* insertion of forged log entries by a malicious user.
* @kind path-problem
* @problem.severity error
* @precision high
* @id py/log-injection
* @tags security
* external/cwe/cwe-117
*/

import python
import experimental.semmle.python.security.injection.LogInjection
import DataFlow::PathGraph

from LogInjectionFlowConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "$@ flows to log entry.", source.getNode(),
"User-provided value"
44 changes: 44 additions & 0 deletions python/ql/src/experimental/Security/CWE-117/LogInjectionBad.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
"""
@Desc :Log Injection
"""
from flask import Flask
from flask import request
from django.utils.log import request_logger
import logging

logging.basicConfig(level=logging.DEBUG)

app = Flask(__name__)

@app.route('/bad1')
def bad1():
name = request.args.get('name')
app.logger.info('User name: ' + name) # Bad
return 'bad1'

@app.route('/bad2')
def bad2():
name = request.args.get('name')
logging.info('User name: ' + name) # Bad
return 'bad2'

@app.route('/bad3')
def bad3():
name = request.args.get('name')
request_logger.warn('User name: ' + name) # Bad
return 'bad3'

@app.route('/bad4')
def bad4():
name = request.args.get('name')
logtest = logging.getLogger('test')
logtest.debug('User name: ' + name) # Bad
return 'bad4'

if __name__ == '__main__':
app.debug = True
handler = logging.FileHandler('log')
app.logger.addHandler(handler)
app.run()
25 changes: 25 additions & 0 deletions python/ql/src/experimental/Security/CWE-117/LogInjectionGood.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
"""
@Desc :Log Injection
"""
from flask import Flask
from flask import request
import logging

logging.basicConfig(level=logging.DEBUG)

app = Flask(__name__)

@app.route('/good1')
def good1():
name = request.args.get('name')
name = name.replace('\r\n','').replace('\n','')
logging.info('User name: ' + name) # Good
return 'good1'

if __name__ == '__main__':
app.debug = True
handler = logging.FileHandler('log')
app.logger.addHandler(handler)
app.run()
30 changes: 30 additions & 0 deletions python/ql/src/experimental/semmle/python/Concepts.qll
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,36 @@ private import semmle.python.dataflow.new.RemoteFlowSources
private import semmle.python.dataflow.new.TaintTracking
private import experimental.semmle.python.Frameworks

/** Provides classes for modeling log related APIs. */
module LogOutput {
/**
* A data flow node for log output.
*
* Extend this class to model new APIs. If you want to refine existing API models,
* extend `LogOutput` instead.
*/
abstract class Range extends DataFlow::Node {
/**
* Get the parameter value of the log output function.
*/
abstract DataFlow::Node getAnInput();
}
}

/**
* A data flow node for log output.
*
* Extend this class to refine existing API models. If you want to model new APIs,
* extend `LogOutput::Range` instead.
*/
class LogOutput extends DataFlow::Node {
LogOutput::Range range;

LogOutput() { this = range }

DataFlow::Node getAnInput() { result = range.getAnInput() }
}

/** Provides classes for modeling Regular Expression-related APIs. */
module RegexExecution {
/**
Expand Down
1 change: 1 addition & 0 deletions python/ql/src/experimental/semmle/python/Frameworks.qll
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@
private import experimental.semmle.python.frameworks.Stdlib
private import experimental.semmle.python.frameworks.LDAP
private import experimental.semmle.python.frameworks.NoSQL
private import experimental.semmle.python.frameworks.Log
118 changes: 118 additions & 0 deletions python/ql/src/experimental/semmle/python/frameworks/Log.qll
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
/**
* Provides classes modeling security-relevant aspects of the log libraries.
*/

private import python
private import semmle.python.dataflow.new.DataFlow
private import semmle.python.dataflow.new.TaintTracking
private import semmle.python.dataflow.new.RemoteFlowSources
private import experimental.semmle.python.Concepts
private import semmle.python.frameworks.Flask
private import semmle.python.ApiGraphs

/**
* Provides models for Python's log-related libraries.
*/
private module log {
/**
* Log output method list.
*
* See https://docs.python.org/3/library/logging.html#logger-objects
*/
private class LogOutputMethods extends string {
LogOutputMethods() {
this in ["info", "error", "warn", "warning", "debug", "critical", "exception", "log"]
}
}

/**
* The class used to find the log output method of the `logging` module.
*
* See `LogOutputMethods`
*/
private class LoggingCall extends DataFlow::CallCfgNode, LogOutput::Range {
LoggingCall() {
this = API::moduleImport("logging").getMember(any(LogOutputMethods m)).getACall()
}

override DataFlow::Node getAnInput() {
this.getFunction().(DataFlow::AttrRead).getAttributeName() != "log" and
result in [this.getArg(_), this.getArgByName(_)] // this includes the arg named "msg"
or
this.getFunction().(DataFlow::AttrRead).getAttributeName() = "log" and
result in [this.getArg(any(int i | i > 0)), this.getArgByName(any(string s | s != "level"))]
}
}

/**
* The class used to find log output methods related to the `logging.getLogger` instance.
*
* See `LogOutputMethods`
*/
private class LoggerCall extends DataFlow::CallCfgNode, LogOutput::Range {
LoggerCall() {
this =
API::moduleImport("logging")
.getMember("getLogger")
.getReturn()
.getMember(any(LogOutputMethods m))
.getACall()
}

override DataFlow::Node getAnInput() {
this.getFunction().(DataFlow::AttrRead).getAttributeName() != "log" and
result in [this.getArg(_), this.getArgByName(_)] // this includes the arg named "msg"
or
this.getFunction().(DataFlow::AttrRead).getAttributeName() = "log" and
result in [this.getArg(any(int i | i > 0)), this.getArgByName(any(string s | s != "level"))]
}
}

/**
* The class used to find the relevant log output method of the `flask.Flask.logger` instance (flask application).
*
* See `LogOutputMethods`
*/
private class FlaskLoggingCall extends DataFlow::CallCfgNode, LogOutput::Range {
FlaskLoggingCall() {
this =
Flask::FlaskApp::instance()
.getMember("logger")
.getMember(any(LogOutputMethods m))
.getACall()
}

override DataFlow::Node getAnInput() {
this.getFunction().(DataFlow::AttrRead).getAttributeName() != "log" and
result in [this.getArg(_), this.getArgByName(_)] // this includes the arg named "msg"
or
this.getFunction().(DataFlow::AttrRead).getAttributeName() = "log" and
result in [this.getArg(any(int i | i > 0)), this.getArgByName(any(string s | s != "level"))]
}
}

/**
* The class used to find the relevant log output method of the `django.utils.log.request_logger` instance (django application).
*
* See `LogOutputMethods`
*/
private class DjangoLoggingCall extends DataFlow::CallCfgNode, LogOutput::Range {
DjangoLoggingCall() {
this =
API::moduleImport("django")
.getMember("utils")
.getMember("log")
.getMember("request_logger")
.getMember(any(LogOutputMethods m))
.getACall()
}

override DataFlow::Node getAnInput() {
this.getFunction().(DataFlow::AttrRead).getAttributeName() != "log" and
result in [this.getArg(_), this.getArgByName(_)] // this includes the arg named "msg"
or
this.getFunction().(DataFlow::AttrRead).getAttributeName() = "log" and
result in [this.getArg(any(int i | i > 0)), this.getArgByName(any(string s | s != "level"))]
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import python
import semmle.python.Concepts
import experimental.semmle.python.Concepts
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.dataflow.new.RemoteFlowSources

/**
* A taint-tracking configuration for tracking untrusted user input used in log entries.
*/
class LogInjectionFlowConfig extends TaintTracking::Configuration {
LogInjectionFlowConfig() { this = "LogInjectionFlowConfig" }

override predicate isSource(DataFlow::Node source) { source instanceof RemoteFlowSource }

override predicate isSink(DataFlow::Node sink) { sink = any(LogOutput logoutput).getAnInput() }

override predicate isSanitizer(DataFlow::Node node) {
exists(CallNode call |
node.asCfgNode() = call.getFunction().(AttrNode).getObject("replace") and
call.getArg(0).getNode().(StrConst).getText() in ["\r\n", "\n"]
)
}
Comment on lines +18 to +23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly unsafe as it deems either replacement ("\r\n" or "\n") a sanitizer when really both are required.
I am not sure how to improve that easily, though. Perhaps we will turn it into a concept with a single implementation and improve it over time, but I am not sure you need to do that now.

Further, this seems overly specific, so I would expect some false positives from this. Consider adding at least StringConstCompare as a sanitizer option. (Either directly here in this predicate, or by a construction similar to here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a specific case here? Because the safety laboratory did not give me advice on the elimination of FP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user does replace("\r\n", ""), then that will match your predicate and be considered a sanitizer. However, that would not do any replacements on a message having only "\n"'s in it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other comment is that if the message is compared against a known string, that should be considered a sanitizer (since that is much stronger than doing a replacement).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if you give me the actual case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yoff Excuse me again. Do you have examples in actual open source projects for me?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I had to do a bit of digging. My comment came from seeing code like this, although in that particular case we do want to alert (because the logged value is outside the known set). I think I have seen also that only values in a known set is logged, which would be safe, and that is the sanitizer I proposed. However, it does not seem to be very common, so we could also add it once we see the first FP...

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
edges
| LogInjectionBad.py:17:12:17:18 | ControlFlowNode for request | LogInjectionBad.py:17:12:17:23 | ControlFlowNode for Attribute |
| LogInjectionBad.py:17:12:17:23 | ControlFlowNode for Attribute | LogInjectionBad.py:18:21:18:40 | ControlFlowNode for BinaryExpr |
| LogInjectionBad.py:23:12:23:18 | ControlFlowNode for request | LogInjectionBad.py:23:12:23:23 | ControlFlowNode for Attribute |
| LogInjectionBad.py:23:12:23:23 | ControlFlowNode for Attribute | LogInjectionBad.py:24:18:24:37 | ControlFlowNode for BinaryExpr |
| LogInjectionBad.py:29:12:29:18 | ControlFlowNode for request | LogInjectionBad.py:29:12:29:23 | ControlFlowNode for Attribute |
| LogInjectionBad.py:29:12:29:23 | ControlFlowNode for Attribute | LogInjectionBad.py:30:25:30:44 | ControlFlowNode for BinaryExpr |
| LogInjectionBad.py:35:12:35:18 | ControlFlowNode for request | LogInjectionBad.py:35:12:35:23 | ControlFlowNode for Attribute |
| LogInjectionBad.py:35:12:35:23 | ControlFlowNode for Attribute | LogInjectionBad.py:37:19:37:38 | ControlFlowNode for BinaryExpr |
nodes
| LogInjectionBad.py:17:12:17:18 | ControlFlowNode for request | semmle.label | ControlFlowNode for request |
| LogInjectionBad.py:17:12:17:23 | ControlFlowNode for Attribute | semmle.label | ControlFlowNode for Attribute |
| LogInjectionBad.py:18:21:18:40 | ControlFlowNode for BinaryExpr | semmle.label | ControlFlowNode for BinaryExpr |
| LogInjectionBad.py:23:12:23:18 | ControlFlowNode for request | semmle.label | ControlFlowNode for request |
| LogInjectionBad.py:23:12:23:23 | ControlFlowNode for Attribute | semmle.label | ControlFlowNode for Attribute |
| LogInjectionBad.py:24:18:24:37 | ControlFlowNode for BinaryExpr | semmle.label | ControlFlowNode for BinaryExpr |
| LogInjectionBad.py:29:12:29:18 | ControlFlowNode for request | semmle.label | ControlFlowNode for request |
| LogInjectionBad.py:29:12:29:23 | ControlFlowNode for Attribute | semmle.label | ControlFlowNode for Attribute |
| LogInjectionBad.py:30:25:30:44 | ControlFlowNode for BinaryExpr | semmle.label | ControlFlowNode for BinaryExpr |
| LogInjectionBad.py:35:12:35:18 | ControlFlowNode for request | semmle.label | ControlFlowNode for request |
| LogInjectionBad.py:35:12:35:23 | ControlFlowNode for Attribute | semmle.label | ControlFlowNode for Attribute |
| LogInjectionBad.py:37:19:37:38 | ControlFlowNode for BinaryExpr | semmle.label | ControlFlowNode for BinaryExpr |
subpaths
#select
| LogInjectionBad.py:18:21:18:40 | ControlFlowNode for BinaryExpr | LogInjectionBad.py:17:12:17:18 | ControlFlowNode for request | LogInjectionBad.py:18:21:18:40 | ControlFlowNode for BinaryExpr | $@ flows to log entry. | LogInjectionBad.py:17:12:17:18 | ControlFlowNode for request | User-provided value |
| LogInjectionBad.py:24:18:24:37 | ControlFlowNode for BinaryExpr | LogInjectionBad.py:23:12:23:18 | ControlFlowNode for request | LogInjectionBad.py:24:18:24:37 | ControlFlowNode for BinaryExpr | $@ flows to log entry. | LogInjectionBad.py:23:12:23:18 | ControlFlowNode for request | User-provided value |
| LogInjectionBad.py:30:25:30:44 | ControlFlowNode for BinaryExpr | LogInjectionBad.py:29:12:29:18 | ControlFlowNode for request | LogInjectionBad.py:30:25:30:44 | ControlFlowNode for BinaryExpr | $@ flows to log entry. | LogInjectionBad.py:29:12:29:18 | ControlFlowNode for request | User-provided value |
| LogInjectionBad.py:37:19:37:38 | ControlFlowNode for BinaryExpr | LogInjectionBad.py:35:12:35:18 | ControlFlowNode for request | LogInjectionBad.py:37:19:37:38 | ControlFlowNode for BinaryExpr | $@ flows to log entry. | LogInjectionBad.py:35:12:35:18 | ControlFlowNode for request | User-provided value |
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
experimental/Security/CWE-117/LogInjection.ql
Loading