Skip to content

C#: Improve some existing manual models. #19940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

michaelnebel
Copy link
Contributor

@michaelnebel michaelnebel commented Jul 1, 2025

In this PR we improve the models as data modelling across multiple classes.

  • Explicitly added summary models for all overloads of System.Xml.XmlDictionaryReader.CreateBinaryReader.
  • Added models for some of the methods and properties in System.Runtime.Serialization.SerializationInfo and System.Runtime.Serialization.SerializationInfoEnumerator.
  • Updated models for System.Text.Encoding.GetBytes, System.Text.Encoding.GetChars and the constructor for System.IO.MemoryStream.

Also convert the deserialization tests to inline expectations.

The new findings in mono/mono for cs/cleartext-storage-of-sensitive-information and
cs/exposure-of-sensitive-information are both related to the improved models for GetBytes (where taint is propagated to another parameter instead of the return).

Copy link
Contributor

github-actions bot commented Jul 1, 2025

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

csharp

Generated file changes for csharp

  • Changes to framework-coverage-csharp.rst:
-    System,"``System.*``, ``System``",47,12139,54,5
+    System,"``System.*``, ``System``",47,12165,54,5
-    Totals,,107,14403,407,9
+    Totals,,107,14429,407,9
  • Changes to framework-coverage-csharp.csv:
- System,54,47,12139,,6,5,5,,,4,1,,33,2,,6,15,17,4,3,,5903,6236
+ System,54,47,12165,,6,5,5,,,4,1,,33,2,,6,15,17,4,3,,5929,6236

@michaelnebel michaelnebel marked this pull request as ready for review July 3, 2025 13:26
@Copilot Copilot AI review requested due to automatic review settings July 3, 2025 13:26
@michaelnebel michaelnebel requested a review from a team as a code owner July 3, 2025 13:26
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances security test queries by converting deserialization tests to inline alert annotations and improves the library’s manual summary models to reduce false negatives.

  • Converted multiple unsafe/unsafe-untrusted-deserialization tests to use inline // $ Alert[...] comments.
  • Renamed and expanded binary‐formatter test classes for broader coverage.
  • Added and refined summary models for XmlDictionaryReader.CreateBinaryReader, Encoding.GetBytes/GetChars, MemoryStream constructors, and serialization APIs.

Reviewed Changes

Copilot reviewed 22 out of 29 changed files in this pull request and generated no comments.

Show a summary per file
File Description
csharp/ql/test/query-tests/Security Features/CWE-502/UnsafeDeserializationUntrustedInputNewtonsoftJson/Test.cs Moved JSON unsafe-deserialization alert to inline comment.
csharp/ql/test/query-tests/Security Features/CWE-502/UnsafeDeserializationUntrustedInput/XmlSerializerUntrustedInputBad.cs Added inline alert annotation after the return call.
csharp/ql/test/query-tests/Security Features/CWE-502/UnsafeDeserializationUntrustedInput/BinaryFormatterUntrustedInputBad.cs Renamed BadBinaryFormatter class and added a second test class.
csharp/ql/src/change-notes/2025-07-01-improve-summary-models.md Documented the new and updated summary models.
csharp/ql/lib/ext/System.Xml.model.yml Added summary models for all XmlDictionaryReader.CreateBinaryReader overloads.
csharp/ql/lib/ext/System.Text.model.yml Updated taint mappings for Encoding.GetBytes/GetChars.
csharp/ql/lib/ext/System.Runtime.Serialization.model.yml Added models for SerializationInfo and SerializationInfoEnumerator.
csharp/ql/lib/ext/System.IO.model.yml Adjusted MemoryStream taint model to use .Element for byte arrays.
Files not reviewed (7)
  • csharp/ql/test/library-tests/dataflow/library/FlowSummaries.expected: Language not supported
  • csharp/ql/test/library-tests/dataflow/library/FlowSummariesFiltered.expected: Language not supported
  • csharp/ql/test/query-tests/Security Features/CWE-502/DeserializedDelegate/DeserializedDelegate.qlref: Language not supported
  • csharp/ql/test/query-tests/Security Features/CWE-502/UnsafeDeserialization/UnsafeDeserialization.qlref: Language not supported
  • csharp/ql/test/query-tests/Security Features/CWE-502/UnsafeDeserializationUntrustedInput/UnsafeDeserializationUntrustedInput.expected: Language not supported
  • csharp/ql/test/query-tests/Security Features/CWE-502/UnsafeDeserializationUntrustedInput/UnsafeDeserializationUntrustedInput.qlref: Language not supported
  • csharp/ql/test/query-tests/Security Features/CWE-502/UnsafeDeserializationUntrustedInputNewtonsoftJson/UnsafeDeserializationUntrustedInput.qlref: Language not supported
Comments suppressed due to low confidence (2)

csharp/ql/test/query-tests/Security Features/CWE-502/UnsafeDeserializationUntrustedInputNewtonsoftJson/Test.cs:17

  • The $ Alert annotation is placed on the JsonSerializerSettings line rather than on the DeserializeObject invocation itself. Move the inline comment to the line starting with return JsonConvert.DeserializeObject(...) to ensure the test framework picks up the alert.
        return JsonConvert.DeserializeObject(data.Text, new JsonSerializerSettings // $ Alert[cs/unsafe-deserialization-untrusted-input]

csharp/ql/test/query-tests/Security Features/CWE-502/UnsafeDeserializationUntrustedInput/BinaryFormatterUntrustedInputBad.cs:7

  • The test harness expects a class named BadBinaryFormatter, but this has been renamed to BadBinaryFormatter1. Rename it back to BadBinaryFormatter to maintain compatibility with the test matcher.
class BadBinaryFormatter1

Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions about how the models all work together.

- ["System.Runtime.Serialization", "SerializationInfo", False, "GetEnumerator", "()", "", "Argument[this]", "ReturnValue", "taint", "manual"]
- ["System.Runtime.Serialization", "SerializationInfo", False, "GetString", "(System.String)", "", "Argument[this]", "ReturnValue", "taint", "manual"]
- ["System.Runtime.Serialization", "SerializationInfo", False, "GetValue", "(System.String,System.Type)", "", "Argument[this]", "ReturnValue", "taint", "manual"]
- ["System.Runtime.Serialization", "SerializationInfoEnumerator", False, "get_Current", "()", "", "Argument[this]", "ReturnValue", "taint", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns a SerializationEntry struct. I'm not familiar with CodeQL for C#, so I just want to check that it's sufficient to taint that struct, and then accessing the fields will propagate the taint, without requiring any extra models.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we would need extra models for that as well. The model itself doesn't bring any value unless models for SerializationEntry is also provided. However, I decided to include it for consistency (such that all properties of SerializationInfoEnumerator are modelled - or at least those where it makes sense to have a model). But I have stop the modelling somehere 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Perhaps a comment on this line saying that it won't work until models for SerializationEntry are added, would save the next person to run into this some time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -47,7 +47,7 @@ extensions:
- ["System.IO", "FileStream", False, "FileStream", "(System.String,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,System.Int32)", "", "Argument[0]", "Argument[this]", "taint", "manual"]
- ["System.IO", "FileStream", False, "FileStream", "(System.String,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,System.Int32,System.Boolean)", "", "Argument[0]", "Argument[this]", "taint", "manual"]
- ["System.IO", "FileStream", False, "FileStream", "(System.String,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,System.Int32,System.IO.FileOptions)", "", "Argument[0]", "Argument[this]", "taint", "manual"]
- ["System.IO", "MemoryStream", False, "MemoryStream", "(System.Byte[])", "", "Argument[0]", "Argument[this]", "taint", "manual"]
- ["System.IO", "MemoryStream", False, "MemoryStream", "(System.Byte[])", "", "Argument[0].Element", "Argument[this]", "taint", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. We do the opposite in CodeQL for Go - byte arrays are often used synonymously with strings, and we always taint the whole array rather than individual elements, just as we would if it was a string. I guess either way works as long as your consistent, but I seem to recall that for Go it was better to taint the whole array because strings can be considered as byte arrays in some circumstances. Just thought I'd mention it.

Copy link
Contributor

@aschackmull aschackmull Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java, we would also taint a whole byte[] rather than its members - in fact single byte values are typically considered too small to carry taint and used as type-based sanitizers.

Copy link
Contributor Author

@michaelnebel michaelnebel Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for looking into this!
So the change is probably a bit more controversial than advertised in slack. 😄
To address the questions/comments: Yes, I think it makes sense to consider both byte[] and char[] as "bulk" like datatypes like it is done for Java. The change in this PR solves the reported issue (as it is now) and doesn't break the pattern that is already being used for C# in terms of modelling - but we should probably change this.
Will it be acceptable to merge the models as they are and then as a follow up try and convert all the models (at least all the manual ones) to check what the consequences will be? This might also tie into model generator as well.

@@ -3,18 +3,18 @@ extensions:
pack: codeql/csharp-all
extensible: summaryModel
data:
- ["System.Text", "Encoding", True, "GetBytes", "(System.Char*,System.Int32,System.Byte*,System.Int32)", "", "Argument[0]", "ReturnValue", "taint", "manual"]
- ["System.Text", "Encoding", True, "GetBytes", "(System.Char*,System.Int32,System.Byte*,System.Int32)", "", "Argument[0].Element", "Argument[2]", "taint", "manual"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For many of the models in this file, I don't understand why the input has .Element but the output doesn't. Please can you explain how this works? (Obviously the old model was just wrong, though.)

Copy link
Contributor Author

@michaelnebel michaelnebel Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an excellent question - I didn't want to investigate that as a part of adding the models - but now I have checked: The cause is that for C#, support for implicit reads at sinks hasn't been implemented. That is, if the elements of a collection are tainted but the sink is a collection type, then taint isn't propagated. Since we have a few byte[] sinks, it appears that the models are made like this.
This should probably be addressed (but not as a part of this PR) - will open an issue for it (I think it is just a matter of making a simple implementation for defaultImplicitTaintRead - but it will need to be tested separately).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like all the skeletons are falling out of the closet 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine to me, as long as there's a follow-up issue and we don't totally forget about it.

Copy link
Contributor Author

@michaelnebel michaelnebel Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened issues for both ((1) Implicit reads at sinks and (2) Convert byte[] and char[] models)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants