Skip to content

Add C++ security rules for NUL terminators and string_view safety #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Mar 26, 2025

Conversation

ESS-ENN
Copy link
Collaborator

@ESS-ENN ESS-ENN commented Mar 26, 2025

Summary by CodeRabbit

  • New Features
    • Introduced security rules to enhance C++ string handling by detecting potential issues with missing null terminators in memory copies and risky assignments involving temporary strings.
  • Tests
    • Added comprehensive tests to validate correct string copy operations and safe usage patterns for string views, ensuring robustness against common pitfalls in C++ string manipulation.

Sakshis and others added 30 commits December 16, 2024 13:09
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ESS-ENN
❌ Sakshis


Sakshis seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

coderabbitai bot commented Mar 26, 2025

Walkthrough

This pull request adds two new C++ security rules: one to detect missing NUL terminators when using memory copy functions and another to flag unsafe constructions of std::string_view from temporary strings. Along with these rules, corresponding snapshot files and YAML test cases have been introduced to validate the proper functioning of the rules using both valid and invalid test cases.

Changes

File(s) Change Summary
rules/cpp/security/missing-nul-cpp-string-memcpy-copy-cpp.yml
rules/cpp/security/string-view-temporary-string-cpp.yml
Introduces two new C++ security rules. One rule checks that memory copy functions account for the NUL terminator, and the other detects the construction of std::string_view from temporary strings.
tests/__snapshots__/missing-nul-cpp-string-memcpy-copy-cpp-snapshot.yml
tests/__snapshots__/string-view-temporary-string-cpp-snapshot.yml
Adds new snapshot files that define test cases (with metadata and labels) demonstrating the proper and improper usages for both rules.
tests/cpp/missing-nul-cpp-string-memcpy-copy-cpp-test.yml
tests/cpp/string-view-temporary-string-cpp-test.yml
Provides new YAML test files with valid and invalid cases to validate the behavior of the newly introduced rules in C++ code.

Sequence Diagram(s)

sequenceDiagram
    participant Code
    participant ASTParser
    participant MemcpyRule
    Code->>ASTParser: Parse code with memcpy usage
    ASTParser->>MemcpyRule: Analyze memcpy parameters
    MemcpyRule-->>ASTParser: Identify missing NUL terminator if present
    ASTParser-->>Code: Emit warning for potential out-of-bounds read
Loading
sequenceDiagram
    participant Code
    participant ASTParser
    participant StringViewRule
    Code->>ASTParser: Parse code with std::string_view usage
    ASTParser->>StringViewRule: Analyze construction from temporary std::string
    StringViewRule-->>ASTParser: Detect potential use-after-free issue
    ASTParser-->>Code: Emit warning for unsafe string_view assignment
Loading

Suggested reviewers

  • ganeshpatro321

Poem

I'm a little rabbit, hopping through the code,
Finding sneaky bugs where strings once strode.
With checks for NUL and safe views true,
Our C++ code now dances anew.
Hoppity cheers to this clever reload!
🐰🌟


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai bot changed the title @coderabbitai Add C++ security rules for NUL terminators and string_view safety Mar 26, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (10)
tests/__snapshots__/missing-nul-cpp-string-memcpy-copy-cpp-snapshot.yml (3)

11-55: Test_001 Labels: Consider streamlining duplicates
The label section for test_001 contains several repeated entries (for example, duplicate entries for char to[20] and the length initialization). If these entries are auto-generated, consider deduplicating or commenting their purpose to improve maintainability and clarity.


64-120: Test_002 Labels: Review for redundancy
Much like test_001, the labels here are very detailed and include repeated segments. While comprehensive labeling may be useful for snapshot diffing, streamlining these entries may improve readability without losing important information.


129-173: Test_003 Labels: Verify accuracy and reduce overlap
The labels for test_003 are comprehensive but exhibit redundant definitions. Consider whether all duplicated label entries (e.g. for char to[20] and the length calculation) are necessary. A cleanup here might reduce noise during snapshot comparison.

rules/cpp/security/missing-nul-cpp-string-memcpy-cpp.yml (3)

14-65: AST Pattern for strlen($STR.c_str()): Structural Verification
This section defines an AST pattern targeting calls where the third argument is computed via strlen($STR.c_str()). The use of nthChild and nested conditions appears well thought out. For future maintainability, adding inline comments explaining each nested condition could help clarify the matching logic for maintainers. Verify that these AST positions reliably match the target code’s structure.


66-80: Handling of Associated Type Declarations in Pattern Matching
The alternative pattern block (lines 66–80) focuses on validating type declarations for the destination buffer context. This extra check is useful for confirming the usage context of the memory copy call. Consider expanding inline documentation to clarify why these particular patterns (e.g. $TYPE $DEST[$DIM] = $$$;) are checked and whether they cover all expected scenarios.


161-405: Overall Rule Complexity and Maintainability
The rule employs multiple alternative branches with deeply nested conditions to catch a range of misuse patterns. While comprehensive, the complexity could make future maintenance challenging. Consider modularizing the rule (or adding detailed inline comments) to clarify the purpose of each branch. This can aid in both debugging and future extensions of the pattern logic.

tests/cpp/string-view-temporary-string-cpp-test.yml (1)

28-33: Invalid Test Case: Concatenation with an Undeclared Identifier
The snippet using foo + "foo" + bar introduces an undeclared symbol (bar). This appears to deliberately test the rule’s behavior when the concatenation involves an unexpected or undefined operand. Consider adding a comment in the YAML (or documentation) clarifying that this case is meant to simulate an unsafe pattern with an external or missing variable reference.

tests/__snapshots__/string-view-temporary-string-cpp-snapshot.yml (1)

1-311: Comprehensive Snapshot File Coverage
The snapshot file provides detailed label mappings for multiple scenarios of unsafe string concatenations. The various snapshot groups (covering cases like returns_std_string() + "bar", returns_std_string() + foo, literal-first concatenations, and chained concatenations) are well structured with clearly specified source ranges and label styles.
To enhance maintainability, consider adding a top-level YAML comment summarizing the purpose of the snapshots (e.g., “Expected outcomes for unsafe std::string_view constructions from temporary std::string objects”).

rules/cpp/security/string-view-temporary-string-cpp.yml (2)

1-13: Metadata and Messaging Accuracy
The rule’s metadata is clear and informative. One nitpick: there is a typo in the security message – “immeadiately” should be corrected to “immediately” for clarity.


16-907: Comprehensive Utility Definitions for Unsafe Constructions
The extensive collection of utility patterns covers a wide range of potential unsafe constructions—from assignments using direct concatenation (+), literal-first operations, to cases involving function returns and substring calls. This comprehensive approach minimizes the risk of missing a dangerous pattern. For long-term maintainability, consider adding inline comments or documentation for each utility block to explain the targeted use-case in brief.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 620d83c and df01215.

📒 Files selected for processing (6)
  • rules/cpp/security/missing-nul-cpp-string-memcpy-cpp.yml (1 hunks)
  • rules/cpp/security/string-view-temporary-string-cpp.yml (1 hunks)
  • tests/__snapshots__/missing-nul-cpp-string-memcpy-copy-cpp-snapshot.yml (1 hunks)
  • tests/__snapshots__/string-view-temporary-string-cpp-snapshot.yml (1 hunks)
  • tests/cpp/missing-nul-cpp-string-memcpy-cpp-test.yml (1 hunks)
  • tests/cpp/string-view-temporary-string-cpp-test.yml (1 hunks)
🔇 Additional comments (24)
tests/__snapshots__/missing-nul-cpp-string-memcpy-copy-cpp-snapshot.yml (3)

4-10: Test_001 Snapshot: Validate detection of missing NUL terminator
This test case demonstrates a common mistake—using strlen(from.c_str()) to determine the length, which does not include the terminating NUL. It appears designed to trigger the new rule. Please confirm that this behavior is intentional for snapshot validation.


57-63: Test_002 Snapshot: Validate alternate misuse using from.size()
This test case computes the length using from.size(), which again omits adding an extra byte for the NUL terminator. It correctly represents an unsafe usage scenario. Please double-check that the snapshot outcome is as expected for the rule validation.


122-128: Test_003 Snapshot: Validate misuse with from.length()
This snapshot uses from.length() to determine the copy length, which, similar to the other cases, does not account for the NUL terminator. Ensure that the inclusion of this variant provides distinct coverage versus the from.size() test.

tests/cpp/missing-nul-cpp-string-memcpy-cpp-test.yml (5)

3-10: Valid Test Case (test_001): Verify correct length handling
Under the “valid” section, test_001 computes the length using

size_t len_001 = strlen(from.c_str()+1);

This approach subtracts one character from what would normally be returned by strlen(from.c_str()). Typically, to correctly include the NUL terminator you would expect a calculation like strlen(from.c_str()) + 1. Please verify if the current formulation is intentional or requires adjustment.


11-18: Valid Test Case (test_002): Correct inclusion of NUL terminator
This test correctly computes the length as from.size()+1, which properly includes the terminating NUL. This case appears to be the ideal example for safe usage.


20-27: Invalid Test Case (test_001): Missing NUL terminator
Here the length is computed as strlen(from.c_str()) with no addition of an extra byte. This unsafe usage is expected to be flagged by the rule. The test case aligns with the intended invalid pattern.


28-35: Invalid Test Case (test_002): Omission of extra byte with from.size()
The usage of from.size() without adding 1 results in the NUL terminator being omitted. This invalid scenario should be captured by the new rule as designed.


36-43: Invalid Test Case (test_003): Unsafe use of from.length()
Using from.length() to determine the number of bytes to copy fails to account for the null terminator and is correctly categorized here as an unsafe pattern.

rules/cpp/security/missing-nul-cpp-string-memcpy-cpp.yml (3)

1-13: Rule Header and Metadata: Clear and Informative
The rule’s header clearly defines its purpose with an appropriate severity and message. The reference to [CWE-125] and the link to external guidelines adds useful context.


81-120: AST Pattern for $STR.size() Usage: Consistency Check
This block addresses cases where the length is determined via $STR.size(). It correctly expects the absence of a +1 and thus a missing NUL terminator. Please verify that the nthChild checks and nested conditions mirror those in the strlen block, ensuring consistent rule behavior.


121-160: AST Pattern for $STR.length() Usage: Consistency Across Variants
The subsequent alternative captures scenarios using $STR.length(). Its structure should be consistent with the $STR.size() handling so that all unsafe patterns are flagged equivalently. Confirm that no edge cases are missed in these variants.

tests/cpp/string-view-temporary-string-cpp-test.yml (6)

1-8: Valid Test Case Definition: Clear and Concise
The valid test snippet correctly assigns the result of string concatenation to a temporary variable of type std::string, ensuring safety. The YAML structure and indentation are clear.


9-15: Invalid Test Case: Direct Assignment from Temporary
This snippet demonstrates an unsafe use—assigning the result of foo + "bar" (a temporary std::string) directly to a std::string_view. This is exactly the scenario the new rule intends to catch.


16-21: Invalid Test Case: Reversed Concatenation Order
The test case with "bar" + foo correctly represents an unsafe assignment where a literal is used on the left side. This helps ensure the rule detects concatenation order issues.


22-27: Invalid Test Case: Chained String Literals and Variables
Here the concatenation involves multiple operands (foo + foo + "bar"). This chained operation, which yields a temporary, is properly modeled as an invalid case for a std::string_view assignment.


34-40: Invalid Test Case: Unsafe External Function Usage
By concatenating the temporary returned from returns_std_string() with foo, this test case simulates a dangerous scenario originating from an external function’s return value. This is well aligned with the rule’s objectives.


41-47: Invalid Test Case: External Function and Literal Combination
This snippet tests the unsafe construction of a std::string_view from the temporary result of returns_std_string() concatenated with a literal. It accurately represents one of the unsafe patterns the rule is targeting.

rules/cpp/security/string-view-temporary-string-cpp.yml (7)

14-14: Ast-Grep Essentials Flag Usage
The ast-grep-essentials: true flag is appropriately set, enabling the tool to use tree-sitter–based matching effectively.


16-39: Utility Pattern for std::to_string(...) Matching
The first utility pattern captures assignments where a std::string_view is being set via a call to std::to_string. The conditions—ensuring the target identifier matches string_view or wstring_view and that the call expression points to std::to_string—are correctly structured.


40-62: Utility Pattern for Substring Method Matching
The rule properly defines a pattern to catch assignments that use the .substr(...) method on a string (or wstring). The usage of regex and structure in the nested matching conditions looks correct.


63-82: Utility Pattern for Binary '+' Concatenation
This pattern ensures that assignments involving a binary + expression—with the left-hand value being an identifier matching wstring or string—are detected when assigned to a string_view. The approach is logical and concise.


83-106: Utility Pattern for Literal-First Concatenations
The construction matching assignments that start with a literal (i.e. "..." + $EXPR) is well defined. It correctly ensures that the string literal and the subsequent identifier conform to the expected types (e.g. string or wstring).


107-130: Utility Pattern for Instance Initialization via Concatenation
This pattern targets cases where the assignment is part of a variable’s initialization using literal concatenation. The nested conditions that check for corresponding declarations and types provide robust matching.


908-944: Rule Aggregation Block
The rule: section successfully aggregates the previously defined utility patterns and specifies conditions under which the rule should trigger. It distinguishes between different kinds of expressions (e.g. expression statements, call expressions, declarations) and excludes any with errors in the surrounding code. This design is both thorough and effective for detecting unsafe std::string_view constructions.

@ganeshpatro321 ganeshpatro321 merged commit aa47937 into coderabbitai:main Mar 26, 2025
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants