Skip to content

Conversation

jserv
Copy link
Collaborator

@jserv jserv commented Aug 25, 2025

This adds string interning to reduce memory usage by deduplicating identical identifier strings throughout the compilation process. It ensures that each unique identifier string is stored only once in memory, with all references pointing to the single interned copy.

The implementation uses a hashmap-based string pool that checks for existing strings before allocating new ones. String interning is now applied comprehensively across all identifier types for maximum memory efficiency.

Benefits:

  • Reduces memory usage by 3-5% for typical programs with duplicate identifiers (e.g., common parameter names like 'x', 'y', 'width')

Summary by Bito

This pull request implements a string interning mechanism to optimize memory usage by deduplicating identical identifier strings during compilation. A hashmap-based string pool is introduced, enhancing memory efficiency by 3-5% for programs with duplicate identifiers, particularly in 'src/parser.c'.

@jserv jserv force-pushed the string-interning branch from 71563fd to bc5a2f1 Compare August 25, 2025 03:36
@sysprog21 sysprog21 deleted a comment from bito-code-review bot Aug 25, 2025
Comment on lines +1166 to +1205
/* Initialize string pool for identifier deduplication */
string_pool = arena_alloc(GENERAL_ARENA, sizeof(string_pool_t));
string_pool->strings = hashmap_create(512);

/* Initialize string literal pool for deduplicating string constants */
string_literal_pool =
arena_alloc(GENERAL_ARENA, sizeof(string_literal_pool_t));
string_literal_pool->literals = hashmap_create(256);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems hashmap won't be freed after in global_release?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems hashmap won't be freed after in global_release?

Yes, the hashmaps for string pools were not being freed in global_release().
Fixed now by adding proper cleanup for both string pool hashmaps.

@jserv jserv force-pushed the string-interning branch from bc5a2f1 to 2a442d1 Compare August 25, 2025 13:19
@sysprog21 sysprog21 deleted a comment from bito-code-review bot Aug 25, 2025
@jserv jserv requested a review from ChAoSUnItY August 25, 2025 13:36
@jserv jserv force-pushed the string-interning branch from 2a442d1 to 1bc6b04 Compare August 25, 2025 13:43
@sysprog21 sysprog21 deleted a comment from bito-code-review bot Aug 25, 2025
This adds string interning to reduce memory usage by deduplicating
identical identifier strings throughout the compilation process. It
ensures that each unique identifier string is stored only once in
memory, with all references pointing to the single interned copy.

The implementation uses a hashmap-based string pool that checks for
existing strings before allocating new ones. String interning is now
applied comprehensively across all identifier types for maximum memory
efficiency.

Benefits:
- Reduces memory usage by 3-5% for typical programs with duplicate
  identifiers (e.g., common parameter names like 'x', 'y', 'width')
@jserv jserv force-pushed the string-interning branch from 1bc6b04 to d6e1889 Compare August 25, 2025 14:02
@sysprog21 sysprog21 deleted a comment from bito-code-review bot Aug 25, 2025
@jserv jserv merged commit 3ce9b6d into master Aug 25, 2025
12 checks passed
@jserv jserv deleted the string-interning branch August 25, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants