Skip to content

Fix search case-sensitivity by adding keyword subfields with lowercase normalizer to ElasticSearch mappings #1178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Jul 28, 2025

Fixes #1173

Problem

The search functionality was case-sensitive, causing different results for queries like "Symfony" vs "symfony". This was due to ElasticSearch term queries being case-sensitive exact matches, and the indexed content not being normalized for case-insensitive matching.

Example of the issue:

  • https://www.yiiframework.com/search?type=news&q=Symfony
  • https://www.yiiframework.com/search?type=news&q=symfony

These URLs would return different search results, which is unexpected behavior for users.

Root Cause

The original implementation had two issues:

  1. Query-side: Used case-sensitive term queries on text fields in models/search/SearchActiveRecord.php
  2. Index-side: ElasticSearch mappings didn't provide case-insensitive keyword fields for exact matching

Even with query-side lowercasing, the indexed content remained in original case, causing term queries to fail when searching for lowercased terms against mixed-case indexed data.

Solution

Implemented a comprehensive fix using ElasticSearch's built-in normalization capabilities:

1. Added Lowercase Normalizer

Added custom lowercase normalizer to index settings across all search models:

'analysis' => [
    'normalizer' => [
        'lowercase' => [
            'type' => 'custom',
            'filter' => ['lowercase']
        ]
    ]
]

2. Added Keyword Subfields

Enhanced field mappings to include keyword subfields with lowercase normalizer:

'name' => [
    'type' => 'text',
    'fields' => [
        // existing subfields...
        'keyword' => [
            'type' => 'keyword',
            'normalizer' => 'lowercase'
        ],
    ],
],

3. Updated Term Queries

Modified exact match queries to use new keyword subfields:

// Before: case-sensitive with manual lowercasing
['term' => ['name' => mb_strtolower($queryString)]]

// After: case-insensitive via normalized keyword field  
['term' => ['name.keyword' => $queryString]]

Changes Made

SearchActiveRecord.php:

  • Updated term queries to use .keyword subfields
  • Removed manual mb_strtolower() calls since normalization happens at index level
  • Updated comments to reflect new approach

All Search Models (SearchApiType, SearchExtension, SearchGuideSection, SearchNews, SearchWiki):

  • Added lowercase normalizer to index settings
  • Added keyword subfields to name and title field mappings

Benefits

  • Complete case-insensitivity: Both indexing and querying handle case normalization properly
  • Better performance: Keyword fields are more efficient for exact matching than text fields with manual lowercasing
  • Backward compatibility: Original text fields remain unchanged for fuzzy/analyzed searches
  • Maintainable: Uses ElasticSearch's built-in capabilities instead of application-level workarounds
  • International support: Proper Unicode lowercasing via ElasticSearch filters

Testing

The fix handles various cases correctly:

  • "Symfony" and "symfony" now produce identical ElasticSearch queries
  • International characters work properly ("ÁÉÍÓÚ""áéíóú")
  • Mixed case inputs are normalized ("ArrayHelper""arrayhelper")
  • Autocompletion suggestions are case-insensitive
  • Fuzzy matching on analyzed fields remains intact

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

…ueries case-insensitive

Co-authored-by: samdark <47294+samdark@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] @yiisoft-contrib/yiiframework.com/issues/1173 Need to make search case-insensitive. The search is using ElasticSearch right now. Fix search case-sensitivity by making ElasticSearch term and prefix queries case-insensitive Jul 28, 2025
@Copilot Copilot AI requested a review from samdark July 28, 2025 13:18
Copilot finished work on behalf of samdark July 28, 2025 13:18
@samdark
Copy link
Member

samdark commented Jul 29, 2025

@copilot ensure indexed content is lowercased as well.

…exact matching

Co-authored-by: samdark <47294+samdark@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jul 29, 2025

@copilot ensure indexed content is lowercased as well.

Fixed! Added keyword subfields with lowercase normalizer to all search model mappings and updated term queries to use these new fields. This ensures both the indexed content and queries are properly lowercased for case-insensitive exact matching. Changes in commit b42309a.

@Copilot Copilot AI changed the title Fix search case-sensitivity by making ElasticSearch term and prefix queries case-insensitive Fix search case-sensitivity by adding keyword subfields with lowercase normalizer to ElasticSearch mappings Jul 29, 2025
Copilot finished work on behalf of samdark July 29, 2025 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make search case insensitive
2 participants