Skip to content

FIX: Stop silently dropping first two rows during load_mapping #33076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 9, 2025

Conversation

s3lase
Copy link
Contributor

@s3lase s3lase commented Jun 4, 2025

Currently, the first two rows returned by DiscourseDB#query_array are silently dropped during the column size check in DiscourseDB#load_mapping. This happens because the rows object, while an enumerator, isn't fully compliant, it doesn't rewind during introspection. As a result, calls like #first, #peek, or #any? advance the iterator.

Ideally, we’d fix this by updating the query_array enumeration implementation. However, customizing the enumerator to be fully compliant would likely introduce unnecessary perf overhead for all use cases. So, this fix works around that limitation by building the map a little differently.

Currently, the first two rows returned by `DiscourseDB#query_array` are
silently dropped during the column size check in `DiscourseDB#load_mapping`.
This happens because the rows object, while an enumerator, isn't fully compliant,
it doesn't rewind during introspection. As a result, calls like `#first`, `#peek`,
or `#any?` advance the iterator.

Ideally, we’d fix this by updating the `query_array` enumeration implementation.
However, customizing the enumerator to be fully compliant would likely introduce
unnecessary perf overhead for all use cases. So, this fix works around that
limitation by building the map a little differently.
@s3lase s3lase requested a review from a team as a code owner June 4, 2025 21:56
@github-actions github-actions bot added the migrations-tooling PR which includes changes to migrations tooling label Jun 4, 2025
@s3lase s3lase requested a review from gschlager June 4, 2025 22:05
Copy link
Member

@gschlager gschlager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

I guess we should make it more explicit by using a custom enumerator in DiscourseDB to avoid making this mistake in the future.

class NonRewindableEnumerator < Enumerator
  def rewind
    raise RuntimeError, "Rewind not supported on this enumerator"
  end
end

@s3lase
Copy link
Contributor Author

s3lase commented Jun 9, 2025

Thanks!

I explored extending behaviour with a custom enumerator but decided it'd add unneeded overhead at this point. I think having a custom enumerator which simply throws an exception might be simpler

@s3lase s3lase merged commit d304e70 into main Jun 9, 2025
6 checks passed
@s3lase s3lase deleted the fix/mt/load-mapping branch June 9, 2025 21:27
martin-brennan pushed a commit that referenced this pull request Jun 10, 2025
)

Currently, the first two rows returned by `DiscourseDB#query_array` are
silently dropped during the column size check in
`DiscourseDB#load_mapping`. This happens because the rows object, while
an enumerator, isn't fully compliant, it doesn't rewind during
introspection. As a result, calls like `#first`, `#peek`, or `#any?`
advance the iterator.

Ideally, we’d fix this by updating the `query_array` enumeration
implementation. However, customizing the enumerator to be fully
compliant would likely introduce unnecessary perf overhead for all use
cases. So, this fix works around that limitation by building the map a
little differently.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migrations-tooling PR which includes changes to migrations tooling
Development

Successfully merging this pull request may close these issues.

2 participants