Skip to content

Support CODDTest for DuckDB #1224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

DerZc
Copy link
Contributor

@DerZc DerZc commented Apr 30, 2025

Support CODDTest for DuckDB. This is a sub-task of the larger implementation tracked in #1054.

Copy link
Contributor

@mrigger mrigger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. I looked through the files and added some comments and suggestions. Let me know once you've had a look at them. In particular, the DuckdbCODDTestOracle is difficult to understand, and I think it will be tough to maintain and extend it in the future. However, I also understand that refactoring this now might be challenging. I'm fine with merging the current state, perhaps after addressing some of the easy issues, but it would also be good to think about what we could improve.

@@ -57,6 +57,14 @@ public static List<String> getExpressionErrors() {
errors.add("Cannot subtract infinite timestamps");
errors.add("Timestamp difference is out of bounds");

// added by CODDTest
// errors.add("must appear in the GROUP BY clause or be used in an aggregate function");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we can remove these commented-out messages?

@@ -77,6 +77,25 @@ public class DuckDBOptions implements DBMSSpecificOptions<DuckDBOracleFactory> {
@Parameter(names = "--max-num-updates", description = "The maximum number of UPDATE statements that are issued for a database", arity = 1)
public int maxNumUpdates = 5;

public enum CODDTestModel {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the same for all CODDTest implementations, perhaps we could extract this to a common enum with the next PR?

@@ -41,9 +41,19 @@ public static class DuckDBCompositeDataType {

private final int size;

// This is used to handle the type that out of the scope of our code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't understand the comment and use of this. Could we perhaps explain or give an example?

}

public void visit(DuckDBTypeCast expr) {
if (expr.getExpression() instanceof DuckDBNullConstant) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we are re-interpreting the AST here (e.g., if it's a NULL, we ignore the cast). In the other parts, I think we treat the AST as a ground truth (i.e., we just directly print its structure). Would it be possible to construct the AST earlier on correctly, by omitting casts if we have null values or if the type is unknown?

innerQuery.setFetchColumns(Arrays.asList(innerQueryFetchColumn));
}

// for (Map.Entry<Integer, DuckDBCompositeDataType> entry: idxTypeMap.entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove such commented-out code?

return true;
} else {
System.out.printf("Wrong option of --coddtest-model, should be one of: RANDOM, EXPRESSION, SUBQUERY");
System.exit(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a switch-case on the enum would be clearer here, throwing an AssertionError in the default case.

// }
for (int i = 0; i < v1Value.size(); ++i) {
if (!v1Value.get(i).equals(v2Value.get(i))) {
String regx = "[+-]*\\d+\\.?\\d*[Ee]*[+-]*\\d+";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this regex do? We should document this.

String regx = "[+-]*\\d+\\.?\\d*[Ee]*[+-]*\\d+";
Pattern pattern = Pattern.compile(regx);
if (pattern.matcher(v1Value.get(i)).matches() && pattern.matcher(v2Value.get(i)).matches()) {
if (!v1Value.get(i).substring(0, 6).equals(v2Value.get(i).substring(0, 6))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the 6 come from? It would be good to document this.

return joinExpressions;
}

private List<DuckDBExpression> genOrderBys(DuckDBExpression specificCondition) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have similar logic in the expression generator of DuckDB? If so, we can reuse it?


@Override
public void check() throws Exception {
reproducer = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the logic in this class and method is difficult to understand. It would be ideal to refactor it, for example, by extracting methods and reducing code duplication between CODDTest implementations. However, I also understand that this might be risky, as we would no longer be certain that the implementation still works after refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants