Skip to content

fix multi-hop traverse with name path #2743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jan 23, 2023
Merged

Conversation

nafraf
Copy link
Contributor

@nafraf nafraf commented Dec 1, 2022

This PR is to solve #2739 "Exact path length queries don't seem to respect the path length"

The problem seems to be that QGEdge_VariableLength() can't detect correctly that the edge has variable length when minHops == maxHops, because it returns (e->minHops != e->maxHops).

In queries like the following, there are variable length edges where minHops == maxHops, and ```QGEdge_VariableLength()`` erroneously return false:

GRAPH.QUERY g "MATCH p = ({v:'a'})-[*2]-({v:'c'}) RETURN length(p)"
GRAPH.QUERY g "MATCH p = ({v:'a'})-[*2..2]-({v:'c'}) RETURN length(p)"

Then, the idea is to create a new property varLen that is true when the edge has variable length specified, without consider the values of minHops, maxHops or even if they are missing.

Comments are welcome.

@codecov
Copy link

codecov bot commented Dec 1, 2022

Codecov Report

Base: 91.19% // Head: 91.17% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (cc8251a) compared to base (f0b0c27).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2743      +/-   ##
==========================================
- Coverage   91.19%   91.17%   -0.03%     
==========================================
  Files         267      267              
  Lines       25516    25519       +3     
==========================================
- Hits        23270    23267       -3     
- Misses       2246     2252       +6     
Impacted Files Coverage Δ
...on_plan/execution_plan_build/build_match_op_tree.c 100.00% <100.00%> (ø)
src/graph/entities/qg_edge.c 98.64% <100.00%> (+0.05%) ⬆️
src/graph/rg_matrix/rg_set_element_uint64.c 89.36% <0.00%> (-8.52%) ⬇️
src/query_ctx.c 87.68% <0.00%> (-4.35%) ⬇️
src/procedures/proc_sp_paths.c 95.41% <0.00%> (-0.86%) ⬇️
src/index/index_construct.c 96.70% <0.00%> (+1.09%) ⬆️
src/procedures/proc_ss_paths.c 91.86% <0.00%> (+1.74%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@nafraf nafraf marked this pull request as ready for review December 2, 2022 00:19
Copy link
Contributor

@AviAvni AviAvni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

query = """CREATE (a {v:'a'}), (b {v:'b'}), (c {v:'c'}), (d {v:'d'}), (a)-[:R]->(b), (b)-[:R]->(c), (c)-[:R]->(a), (d)-[:R]->(d)"""
actual_result = redis_graph.query(query)
query_to_expected_result = {
"MATCH p = ({v:'a'})-[*2]-({v:'c'}) RETURN length(p)" : [[2]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is not variable length it fixed length

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it's fixed length.
Should I rename the QGEdge_VariableLength() method and the property varLen? Because with the change of this PR, it detects if a range length was specified, fixed or variable.

And maybe the function NewCondVarLenTraverseOp() should be renamed too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed some functions. Please check if you agree.
QGEdge_RangeLength() to QGEdge_RangeLength()
_AlgebraicExpression_IsVarLen() to _AlgebraicExpression_IsRangeLen()

Property varLen renamed to rangeLen

The test were not removed, just renamed, for all tests rangeLen == true
test11_variable_length_edges() to test11_range_length_edges()

query_to_expected_result = {
"MATCH p = ({v:'a'})-[*2]-({v:'c'}) RETURN length(p)" : [[2]],
"MATCH p = ({v:'a'})-[*2..]-({v:'c'}) RETURN length(p)" : [[2]],
"MATCH p = ({v:'a'})-[*2..2]-({v:'c'}) RETURN length(p)" : [[2]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is not variable length it fixed length

"MATCH p = ({v:'a'})-[*2..2]-({v:'c'}) RETURN length(p)" : [[2]],
"MATCH p = ({v:'a'})-[*]-({v:'c'}) RETURN length(p)" : [[2],[1]],
"MATCH p = ({v:'a'})-[*..]-({v:'c'}) RETURN length(p)" : [[2],[1]],
"MATCH p = ({v:'d'})-[*0]-() RETURN length(p)" : [[0]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is not variable length it fixed length

@nafraf nafraf changed the title Add varLen property to QGEdge Add rangeLen property to QGEdge Dec 6, 2022
@swilly22
Copy link
Contributor

swilly22 commented Jan 17, 2023

Following up on this, the original issue:

GRAPH.QUERY g "CREATE (a{v:'a'}),(b{v:'b'}),(c{v:'c'}),(a)-[:R]->(b),(b)-[:R]->(c),(c)-[:R]->(a)"
GRAPH.QUERY g "MATCH p = ({v:'a'})-[*2]-({v:'c'}) RETURN length(p)"

Creates a graph where c can be reached from a in multiple ways:
1. a->b->c
2. a<-c

The request in the READ query is to get from a to c via 2 hops ({v:'a'})-[*2]-({v:'c'})
The only path which will get us there is: a->b->c

The appropriate Algebraic expression which denotes this pattern is: THE_ADJ * THE_ADJ applied by a single
Conditional Traverse operation.

GRAPH.explain g "MATCH p = ({v:'a'})-[*2]->({v:'c'}) RETURN p"
1) "Results"
2) "    Project"
3) "        Filter"
4) "            Conditional Traverse | (@anon_0)-[@anon_2*2..2]->(@anon_1)"
5) "                Filter"
6) "                    All Node Scan | (@anon_0)"

The problem here is that Conditional Traverse performs traversal via matrix matrix multiplication which "losses" traversed edges along the path. after the operation finishes we're left with the following information:

  1. src node
  2. dest node
  3. last traversed edge

This lost of path critical information makes the Path object construction not possible.

There are two solutions as we see it:

  1. when dealing with named paths of length > 1 always use DFS (Conditional Variable Length Traverse) (probably less performant compared to BFS via MxM)
  2. use multiple Conditional Traverse operations one for each hop along the constant length path. (We might want to consider introducing a variant to our Conditional Traverse one which performs multiple traversals via MxM, this should yield better performance then multiple Conditional Traverse operations)

This PR seems to have taken the first approach which is OK, but performance wise we might want to use multiple Conditional Traverse operations if the number of hops is say less than 5.

@@ -62,6 +62,12 @@ bool QGEdge_VariableLength
const QGEdge *e
);

// determine whether this is a edge with a range of length defined (variable length [*minHops..maxHops] or fixed length [*Hops])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think comment need update here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was updating the comment I noticed that maybe this function should be replaced by a QGEdge_LengthOne(). See comment below.

@nafraf
Copy link
Contributor Author

nafraf commented Jan 19, 2023

Thanks for the comments and description of the algorithm. Now the code is more clear to me.

With the first approach, to support the case ()-[*0]-() I had to use the Conditional Variable Length Traverse, then I think that QGEdge_RangeLength() should be replaced by a function QGEdge_LengthOne() which returns true if length is one. What do you think about it?

Copy link
Contributor

@AviAvni AviAvni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename function

@AviAvni AviAvni changed the title Add rangeLen property to QGEdge fix multi-hop traverse with name path Jan 23, 2023
@AviAvni AviAvni merged commit cc57c35 into RedisGraph:master Jan 23, 2023
@nafraf nafraf deleted the bug#2739 branch January 25, 2023 01:49
raz-mon pushed a commit that referenced this pull request Jan 25, 2023
* Add varLen property to QGEdge

* Test variable-length edge queries

* Fix spacing

* Fix test, remove direction

* Fix test

* Test zero length edge

* Init varLen in QGEdge_New()

* rename varLen to rangeLen

* Rewrite test11_range_length_edges()

* Use QGEdge_VariableLength() and QGEdge_RangeLength

* Replace QGEdge_RangeLength() by QGEdge_LengthOne()

* rename function

Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>
Co-authored-by: Avi Avni <avi.avni@gmail.com>
raz-mon pushed a commit that referenced this pull request Jan 25, 2023
* Add varLen property to QGEdge

* Test variable-length edge queries

* Fix spacing

* Fix test, remove direction

* Fix test

* Test zero length edge

* Init varLen in QGEdge_New()

* rename varLen to rangeLen

* Rewrite test11_range_length_edges()

* Use QGEdge_VariableLength() and QGEdge_RangeLength

* Replace QGEdge_RangeLength() by QGEdge_LengthOne()

* rename function

Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>
Co-authored-by: Avi Avni <avi.avni@gmail.com>
AviAvni added a commit that referenced this pull request Jan 25, 2023
* Fix toInteger() and toIntegerOrNull() don't convert Booleans (#2797)

* AR_TOINTEGER: add case for type T_BOOL

* skip [8] toInteger() failing on invalid arguments

* Update TypeConversion2.feature

Co-authored-by: Avi Avni <avi.avni@gmail.com>

* Fix right(), left() length validation (#2801)

* Fix AR_LEFT, AR_RIGHT length validation

* Test mistmatch type and fix error message

* left(), right(): NULL is a valid type for lenght

* Fix argument validation

Co-authored-by: Avi Avni <avi.avni@gmail.com>

* fix using timeout max as default when default not configured (#2808)

* fix using timeout max as default when default not configured

* fix

* do not cache query params (#2803)

* do not cache query params

* address PR comments

* fix leak

* fix test

* convert params to SIValues

* wip

* delay freeing of parsed params to after the query is parsed

* free parse params when freeing AST

* revert changes to ast_validations.c

* revert arithmetic_expression_construct changes

* emit an error when param value eval fails

Co-authored-by: Avi Avni <avi.avni@gmail.com>

* fix multi-hop traverse with name path (#2743)

* Add varLen property to QGEdge

* Test variable-length edge queries

* Fix spacing

* Fix test, remove direction

* Fix test

* Test zero length edge

* Init varLen in QGEdge_New()

* rename varLen to rangeLen

* Rewrite test11_range_length_edges()

* Use QGEdge_VariableLength() and QGEdge_RangeLength

* Replace QGEdge_RangeLength() by QGEdge_LengthOne()

* rename function

Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>
Co-authored-by: Avi Avni <avi.avni@gmail.com>

* indegree(), outdegree() - accept array of strings (#2807)

* _AR_NodeDegree() support T_ARRAY argument

* Add tests

* Fix argument validation

* Free array in case of error detected.

* Duplicate comments and use siarray.array[i]

* Update AR_IN() to use SI_ArrayContainsValue()

* Validate comparedNull pointer

* Support only the following two signatures:

* Test indegree() wrong argument number

* Test wrong signature: string and list

* clean

* revert

Co-authored-by: Avi Avni <avi.avni@gmail.com>

* Add ipv6 capability (#2842)

* fix heap remove item (#2827)

* fix heap remove item

* remove utf8 deps

* add test

* wip

* wip

* all unit tests pass

* heap remove element should use pointer cmp instead of cmp cb

* spaces to tabs

Co-authored-by: swilly22 <roi@redislabs.com>
Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>

* bump version 2.10.6

* Improve edge deletion performance (#2758)

* improve edge deletion

* update simple timer

* improve edge deletion perf

* fix build

* fix test

* fix test

* clean

* address review

Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>

* update ramp

Co-authored-by: nafraf <nafraf@users.noreply.github.com>
Co-authored-by: Avi Avni <avi.avni@gmail.com>
Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>
Co-authored-by: Guy Korland <gkorland@gmail.com>
Co-authored-by: swilly22 <roi@redislabs.com>
AviAvni added a commit that referenced this pull request Jan 25, 2023
* do not cache query params (#2803)

* do not cache query params

* address PR comments

* fix leak

* fix test

* convert params to SIValues

* wip

* delay freeing of parsed params to after the query is parsed

* free parse params when freeing AST

* revert changes to ast_validations.c

* revert arithmetic_expression_construct changes

* emit an error when param value eval fails

Co-authored-by: Avi Avni <avi.avni@gmail.com>

* fix multi-hop traverse with name path (#2743)

* Add varLen property to QGEdge

* Test variable-length edge queries

* Fix spacing

* Fix test, remove direction

* Fix test

* Test zero length edge

* Init varLen in QGEdge_New()

* rename varLen to rangeLen

* Rewrite test11_range_length_edges()

* Use QGEdge_VariableLength() and QGEdge_RangeLength

* Replace QGEdge_RangeLength() by QGEdge_LengthOne()

* rename function

Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>
Co-authored-by: Avi Avni <avi.avni@gmail.com>

* Add ipv6 capability (#2842)

* fix heap remove item (#2827)

* fix heap remove item

* remove utf8 deps

* add test

* wip

* wip

* all unit tests pass

* heap remove element should use pointer cmp instead of cmp cb

* spaces to tabs

Co-authored-by: swilly22 <roi@redislabs.com>
Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>

* bump version 2.8.22

* update ramp

* Improve edge deletion performance (#2758)

* improve edge deletion

* update simple timer

* improve edge deletion perf

* fix build

* fix test

* fix test

* clean

* address review

Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>

* fix build

* fix build

* fix test

* fix

* fix test

* fix

* fix

* fix

Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>
Co-authored-by: Avi Avni <avi.avni@gmail.com>
Co-authored-by: nafraf <nafraf@users.noreply.github.com>
Co-authored-by: Guy Korland <gkorland@gmail.com>
Co-authored-by: swilly22 <roi@redislabs.com>
pnxguide pushed a commit to CMU-SPEED/RedisGraph that referenced this pull request Mar 22, 2023
* Add varLen property to QGEdge

* Test variable-length edge queries

* Fix spacing

* Fix test, remove direction

* Fix test

* Test zero length edge

* Init varLen in QGEdge_New()

* rename varLen to rangeLen

* Rewrite test11_range_length_edges()

* Use QGEdge_VariableLength() and QGEdge_RangeLength

* Replace QGEdge_RangeLength() by QGEdge_LengthOne()

* rename function

Co-authored-by: Roi Lipman <swilly22@users.noreply.github.com>
Co-authored-by: Avi Avni <avi.avni@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants