Skip to content

Question: Most efficient way to set or remove the diagonal in large matrices #487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
q-aaronzolnailucas opened this issue Jul 31, 2023 · 8 comments
Labels
discussion Discussing a topic with no specific actions yet

Comments

@q-aaronzolnailucas
Copy link

Hi, what's the best way to set or remove the diagonal (self-loops) from a graphblas Matrix?

@q-aaronzolnailucas q-aaronzolnailucas changed the title Question: Most efficient way to set the diagonal in large matrices Question: Most efficient way to set or remove the diagonal in large matrices Jul 31, 2023
@eriknw
Copy link
Member

eriknw commented Jul 31, 2023

Good questions, thanks for asking! Here are a few examples:

>>> import graphblas as gb
>>> import numpy as np

>>> A = gb.Matrix.from_dense(np.arange(9).reshape(3, 3))
>>> A
"M_0"      nvals  nrows  ncols  dtype  format
gb.Matrix      9      3      3  INT64   fullr
---------------------------------------------
   0  1  2
0  0  1  2
1  3  4  5
2  6  7  8

>>> # Diagonal of a Matrix as a Vector
>>> A.diag()
"v_0"      nvals  size  dtype  format
gb.Vector      3     3  INT64    full
-------------------------------------
index  0  1  2
value  0  4  8

>>> # Off-diagonal of a Matrix as a Vector
>>> A.diag(1)
A"v_1"      nvals  size  dtype  format
gb.Vector      2     2  INT64    full
-------------------------------------
index  0  1
value  1  5

>>> # Select diagonal of a Matrix as a Matrix
>>> A.select("diag")
gb.MatrixExpression                         nrows  ncols  dtype
M_0.select(select.diag[BOOL], thunk=False)      3      3  INT64

"Result"   nvals  nrows  ncols  dtype
gb.Matrix      3      3      3  INT64
-------------------------------------
   0  1  2
0  0
1     4
2        8

Do expr.new() or other << expr to calculate the expression.

>>> # Select off-diagonal of a Matrix as a Matrix (using alternative API)
>>> gb.select.diag(A, 1)
gb.MatrixExpression                     nrows  ncols  dtype
M_0.select(select.diag[BOOL], thunk=1)      3      3  INT64

"Result"   nvals  nrows  ncols  dtype
gb.Matrix      2      3      3  INT64
-------------------------------------
  0  1  2
0    1
1       5
2

Do expr.new() or other << expr to calculate the expression.

>>> # Remove diagonal elements of a Matrix using select
>>> A << gb.select.offdiag(A)
>>> A
"M_0"      nvals  nrows  ncols  dtype   format
gb.Matrix      6      3      3  INT64  bitmapr
----------------------------------------------
   0  1  2
0     1  2
1  3     5
2  6  7

>>> # Assign a Vector to the diagonal of a Matrix
>>> v = gb.Vector.from_dense([1, 2, 3])
>>> v
A(Diag.S) << Diag"v_2"      nvals  size  dtype  format
gb.Vector      3     3  INT64    full
-------------------------------------
index  0  1  2
value  1  2  3

>>> A(gb.op.second) << v.diag()

>>> # or
>>> Diag = v.diag()
>>> A(Diag.S) << Diag
>>> A
"M_0"      nvals  nrows  ncols  dtype  format
gb.Matrix      9      3      3  INT64   fullr
---------------------------------------------
   0  1  2
0  1  1  2
1  3  2  5
2  6  7  3

Does this help? Is there any way we can make this better?

@eriknw eriknw added the discussion Discussing a topic with no specific actions yet label Jul 31, 2023
@eriknw
Copy link
Member

eriknw commented Jul 31, 2023

I would be curious to see which of these is faster in general to assign a Vector to a diagonal:

>>> A(gb.op.second) << v.diag()

or

>>> Diag = v.diag()
>>> A(Diag.S) << Diag

I would expect and hope they are comparable.

@q-aaronzolnailucas
Copy link
Author

q-aaronzolnailucas commented Aug 1, 2023

@eriknw thank you for the detailed response - and apologies if this would have fit better in 'discussions', will keep in mind next time.

I will try both

>>> A(gb.op.second) << v.diag()

and

>>> Diag = v.diag()
>>> A(Diag.S) << Diag

and report back if I see significant differences.

As to improvements - I think adding/removing self loops is quite a common operation - e.g. removing loops before pagerank or adding them before GNN message passing.

Scipy has a setdiag method that is quite useful, maybe wrapping some of your answer into something like that (but that fits the lazy execution model).

I think particularly

A(gb.op.second) << v.diag()

is quite hard to read. It's also more verbose to set the diagonal to a scalar.

I'd be happy to have a go at implementing a setdiag method if it fits in the paradigm

@jim22k
Copy link
Member

jim22k commented Aug 2, 2023

Removing the diagonal is easy.

gb.select.offdiag(A)

Setting the diagonal is necessarily more complex due to the interactions with the existing diagonal.

  • Replace every element of the diagonal with a Scalar value
  • Replace existing elements of the diagonal with a Scalar value
  • Accumulate a Vector into the existing diagonal
  • Replace the existing diagonal with values from a Vector (leaving untouched the values on the diagonal where the Vector has missing values)
  • Fully replace the existing diagonal with a Vector (dropping values on the diagonal where the Vector has missing values)

I worry that a setdiag method would need to be quite complex to capture these different situations.

@q-aaronzolnailucas
Copy link
Author

I see your point @jim22k, but wouldn't the complex nature be a good reason to have an implementations, so that users don't have to write it each time? I think like the nuances in the most efficient approach for each of the above requires a deep understanding of graphblas

@eriknw
Copy link
Member

eriknw commented Aug 12, 2023

I'd like to entertain the possibility of a simpler A.setdiag method.

You both make good points. I agree this may be a common and useful operation. It doesn't need to do every option that Jim listed--and probably shouldn't--but its docstring should discuss how to do things "manually" if one needs accumulation, masking, etc. This is a reasonable method to look for--especially since scipy has it--so we can make it findable, useful, and educational.

Firstly, I don't think A.setdiag needs to be lazy. A.diag and v.diag both return new objects, not expressions. Calling A.setdiag can update A.

Here's what I think it could look like:

def setdiag(self, values, k=0):

where values can be:

  • scalar (Python scalar or gb.Scalar)
    • replace all diagonal elements with the value
  • np.ndarray
    • replace all diagonal element with the values
  • Vector
    • replace diagonal values where the vector has values, and leave other diagonal values alone
  • Matrix (maybe)
    • replaces diagonal values with the diagonal values of the input matrix, and leave missing diagonal elements alone

Right now, I'm leaning towards adding something like this, and am interested in hearing what other people think. Arguments for adding it are:

  • it has a good name
  • it is useful
  • it (probably) reads better (for most people)
  • implementating different cases can be non-obvious / non-trivial for some
    • in particular, the best/fastest way may not be obvious (we should benchmark and choose)
  • the implementation can be simple (for us experts) and fast
  • people may look for it and expect it (as @q-aaronzolnailucas did)
  • it can be educational

Some arguments for not adding it are:

  • API pressure to add mask (vector or matrix?) and accumulator as keywords, which is not a common pattern for us
  • explicit recipes exist to do the operation you want
  • may be ambiguous when reading code whether missing values from vector will remove diagonal elements from A
    • for example, is it reasonable to expect that A.setdiag(v) ; v.isequal(A.diag()) is always true?
    • API pressure to add an option to control this behavior (i.e., clear_missing= or whatever)
    • this will need to be clearly documented
  • should setdiag return self so it can be used in method chaining, or should it return None so there is no confusion about return value (since it doesn't return an expression)?

I'll retort to myself that API pressure isn't necessarily a bad thing. It offers possible future enhancements if there is sufficient need.

Thanks for the suggestion and engaging with us @q-aaronzolnailucas! I really appreciate ideas to improve usability.

@eriknw
Copy link
Member

eriknw commented Sep 5, 2023

We've discussed various ways to set the diagonal during our weekly meetings. Here are some options as I understand them:

# 1
A.setdiag(v)
A.setdiag(1, k=1)
A.setdiag([1, 2, 3])
A('+').setdiag(v)
A(v.S).setdiag(v)

# 2
A.setdiag(v)
A.setdiag(v, accum="+")
A.setdiag(v, mask=v.S)

# 3
A[diag] = v
A[diag(0)] = v
A(...)[diag(0)] = v
A[diag(0)](...) << v

# 4
A[diag, 0] = v
A['diag', 0] = v

# 5
#A.diag[0].new()  # extract diagonal--maybe--not in GraphBLAS (diag is a constructor)
A.diag[0] = v
# Note we already have A.diag() -> Vector
v = A.diag(k)
A.diag[k] = v

I think the top two options are 2 and 3. Maybe we can begin with option 2?

@eriknw
Copy link
Member

eriknw commented Sep 22, 2023

Closing. A.setdiag(k=k, mask=mask, accum=accum) was added in #493, and extracting the diagonal is best done with A.diag(k) or gb.select.diag(A, k).

Thanks again for the questions and suggestions @q-aaronzolnailucas! I think A.setdiag is a nice addition to the GraphBLAS API.

@eriknw eriknw closed this as completed Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussing a topic with no specific actions yet
Projects
None yet
Development

No branches or pull requests

3 participants