How to copy a CSR matrix into a `sparsevec` column? #127

TopCoder2K · 2025-04-21T09:14:41Z

I have a sparse vector — the result of applying sklearn's TfidfVectorizer:

<Compressed Sparse Row sparse matrix of dtype 'float64'
        with 4 stored elements and shape (1, 157541)>
  Coords        Values
  (0, 5051)     0.35521903059198523
  (0, 14956)    0.5566306658037382
  (0, 45152)    0.7328483894186835
  (0, 60738)    0.1640578566196061

which I want to copy into a table with a sparsevec column. As far as I understand from the documentation, the correct way to do this is the following:

        with cur.copy(
            "COPY my_table FROM STDIN WITH (FORMAT BINARY)"
        ) as copy:
            copy.set_types(["sparsevec"])
            copy.write_row((SparseVector(the_sparse_vector),))

but this produces an error:

psycopg.errors.DataException: sparsevec indices must not contain duplicates

I've investigated a bit and found this line which uses value.coords[0] (not value.coords[1] for two dimensional input). Is this a bug? What should I do?

Additional information about the example:

The code

print(the_sparse_vector)
the_sparse_vector = the_sparse_vector.tocoo()
print(the_sparse_vector.ndim, the_sparse_vector.shape)
print(the_sparse_vector.coords)
print(the_sparse_vector.data)
print(SparseVector(the_sparse_vector))

outputs:

<Compressed Sparse Row sparse matrix of dtype 'float64'
        with 4 stored elements and shape (1, 157541)>
  Coords        Values
  (0, 5051)     0.35521903059198523
  (0, 14956)    0.5566306658037382
  (0, 45152)    0.7328483894186835
  (0, 60738)    0.1640578566196061
2 (1, 157541)
(array([0, 0, 0, 0], dtype=int32), array([ 5051, 14956, 45152, 60738], dtype=int32))
[0.35521903 0.55663067 0.73284839 0.16405786]
SparseVector({0: 0.1640578566196061}, 157541)

I have

psycopg           3.2.6
psycopg-binary    3.2.6
pgvector          0.4.0
scipy             1.15.2

The text was updated successfully, but these errors were encountered:

TopCoder2K · 2025-04-21T09:43:26Z

I've implemented a quick fix:

class SparseVectorFixed(SparseVector):
    def _from_sparse(self, value):
        value = value.tocoo()

        if value.ndim == 1:
            self._dim = value.shape[0]
        elif value.ndim == 2 and value.shape[0] == 1:
            self._dim = value.shape[1]
        else:
            raise ValueError('expected ndim to be 1')

        if hasattr(value, 'coords'):
            # scipy 1.13+
            ### Start of changes ###
            if value.ndim == 1:
                self._indices = value.coords[0].tolist()
            else:
                self._indices = value.coords[1].tolist()
            ### End of changes ###
        else:
            self._indices = value.col.tolist()
        self._values = value.data.tolist()

and it seems to work fine: the result of SELECT from the table —

                                pos_vector                                 
---------------------------------------------------------------------------
 {5052:0.35521904,14957:0.5566307,45153:0.7328484,60739:0.16405785}/157541
(1 row)

though I want the changes to be confirmed by experts.

ankane · 2025-04-21T10:11:09Z

Hi @TopCoder2K, thanks for reporting! Pushed a fix in the commit above.

ankane closed this as completed in 713590a Apr 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to copy a CSR matrix into a `sparsevec` column? #127

How to copy a CSR matrix into a `sparsevec` column? #127

TopCoder2K commented Apr 21, 2025

TopCoder2K commented Apr 21, 2025 •

edited

Loading

ankane commented Apr 21, 2025

How to copy a CSR matrix into a sparsevec column? #127

How to copy a CSR matrix into a sparsevec column? #127

Comments

TopCoder2K commented Apr 21, 2025

TopCoder2K commented Apr 21, 2025 • edited Loading

ankane commented Apr 21, 2025

How to copy a CSR matrix into a `sparsevec` column? #127

How to copy a CSR matrix into a `sparsevec` column? #127

TopCoder2K commented Apr 21, 2025 •

edited

Loading