ENH: (`linalg`) add weighted Gram matrix computation #29567

MengAiDev · 2025-08-15T02:08:57Z

ENH: Add weighted_gram_matrix function to numpy.linalg

Implement weighted_gram_matrix function in numpy/linalg/_linalg.py to
compute weighted Gram matrices efficiently. Add corresponding test cases
in numpy/linalg/tests/test_linalg.py to verify correctness. Update
.gitignore to exclude .env/ directory from version control.

See #29559

- Implement weighted_gram_matrix function in numpy/linalg/_linalg.py - Add corresponding test cases in numpy/linalg/tests/test_linalg.py - Update .gitignore to exclude .env/ directory

.gitignore

mattip · 2025-08-15T06:35:31Z

numpy/linalg/_linalg.py

+    X = asanyarray(X)
+
+    if X.ndim != 2:
+        raise ValueError("X must be a 2-D array")


This could be generalized to higher dimensions, right?

mattip

Nice, especially the tests of many of the corner cases. I am not sure we are ready for a Pr to implement this, as the issue has two one-line implementations in the comments, and a fast cython-based implementation is in tabmat.sandwich with specializations for sparse and categorical matrices. Before this gets much further, I think it would be nice to see some benchmarks of this approach vs. tabmat for some larger matrices.

- Extend weighted_gram_matrix to handle arrays with more than 2 dimensions - Apply computation to the last two axes for higher-dimensional arrays - Update function signature and docstring to reflect new behavior - Add tests for 3D and 4D arrays - Improve error handling for mismatched weights shape

MengAiDev · 2025-08-15T10:14:30Z

Here is a benchmark:

#!/usr/bin/env python3
"""
Benchmark script to compare NumPy's weighted_gram_matrix implementation 
with tabmat's sandwich product.
"""

import time
import numpy as np
import tabmat


def benchmark_numpy_implementation(X, weights=None, iterations=5):
    """Benchmark NumPy's weighted_gram_matrix implementation."""
    times = []
    for _ in range(iterations):
        start = time.perf_counter()
        result = np.linalg.weighted_gram_matrix(X, weights=weights)
        end = time.perf_counter()
        times.append(end - start)
    return np.mean(times), result


def benchmark_tabmat_implementation(X, weights=None, iterations=5):
    """Benchmark tabmat's sandwich product implementation."""
    # Convert to tabmat format
    if isinstance(X, np.ndarray):
        X_tabmat = tabmat.DenseMatrix(X)
    else:
        X_tabmat = X
    
    times = []
    for _ in range(iterations):
        start = time.perf_counter()
        if weights is not None:
            result = X_tabmat.sandwich(weights)
        else:
            # For no weights, use sandwich with ones
            result = X_tabmat.sandwich(np.ones(X.shape[0]))
        end = time.perf_counter()
        times.append(end - start)
    return np.mean(times), result


def run_benchmarks():
    """Run benchmarks for different matrix sizes."""
    print("Running benchmarks for weighted Gram matrix computation...")
    print("=" * 60)
    
    # Test cases with different sizes
    test_cases = [
        {"name": "Small (100x50)", "shape": (100, 50)},
        {"name": "Medium (1000x500)", "shape": (1000, 500)},
        {"name": "Large (5000x2000)", "shape": (5000, 2000)},
    ]
    
    results = []
    
    for case in test_cases:
        print(f"\nTesting {case['name']} matrix...")
        print("-" * 30)
        
        # Generate test data
        np.random.seed(42)  # For reproducibility
        X = np.random.rand(*case['shape'])
        weights = np.random.rand(case['shape'][0])
        
        # Benchmark NumPy implementation
        numpy_time, numpy_result = benchmark_numpy_implementation(X, weights)
        print(f"NumPy implementation: {numpy_time:.6f} seconds")
        
        # Benchmark tabmat implementation
        tabmat_time, tabmat_result = benchmark_tabmat_implementation(X, weights)
        print(f"Tabmat implementation: {tabmat_time:.6f} seconds")
        
        # Calculate speedup
        speedup = numpy_time / tabmat_time if tabmat_time > 0 else float('inf')
        print(f"Speedup (NumPy/tabmat): {speedup:.2f}x")
        
        # Verify results are similar (within tolerance)
        diff = np.abs(numpy_result - tabmat_result).max()
        print(f"Max difference between implementations: {diff:.2e}")
        
        results.append({
            "case": case["name"],
            "numpy_time": numpy_time,
            "tabmat_time": tabmat_time,
            "speedup": speedup,
            "max_diff": diff
        })
    
    # Print summary
    print("\n" + "=" * 60)
    print("BENCHMARK SUMMARY")
    print("=" * 60)
    print(f"{'Case':<15} {'NumPy (s)':<12} {'Tabmat (s)':<12} {'Speedup':<10} {'Max Diff':<10}")
    print("-" * 60)
    
    for result in results:
        print(f"{result['case']:<15} {result['numpy_time']:<12.6f} {result['tabmat_time']:<12.6f} "
              f"{result['speedup']:<10.2f}x {result['max_diff']:<10.2e}")


if __name__ == "__main__":
    run_benchmarks()

- Add performance tests for medium (1000x500) and large (2000x500) matrices - Include assertion of result shape and symmetry in performance tests - Update function docstring to suggest optimized libraries for large matrices

MengAiDev · 2025-08-15T10:43:47Z

I update it because when handle small matrixs, numpy is faster, otherwise, tabmat is faster

jorenham

Like has been said before, this PR is a bit premature, as this should be discussed in the mailing list first.
But if we assume that there'll be a consensus in favor of this, then this would also require the stubs to be updated. Specifically in https://github.com/numpy/numpy/blob/8b89d0f127eb76c8f84c14b4814a18814f3bdec7/numpy/linalg/linalg.pyi and https://github.com/numpy/numpy/blob/main/numpy/linalg/_linalg.pyi

.gitignore

MengAiDev · 2025-08-16T00:34:07Z

It seems the gitignore is the same as this commit: 0c203ed

jorenham · 2025-08-16T01:54:54Z

It seems the gitignore is the same as this commit: 0c203ed

well, almost; you're missing a trailing newline

MengAiDev added 3 commits August 15, 2025 10:03

feat(linalg): add weighted Gram matrix computation

0ba514e

- Implement weighted_gram_matrix function in numpy/linalg/_linalg.py - Add corresponding test cases in numpy/linalg/tests/test_linalg.py - Update .gitignore to exclude .env/ directory

lint fix

a57c04b

fix doc

f4f7a2d

MengAiDev mentioned this pull request Aug 15, 2025

ENH: weighted gram matrix / sandwich product #29559

Open

mattip reviewed Aug 15, 2025

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

mattip reviewed Aug 15, 2025

View reviewed changes

MengAiDev requested a review from mattip August 15, 2025 09:31

perf(linalg): add performance tests for weighted_gram_matrix

4b08a50

- Add performance tests for medium (1000x500) and large (2000x500) matrices - Include assertion of result shape and symmetry in performance tests - Update function docstring to suggest optimized libraries for large matrices

jorenham requested changes Aug 15, 2025

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

Update .gitignore

8804a2c

jorenham self-requested a review August 16, 2025 04:08

jorenham added the 01 - Enhancement label Aug 16, 2025

jorenham changed the title ~~feat(linalg): add weighted Gram matrix computation~~ ENH: (linalg) add weighted Gram matrix computation Aug 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: (`linalg`) add weighted Gram matrix computation #29567

ENH: (`linalg`) add weighted Gram matrix computation #29567

MengAiDev commented Aug 15, 2025

Uh oh!

Uh oh!

mattip Aug 15, 2025

Uh oh!

mattip left a comment

Uh oh!

MengAiDev commented Aug 15, 2025

Uh oh!

MengAiDev commented Aug 15, 2025

Uh oh!

jorenham left a comment

Uh oh!

Uh oh!

MengAiDev commented Aug 16, 2025

Uh oh!

jorenham commented Aug 16, 2025

Uh oh!

Uh oh!

Uh oh!

ENH: (linalg) add weighted Gram matrix computation #29567

Are you sure you want to change the base?

ENH: (linalg) add weighted Gram matrix computation #29567

Conversation

MengAiDev commented Aug 15, 2025

Uh oh!

Uh oh!

mattip Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

mattip left a comment

Choose a reason for hiding this comment

Uh oh!

MengAiDev commented Aug 15, 2025

Uh oh!

MengAiDev commented Aug 15, 2025

Uh oh!

jorenham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MengAiDev commented Aug 16, 2025

Uh oh!

jorenham commented Aug 16, 2025

Uh oh!

Uh oh!

ENH: (`linalg`) add weighted Gram matrix computation #29567

ENH: (`linalg`) add weighted Gram matrix computation #29567