Skip to content

ENH: (linalg) add weighted Gram matrix computation #29567

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

MengAiDev
Copy link

ENH: Add weighted_gram_matrix function to numpy.linalg

Implement weighted_gram_matrix function in numpy/linalg/_linalg.py to
compute weighted Gram matrices efficiently. Add corresponding test cases
in numpy/linalg/tests/test_linalg.py to verify correctness. Update
.gitignore to exclude .env/ directory from version control.

See #29559

- Implement weighted_gram_matrix function in numpy/linalg/_linalg.py
- Add corresponding test cases in numpy/linalg/tests/test_linalg.py
- Update .gitignore to exclude .env/ directory
X = asanyarray(X)

if X.ndim != 2:
raise ValueError("X must be a 2-D array")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be generalized to higher dimensions, right?

Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, especially the tests of many of the corner cases. I am not sure we are ready for a Pr to implement this, as the issue has two one-line implementations in the comments, and a fast cython-based implementation is in tabmat.sandwich with specializations for sparse and categorical matrices. Before this gets much further, I think it would be nice to see some benchmarks of this approach vs. tabmat for some larger matrices.

- Extend weighted_gram_matrix to handle arrays with more than 2 dimensions
- Apply computation to the last two axes for higher-dimensional arrays
- Update function signature and docstring to reflect new behavior
- Add tests for 3D and 4D arrays
- Improve error handling for mismatched weights shape
@MengAiDev MengAiDev requested a review from mattip August 15, 2025 09:31
@MengAiDev
Copy link
Author

Here is a benchmark:

#!/usr/bin/env python3
"""
Benchmark script to compare NumPy's weighted_gram_matrix implementation 
with tabmat's sandwich product.
"""

import time
import numpy as np
import tabmat


def benchmark_numpy_implementation(X, weights=None, iterations=5):
    """Benchmark NumPy's weighted_gram_matrix implementation."""
    times = []
    for _ in range(iterations):
        start = time.perf_counter()
        result = np.linalg.weighted_gram_matrix(X, weights=weights)
        end = time.perf_counter()
        times.append(end - start)
    return np.mean(times), result


def benchmark_tabmat_implementation(X, weights=None, iterations=5):
    """Benchmark tabmat's sandwich product implementation."""
    # Convert to tabmat format
    if isinstance(X, np.ndarray):
        X_tabmat = tabmat.DenseMatrix(X)
    else:
        X_tabmat = X
    
    times = []
    for _ in range(iterations):
        start = time.perf_counter()
        if weights is not None:
            result = X_tabmat.sandwich(weights)
        else:
            # For no weights, use sandwich with ones
            result = X_tabmat.sandwich(np.ones(X.shape[0]))
        end = time.perf_counter()
        times.append(end - start)
    return np.mean(times), result


def run_benchmarks():
    """Run benchmarks for different matrix sizes."""
    print("Running benchmarks for weighted Gram matrix computation...")
    print("=" * 60)
    
    # Test cases with different sizes
    test_cases = [
        {"name": "Small (100x50)", "shape": (100, 50)},
        {"name": "Medium (1000x500)", "shape": (1000, 500)},
        {"name": "Large (5000x2000)", "shape": (5000, 2000)},
    ]
    
    results = []
    
    for case in test_cases:
        print(f"\nTesting {case['name']} matrix...")
        print("-" * 30)
        
        # Generate test data
        np.random.seed(42)  # For reproducibility
        X = np.random.rand(*case['shape'])
        weights = np.random.rand(case['shape'][0])
        
        # Benchmark NumPy implementation
        numpy_time, numpy_result = benchmark_numpy_implementation(X, weights)
        print(f"NumPy implementation: {numpy_time:.6f} seconds")
        
        # Benchmark tabmat implementation
        tabmat_time, tabmat_result = benchmark_tabmat_implementation(X, weights)
        print(f"Tabmat implementation: {tabmat_time:.6f} seconds")
        
        # Calculate speedup
        speedup = numpy_time / tabmat_time if tabmat_time > 0 else float('inf')
        print(f"Speedup (NumPy/tabmat): {speedup:.2f}x")
        
        # Verify results are similar (within tolerance)
        diff = np.abs(numpy_result - tabmat_result).max()
        print(f"Max difference between implementations: {diff:.2e}")
        
        results.append({
            "case": case["name"],
            "numpy_time": numpy_time,
            "tabmat_time": tabmat_time,
            "speedup": speedup,
            "max_diff": diff
        })
    
    # Print summary
    print("\n" + "=" * 60)
    print("BENCHMARK SUMMARY")
    print("=" * 60)
    print(f"{'Case':<15} {'NumPy (s)':<12} {'Tabmat (s)':<12} {'Speedup':<10} {'Max Diff':<10}")
    print("-" * 60)
    
    for result in results:
        print(f"{result['case']:<15} {result['numpy_time']:<12.6f} {result['tabmat_time']:<12.6f} "
              f"{result['speedup']:<10.2f}x {result['max_diff']:<10.2e}")


if __name__ == "__main__":
    run_benchmarks()

- Add performance tests for medium (1000x500) and large (2000x500) matrices
- Include assertion of result shape and symmetry in performance tests
- Update function docstring to suggest optimized libraries for large matrices
@MengAiDev
Copy link
Author

I update it because when handle small matrixs, numpy is faster, otherwise, tabmat is faster

Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like has been said before, this PR is a bit premature, as this should be discussed in the mailing list first.
But if we assume that there'll be a consensus in favor of this, then this would also require the stubs to be updated. Specifically in https://github.com/numpy/numpy/blob/8b89d0f127eb76c8f84c14b4814a18814f3bdec7/numpy/linalg/linalg.pyi and https://github.com/numpy/numpy/blob/main/numpy/linalg/_linalg.pyi

@MengAiDev
Copy link
Author

It seems the gitignore is the same as this commit: 0c203ed

@jorenham
Copy link
Member

It seems the gitignore is the same as this commit: 0c203ed

well, almost; you're missing a trailing newline

@jorenham jorenham self-requested a review August 16, 2025 04:08
@jorenham jorenham changed the title feat(linalg): add weighted Gram matrix computation ENH: (linalg) add weighted Gram matrix computation Aug 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants