-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
ENH: (linalg
) add weighted Gram matrix computation
#29567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
numpy/linalg/_linalg.py
Outdated
X = asanyarray(X) | ||
|
||
if X.ndim != 2: | ||
raise ValueError("X must be a 2-D array") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be generalized to higher dimensions, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, especially the tests of many of the corner cases. I am not sure we are ready for a Pr to implement this, as the issue has two one-line implementations in the comments, and a fast cython-based implementation is in tabmat.sandwich with specializations for sparse and categorical matrices. Before this gets much further, I think it would be nice to see some benchmarks of this approach vs. tabmat for some larger matrices.
- Extend weighted_gram_matrix to handle arrays with more than 2 dimensions - Apply computation to the last two axes for higher-dimensional arrays - Update function signature and docstring to reflect new behavior - Add tests for 3D and 4D arrays - Improve error handling for mismatched weights shape
Here is a benchmark: #!/usr/bin/env python3
"""
Benchmark script to compare NumPy's weighted_gram_matrix implementation
with tabmat's sandwich product.
"""
import time
import numpy as np
import tabmat
def benchmark_numpy_implementation(X, weights=None, iterations=5):
"""Benchmark NumPy's weighted_gram_matrix implementation."""
times = []
for _ in range(iterations):
start = time.perf_counter()
result = np.linalg.weighted_gram_matrix(X, weights=weights)
end = time.perf_counter()
times.append(end - start)
return np.mean(times), result
def benchmark_tabmat_implementation(X, weights=None, iterations=5):
"""Benchmark tabmat's sandwich product implementation."""
# Convert to tabmat format
if isinstance(X, np.ndarray):
X_tabmat = tabmat.DenseMatrix(X)
else:
X_tabmat = X
times = []
for _ in range(iterations):
start = time.perf_counter()
if weights is not None:
result = X_tabmat.sandwich(weights)
else:
# For no weights, use sandwich with ones
result = X_tabmat.sandwich(np.ones(X.shape[0]))
end = time.perf_counter()
times.append(end - start)
return np.mean(times), result
def run_benchmarks():
"""Run benchmarks for different matrix sizes."""
print("Running benchmarks for weighted Gram matrix computation...")
print("=" * 60)
# Test cases with different sizes
test_cases = [
{"name": "Small (100x50)", "shape": (100, 50)},
{"name": "Medium (1000x500)", "shape": (1000, 500)},
{"name": "Large (5000x2000)", "shape": (5000, 2000)},
]
results = []
for case in test_cases:
print(f"\nTesting {case['name']} matrix...")
print("-" * 30)
# Generate test data
np.random.seed(42) # For reproducibility
X = np.random.rand(*case['shape'])
weights = np.random.rand(case['shape'][0])
# Benchmark NumPy implementation
numpy_time, numpy_result = benchmark_numpy_implementation(X, weights)
print(f"NumPy implementation: {numpy_time:.6f} seconds")
# Benchmark tabmat implementation
tabmat_time, tabmat_result = benchmark_tabmat_implementation(X, weights)
print(f"Tabmat implementation: {tabmat_time:.6f} seconds")
# Calculate speedup
speedup = numpy_time / tabmat_time if tabmat_time > 0 else float('inf')
print(f"Speedup (NumPy/tabmat): {speedup:.2f}x")
# Verify results are similar (within tolerance)
diff = np.abs(numpy_result - tabmat_result).max()
print(f"Max difference between implementations: {diff:.2e}")
results.append({
"case": case["name"],
"numpy_time": numpy_time,
"tabmat_time": tabmat_time,
"speedup": speedup,
"max_diff": diff
})
# Print summary
print("\n" + "=" * 60)
print("BENCHMARK SUMMARY")
print("=" * 60)
print(f"{'Case':<15} {'NumPy (s)':<12} {'Tabmat (s)':<12} {'Speedup':<10} {'Max Diff':<10}")
print("-" * 60)
for result in results:
print(f"{result['case']:<15} {result['numpy_time']:<12.6f} {result['tabmat_time']:<12.6f} "
f"{result['speedup']:<10.2f}x {result['max_diff']:<10.2e}")
if __name__ == "__main__":
run_benchmarks() |
- Add performance tests for medium (1000x500) and large (2000x500) matrices - Include assertion of result shape and symmetry in performance tests - Update function docstring to suggest optimized libraries for large matrices
I update it because when handle small matrixs, numpy is faster, otherwise, tabmat is faster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like has been said before, this PR is a bit premature, as this should be discussed in the mailing list first.
But if we assume that there'll be a consensus in favor of this, then this would also require the stubs to be updated. Specifically in https://github.com/numpy/numpy/blob/8b89d0f127eb76c8f84c14b4814a18814f3bdec7/numpy/linalg/linalg.pyi and https://github.com/numpy/numpy/blob/main/numpy/linalg/_linalg.pyi
It seems the gitignore is the same as this commit: 0c203ed |
well, almost; you're missing a trailing newline |
linalg
) add weighted Gram matrix computation
ENH: Add weighted_gram_matrix function to numpy.linalg
Implement weighted_gram_matrix function in numpy/linalg/_linalg.py to
compute weighted Gram matrices efficiently. Add corresponding test cases
in numpy/linalg/tests/test_linalg.py to verify correctness. Update
.gitignore to exclude .env/ directory from version control.
See #29559