AttributeError in Birch for StandardScaled values

I have run into this issue which shows up in a few older reported issues as well, but the current open is a bit different. 

Using this [data](https://drive.google.com/file/d/1jX2Xu15MPJcSwPOb07bX7V456BJVE251/view?usp=sharing), and running this code:

```python
import dask.array as da
from sklearn.cluster import Birch
from sklearn.preprocessing import StandardScaler

data = da.from_zarr("bug_data.zarr")
print("Data Shape: ", data.shape)
print("Min, Max, Mean, StDev.: ", data.min().compute(), data.max().compute(), data.mean().compute(), data.std().compute())
scaler = StandardScaler()
scaler.fit(data)
data = scaler.transform(data)
print("Post-Scale - Min, Max, Mean, StDev.: ", data.min(), data.max(), data.mean(), data.std())
clustering = Birch(branching_factor=5, threshold=1e-5, n_clusters=None)
clustering.fit(data)

```
I run into this error:

```pytb
Data Shape:  (150000, 2000)
Min, Max, Mean, StDev.:  -1.7028557 5.1015463 0.020574544 0.32617828
/home/nlahaye/.local/lib/python3.8/site-packages/dask/array/core.py:1650: FutureWarning: The `numpy.may_share_memory` function is not implemented by Dask array. You may want to use the da.map_blocks function or something similar to silence this warning. Your code may stop working in a future release.
  warnings.warn(
Post-Scale - Min, Max, Mean, StDev.:  -5.8093686 7.8372993 4.7429404e-11 1.0000027
Traceback (most recent call last):
  File "clustering_bug.py", line 25, in <module>
    clustering.fit(data)
  File "/home/nlahaye/.local/lib/python3.8/site-packages/sklearn/cluster/_birch.py", line 517, in fit
    return self._fit(X, partial=False)
  File "/home/nlahaye/.local/lib/python3.8/site-packages/sklearn/cluster/_birch.py", line 562, in _fit
    split = self.root_.insert_cf_subcluster(subcluster)
  File "/home/nlahaye/.local/lib/python3.8/site-packages/sklearn/cluster/_birch.py", line 200, in insert_cf_subcluster
    split_child = closest_subcluster.child_.insert_cf_subcluster(subcluster)
  File "/home/nlahaye/.local/lib/python3.8/site-packages/sklearn/cluster/_birch.py", line 200, in insert_cf_subcluster
    split_child = closest_subcluster.child_.insert_cf_subcluster(subcluster)
  File "/home/nlahaye/.local/lib/python3.8/site-packages/sklearn/cluster/_birch.py", line 200, in insert_cf_subcluster
    split_child = closest_subcluster.child_.insert_cf_subcluster(subcluster)
  [Previous line repeated 3 more times]
  File "/home/nlahaye/.local/lib/python3.8/site-packages/sklearn/cluster/_birch.py", line 221, in insert_cf_subcluster
    self.update_split_subclusters(
  File "/home/nlahaye/.local/lib/python3.8/site-packages/sklearn/cluster/_birch.py", line 179, in update_split_subclusters
    self.init_sq_norm_[ind] = new_subcluster1.sq_norm_
AttributeError: '_CFSubcluster' object has no attribute 'sq_norm_'
```


For simplicity, I extracted this code and stripped away dask-ml wrappers from software I use for clustering, and have been able to successfully complete jobs with other datasets. This data is also a reduced set from a dataset that has many more samples.

```shell
Environment:
OS - CentOS-7
python - v3.8.2
dask - v2022.04.1
sklearn - v1.0.2
```

Please let me know if there is any other info you would like, etc.

Thanks!
Nick

_Originally posted by @nlahaye in https://github.com/scikit-learn/scikit-learn/issues/17966#issuecomment-1112848112_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

AttributeError in Birch for StandardScaled values #23269

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

AttributeError in Birch for StandardScaled values #23269

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions