Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
208 commits
Select commit Hold shift + click to select a range
b4208f2
Add per feature max_categories for OrdinalEncoder
Andrew-Wang-IB45 Apr 25, 2023
a3db2b6
Fix formatting
Andrew-Wang-IB45 Apr 26, 2023
39e81ab
Update behaviour of max_categories in OrdinalEncoder
Andrew-Wang-IB45 Apr 27, 2023
bbf9031
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 27, 2023
b06a0d5
Fix errors pertaining to checking for infrequent categories
Andrew-Wang-IB45 Apr 27, 2023
be5242a
Only check max_categories in OrdinalEncoder when it is an array-like …
Andrew-Wang-IB45 Apr 28, 2023
f43456e
Update changelog
Andrew-Wang-IB45 Apr 28, 2023
310f81e
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 28, 2023
d558687
Improve ordering of checking max_categories and add tests for Ordinal…
Andrew-Wang-IB45 Apr 29, 2023
5c6c690
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 29, 2023
dd26ec1
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 6, 2023
84a5e2c
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 10, 2023
de713ce
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 13, 2023
cb09cac
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 17, 2023
77e9567
Add _max_categories_per_feature attribute to BaseEncoder and remove o…
Andrew-Wang-IB45 May 17, 2023
20c4489
Update tests
Andrew-Wang-IB45 May 17, 2023
918a7a8
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 24, 2023
ea0a7fb
Update doc/modules/preprocessing.rst
Andrew-Wang-IB45 May 26, 2023
3427924
Update sklearn/preprocessing/_encoders.py
Andrew-Wang-IB45 May 26, 2023
1fa914a
Simplify error message for array-like max_categories
Andrew-Wang-IB45 May 26, 2023
d4ef66d
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 26, 2023
1c7cf7f
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 27, 2023
25c6017
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 31, 2023
140ce59
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 3, 2023
f20e430
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 10, 2023
89a49aa
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 14, 2023
6488f9b
Fix indentation on changelog
Andrew-Wang-IB45 Jun 14, 2023
5ab8865
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 15, 2023
d8dc3ad
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 16, 2023
98e9dce
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 17, 2023
dcbbe8f
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 20, 2023
08c3816
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 21, 2023
1a6e63a
Fix linting issues
Andrew-Wang-IB45 Jun 21, 2023
1c63dba
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 22, 2023
bce9206
Fix linting issues
Andrew-Wang-IB45 Jun 22, 2023
a29427d
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 23, 2023
1491db0
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 24, 2023
4d428cb
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 27, 2023
ce001cc
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 28, 2023
2065cf5
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 29, 2023
2e8264d
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 30, 2023
9a96682
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 1, 2023
c6a3ce1
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 4, 2023
f97cac6
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 6, 2023
6572de9
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 7, 2023
4f11199
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 8, 2023
c952757
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 11, 2023
85d055a
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 12, 2023
7c7d8f8
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 13, 2023
a946883
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 14, 2023
2eeb6b0
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 15, 2023
ca89cd8
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 18, 2023
6b1bc4e
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 19, 2023
adebab7
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 21, 2023
df75db1
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 21, 2023
a403c63
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 25, 2023
c74473f
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 26, 2023
926f10d
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 29, 2023
4298516
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 2, 2023
a15ba94
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 3, 2023
2d820ff
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 5, 2023
36e3cb2
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 8, 2023
de66f7c
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 9, 2023
c73c024
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 10, 2023
92e5b0d
Migrate changelog from v1.3 to v1.4
Andrew-Wang-IB45 Aug 10, 2023
8605223
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 11, 2023
9197d06
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 12, 2023
9e13394
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 12, 2023
03632c1
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 17, 2023
9953091
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 19, 2023
d0710bd
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 22, 2023
f6bd5ef
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 26, 2023
9070a09
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 31, 2023
06c3db0
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 2, 2023
1d5d74f
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 5, 2023
d43c651
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 7, 2023
3b33463
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 8, 2023
1b77d39
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 11, 2023
0dc270c
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 12, 2023
f488416
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 12, 2023
d3eb9cf
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 14, 2023
101d7e4
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 16, 2023
b21bed0
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 17, 2023
ba051a0
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 17, 2023
2c9469a
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 20, 2023
5490620
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 21, 2023
b8a2922
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 29, 2023
486bd77
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Oct 5, 2023
627ea6c
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Oct 7, 2023
fa2c5aa
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Oct 12, 2023
097d917
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Oct 14, 2023
75e3604
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Oct 17, 2023
27fc15d
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Oct 18, 2023
a97d1ab
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Oct 25, 2023
f6c9774
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Oct 29, 2023
8413007
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Nov 11, 2023
b080ae3
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Nov 14, 2023
7fdee29
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Nov 14, 2023
2c7f31b
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Nov 18, 2023
0708111
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Nov 21, 2023
18f9c3c
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Nov 22, 2023
87582d6
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
ogrisel Nov 24, 2023
6ceff30
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Nov 24, 2023
513aeaa
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Nov 25, 2023
b65cf15
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Nov 27, 2023
b53bd46
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Nov 28, 2023
3b85a37
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Dec 4, 2023
0cb5ba8
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Dec 6, 2023
6876750
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Dec 9, 2023
e6e2713
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Dec 12, 2023
ddb0942
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Dec 16, 2023
b981f54
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Dec 20, 2023
84f9105
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Dec 22, 2023
d28ab18
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Dec 23, 2023
a426ae5
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Dec 27, 2023
d9c9b13
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Dec 31, 2023
3cfd7eb
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jan 3, 2024
863de89
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jan 5, 2024
51737f9
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jan 11, 2024
badda05
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jan 14, 2024
7428a4f
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jan 19, 2024
35ba67b
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jan 23, 2024
729f8f6
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jan 24, 2024
58ca00d
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jan 27, 2024
5c08ae8
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 1, 2024
90ce34d
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 2, 2024
98dfacb
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 7, 2024
8d42d95
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 9, 2024
35a4574
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 10, 2024
77c5a25
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 13, 2024
897d000
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 15, 2024
bd83262
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 17, 2024
a8dd037
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 17, 2024
3f2e48a
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 20, 2024
0cc71a3
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 21, 2024
3cddd6d
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 24, 2024
b1c569f
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Feb 27, 2024
14fe7b8
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Mar 2, 2024
1530b8f
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Mar 4, 2024
4e47c0b
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Mar 6, 2024
074afc8
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Mar 9, 2024
32cdaee
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Mar 12, 2024
3b80888
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Mar 14, 2024
c6c00d5
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Mar 15, 2024
fe20633
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Mar 22, 2024
54c75e4
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Mar 26, 2024
25f788e
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Mar 29, 2024
89f8d06
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 2, 2024
d98a3ca
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 4, 2024
07f08f5
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 6, 2024
1c5cb23
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 9, 2024
7cb326d
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 11, 2024
aba6b15
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 13, 2024
a9b90c6
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 16, 2024
8ba36a4
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 16, 2024
1ad98da
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 20, 2024
e2937b4
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 24, 2024
6dac729
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 25, 2024
93926b4
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Apr 26, 2024
5de0a59
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 1, 2024
4050369
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 7, 2024
f41bd8c
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 8, 2024
4d20133
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 11, 2024
ed1b25b
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 15, 2024
df6227d
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 17, 2024
48a8156
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 21, 2024
4b47648
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 22, 2024
76b14ef
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 25, 2024
6d8000b
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 28, 2024
c72fb61
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 May 29, 2024
656a621
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 1, 2024
50a07e1
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 8, 2024
2ed52ee
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 15, 2024
de34c7d
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 21, 2024
c4fcff1
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 22, 2024
9a0df29
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 25, 2024
336f938
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jun 29, 2024
205d431
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 3, 2024
768d8b2
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 6, 2024
4bc68db
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 9, 2024
c27b98f
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 13, 2024
3eab8e7
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 17, 2024
2ef436b
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 20, 2024
39f711f
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 21, 2024
e96d4de
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 23, 2024
9b8df70
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 24, 2024
cbed9ab
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 26, 2024
5541c2b
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 30, 2024
217e7cc
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jul 31, 2024
d92cf8c
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 1, 2024
ee8cde0
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 3, 2024
f30113a
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 6, 2024
9f931f1
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 8, 2024
3c31f99
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 13, 2024
4f6c0a5
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 14, 2024
8abe364
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 16, 2024
8017913
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 17, 2024
c3114b4
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 20, 2024
3337745
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 21, 2024
970a013
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 22, 2024
5b57ce1
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 24, 2024
4bb5aaa
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 27, 2024
7bc2d8c
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Aug 31, 2024
ea03e99
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 4, 2024
748b876
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 13, 2024
35dd88c
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Sep 14, 2024
0fe87ab
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Oct 5, 2024
2ca25fe
Merge branch 'main' into ordinal_encoder_max_categories_per_feature
Andrew-Wang-IB45 Jan 27, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions doc/modules/preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -752,10 +752,12 @@ enable the gathering of infrequent categories are `min_frequency` and
this fraction of the total number of samples will be considered infrequent.
The default value is 1, which means every category is encoded separately.

2. `max_categories` is either `None` or any integer greater than 1. This
parameter sets an upper limit to the number of output features for each
input feature. `max_categories` includes the feature that combines
infrequent categories.
2. `max_categories` is either `None` or. any integer greater or equal to 1.
:class:`OrdinalEncoder` also supports an array-like containing `None` and
integers or a dictionary mapping a feature name found in `feature_names_in_`
to an integer. This parameter sets an upper limit to the
number of output categories for each input feature. `max_categories`
includes the category that combines infrequent categories.

In the following example with :class:`OrdinalEncoder`, the categories `'dog'` and
`'snake'` are considered infrequent::
Expand Down
6 changes: 6 additions & 0 deletions doc/whats_new/v1.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -925,6 +925,12 @@ Changelog
- |Fix| :class:`preprocessing.OneHotEncoder` and :class:`preprocessing.OrdinalEncoder`
raise an exception if the user provided categories contain duplicates.
:pr:`27328` by :user:`Xuefeng Xu <xuefeng-xu>`.

- |Enhancement| Added support for passing `max_categories` as `array-like` or
`dict` in :class:`preprocessing.OrdinalEncoder`. This allows specifying the
maximum number of output categories for each input feature instead of being
restricted to setting a global maximum number of output categories.
:pr:`26284` by :user:`Andrew Wang <Andrew-Wang-IB45>`.

- |Fix| :class:`preprocessing.FunctionTransformer` raises an error at `transform` if
the output of `get_feature_names_out` is not consistent with the column names of the
Expand Down
103 changes: 95 additions & 8 deletions sklearn/preprocessing/_encoders.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,15 +265,84 @@ def infrequent_categories_(self):
for category, indices in zip(self.categories_, infrequent_indices)
]

def _validate_max_categories(self):
"""
Check max_categories and returns the corresponding array.
"""
max_categories = getattr(self, "max_categories", None)

if isinstance(max_categories, Integral) and max_categories >= 1:
return [max_categories] * self.n_features_in_

elif isinstance(max_categories, dict):
if not hasattr(self, "feature_names_in_"):
raise ValueError(
f"{self.__class__.__name__} was not fitted on data "
"with feature names. Pass max_categories as an integer "
"array instead."
)

unexpected_feature_names = list(
set(self.max_categories) - set(self.feature_names_in_)
)
if unexpected_feature_names:
unexpected_feature_names.sort() # deterministic error message
n_unexpected = len(unexpected_feature_names)
if len(unexpected_feature_names) > 5:
unexpected_feature_names = unexpected_feature_names[:5]
unexpected_feature_names.append("...")
raise ValueError(
f"max_categories contains {n_unexpected} unexpected feature "
f"names: {unexpected_feature_names}."
)

max_categories_array = [None] * self.n_features_in_
for feature_idx, feature_name in enumerate(self.feature_names_in_):
if feature_name in max_categories:
max_count = max_categories[feature_name]
if not (isinstance(max_count, Integral) and max_count >= 1):
raise ValueError(
f"max_categories['{feature_name}'] must be an "
f"integer at least 1. Got {max_count!r}."
)
max_categories_array[feature_idx] = max_count
return max_categories_array if any(max_categories_array) else None

elif _is_arraylike_not_scalar(max_categories):
max_categories = np.asarray(max_categories)
if (
max_categories.ndim != 1
or max_categories.shape[0] != self.n_features_in_
):
raise ValueError(
f"max_categories has shape {max_categories.shape} but the "
f"input data X has {self.n_features_in_} features."
)

if any(
max_count is not None
and not (isinstance(max_count, Integral) and max_count >= 1)
for max_count in max_categories
):
raise ValueError(
"max_categories must be an array-like of None or integers "
"at least 1."
)

return max_categories if any(max_categories) else None

else:
return None

def _check_infrequent_enabled(self):
"""
This functions checks whether _infrequent_enabled is True or False.
This has to be called after parameter validation in the fit function.
"""
max_categories = getattr(self, "max_categories", None)
self._max_categories_per_feature = self._validate_max_categories()
min_frequency = getattr(self, "min_frequency", None)
self._infrequent_enabled = (
max_categories is not None and max_categories >= 1
self._max_categories_per_feature is not None
) or min_frequency is not None

def _identify_infrequent(self, category_count, n_samples, col_idx):
Expand Down Expand Up @@ -305,9 +374,14 @@ def _identify_infrequent(self, category_count, n_samples, col_idx):
infrequent_mask = np.zeros(category_count.shape[0], dtype=bool)

n_current_features = category_count.size - infrequent_mask.sum() + 1
if self.max_categories is not None and self.max_categories < n_current_features:
if self._max_categories_per_feature is not None:
max_categories = self._max_categories_per_feature[col_idx]
else:
max_categories = None

if max_categories is not None and max_categories < n_current_features:
# max_categories includes the one infrequent category
frequent_category_count = self.max_categories - 1
frequent_category_count = max_categories - 1
if frequent_category_count == 0:
# All categories are infrequent
infrequent_mask[:] = True
Expand Down Expand Up @@ -1318,12 +1392,20 @@ class OrdinalEncoder(OneToOneFeatureMixin, _BaseEncoder):
.. versionadded:: 1.3
Read more in the :ref:`User Guide <encoder_infrequent_categories>`.

max_categories : int, default=None
max_categories : int, array-like of int, dict of str or None, default=None
Specifies an upper limit to the number of output categories for each input
feature when considering infrequent categories. If there are infrequent
categories, `max_categories` includes the category representing the
infrequent categories along with the frequent categories. If `None`,
there is no limit to the number of output features.
infrequent categories along with the frequent categories.

- If int, then `max_categories` is the upper limit of output categories
for all input features.
- If array-like, then each item in `max_categories` is the upper limit
of output categories for the corresponding input feature.
- If dict, then its keys should be the feature names occurring in
`feature_names_in_` and the corresponding values should be the
upper limits of output categories.
- If `None`, then there is no limit to the number of output categories.

`max_categories` do **not** take into account missing or unknown
categories. Setting `unknown_value` or `encoded_missing_value` to an
Expand Down Expand Up @@ -1443,7 +1525,12 @@ class OrdinalEncoder(OneToOneFeatureMixin, _BaseEncoder):
"encoded_missing_value": [Integral, type(np.nan)],
"handle_unknown": [StrOptions({"error", "use_encoded_value"})],
"unknown_value": [Integral, type(np.nan), None],
"max_categories": [Interval(Integral, 1, None, closed="left"), None],
"max_categories": [
Interval(Integral, 1, None, closed="left"),
"array-like",
dict,
None,
],
"min_frequency": [
Interval(Integral, 1, None, closed="left"),
Interval(RealNotInt, 0, 1, closed="neither"),
Expand Down
Loading
Loading