visualpython
diff --git a/‎docs/.gitbook/assets/image (322).png
236 KB b/‎docs/.gitbook/assets/image (322).png
236 KB
diff --git a/‎docs/.gitbook/assets/image (323).png
48.9 KB b/‎docs/.gitbook/assets/image (323).png
48.9 KB
diff --git a/‎docs/.gitbook/assets/image (324).png
66.2 KB b/‎docs/.gitbook/assets/image (324).png
66.2 KB
diff --git a/‎docs/.gitbook/assets/image (325).png
63.6 KB b/‎docs/.gitbook/assets/image (325).png
63.6 KB
diff --git a/‎docs/.gitbook/assets/image (326).png
77.6 KB b/‎docs/.gitbook/assets/image (326).png
77.6 KB
diff --git a/‎docs/machine-learning/3.-data-prep.md
Lines changed: 69 additions & 10 deletions b/‎docs/machine-learning/3.-data-prep.md
Lines changed: 69 additions & 10 deletions
@@ -1,20 +1,79 @@
-# 3. Data Prep
-
-
+---
+description: Tools for Preprocessing(Encoding/Scaling)
+---
 
-<figure><img src="../.gitbook/assets/image (148).png" alt="" width="211"><figcaption></figcaption></figure>
-
-1. Click on Data Prep in the Machine Learning category.
+# 3. Data Prep
 
+<figure><img src="../.gitbook/assets/image (322).png" alt="" width="529"><figcaption></figcaption></figure>
 
+1. Click on **Data Prep** in the **Machine Learning** category.
 
-<figure><img src="../.gitbook/assets/image (149).png" alt="" width="563"><figcaption></figcaption></figure>
+<figure><img src="../.gitbook/assets/image (323).png" alt="" width="563"><figcaption></figcaption></figure>
 
 2. _**Model Type**_: You can perform various preprocessing tasks:
-   * Encoding
-   * Scaling
-   * ETC
+   * [**Encoding**](3.-data-prep.md#encoding)
+   * [**Scaling**](3.-data-prep.md#scaling)
+   * [**ETC**](3.-data-prep.md#etc-simpleimputer-smote-makecolumntransformer)
 3. _**Allocate to**_: Assign variable names for the model to perform the selected preprocessing tasks.
 4. _**Code View**_: Preview the code that will be output.
 5. _**Run**_: Execute the code.
 
+
+
+***
+
+## Encoding
+
+<figure><img src="../.gitbook/assets/image (324).png" alt="" width="563"><figcaption></figcaption></figure>
+
+1. _**Sparse (OneHotEncoder)**_: If _**true**,_ returns the encoding result as a sparse matrix.
+2. _**Handle unknown (OneHotEncoder, OrdinalEncoder)**_: Used when encoding, if there is a category that exists in the training data but not in the test data. If _**ignore** is_ selected, it will be set to 0, and if _**error**_ is selected, a ValueError will be raised.
+3. _**Unknown values (OrdinalEncoder)**_: Fill with a specific value, not ignore or error.
+4. _**Cols (TargetEncoder)**_: Select the columns to encode.
+5. _**Handle missing (TargetEncoder)**_: Choose how to handle missing values.
+6. _**Smoothing (TargetEncoder)**_: When the number of data in a particular category is small, it adds the entered values and calculates the average of the categories to prevent overfitting.
+
+
+
+***
+
+## Scaling
+
+<figure><img src="../.gitbook/assets/image (325).png" alt="" width="563"><figcaption></figcaption></figure>
+
+1. _**With mean (StandardScaler)**_: Center the mean of the data to zero.
+2. _**With std (StandardScaler)**_: Scale the standard deviation of the data to 1.
+3. _**With centering (RobustScaler)**_: Performs centering by Q-subtracting the median from each attribute (column)_._
+4. _**With scaling (RobustScaler)**_: Scales each attribute by dividing it by its IQR.
+5. _**Feature range (MinMaxScaler)**_: Sets the minimum and maximum values for the scaled result.
+6. _**Norm (Normalizer)**:_
+   1. _**L1**_: The sum of the absolute values of each attribute will be 1.
+   2. _**L2**_: Scale the vectors so that their Euclidean distance is 1.&#x20;
+   3. _**Max Norm**_: Ensures that the scaling result does not exceed an existing maximum value.
+7. _**N bins (KBins Discretizer)**_: Determines how many bins to divide the variable into.
+8. _**Strategy (KBins Discretizer)**_:
+   1. _**uniform**_: Divide the section by a uniform width.
+   2. _**QUANTILE**_: Divide so that each bin has an even number of data.
+9. _**Encode (KBins Discretizer)**_: Specify the encoding method.
+   1. _**ordinal**_: Encodes each interval as an integer.
+   2. _**onehot**_: Encodes each interval as a binary vector.
+
+
+
+***
+
+## ETC(SimpleImputer / SMOTE / MakeColumnTransformer)
+
+<figure><img src="../.gitbook/assets/image (326).png" alt="" width="563"><figcaption></figcaption></figure>
+
+1. _**Missing values (SimpleImputer)**_: Treats the entered values as missing.
+2. _**Fill value (SimpleImputer)**_: Replaces _the_ missing value with the input value.
+3. _**Copy (SimpleImputer)**_: Returns the original data unchanged, as new data.
+4. _**Add indicator (SimpleImputer)**_: Adds a new column with 0s and 1s, with a 1 for rows with missing values and a 0 for rows without.
+5. _**K neighbors (SMOTE)**_: Specifies the number of neighbors to group together based on center point data.
+6. _**Sampling strategy (SMOTE)**_:
+   1. _**auto**_: Automatically adjusts the ratio of minority to majority class data to balance out class imbalances.&#x20;
+   2. _**minority**_: Makes the size of the minority class dataset equal to the size of the majority class dataset.
+   3. _**float**_: You can specify the desired class ratio. For example, setting it to 0.5 makes the minority class dataset half the size of the majority class dataset.
+7. _**Estimator (MakeColumnTransformer)**_: You can specify different global models to apply to each column. The model selected here will be applied to the columns selected _in Columns_ below.
+