Replies: 2 comments 1 reply
-
Concerning your first point, I believe that there are two paths you can take depending on the objective of your project. While RFs can technically handle non-stationary data it won't inherently model trends or seasonal patterns. 1. If you are trying to model within the same period, you can use the raw non-stationary data as long as you include some time-based features to try and capture these patterns. 2. If you are trying to make some forecasts then you should detrend your data as you mentioned. You might want to read this paper to get a grip on how you want to handle the non-stationary side of your data depending on how you want to use it. While scaling is not always necessary for RF you might want to consider StandardScalar, but remember to apply it appropriately to avoid data leakage. For your splitting you might want to consider using time-based splitting, but there are few options you can consider looking into on this particular matter. Let me know if this was helpful |
Beta Was this translation helpful? Give feedback.
-
Thank you for your insightful feedback. The main aim of my analysis is to identify the main influential factors of groundwater level fluctuation. The groundwater level data from confined aquifer wells exhibit declining trend since 2016 till date. Input variables include precipitation (daily accumulated), daily SWE accumulated, daily average air temperature, daily average soil temperature, daily average soil moisture content (%), and stream water levels. All input and output (GWL) are time series data from 11/15/2017 to 06/06/2023. All the input variables (Except stream water levels) were obtained from one SNOTEL station.` Employed the test_train_split technique for splitting the data to training (80%) and testing (20 %). Three feature importance methods: Gini importance, permutation importance, and SHAP importance were used to rank the input variables in order of their importance to influencing the output (GWL). I am not sure if the random splitting technique as an effect on the feature importance identification or not? |
Beta Was this translation helpful? Give feedback.
-
Hi everyone, I am a graduate student in Geophysics and my research is on investigating factors driving groundwater depletion. I am currently working on a sensitivity analysis to identify the main drivers of groundwater depletion using Random Forest. My input dataset includes total precipitation, snow water equivalence (SWE), soil moisture, soil temperature, air temperature, and stream water level. The GWL and input features appear to show some seasonal trends I have the following concerns:1. Can the RF regression model deal with trended inputs (X) and outputs (Y)? Can RF model non-stationary data, e.g., Y=GWL of bedrock well is trending downward? If not, is de-trending of Y needed as a preprocessing step before input to the RF model? Or, since your RF model is fit to existing data, without making a forecast (a new inputs X --> a new Y), do we need to worry about the non-stationary Y? 2) X exhibit different magnitudes: preprocessing by scaling first before input to the RF model? Should each input feature be scaled to 0-1 before training/testing? What splitting technique is recommended for this kind of analysis?
Beta Was this translation helpful? Give feedback.
All reactions