Utility Functions

Fill in a module description here

Processing Data

Filterting Input Data:

We will need to be able to filter the input data to fit our testing needs. _filter_dataframe is a function to do this that takes in a pandas Dataframe and a set of filters.

/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/statsforecast/utils.py:237: FutureWarning: 'M' is deprecated and will be removed in a future version, please use 'ME' instead.
  "ds": pd.date_range(start="1949-01-01", periods=len(AirPassengers), freq="M"),

source

_filter_dataframe

 _filter_dataframe (df, filters)

Filter a DataFrame using a dictionary or a list of dictionaries with multiple filter conditions.

Filter Examples: You can pass in a single value like {“State”:“Wisconsin”}. You can also pass in a list {“Cities”:[“La Crosse”,“Madison”,“Eau Claire”,“Milwaukee”]}

Type Details
df A pandas DataFrame
filters dictonary or list of dictionaries
Returns DataFrame

Removing Dimensions with few Observations:

Check Names and Data Types


source

_name_type_check

 _name_type_check (df, dimension, date_col)

Check datatypes and names of columns

Process Metric Column:


source

_process_metric_col

 _process_metric_col (df, metric_col)

Putting Everthing together: _process_data


source

_process_data

 _process_data (path:str, dimension:str=None, date_col:str='ds',
                metric_col:Union[str,Callable]='y',
                filters:list[dict]=None, sz_threshold=50)

Filters and aggregates data

Type Default Details
path str Path to Feather File
dimension str None Independant Variable
date_col str ds Date Column
metric_col typing.Union[str, typing.Callable] y Dependent Variable
filters list None Desired Filters
sz_threshold int 50 Minimum number of observations