sforecast

Submodules

Attributes

__version__

Classes

`sliding_forecast`	Sforecast supports sliding (expanding) window fit and predict operations.
`covarlags`	Transfomer for creating lagged variables. The transformer
`rolling_transformer`

Functions

`get_dense_nn`(Nlags[, Ncovars, Nexogs, Nendogs])
`get_dense_emb_nn`(df, Nlags, catvars[, Ncovars, ...])

Package Contents

sforecast.__version__

class sforecast.sliding_forecast(y: list, model: None, model_type: None, xscale_parameters=None, swin_parameters=None, tf_parameters=None, cm_parameters=None, scale_parameters=None, verbose=False, debug=False)

Sforecast supports sliding (expanding) window fit and predict operations. The fit operation is an out-of-sample train test fit, where the Ntest (swin_paramters:Ntest) defines the size of the training obervations and test observations.

The out-of-sample train-test methodology works as follows. The first training and test window (before sliding) has a training set comprised of total observations - Ntest and test set size Ntest. The model is retrained every Nhorizon steps, where Nhorizon defaults to 1. After the first train/test operation, the training set window slides forward by Nhorizon observations. The training set increases by Nhorizon observations, and the test set decreases by Nhorizon observations.

The fitted transformers make_lags, scaler, and make_derived_attributes, transform the data during test or predict operations. The transformers generate lagged variables, derived variables, and together with exogenous variables are scaled. These are subsuquently input to the fitted model to make a prediction.

The last fit corresponds to all observation in the dfXY input DataFrame. The fitted model, make_lags transform make_derived_attributes and scaler transform are returned in addition to test predictions, and index for each prediciton matching the input dataframe.

Sforecast supports recursive (Nhorizon x 1-step) forecasts and (under development) direct forecasts (Nhorizon models) over the Nhorizon interval. The predict operation makes N-step (N x 1-step) recursive forcasts, correspondng to the recursive/single_step fit operation or Nhorizon direct forecasts (under development) correspondng to a direct forecast fit operation.

__init__(self, y=”y”, model=None, ts_parameters=None)

Recieves inputs defining the sliding forecast model including ML model, time-series sliding/expanding window hyper-parameters, and ML feature scaling.

Parameters

y (String or List): single target (dependent) variable (str) or list of target variables.

model (ML Model): The ML model to be used in the forecast operation. It can be a SK Learn model (model_type = “sk”) or TensorFlow model (model_type = “tf”). This variable is ignored if model_type = “cm” (classical forecast moodel).

model_type (String): indicates the type of model, “sk” (Sklearn), “cm” (statsmodel or pdarima), or “tf” (TensorFlow).

swin_parameters (Dictionary): sliding window forecast model parameters.

Nlags (Integer): Number of lags for all target variables and covariates. Lagged variables enable accounting for the auto-regressive properties of the timeseries. Defaults to 1.

covars (List): List of covariate variables. If not already present, the y forecast variable(s) will be added to the covars list. Non-lagged covariates (lag = 0) are removed from the training variables to avoid leakage. Lag values > 0 and <= Nlags are included in the training variables.

catvars (List): List of categorical variables. This input is only relevant for TensorFlow models and is ignored otherwise. Default = None.

exenvars (List): List of exogenous (e.g., input variables) and enodogenous (e.g., derived variables). Exenvars is for continuous, variables. Categorical variables are contained in catvars, not in exenvars. Exenvars can be a list of lists, in which case it represents groups of variables that can be processed differently, for example by the TensorFlow model. For example, a TensorFlow models can process exenvars with a dense feed forward network and covariate lagged variables in an LSTM.

Ntest (Integer) - number of predictions to make. Defaults to 1. Ntest > 0 will cause the sliding forecast to divide the observations into trainng and test. The first training will use the last Ntest observatios as the test set and the previous observations as the training set. Ntest can be set to 0 in which case all observations are used as the training set and there are no test statistics to maintain.

alpha (Float): A number between 0 and 1 designating the donfidence interval spread. Defaults to 0.2 (80%).

Nhorizon (Integer) - n-step (i.e., Nhorizon) forecast. For example, the sliding/expanding window will move forward by Nhorizons after Nhorizon predictions. Default to 1.

minmax (Tuple): Defaults to (None, None). Imposes and lower and upper limit on the forecast/predictions (and confidence intervals), respectively.

ci_method (String):

The method used to estimate the confidence interval. Defaults to “linear” from the numpy percentile function. Choices are

“inverted_cdf”, “averaged_inverted_cdf”, “inverted_cdf”, “averaged_inverted_cdf”,”closest_observation”, “interpolated_inverted_cdf”, “hazen”, “weibull”, “linear”, “median_unbiased “, “normal_unbiased”

“minmax” - the min and max values observed errors

“tdistribution” - compute the t-distribution confidence interval

horizon_predict_method (String): “single_step” or “direct”. “single_step” indications predictions over the horizon window are recursive, meaning a single ML model with Nhorizon 1-step recursive predictions. “direct” indeicates that predictions over the horizon interval use the direct forecasting (Nhorizon models).

derived_attributes_transform: a transform for derived (endogenous/dependent) attributes. See example.ipynb for how to create derived endogenous attributes.

tf_params (Dictionary, optional) - TensorFlow parameters. Defaults to None.

Nepochs_i: Number of training epochs for the first (intitial training). Defaults to 10.

Nepochs_t: Number of training (tuning) epochs for subsequent trainng, after the initial traiing. Defaults to 5.

batch_size: Defaults to 32.

lstm: defaults to False.

cm_params (Dictionary): parameters for the classical forecast models, when model_type = “cm”

model (str): The supported models are “arima”, “sarimax” and “auto_arima”. Default is None.

order (tuple): SARIMA and ARIMA order tuple contains the (p,d,q) parameters of the ARIMA and SARIMA models. Defaults to None.

seasonal_order (tuple): sarima seasonal order (P,D,Q,m). Defaults to (0, 0, 0, 0).

enforce_stationarity (Boolean): sarima. Defualuts to True.

enforce_invertibility (Boolean): SARIMAX. Defaults to True.

start_p (int): autoarima, starting p, lags. Defaults to 1.

start_q (int): autoarima, starting q, ma error lags.

d (Boolean): autoarima differencing paramter. Defaults to None, discovered.

seasonal (Boolean): autoarima. Defaults to True.

“max_p”: autoarima, Default to None.

“max_q”: autoarima, Defaults to None.

test (string): autoarima stationarity test, Default ot “adf”, automated Dickey-Fuller test.

start_P (int): autoarima seasonal order. Defautls to 1.

start_Q (int): autoarima seasonal order. Defautls to 1. seasonal ma (error) order.

max_P: autoarima, Defaults to None.

max_Q: autoarima. Defaults to None.

m (int): autoarima seasonal period (number of observations (rows) corresponding to the season) Defaults to 12 auto arima,

D (int): autoarima, sasonal difference. Defaults to None, discovered with seasonality = True.

trace (Boolean): autoarima, defalults to True. Print model AIC.

error_action (str): autoarima. Defaults to “ignore”, don’t want to know if an order does not work.

suppress_warnings (Boolean): autoarima. Defaults to True.

stepwise: autoarima. Defaults True, stepwise search.

scale_parameters (Dictionary): input variables are scaled as designated by the parameters in the dictionary.

mms_cols (str or list): Defaults to “all” in which case all input variables are scalled with the SKlearn MinMax scaler. If mms_cols = None, then no variables are scaled with the MinMax scaler. If mms_cols = list of columns, then the corresponding columns are scaled with the MinMax scaler.

ss_cols (str or list): The ss_cols option takes precedence over mms_cols. Defaults to None. If ss_cols = “all” all input variables are scalled with the SKlearn StandardScaler scaler. If xscale_params[“ss_cols”] = list of columns, then the corresponding columns are scaled with StandardScaler.

minmax (2-tuple): forecassts predictions and ci (confidence intervals) are constrained to fall between minmax[0] and minmax[1]. Defaults to (None, None), meaning no lower or upper limits are imposed on the predictions and confidence intervals.

State Variables

metrics (dictionary of dictionaries): {{y target variable): {MSE: number} , {MAE: number} } , … }
model (ML Model): ML Model. After fitting the variable corresponds to the trained model trained on all observations.
dfXYfit_last (DataFrame): the last row of the fitted dataframe. The dataframe includes covariate columns (note, unlagged covariates are not used for the fit or predict operations) and all columns (variables) used in the predict operation includind lagged covariates, exogenous variables, derived/endogenous variables, and categorical variables. Continuous variables (not covariates) are scaled according the scale_parameters input dictionary.
df_pred (DataFrame): Dataframe containing the prediciton results. See sforecast.fit() for additional information.

swin_params

scale_params

tf_params

cm_params

predict_params

y

Nhorizon

minmax

_minmax

covars

ci

model_type

history_i = None

history_t = None

idx_last_obs = None

dfXYfit_last = None

metrics = None

exenvars

fit(dfXY, verbose=False, debug=False)

The fit operation is an out-of-sample train test fit, where the Ntest (swin_paramters:Ntest) defines the size of the training obervations and test observations. The model is retrained every Nhorizon steps. Nhorizon defaults to 1. After the first train/test operation, the training set window slides forward and the training set increases by Nhorizon observations and the test set decreases by Nhorizon observatsions. The last fit corresponds to all observation in the dfXY input DataFrame.

Parameters:

dfXY (DataFrame) – input DataFrame containing the target variable, covariates (optional), exogenous continuous variables (optional), and categorical variables (optional).
verbose (bool) – True or False. Verbose == True causes the printing of helpful information. Defaults to False.

Returns:

DataFrame with the forecast output including the following columns for each covariate.

y_train: y training value at the corresponding observation for the initial window (before sliding). Values outside the initial window will be NaN. After the initial training, y values will then be shown under the y_test column.
y_test: y value (truth) corresponding to the prediction.
y_pred: y predicted (i.e. forecast)
y_lower: lower confidence limit
y_upper: upper confidence limit

predict(Nperiods=1, dfexogs=None, dfcats=None, ts_period=None, verbose=False, debug=False)

Nperiod forecast next N_periods based on the model trainded during the fit operation.

Parameters:

Nperiods (int) – Indicates how many periods forward to forecast. Defaults to 1.
dfexog (DataFrame, optional) – Defaults to None. If the fit includes exogenous variables (independent continous variables), then a dfexog input dataframe is required. The number of rows of dfexog must be <= Npriods and the columns are the same as exogvars.
dfcats (DataFrame, optional) – Defaults to None. This input object applies only to TensorFlow models. It is required if the fit includes categorical variables. The categorical variables must be encodeded (e.g., SK label encoder) in the same way as for the fit operation.
ts_period (pandas time offset, optional) – Defaults to None. Not required if the timeseries index is an integer. It is required if the timeseries index is a timestamp. The input is ignroed (a warning is issued) if the timeeries index is an integer.
verbose (bool, optional) – Print helpful information to standard out when True. Defaults to False.

Returns:

Dataframe including the forecast for each target variable.

class sforecast.covarlags(covars=None, Nlags=1)

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Transfomer for creating lagged variables. The transformer includes fit() and transform() , and fit_transform() operations (by extension of TrasnformerMixin). The transofrm retains a memory of Nlags+1 columns.

__init__

Parameters:

covars (list) – list of column names representing covariates.
Nlags (int) – number of lags to implement for each covariate.

covars

Nlags

Nmemory

dfmemory = None

fit(df)

transform(df=pd.DataFrame(), Nout=None, dfnewrows=pd.DataFrame(), debug=False)

Create lagged covariates.

Parameters:

df (DataFrame, optional) – dataframe containing covariats. Defaults to empty dataframe..
Nout (int, optional) – Number of output rows. Defaults to None.
dfnewrows (DataFrame, optional) – new row to append to the DataFrame. The DataFrame memory will be updated with the last Nlags+1 rows. Defaults to None.
debug (bool, optional) – _description_. Defaults to False.

Returns:

DataFrame with the addition of lagged variables

get_last_dfrow()

Returns:: the last row of the saved DataFrame

set_last_y(y, yvalue, debug=False)

Update the value of the column “y” corresponding to the last row of the saved DataFrame.

Parameters:

y (string or list) – column name(s) of the last row to update with the values yvalue
yvalue (Numeric or list of Numerics) – value(s) corresponding to “y”

sforecast.get_dense_nn(Nlags: int, Ncovars: int = 1, Nexogs: int = 0, Nendogs: int = 0)

sforecast.get_dense_emb_nn(df: pandas.DataFrame, Nlags: int, catvars: list[str], Ncovars: int = 1, Nexogs: int = 0, Nendogs: int = 0)

class sforecast.rolling_transformer

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

variable_transform_dict: dict[str:Callable]

Nrw: int = 3

dfmemory: pandas.DataFrame

__post_init__()

fit(df)

transform(df: pandas.DataFrame, Nout: int | None = None, dfnewrows: pandas.DataFrame = pd.DataFrame()) → pandas.DataFrame

get_Nclip()

get_derived_attribute_names()