gbnet.models.survival.discrete_survival

Classes

BetaSurvivalModel

Gradient Boosting Beta Survival Model.

ThetaSurvivalModel

Gradient Boosting Theta Survival Model.

Functions

loadModule(module)

Load the appropriate gradient boosting module.

create_data_matrix(X, module_type[, enable_categorical])

Create appropriate data matrix based on module type.

log_p_event(t, alpha, beta)

log P(T=t | alpha, beta) = log B(alpha+1, beta + t -1) - log B(alpha, beta)

log_p_surv(t, alpha, beta)

log P(T > t | alpha, beta) = log B(alpha, beta + t) - log B(alpha, beta)

log_p_event_geometric(t, theta)

log P(T=t | theta) = log(theta) + (t-1) * log(1-theta)

log_p_surv_geometric(t, theta)

log P(T > t | theta) = t * log(1-theta)

Module Contents

gbnet.models.survival.discrete_survival.loadModule(module)[source]

Load the appropriate gradient boosting module.

gbnet.models.survival.discrete_survival.create_data_matrix(X, module_type, enable_categorical=True)[source]

Create appropriate data matrix based on module type.

Parameters:
  • X (array-like) – Input features

  • module_type (str) – Type of module (“XGBModule” or “LGBModule”)

  • enable_categorical (bool, optional) – Whether to enable categorical features (XGBoost only)

Returns:

XGBoost DMatrix or LightGBM Dataset depending on module type

Return type:

data_matrix

gbnet.models.survival.discrete_survival.log_p_event(t, alpha, beta)[source]

log P(T=t | alpha, beta) = log B(alpha+1, beta + t -1) - log B(alpha, beta) Corrected denominator: Gamma(alpha + beta + t)

gbnet.models.survival.discrete_survival.log_p_surv(t, alpha, beta)[source]

log P(T > t | alpha, beta) = log B(alpha, beta + t) - log B(alpha, beta)

gbnet.models.survival.discrete_survival.log_p_event_geometric(t, theta)[source]

log P(T=t | theta) = log(theta) + (t-1) * log(1-theta)

gbnet.models.survival.discrete_survival.log_p_surv_geometric(t, theta)[source]

log P(T > t | theta) = t * log(1-theta)

class gbnet.models.survival.discrete_survival.BetaSurvivalModel(nrounds=None, params=None, module_type='XGBModule', min_hess=0.0)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

Gradient Boosting Beta Survival Model.

This model combines gradient boosting with a Beta distribution for discrete survival analysis. It uses either XGBoost or LightGBM as the underlying boosting engine wrapped in a PyTorch module.

Parameters:
  • nrounds (int, optional) – Number of boosting rounds. Defaults to 500 for XGBModule and 1000 for LGBModule.

  • params (dict, optional) – Additional parameters passed to the gradient boosting model.

  • module_type (str, optional) – Type of gradient boosting module to use, either “XGBModule” or “LGBModule”. Defaults to “XGBModule”.

  • min_hess (float, optional) – Minimum hessian value for numerical stability. Defaults to 0.0.

Variables:
  • model (XGBModule or LGBModule) – Trained gradient boosting module. Set after fitting.

  • losses (list) – List of loss values recorded at each training iteration.

  • n_features_in (int) – Number of features seen during fit.

fit(X, y)[source]

Trains the model using input features X and survival data y.

predict_survival(X, times)[source]

Predicts survival probabilities for given times.

predict_hazard(X, times)

Predicts hazard probabilities for given times.

score(X, y)[source]

Returns the negative log likelihood score.

Notes

The model uses a Beta distribution to model discrete survival times. The gradient boosting model learns parameters alpha and beta for each sample, which are then used to compute survival and hazard probabilities.

For survival data, y should be a structured array or DataFrame with columns: - ‘time’: observed time (discrete) - ‘event’: event indicator (0=censored, 1=event)

nrounds = None[source]
params = None[source]
module_type = 'XGBModule'[source]
min_hess = 0.0[source]
model_ = None[source]
losses_ = [][source]
Module[source]
fit(X, y)[source]

Fit the Beta survival model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples, 2) or structured array) – Survival data. If array-like, should have columns [time, event]. If structured array, should have ‘time’ and ‘event’ fields. event: 0 for censored, 1 for event observed.

Returns:

self – Returns self.

Return type:

object

predict_survival(X, times)[source]

Predict survival probabilities P(T > t) for given times.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input features.

  • times (array-like of shape (n_times,)) – Times at which to predict survival probabilities.

Returns:

survival_probs – Survival probabilities for each sample at each time point.

Return type:

array-like of shape (n_samples, n_times)

predict(X)[source]

Predict the expected survival time.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input features.

Returns:

expected_times – Expected survival times for each sample.

Return type:

array-like of shape (n_samples,)

score(X, y)[source]

Return the negative log likelihood score.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input features.

  • y (array-like of shape (n_samples, 2) or structured array) – Survival data with time and event columns.

Returns:

score – Negative log likelihood score. Lower values indicate better fit.

Return type:

float

class gbnet.models.survival.discrete_survival.ThetaSurvivalModel(nrounds=None, params=None, module_type='XGBModule', min_hess=0.0)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

Gradient Boosting Theta Survival Model.

This model combines gradient boosting with a geometric distribution for discrete survival analysis. It uses either XGBoost or LightGBM as the underlying boosting engine wrapped in a PyTorch module.

The model learns parameters a and b for each sample, then computes theta = a/(a+b) which defines the probability parameter of a geometric distribution for survival times.

Parameters:
  • nrounds (int, optional) – Number of boosting rounds. Defaults to 100.

  • params (dict, optional) – Additional parameters passed to the gradient boosting model.

  • module_type (str, optional) – Type of gradient boosting module to use, either “XGBModule” or “LGBModule”. Defaults to “XGBModule”.

  • min_hess (float, optional) – Minimum hessian value for numerical stability. Defaults to 0.0.

Variables:
  • model (XGBModule or LGBModule) – Trained gradient boosting module. Set after fitting.

  • losses (list) – List of loss values recorded at each training iteration.

  • n_features_in (int) – Number of features seen during fit.

fit(X, y)[source]

Trains the model using input features X and survival data y.

predict_survival(X, times)[source]

Predicts survival probabilities for given times.

predict(X)[source]

Predicts the expected survival time.

score(X, y)[source]

Returns the negative log likelihood score.

Notes

The model uses a geometric distribution to model discrete survival times. The gradient boosting model learns parameters a and b for each sample, which are used to compute theta = a/(a+b), the success probability.

Survival probabilities follow: - P(T=t) = theta * (1-theta)^(t-1) for event at time t - P(T>t) = (1-theta)^t for survival beyond time t

For survival data, y should be a structured array or DataFrame with columns: - ‘time’: observed time (discrete) - ‘event’: event indicator (0=censored, 1=event)

nrounds = None[source]
params = None[source]
module_type = 'XGBModule'[source]
min_hess = 0.0[source]
model_ = None[source]
losses_ = [][source]
Module[source]
fit(X, y)[source]

Fit the Theta survival model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Training features.

  • y (array-like of shape (n_samples, 2) or structured array) – Survival data. If array-like, should have columns [time, event]. If structured array, should have ‘time’ and ‘event’ fields. event: 0 for censored, 1 for event observed.

Returns:

self – Returns self.

Return type:

object

predict_survival(X, times)[source]

Predict survival probabilities P(T > t) for given times.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input features.

  • times (array-like of shape (n_times,)) – Times at which to predict survival probabilities.

Returns:

survival_probs – Survival probabilities for each sample at each time point.

Return type:

array-like of shape (n_samples, n_times)

predict(X)[source]

Predict the expected survival time.

Parameters:

X (array-like of shape (n_samples, n_features)) – Input features.

Returns:

expected_times – Expected survival times for each sample.

Return type:

array-like of shape (n_samples,)

score(X, y)[source]

Return the negative log likelihood score.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input features.

  • y (array-like of shape (n_samples, 2) or structured array) – Survival data with time and event columns.

Returns:

score – Negative log likelihood score. Lower values indicate better fit.

Return type:

float