gbnet.models.survival.discrete_survival
=======================================

.. py:module:: gbnet.models.survival.discrete_survival


Classes
-------

.. autoapisummary::

   gbnet.models.survival.discrete_survival.BetaSurvivalModel
   gbnet.models.survival.discrete_survival.ThetaSurvivalModel


Functions
---------

.. autoapisummary::

   gbnet.models.survival.discrete_survival.loadModule
   gbnet.models.survival.discrete_survival.create_data_matrix
   gbnet.models.survival.discrete_survival.log_p_event
   gbnet.models.survival.discrete_survival.log_p_surv
   gbnet.models.survival.discrete_survival.log_p_event_geometric
   gbnet.models.survival.discrete_survival.log_p_surv_geometric


Module Contents
---------------

.. py:function:: loadModule(module)

   Load the appropriate gradient boosting module.


.. py:function:: create_data_matrix(X, module_type, enable_categorical=True)

   Create appropriate data matrix based on module type.

   :param X: Input features
   :type X: array-like
   :param module_type: Type of module ("XGBModule" or "LGBModule")
   :type module_type: str
   :param enable_categorical: Whether to enable categorical features (XGBoost only)
   :type enable_categorical: bool, optional

   :returns: XGBoost DMatrix or LightGBM Dataset depending on module type
   :rtype: data_matrix


.. py:function:: log_p_event(t, alpha, beta)

   log P(T=t | alpha, beta) = log B(alpha+1, beta + t -1) - log B(alpha, beta)
   Corrected denominator: Gamma(alpha + beta + t)


.. py:function:: log_p_surv(t, alpha, beta)

   log P(T > t | alpha, beta) = log B(alpha, beta + t) - log B(alpha, beta)


.. py:function:: log_p_event_geometric(t, theta)

   log P(T=t | theta) = log(theta) + (t-1) * log(1-theta)


.. py:function:: log_p_surv_geometric(t, theta)

   log P(T > t | theta) = t * log(1-theta)


.. py:class:: BetaSurvivalModel(nrounds=None, params=None, module_type='XGBModule', min_hess=0.0)

   Bases: :py:obj:`sklearn.base.BaseEstimator`, :py:obj:`sklearn.base.RegressorMixin`


   Gradient Boosting Beta Survival Model.

   This model combines gradient boosting with a Beta distribution for discrete
   survival analysis. It uses either XGBoost or LightGBM as the underlying
   boosting engine wrapped in a PyTorch module.

   :param nrounds: Number of boosting rounds. Defaults to 500 for XGBModule and 1000 for LGBModule.
   :type nrounds: int, optional
   :param params: Additional parameters passed to the gradient boosting model.
   :type params: dict, optional
   :param module_type: Type of gradient boosting module to use, either "XGBModule" or "LGBModule".
                       Defaults to "XGBModule".
   :type module_type: str, optional
   :param min_hess: Minimum hessian value for numerical stability. Defaults to 0.0.
   :type min_hess: float, optional

   :ivar model_: Trained gradient boosting module. Set after fitting.
   :vartype model_: XGBModule or LGBModule
   :ivar losses_: List of loss values recorded at each training iteration.
   :vartype losses_: list
   :ivar n_features_in_: Number of features seen during fit.

   :vartype n_features_in_: int

   .. method:: fit(X, y)

      Trains the model using input features X and survival data y.

   .. method:: predict_survival(X, times)

      Predicts survival probabilities for given times.

   .. method:: predict_hazard(X, times)

      Predicts hazard probabilities for given times.

   .. method:: score(X, y)

      Returns the negative log likelihood score.


   .. rubric:: Notes

   The model uses a Beta distribution to model discrete survival times.
   The gradient boosting model learns parameters alpha and beta for each sample,
   which are then used to compute survival and hazard probabilities.

   For survival data, y should be a structured array or DataFrame with columns:
   - 'time': observed time (discrete)
   - 'event': event indicator (0=censored, 1=event)


   .. py:attribute:: nrounds
      :value: None


   .. py:attribute:: params
      :value: None


   .. py:attribute:: module_type
      :value: 'XGBModule'


   .. py:attribute:: min_hess
      :value: 0.0


   .. py:attribute:: model_
      :value: None


   .. py:attribute:: losses_
      :value: []


   .. py:attribute:: Module


   .. py:method:: fit(X, y)

      Fit the Beta survival model.

      :param X: Training features.
      :type X: array-like of shape (n_samples, n_features)
      :param y: Survival data. If array-like, should have columns [time, event].
                If structured array, should have 'time' and 'event' fields.
                event: 0 for censored, 1 for event observed.
      :type y: array-like of shape (n_samples, 2) or structured array

      :returns: **self** -- Returns self.
      :rtype: object


   .. py:method:: predict_survival(X, times)

      Predict survival probabilities P(T > t) for given times.

      :param X: Input features.
      :type X: array-like of shape (n_samples, n_features)
      :param times: Times at which to predict survival probabilities.
      :type times: array-like of shape (n_times,)

      :returns: **survival_probs** -- Survival probabilities for each sample at each time point.
      :rtype: array-like of shape (n_samples, n_times)


   .. py:method:: predict(X)

      Predict the expected survival time.

      :param X: Input features.
      :type X: array-like of shape (n_samples, n_features)

      :returns: **expected_times** -- Expected survival times for each sample.
      :rtype: array-like of shape (n_samples,)


   .. py:method:: score(X, y)

      Return the negative log likelihood score.

      :param X: Input features.
      :type X: array-like of shape (n_samples, n_features)
      :param y: Survival data with time and event columns.
      :type y: array-like of shape (n_samples, 2) or structured array

      :returns: **score** -- Negative log likelihood score. Lower values indicate better fit.
      :rtype: float


.. py:class:: ThetaSurvivalModel(nrounds=None, params=None, module_type='XGBModule', min_hess=0.0)

   Bases: :py:obj:`sklearn.base.BaseEstimator`, :py:obj:`sklearn.base.RegressorMixin`


   Gradient Boosting Theta Survival Model.

   This model combines gradient boosting with a geometric distribution for discrete
   survival analysis. It uses either XGBoost or LightGBM as the underlying
   boosting engine wrapped in a PyTorch module.

   The model learns parameters a and b for each sample, then computes theta = a/(a+b)
   which defines the probability parameter of a geometric distribution for survival times.

   :param nrounds: Number of boosting rounds. Defaults to 100.
   :type nrounds: int, optional
   :param params: Additional parameters passed to the gradient boosting model.
   :type params: dict, optional
   :param module_type: Type of gradient boosting module to use, either "XGBModule" or "LGBModule".
                       Defaults to "XGBModule".
   :type module_type: str, optional
   :param min_hess: Minimum hessian value for numerical stability. Defaults to 0.0.
   :type min_hess: float, optional

   :ivar model_: Trained gradient boosting module. Set after fitting.
   :vartype model_: XGBModule or LGBModule
   :ivar losses_: List of loss values recorded at each training iteration.
   :vartype losses_: list
   :ivar n_features_in_: Number of features seen during fit.

   :vartype n_features_in_: int

   .. method:: fit(X, y)

      Trains the model using input features X and survival data y.

   .. method:: predict_survival(X, times)

      Predicts survival probabilities for given times.

   .. method:: predict(X)

      Predicts the expected survival time.

   .. method:: score(X, y)

      Returns the negative log likelihood score.


   .. rubric:: Notes

   The model uses a geometric distribution to model discrete survival times.
   The gradient boosting model learns parameters a and b for each sample,
   which are used to compute theta = a/(a+b), the success probability.

   Survival probabilities follow:
   - P(T=t) = theta * (1-theta)^(t-1) for event at time t
   - P(T>t) = (1-theta)^t for survival beyond time t

   For survival data, y should be a structured array or DataFrame with columns:
   - 'time': observed time (discrete)
   - 'event': event indicator (0=censored, 1=event)


   .. py:attribute:: nrounds
      :value: None


   .. py:attribute:: params
      :value: None


   .. py:attribute:: module_type
      :value: 'XGBModule'


   .. py:attribute:: min_hess
      :value: 0.0


   .. py:attribute:: model_
      :value: None


   .. py:attribute:: losses_
      :value: []


   .. py:attribute:: Module


   .. py:method:: fit(X, y)

      Fit the Theta survival model.

      :param X: Training features.
      :type X: array-like of shape (n_samples, n_features)
      :param y: Survival data. If array-like, should have columns [time, event].
                If structured array, should have 'time' and 'event' fields.
                event: 0 for censored, 1 for event observed.
      :type y: array-like of shape (n_samples, 2) or structured array

      :returns: **self** -- Returns self.
      :rtype: object


   .. py:method:: predict_survival(X, times)

      Predict survival probabilities P(T > t) for given times.

      :param X: Input features.
      :type X: array-like of shape (n_samples, n_features)
      :param times: Times at which to predict survival probabilities.
      :type times: array-like of shape (n_times,)

      :returns: **survival_probs** -- Survival probabilities for each sample at each time point.
      :rtype: array-like of shape (n_samples, n_times)


   .. py:method:: predict(X)

      Predict the expected survival time.

      :param X: Input features.
      :type X: array-like of shape (n_samples, n_features)

      :returns: **expected_times** -- Expected survival times for each sample.
      :rtype: array-like of shape (n_samples,)


   .. py:method:: score(X, y)

      Return the negative log likelihood score.

      :param X: Input features.
      :type X: array-like of shape (n_samples, n_features)
      :param y: Survival data with time and event columns.
      :type y: array-like of shape (n_samples, 2) or structured array

      :returns: **score** -- Negative log likelihood score. Lower values indicate better fit.
      :rtype: float