Skip to content

classifier

Classes:

Name Description
Classifier

Wrapper for PyTorch classification models that automatically handles

ClassifierInitialized

Wrapper for PyTorch classification models that automatically handles

Classifier

Classifier(
    module: Type[Module],
    loss_fn: Union[
        str, Callable
    ] = "binary_cross_entropy_with_logits",
    optimizer_fn: Union[str, Callable] = "sgd",
    lr: float = 0.001,
    output_is_logit: bool = True,
    is_class_incremental: bool = False,
    is_feature_incremental: bool = False,
    device: str = "cpu",
    seed: int = 42,
    **kwargs
)

Bases: DeepEstimator, MiniBatchClassifier

Wrapper for PyTorch classification models that automatically handles increases in the number of classes by adding output neurons in case the number of observed classes exceeds the current number of output neurons.

Parameters:

Name Type Description Default
module Type[Module]

Torch Module that builds the autoencoder to be wrapped. The Module should accept parameter n_features so that the returned model's input shape can be determined based on the number of features in the initial training example.

required
loss_fn Union[str, Callable]

Loss function to be used for training the wrapped model. Can be a loss function provided by torch.nn.functional or one of the following: 'mse', 'l1', 'cross_entropy', 'binary_cross_entropy_with_logits', 'binary_crossentropy', 'smooth_l1', 'kl_div'.

'binary_cross_entropy_with_logits'
optimizer_fn Union[str, Callable]

Optimizer to be used for training the wrapped model. Can be an optimizer class provided by torch.optim or one of the following: "adam", "adam_w", "sgd", "rmsprop", "lbfgs".

'sgd'
lr float

Learning rate of the optimizer.

0.001
output_is_logit bool

Whether the module produces logits as output. If true, either softmax or sigmoid is applied to the outputs when predicting.

True
is_class_incremental bool

Whether the classifier should adapt to the appearance of previously unobserved classes by adding a unit to the output layer of the network. This works only if the last trainable layer is a nn.Linear layer. Note also, that output activation functions can not be adapted, meaning that a binary classifier with a sigmoid output can not be altered to perform multi-class predictions.

False
is_feature_incremental bool

Whether the model should adapt to the appearance of previously features by adding units to the input layer of the network.

False
device str

to run the wrapped model on. Can be "cpu" or "cuda".

'cpu'
seed int

Random seed to be used for training the wrapped model.

42
**kwargs

Parameters to be passed to the build_fn function aside from n_features.

{}

Examples:

>>> from river import metrics, preprocessing, compose, datasets
>>> from deep_river import classification
>>> from torch import nn
>>> from torch import manual_seed
>>> _ = manual_seed(42)
>>> class MyModule(nn.Module):
...     def __init__(self, n_features):
...         super(MyModule, self).__init__()
...         self.dense0 = nn.Linear(n_features,5)
...         self.nlin = nn.ReLU()
...         self.dense1 = nn.Linear(5, 2)
...         self.softmax = nn.Softmax(dim=-1)
...
...     def forward(self, x, **kwargs):
...         x = self.nlin(self.dense0(x))
...         x = self.nlin(self.dense1(x))
...         x = self.softmax(x)
...         return x
>>> model_pipeline = compose.Pipeline(
...     preprocessing.StandardScaler,
...     Classifier(module=MyModule,
...                loss_fn="binary_cross_entropy",
...                optimizer_fn='adam')
... )
>>> dataset = datasets.Phishing()
>>> metric = metrics.Accuracy()
>>> for x, y in dataset:
...     y_pred = model_pipeline.predict_one(x)  # make a prediction
...     metric.update(y, y_pred)  # update the metric
...     model_pipeline.learn_one(x,y)
>>> print(f'Accuracy: {metric.get()}')
Accuracy: 0.7264

Methods:

Name Description
clone

Clones the estimator.

draw

Draws the wrapped model.

initialize_module

Parameters

learn_many

Performs one step of training with a batch of examples.

learn_one

Performs one step of training with a single example.

predict_proba_many

Predict the probability of each label given the input.

predict_proba_one

Predict the probability of each label given the input.

Source code in deep_river/classification/classifier.py
def __init__(
    self,
    module: Type[torch.nn.Module],
    loss_fn: Union[str, Callable] = "binary_cross_entropy_with_logits",
    optimizer_fn: Union[str, Callable] = "sgd",
    lr: float = 1e-3,
    output_is_logit: bool = True,
    is_class_incremental: bool = False,
    is_feature_incremental: bool = False,
    device: str = "cpu",
    seed: int = 42,
    **kwargs,
):
    super().__init__(
        module=module,
        loss_fn=loss_fn,
        optimizer_fn=optimizer_fn,
        device=device,
        lr=lr,
        is_feature_incremental=is_feature_incremental,
        seed=seed,
        **kwargs,
    )
    self.observed_classes: SortedSet[base.typing.ClfTarget] = SortedSet([])
    self.output_is_logit = output_is_logit
    self.is_class_incremental = is_class_incremental

clone

clone(
    new_params: dict[Any, Any] | None = None,
    include_attributes=False,
)

Clones the estimator.

Parameters:

Name Type Description Default
new_params dict[Any, Any] | None

New parameters to be passed to the cloned estimator.

None
include_attributes

If True, the attributes of the estimator will be copied to the cloned estimator. This is useful when the estimator is a transformer and the attributes are the learned parameters.

False

Returns:

Type Description
DeepEstimator

The cloned estimator.

Source code in deep_river/base.py
def clone(
    self,
    new_params: dict[Any, Any] | None = None,
    include_attributes=False,
):
    """Clones the estimator.

    Parameters
    ----------
    new_params
        New parameters to be passed to the cloned estimator.
    include_attributes
        If True, the attributes of the estimator will be copied to the
        cloned estimator. This is useful when the estimator is a
        transformer and the attributes are the learned parameters.

    Returns
    -------
    DeepEstimator
        The cloned estimator.
    """
    new_params = new_params or {}
    new_params.update(self.kwargs)
    new_params.update(self._get_params())
    new_params.update({"module": self.module_cls})

    clone = self.__class__(**new_params)
    if include_attributes:
        clone.__dict__.update(self.__dict__)
    return clone

draw

draw() -> Digraph

Draws the wrapped model.

Source code in deep_river/base.py
def draw(self) -> Digraph:
    """Draws the wrapped model."""
    first_parameter = next(self.module.parameters())
    input_shape = first_parameter.size()
    y_pred = self.module(torch.rand(input_shape))
    return make_dot(y_pred.mean(), params=dict(self.module.named_parameters()))

initialize_module

initialize_module(x: dict | DataFrame, **kwargs)

Parameters:

Name Type Description Default
module

The instance or class or callable to be initialized, e.g. self.module.

required
kwargs dict

The keyword arguments to initialize the instance or class. Can be an empty dict.

{}

Returns:

Type Description
instance

The initialized component.

Source code in deep_river/base.py
def initialize_module(self, x: dict | pd.DataFrame, **kwargs):
    """
    Parameters
    ----------
    module
      The instance or class or callable to be initialized, e.g.
      ``self.module``.
    kwargs : dict
      The keyword arguments to initialize the instance or class. Can be an
      empty dict.
    Returns
    -------
    instance
      The initialized component.
    """
    torch.manual_seed(self.seed)
    if isinstance(x, Dict):
        n_features = len(x)
    elif isinstance(x, pd.DataFrame):
        n_features = len(x.columns)

    if not isinstance(self.module_cls, torch.nn.Module):
        self.module = self.module_cls(
            n_features=n_features,
            **self._filter_kwargs(self.module_cls, kwargs),
        )

    self.module.to(self.device)
    self.optimizer = self.optimizer_func(self.module.parameters(), lr=self.lr)
    self.module_initialized = True

    self._get_input_output_layers(n_features=n_features)

learn_many

learn_many(X: DataFrame, y: Series) -> None

Performs one step of training with a batch of examples.

Parameters:

Name Type Description Default
X DataFrame

Input examples.

required
y Series

Target values.

required

Returns:

Type Description
Classifier

The classifier itself.

Source code in deep_river/classification/classifier.py
def learn_many(self, X: pd.DataFrame, y: pd.Series) -> None:
    """
    Performs one step of training with a batch of examples.

    Parameters
    ----------
    X
        Input examples.
    y
        Target values.

    Returns
    -------
    Classifier
        The classifier itself.
    """
    # check if model is initialized

    if not self.module_initialized:
        self._update_observed_features(X)
        self._update_observed_classes(y)
        self.initialize_module(x=X, **self.kwargs)

    self._adapt_input_dim(X)
    self._adapt_output_dim(y)

    x_t = df2tensor(X, features=self.observed_features, device=self.device)

    self._learn(x=x_t, y=y)

learn_one

learn_one(x: dict, y: ClfTarget) -> None

Performs one step of training with a single example.

Parameters:

Name Type Description Default
x dict

Input example.

required
y ClfTarget

Target value.

required

Returns:

Type Description
Classifier

The classifier itself.

Source code in deep_river/classification/classifier.py
def learn_one(self, x: dict, y: base.typing.ClfTarget) -> None:
    """
    Performs one step of training with a single example.

    Parameters
    ----------
    x
        Input example.
    y
        Target value.

    Returns
    -------
    Classifier
        The classifier itself.
    """

    # check if model is initialized
    if not self.module_initialized:
        self._update_observed_features(x)
        self._update_observed_classes(y)
        self.initialize_module(x=x, **self.kwargs)

    # check last layer
    self._adapt_input_dim(x)
    self._adapt_output_dim(y)

    x_t = dict2tensor(x, features=self.observed_features, device=self.device)

    self._learn(x=x_t, y=y)

predict_proba_many

predict_proba_many(x: DataFrame) -> DataFrame

Predict the probability of each label given the input.

Parameters:

Name Type Description Default
x DataFrame

Input examples.

required

Returns:

Type Description
DataFrame

of probabilities for each label.

Source code in deep_river/classification/classifier.py
def predict_proba_many(self, x: pd.DataFrame) -> pd.DataFrame:
    """
    Predict the probability of each label given the input.

    Parameters
    ----------
    x
        Input examples.

    Returns
    -------
    pd.DataFrame
        of probabilities for each label.
    """
    if not self.module_initialized:
        self._update_observed_features(x)
        self.initialize_module(x=x, **self.kwargs)

    self._adapt_input_dim(x)
    x_t = df2tensor(x, features=self.observed_features, device=self.device)
    self.module.eval()
    with torch.inference_mode():
        y_preds = self.module(x_t)
    return pd.DataFrame(
        output2proba(y_preds, self.observed_classes, self.output_is_logit)
    )

predict_proba_one

predict_proba_one(x: dict) -> Dict[ClfTarget, float]

Predict the probability of each label given the input.

Parameters:

Name Type Description Default
x dict

Input example.

required

Returns:

Type Description
Dict[ClfTarget, float]

Dictionary of probabilities for each label.

Source code in deep_river/classification/classifier.py
def predict_proba_one(self, x: dict) -> Dict[base.typing.ClfTarget, float]:
    """
    Predict the probability of each label given the input.

    Parameters
    ----------
    x
        Input example.

    Returns
    -------
    Dict[ClfTarget, float]
        Dictionary of probabilities for each label.
    """

    if not self.module_initialized:
        self._update_observed_features(x)
        self.initialize_module(x=x, **self.kwargs)

    self._adapt_input_dim(x)

    x_t = dict2tensor(x, features=self.observed_features, device=self.device)

    self.module.eval()
    with torch.inference_mode():
        y_pred = self.module(x_t)
    return output2proba(y_pred, self.observed_classes, self.output_is_logit)[0]

ClassifierInitialized

ClassifierInitialized(
    module: Module,
    loss_fn: Union[str, Callable],
    optimizer_fn: Union[str, type],
    lr: float = 0.001,
    output_is_logit: bool = True,
    is_class_incremental: bool = False,
    is_feature_incremental: bool = False,
    device: str = "cpu",
    seed: int = 42,
    **kwargs
)

Bases: DeepEstimatorInitialized, MiniBatchClassifier

Wrapper for PyTorch classification models that automatically handles increases in the number of classes by adding output neurons in case the number of observed classes exceeds the current number of output neurons.

Parameters:

Name Type Description Default
module Module

Torch Module that builds the autoencoder to be wrapped.

required
loss_fn Union[str, Callable]

Loss function to be used for training the wrapped model. Can be a loss function provided by torch.nn.functional or one of the following: 'mse', 'l1', 'cross_entropy', 'binary_cross_entropy_with_logits', 'binary_crossentropy', 'smooth_l1', 'kl_div'.

required
optimizer_fn Union[str, type]

Optimizer to be used for training the wrapped model. Can be an optimizer class provided by torch.optim or one of the following: "adam", "adam_w", "sgd", "rmsprop", "lbfgs".

required
lr float

Learning rate of the optimizer.

0.001
output_is_logit bool

Whether the module produces logits as output. If true, either softmax or sigmoid is applied to the outputs when predicting.

True
is_class_incremental bool

Whether the classifier should adapt to the appearance of previously unobserved classes by adding a unit to the output layer of the network. This works only if the last trainable layer is a nn.Linear layer. Note also, that output activation functions can not be adapted, meaning that a binary classifier with a sigmoid output can not be altered to perform multi-class predictions.

False
is_feature_incremental bool

Whether the model should adapt to the appearance of previously features by adding units to the input layer of the network.

False
device str

to run the wrapped model on. Can be "cpu" or "cuda".

'cpu'
seed int

Random seed to be used for training the wrapped model.

42
**kwargs

Parameters to be passed to the build_fn function aside from n_features.

{}

Examples:

>>> from river import metrics, preprocessing, compose, datasets
>>> from deep_river import classification
>>> from torch import nn
>>> from torch import manual_seed
>>> _ = manual_seed(42)
>>> class MyModule(nn.Module):
...     def __init__(self):
...         super(MyModule, self).__init__()
...         self.dense0 = nn.Linear(10,5)
...         self.nlin = nn.ReLU()
...         self.dense1 = nn.Linear(5, 2)
...         self.softmax = nn.Softmax(dim=-1)
...
...     def forward(self, x, **kwargs):
...         x = self.nlin(self.dense0(x))
...         x = self.nlin(self.dense1(x))
...         x = self.softmax(x)
...         return x
>>> model_pipeline = compose.Pipeline(
...     preprocessing.StandardScaler,
...     Classifier(module=MyModule,
...                loss_fn="binary_cross_entropy",
...                optimizer_fn='adam')
... )
>>> dataset = datasets.Phishing()
>>> metric = metrics.Accuracy()
>>> for x, y in dataset:
...     y_pred = model_pipeline.predict_one(x)  # make a prediction
...     metric.update(y, y_pred)  # update the metric
...     model_pipeline.learn_one(x,y)
>>> print(f'Accuracy: {metric.get()}')
Accuracy: 0.7264

Methods:

Name Description
learn_many

Updates the model with multiple instances for supervised learning.

learn_one

Learns from a single example.

predict_proba_many

Predicts probabilities for multiple examples.

predict_proba_one

Predicts probabilities for a single example.

Source code in deep_river/classification/classifier.py
def __init__(
    self,
    module: torch.nn.Module,
    loss_fn: Union[str, Callable],
    optimizer_fn: Union[str, type],
    lr: float = 0.001,
    output_is_logit: bool = True,
    is_class_incremental: bool = False,  # todo needs to be tested
    is_feature_incremental: bool = False,
    device: str = "cpu",
    seed: int = 42,
    **kwargs,
):
    super().__init__(
        module=module,
        loss_fn=loss_fn,
        optimizer_fn=optimizer_fn,
        lr=lr,
        device=device,
        seed=seed,
        is_feature_incremental=is_feature_incremental,
        **kwargs,
    )
    self.output_is_logit = output_is_logit
    self.is_class_incremental = is_class_incremental
    self.observed_classes: SortedSet = SortedSet()

learn_many

learn_many(X: DataFrame, y: Series) -> None

Updates the model with multiple instances for supervised learning.

The function updates the observed features and targets based on the input data. It converts the data from a pandas DataFrame to a tensor format before learning occurs. The updates to the model are executed through an internal learning mechanism.

Parameters:

Name Type Description Default
X DataFrame

The data-frame containing instances to be learned by the model. Each row represents a single instance, and each column represents a feature.

required
y Series

The target values corresponding to the instances in X. Each entry in the series represents the target associated with a row in X.

required

Returns:

Type Description
None
Source code in deep_river/classification/classifier.py
def learn_many(self, X: pd.DataFrame, y: pd.Series) -> None:
    """
    Updates the model with multiple instances for supervised learning.

    The function updates the observed features and targets based on the input
    data. It converts the data from a pandas DataFrame to a tensor format before
    learning occurs. The updates to the model are executed through an internal
    learning mechanism.

    Parameters
    ----------
    X : pd.DataFrame
        The data-frame containing instances to be learned by the model. Each
        row represents a single instance, and each column represents a feature.
    y : pd.Series
        The target values corresponding to the instances in `X`. Each entry in
        the series represents the target associated with a row in `X`.

    Returns
    -------
    None
    """
    self._update_observed_features(X)
    self._update_observed_targets(y)
    x_t = self._df2tensor(X)
    self._learn(x_t, y)

learn_one

learn_one(x: dict, y: ClfTarget) -> None

Learns from a single example.

Source code in deep_river/classification/classifier.py
def learn_one(self, x: dict, y: base.typing.ClfTarget) -> None:
    """Learns from a single example."""
    self._update_observed_features(x)
    self._update_observed_targets(y)
    x_t = self._dict2tensor(x)
    self._learn(x_t, y)

predict_proba_many

predict_proba_many(X: DataFrame) -> DataFrame

Predicts probabilities for multiple examples.

Source code in deep_river/classification/classifier.py
def predict_proba_many(self, X: pd.DataFrame) -> pd.DataFrame:
    """Predicts probabilities for multiple examples."""
    self._update_observed_features(X)
    x_t = self._df2tensor(X)
    self.module.eval()
    with torch.inference_mode():
        y_preds = self.module(x_t)
    return pd.DataFrame(
        output2proba(y_preds, self.observed_classes, self.output_is_logit)
    )

predict_proba_one

predict_proba_one(x: dict) -> dict[ClfTarget, float]

Predicts probabilities for a single example.

Source code in deep_river/classification/classifier.py
def predict_proba_one(self, x: dict) -> dict[base.typing.ClfTarget, float]:
    """Predicts probabilities for a single example."""
    self._update_observed_features(x)
    x_t = self._dict2tensor(x)
    self.module.eval()
    with torch.inference_mode():
        y_pred = self.module(x_t)
    return output2proba(y_pred, self.observed_classes, self.output_is_logit)[0]