Skip to content

Benchmarks

Binary classification

Bananas

Elec2

Phishing

Datasets

Bananas
Bananas dataset.

An artificial dataset where instances belongs to several clusters with a banana shape.
There are two attributes that correspond to the x and y axis, respectively.

    Name  Bananas                                                                                                        
    Task  Binary classification                                                                                          
 Samples  5,300                                                                                                          
Features  2                                                                                                              
  Sparse  False                                                                                                          
    Path  /Users/cedrickulbach/Documents/Projects/deep-river/.venv/lib/python3.10/site-packages/river/datasets/banana.zip
Elec2
Electricity prices in New South Wales.

This is a binary classification task, where the goal is to predict if the price of electricity
will go up or down.

This data was collected from the Australian New South Wales Electricity Market. In this market,
prices are not fixed and are affected by demand and supply of the market. They are set every
five minutes. Electricity transfers to/from the neighboring state of Victoria were done to
alleviate fluctuations.

      Name  Elec2                                                      
      Task  Binary classification                                      
   Samples  45,312                                                     
  Features  8                                                          
    Sparse  False                                                      
      Path  /Users/cedrickulbach/river_data/Elec2/electricity.csv      
       URL  https://maxhalford.github.io/files/datasets/electricity.zip
      Size  2.95 MiB                                                   
Downloaded  True                                                       
Phishing
Phishing websites.

This dataset contains features from web pages that are classified as phishing or not.

    Name  Phishing                                                                                                            
    Task  Binary classification                                                                                               
 Samples  1,250                                                                                                               
Features  9                                                                                                                   
  Sparse  False                                                                                                               
    Path  /Users/cedrickulbach/Documents/Projects/deep-river/.venv/lib/python3.10/site-packages/river/datasets/phishing.csv.gz

Models

Logistic regression
Pipeline (
  StandardScaler (
    with_std=True
  ),
  LogisticRegression (
    optimizer=SGD (
      lr=Constant (
        learning_rate=0.005
      )
    )
    loss=Log (
      weight_pos=1.
      weight_neg=1.
    )
    l2=0.
    l1=0.
    intercept_init=0.
    intercept_lr=Constant (
      learning_rate=0.01
    )
    clip_gradient=1e+12
    initializer=Zeros ()
  )
)
Deep River Logistic
Pipeline (
  StandardScaler (
    with_std=True
  ),
  LogisticRegressionInitialized (
    n_features=10
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="sgd"
    lr=0.005
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)
Deep River MLP
Pipeline (
  StandardScaler (
    with_std=True
  ),
  MultiLayerPerceptronInitialized (
    n_features=10
    n_width=5
    n_layers=5
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="sgd"
    lr=0.005
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)
Deep River LSTM
Pipeline (
  StandardScaler (
    with_std=True
  ),
  LSTMClassifier (
    n_features=10
    hidden_size=32
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="adam"
    lr=0.001
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)
Deep River RNN
Pipeline (
  StandardScaler (
    with_std=True
  ),
  RNNClassifier (
    n_features=10
    hidden_size=32
    num_layers=1
    nonlinearity="tanh"
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="adam"
    lr=0.001
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)
[baseline] Prior class
PriorClassifier ()

Multiclass classification

Hyperplane (limited 5000)

LED (limited 5000)

RandomRBF (limited 5000)

Datasets

Hyperplane (limited 5000)
Hyperplane(limited n=5000)
LED (limited 5000)
LED(limited n=5000)
RandomRBF (limited 5000)
RandomRBF(limited n=5000)

Models

Logistic regression
Pipeline (
  StandardScaler (
    with_std=True
  ),
  LogisticRegression (
    optimizer=SGD (
      lr=Constant (
        learning_rate=0.005
      )
    )
    loss=Log (
      weight_pos=1.
      weight_neg=1.
    )
    l2=0.
    l1=0.
    intercept_init=0.
    intercept_lr=Constant (
      learning_rate=0.01
    )
    clip_gradient=1e+12
    initializer=Zeros ()
  )
)
Deep River Logistic
Pipeline (
  StandardScaler (
    with_std=True
  ),
  LogisticRegressionInitialized (
    n_features=10
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="sgd"
    lr=0.005
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)
Deep River MLP
Pipeline (
  StandardScaler (
    with_std=True
  ),
  MultiLayerPerceptronInitialized (
    n_features=10
    n_width=5
    n_layers=5
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="sgd"
    lr=0.005
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)
Deep River LSTM
Pipeline (
  StandardScaler (
    with_std=True
  ),
  LSTMClassifier (
    n_features=10
    hidden_size=32
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="adam"
    lr=0.001
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)
Deep River RNN
Pipeline (
  StandardScaler (
    with_std=True
  ),
  RNNClassifier (
    n_features=10
    hidden_size=32
    num_layers=1
    nonlinearity="tanh"
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="adam"
    lr=0.001
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)
[baseline] Last Class
NoChangeClassifier ()
[baseline] Prior Class
PriorClassifier ()

Regression

ChickWeights

TrumpApproval

Datasets

ChickWeights
Chick weights along time.

The stream contains 578 items and 3 features. The goal is to predict the weight of each chick
along time, according to the diet the chick is on. The data is ordered by time and then by
chick.

    Name  ChickWeights                                                                                                          
    Task  Regression                                                                                                            
 Samples  578                                                                                                                   
Features  3                                                                                                                     
  Sparse  False                                                                                                                 
    Path  /Users/cedrickulbach/Documents/Projects/deep-river/.venv/lib/python3.10/site-packages/river/datasets/chick-weights.csv
TrumpApproval
Donald Trump approval ratings.

This dataset was obtained by reshaping the data used by FiveThirtyEight for analyzing Donald
Trump's approval ratings. It contains 5 features, which are approval ratings collected by
5 polling agencies. The target is the approval rating from FiveThirtyEight's model. The goal of
this task is to see if we can reproduce FiveThirtyEight's model.

    Name  TrumpApproval                                                                                                             
    Task  Regression                                                                                                                
 Samples  1,001                                                                                                                     
Features  6                                                                                                                         
  Sparse  False                                                                                                                     
    Path  /Users/cedrickulbach/Documents/Projects/deep-river/.venv/lib/python3.10/site-packages/river/datasets/trump_approval.csv.gz

Models

Linear regression
Pipeline (
  StandardScaler (
    with_std=True
  ),
  LinearRegression (
    optimizer=SGD (
      lr=Constant (
        learning_rate=0.005
      )
    )
    loss=Squared ()
    l2=0.
    l1=0.
    intercept_init=0.
    intercept_lr=Constant (
      learning_rate=0.01
    )
    clip_gradient=1e+12
    initializer=Zeros ()
  )
)
Deep River Linear
Pipeline (
  StandardScaler (
    with_std=True
  ),
  LinearRegressionInitialized (
    n_features=10
    loss_fn="mse"
    optimizer_fn="sgd"
    lr=0.005
    is_feature_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)
Deep River MLP
Pipeline (
  StandardScaler (
    with_std=True
  ),
  MultiLayerPerceptron (
    n_features=10
    n_width=5
    n_layers=5
    loss_fn="mse"
    optimizer_fn="sgd"
    lr=0.005
    is_feature_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)
Deep River LSTM
Pipeline (
  StandardScaler (
    with_std=True
  ),
  LSTMRegressor (
    n_features=10
    hidden_size=64
    num_layers=1
    dropout=0.1
    gradient_clip_value=1.
    loss_fn="mse"
    optimizer_fn="adam"
    lr=0.001
    is_feature_incremental=True
    device="cpu"
    seed=42
  )
)
Deep River RNN
Pipeline (
  StandardScaler (
    with_std=True
  ),
  RNNRegressor (
    n_features=10
    hidden_size=64
    num_layers=1
    nonlinearity="tanh"
    dropout=0.1
    gradient_clip_value=1.
    loss_fn="mse"
    optimizer_fn="adam"
    lr=0.001
    is_feature_incremental=True
    device="cpu"
    seed=42
  )
)
River MLP
Pipeline (
  StandardScaler (
    with_std=True
  ),
  MLPRegressor (
    hidden_dims=(10,)
    activations=(<class 'river.neural_net.activations.ReLU'>, <class 'river.neural_net.activations.ReLU'>, <class 'river.neural_net.activations.Identity'>)
    loss=Squared ()
    optimizer=SGD (
      lr=Constant (
        learning_rate=0.005
      )
    )
    seed=42
  )
)
[baseline] Mean predictor
StatisticRegressor (
  statistic=Mean ()
)

Environment

Python implementation: CPython
Python version       : 3.12.11
IPython version      : 9.6.0

river       : 0.22.0
numpy       : 1.26.4
scikit-learn: 1.5.2
pandas      : 2.2.3
scipy       : 1.16.2
plotly      : 6.3.1

Compiler    : Clang 20.1.4 
OS          : Linux
Release     : 6.11.0-1018-azure
Machine     : x86_64
Processor   : x86_64
CPU cores   : 4
Architecture: 64bit