Binary classification¶

Bananas¶

Summary¶

Model	Accuracy	F1	Memory in Mb	Time in s
Deep River LSTM	0.606604	0.36219	0.0321245	1183.4
Deep River Logistic	0.527925	0.222981	0.019639	151.192
Deep River MLP	0.548113	0.101313	0.0418444	278.597
Deep River RNN	0.541321	0.271938	0.0323954	444.79
Logistic regression	0.543208	0.197015	0.00424099	16.6572
[baseline] Prior class	0.551236	0.00335289	0.000611305	7.11342

Charts¶

Elec2¶

Summary¶

Model	Accuracy	F1	Memory in Mb	Time in s
Deep River LSTM	0.835751	0.79717	0.0359993	3685.53
Deep River Logistic	0.837372	0.799445	0.0213318	1308.76
Deep River MLP	0.582715	0.291357	0.0435371	2416.23
Deep River RNN	0.833267	0.799565	0.0357666	3638.42
Logistic regression	0.822144	0.777086	0.005373	201.865
[baseline] Prior class	0.575335	0.00248834	0.000611305	86.623

Charts¶

Phishing¶

Summary¶

Model	Accuracy	F1	Memory in Mb	Time in s
Deep River LSTM	0.8696	0.843118	0.0370378	383.293
Deep River Logistic	0.8488	0.832	0.0215311	45.0264
Deep River MLP	0.5528	0.231087	0.0437365	71.7688
Deep River RNN	0.8784	0.86055	0.0373087	110.05
Logistic regression	0.8872	0.871233	0.00556469	6.14968
[baseline] Prior class	0.554844	0.0794702	0.000611305	2.30399

Charts¶

Datasets¶

Bananas

Bananas dataset.

An artificial dataset where instances belongs to several clusters with a banana shape. There are two attributes that correspond to the x and y axis, respectively.

Name  Bananas                                                                                                        
Task  Binary classification

Samples 5,300
Features 2
Sparse False
Path /Users/cedrickulbach/Documents/Projects/deep-river/.venv/lib/python3.10/site-packages/river/datasets/banana.zip

Elec2

Electricity prices in New South Wales.

This is a binary classification task, where the goal is to predict if the price of electricity will go up or down.

This data was collected from the Australian New South Wales Electricity Market. In this market, prices are not fixed and are affected by demand and supply of the market. They are set every five minutes. Electricity transfers to/from the neighboring state of Victoria were done to alleviate fluctuations.

  Name  Elec2                                                      
  Task  Binary classification

Samples 45,312
Features 8
Sparse False
Path /Users/cedrickulbach/river_data/Elec2/electricity.csv
URL https://maxhalford.github.io/files/datasets/electricity.zip Size 2.95 MiB
Downloaded True

Phishing

Phishing websites.

This dataset contains features from web pages that are classified as phishing or not.

Name  Phishing                                                                                                            
Task  Binary classification

Samples 1,250
Features 9
Sparse False
Path /Users/cedrickulbach/Documents/Projects/deep-river/.venv/lib/python3.10/site-packages/river/datasets/phishing.csv.gz

Models¶

Logistic regression

Pipeline (
  StandardScaler (
    with_std=True
  ),
  LogisticRegression (
    optimizer=SGD (
      lr=Constant (
        learning_rate=0.005
      )
    )
    loss=Log (
      weight_pos=1.
      weight_neg=1.
    )
    l2=0.
    l1=0.
    intercept_init=0.
    intercept_lr=Constant (
      learning_rate=0.01
    )
    clip_gradient=1e+12
    initializer=Zeros ()
  )
)

Deep River Logistic

Pipeline (
  StandardScaler (
    with_std=True
  ),
  LogisticRegressionInitialized (
    n_features=10
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="sgd"
    lr=0.005
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)

Deep River MLP

Pipeline (
  StandardScaler (
    with_std=True
  ),
  MultiLayerPerceptronInitialized (
    n_features=10
    n_width=5
    n_layers=5
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="sgd"
    lr=0.005
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)

Deep River LSTM

Pipeline (
  StandardScaler (
    with_std=True
  ),
  LSTMClassifier (
    n_features=10
    hidden_size=32
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="adam"
    lr=0.001
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)

Deep River RNN

Pipeline (
  StandardScaler (
    with_std=True
  ),
  RNNClassifier (
    n_features=10
    hidden_size=32
    num_layers=1
    nonlinearity="tanh"
    n_init_classes=2
    loss_fn="cross_entropy"
    optimizer_fn="adam"
    lr=0.001
    output_is_logit=True
    is_feature_incremental=True
    is_class_incremental=True
    device="cpu"
    seed=42
    gradient_clip_value=None
  )
)

[baseline] Prior class

PriorClassifier ()

Environment¶

Python implementation: CPython
Python version       : 3.12.12
IPython version      : 9.6.0

river       : 0.22.0
numpy       : 1.26.4
scikit-learn: 1.5.2
pandas      : 2.2.3
scipy       : 1.16.2

Compiler    : Clang 21.1.4 
OS          : Linux
Release     : 6.11.0-1018-azure
Machine     : x86_64
Processor   : x86_64
CPU cores   : 4
Architecture: 64bit