[1;36m╔══════════════════════════╗[0m
[1;33m║     Summarized Results   ║[0m
[1;36m╚══════════════════════════╝[0m
# Experiment Plan and Raw Results

## Experiment Overview
The experiment aims to investigate whether ensemble methods (Random Forests, Gradient Boosting) are more robust to added noise compared to linear models like Logistic Regression when applied to the Breast Cancer Wisconsin dataset for binary classification.

## Experiment Structure
- **Research Question**: Are ensemble methods more robust to added noise in the Breast Cancer Wisconsin dataset compared to linear models like Logistic Regression for a binary classification task?
- **Hypothesis**: Ensemble methods will show a slower rate of performance degradation with increasing noise compared to Logistic Regression, indicating higher robustness to noise.

## Experimental Design
- **Independent Variables**:
  - Model Type: logistic_regression, random_forest, gradient_boosting
  - Noise Level: 0, 5, 15, 25

- **Dependent Variables**:
  - Performance degradation rate

- **Constant Variables**:
  - Dataset (Breast Cancer Wisconsin)
  - Random seed
  - Performance metrics
  - Noise generation method

## Completed Experiments
The control group experiment with logistic_regression at noise_level=0 has been completed. Results from the run:

### Control Group - Partition 1 (logistic_regression, noise_level=0)
- **Accuracy**: 0.9737 ± 0.0166
- **Precision**: 0.9683 ± 0.0290
- **Recall**: 0.9916 ± 0.0112
- **F1 Score**: 0.9794 ± 0.0127
- **ROC AUC**: 0.9953 ± 0.0055
- **Training Time**: 0.0017 ± 0.0002 seconds

## Pending Experiments
The following experimental configurations are still pending execution:

### Experimental Group - Partition 1
- random_forest, noise_level=0
- gradient_boosting, noise_level=0
- logistic_regression, noise_level=5
- random_forest, noise_level=5
- gradient_boosting, noise_level=5

### Experimental Group - Partition 2
- logistic_regression, noise_level=15
- random_forest, noise_level=15
- gradient_boosting, noise_level=15
- logistic_regression, noise_level=25
- random_forest, noise_level=25

### Experimental Group - Partition 3
- gradient_boosting, noise_level=25

Once all experiments are completed, the results will be analyzed to compare the performance degradation rates across different model types and noise levels.
# Experimental Setup and Raw Results

## Control Group
**Partition 1:**
- Independent Variables: 
  - Model Type: Logistic Regression
  - Noise Type: None

**Results:**
- Accuracy: 0.9737 ± 0.0166
- Precision: 0.9683 ± 0.0290
- Recall: 0.9916 ± 0.0112
- F1 Score: 0.9794 ± 0.0127

## Experimental Group
The experimental group tests were planned but not yet completed, with the following configurations:

**Partition 1:**
- Random Forest with no noise
- Gradient Boosting with no noise
- Logistic Regression with Gaussian noise
- Random Forest with Gaussian noise
- Gradient Boosting with Gaussian noise

**Partition 2:**
- Logistic Regression with label flipping
- Random Forest with label flipping
- Gradient Boosting with label flipping
- Logistic Regression with feature deletion
- Random Forest with feature deletion

**Partition 3:**
- Gradient Boosting with feature deletion

## Dataset Information
- Dataset: Breast Cancer Wisconsin
- Samples: 569
- Features: 30
- Classes: 2 (Binary classification)
- Class distribution: [212, 357]

The experiments were designed to compare the robustness of ensemble methods (Random Forests and Gradient Boosting) against Logistic Regression under various noise conditions, but only the control experiment (Logistic Regression with no noise) has been completed.
# Experimental Plan and Results: Model Robustness to Noise in Breast Cancer Wisconsin Dataset

## Experiment Setup

This experiment investigates whether ensemble methods (Random Forests, Gradient Boosting) are more robust to added noise compared to linear models like Logistic Regression for binary classification tasks on the Breast Cancer Wisconsin dataset.

### Variables
- **Independent Variables**: 
  - Model type: Logistic Regression, Random Forest, Gradient Boosting
  - Noise level: 0%, 10%, 20%, 30% of feature standard deviation
  
- **Dependent Variables**:
  - Accuracy
  - Precision
  - Recall
  - F1-Score
  - AUC-ROC

### Constant Variables
- Dataset (Breast Cancer Wisconsin)
- Random seed
- Evaluation metrics
- Cross-validation folds
- Hyperparameter tuning strategy

## Raw Results

### Control Group (Partition 1)
- **Logistic Regression, Noise Level: 0%**
  - Accuracy: 0.9737 ± 0.0166
  - Precision: 0.9683 ± 0.0290
  - Recall: 0.9916 ± 0.0112
  - F1 Score: 0.9794 ± 0.0127
  - AUC-ROC: 0.9953 ± 0.0055

### Experimental Group (Partition 1)
- **Random Forest, Noise Level: 0%**
  - Accuracy: 0.9561 ± 0.0123
  - Precision: 0.9651 ± 0.0282
  - Recall: 0.9665 ± 0.0272
  - F1 Score: 0.9651 ± 0.0097
  - AUC-ROC: 0.9886 ± 0.0082

- **Gradient Boosting, Noise Level: 0%**
  - Accuracy: 0.9491 ± 0.0237
  - Precision: 0.9471 ± 0.0354
  - Recall: 0.9748 ± 0.0184
  - F1 Score: 0.9602 ± 0.0179
  - AUC-ROC: 0.9927 ± 0.0041

- **Logistic Regression, Noise Level: 10%**
  - Accuracy: 0.9754 ± 0.0129
  - Precision: 0.9704 ± 0.0211
  - Recall: 0.9916 ± 0.0112
  - F1 Score: 0.9807 ± 0.0101
  - AUC-ROC: 0.9953 ± 0.0050

- **Random Forest, Noise Level: 10%**
  - Accuracy: 0.9526 ± 0.0180
  - Precision: 0.9598 ± 0.0287
  - Recall: 0.9665 ± 0.0323
  - F1 Score: 0.9624 ± 0.0143
  - AUC-ROC: 0.9897 ± 0.0067

- **Gradient Boosting, Noise Level: 10%**
  - Accuracy: 0.9631 ± 0.0103
  - Precision: 0.9670 ± 0.0133
  - Recall: 0.9748 ± 0.0206
  - F1 Score: 0.9707 ± 0.0083
  - AUC-ROC: 0.9936 ± 0.0043

### Experimental Group (Partition 2)
- **Logistic Regression, Noise Level: 20%**
  - Accuracy: 0.9719 ± 0.0129
  - Precision: 0.9676 ± 0.0213
  - Recall: 0.9888 ± 0.0105
  - F1 Score: 0.9779 ± 0.0100
  - AUC-ROC: 0.9949 ± 0.0059

- **Random Forest, Noise Level: 20%**
  - Accuracy: 0.9420 ± 0.0204
  - Precision: 0.9442 ± 0.0338
  - Recall: 0.9664 ± 0.0188
  - F1 Score: 0.9546 ± 0.0153
  - AUC-ROC: 0.9869 ± 0.0073

- **Gradient Boosting, Noise Level: 20%**
  - Accuracy: 0.9613 ± 0.0089
  - Precision: 0.9646 ± 0.0194
  - Recall: 0.9748 ± 0.0106
  - F1 Score: 0.9694 ± 0.0067
  - AUC-ROC: 0.9863 ± 0.0101

- **Logistic Regression, Noise Level: 30%**
  - Accuracy: 0.9701 ± 0.0142
  - Precision: 0.9649 ± 0.0215
  - Recall: 0.9888 ± 0.0105
  - F1 Score: 0.9765 ± 0.0112
  - AUC-ROC: 0.9943 ± 0.0052

- **Random Forest, Noise Level: 30%**
  - Accuracy: 0.9385 ± 0.0199
  - Precision: 0.9389 ± 0.0317
  - Recall: 0.9664 ± 0.0211
  - F1 Score: 0.9519 ± 0.0150
  - AUC-ROC: 0.9858 ± 0.0073

### Experimental Group (Partition 3)
- Results for Gradient Boosting at 30% noise level were not provided in the available results.

## Dataset Information
- Dataset: Breast Cancer Wisconsin
- Number of samples: 569
- Number of features: 30
- Number of classes: 2
- Class distribution: [212 357]
[1;36m╔══════════════════════╗[0m
[1;33m║     Raw Results      ║[0m
[1;36m╚══════════════════════╝[0m
Here is the experimental plan
{'control_group': {'partition_1': {'independent_vars': [{'model_type': 'logistic_regression', 'noise_level': '0'}], 'control_experiment_filename': '/workspace/default_research_ebede925-2005-498e-bbd3-09686e50b0ab/control_experiment_ebede925-2005-498e-bbd3-09686e50b0ab_control_group_partition_1.sh', 'control_experiment_results_filename': '/workspace/default_research_ebede925-2005-498e-bbd3-09686e50b0ab/results_ebede925-2005-498e-bbd3-09686e50b0ab_control_group_partition_1.json', 'all_control_experiment_results_filename': '/workspace/default_research_ebede925-2005-498e-bbd3-09686e50b0ab/all_results_ebede925-2005-498e-bbd3-09686e50b0ab_control_group_partition_1.txt', 'done': True}}, 'experimental_group': {'partition_1': {'independent_vars': [{'model_type': 'random_forest', 'noise_level': '0'}, {'model_type': 'gradient_boosting', 'noise_level': '0'}, {'model_type': 'logistic_regression', 'noise_level': '5'}, {'model_type': 'random_forest', 'noise_level': '5'}, {'model_type': 'gradient_boosting', 'noise_level': '5'}], 'control_experiment_filename': '', 'control_experiment_results_filename': '', 'all_control_experiment_results_filename': '', 'done': False}, 'partition_2': {'independent_vars': [{'model_type': 'logistic_regression', 'noise_level': '15'}, {'model_type': 'random_forest', 'noise_level': '15'}, {'model_type': 'gradient_boosting', 'noise_level': '15'}, {'model_type': 'logistic_regression', 'noise_level': '25'}, {'model_type': 'random_forest', 'noise_level': '25'}], 'control_experiment_filename': '', 'control_experiment_results_filename': '', 'all_control_experiment_results_filename': '', 'done': False}, 'partition_3': {'independent_vars': [{'model_type': 'gradient_boosting', 'noise_level': '25'}], 'control_experiment_filename': '', 'control_experiment_results_filename': '', 'all_control_experiment_results_filename': '', 'done': False}}, 'question': 'Are ensemble methods (e.g., Random Forests, Gradient Boosting) more robust to added noise in the Breast Cancer Wisconsin dataset compared to linear models like Logistic Regression for a binary classification task?', 'workspace_dir': '/workspace/default_research_ebede925-2005-498e-bbd3-09686e50b0ab', 'hypothesis': 'Ensemble methods will show a slower rate of performance degradation with increasing noise compared to Logistic Regression, indicating higher robustness to noise', 'constant_vars': ['dataset', 'random_seed', 'performance_metrics', 'noise_generation_method'], 'independent_vars': ['model_type', 'noise_level'], 'dependent_vars': ['performance_degradation_rate'], 'controlled_experiment_setup_description': '1. Load the Breast Cancer Wisconsin dataset. 2. Preprocess the data. 3. For each model type, train on clean data and record baseline performance. 4. Gradually increase noise levels and measure the rate at which performance metrics degrade. 5. Calculate the performance degradation rate as the slope of performance vs. noise level. 6. Compare degradation rates between model types to determine robustness.', 'priority': 2, 'plan_id': 'ebede925-2005-498e-bbd3-09686e50b0ab', 'dataset_dir': None}

Here are the actual results of the experiments: 



Here are the results from 2 separate runs of this workflow:



Result 1:

Starting Breast Cancer Wisconsin dataset experiment (Control Group - Partition 1)

Tue Jun  3 08:04:34 UTC 2025

----------------------------------------

Setting up Python environment...

Python version:

Python 3.12.10

Environment path: /workspace/default_research_ebede925-2005-498e-bbd3-09686e50b0ab/venv

----------------------------------------

Checking GPU availability:

Traceback (most recent call last):

  File "<string>", line 1, in <module>

ModuleNotFoundError: No module named 'torch'

----------------------------------------

Running breast cancer experiment...

2025-06-03 08:04:36,452 - __main__ - INFO - Starting breast cancer experiment...

2025-06-03 08:04:36,452 - __main__ - INFO - Loading Breast Cancer Wisconsin dataset...

2025-06-03 08:04:36,467 - __main__ - INFO - Dataset loaded successfully. Shape: (569, 30), Classes: [0 1]

2025-06-03 08:04:36,467 - __main__ - INFO - Class distribution: [212 357]

2025-06-03 08:04:36,467 - __main__ - INFO - Standardizing features...

2025-06-03 08:04:36,469 - __main__ - INFO - No noise addition (noise_level=0)

2025-06-03 08:04:36,469 - __main__ - INFO - Starting 5-fold cross-validation with noise_level=0.0...

2025-06-03 08:04:36,491 - __main__ - INFO - Fold 1/5 - Accuracy: 0.9737, F1: 0.9787, AUC: 0.9954

2025-06-03 08:04:36,496 - __main__ - INFO - Fold 2/5 - Accuracy: 0.9474, F1: 0.9595, AUC: 0.9977

2025-06-03 08:04:36,501 - __main__ - INFO - Fold 3/5 - Accuracy: 0.9649, F1: 0.9730, AUC: 0.9848

2025-06-03 08:04:36,507 - __main__ - INFO - Fold 4/5 - Accuracy: 0.9912, F1: 0.9930, AUC: 1.0000

2025-06-03 08:04:36,512 - __main__ - INFO - Fold 5/5 - Accuracy: 0.9912, F1: 0.9930, AUC: 0.9987

2025-06-03 08:04:36,512 - __main__ - INFO - Cross-validation completed. Average metrics:

2025-06-03 08:04:36,512 - __main__ - INFO -   Accuracy: 0.9737 ± 0.0166

2025-06-03 08:04:36,512 - __main__ - INFO -   Precision: 0.9683 ± 0.0290

2025-06-03 08:04:36,512 - __main__ - INFO -   Recall: 0.9916 ± 0.0112

2025-06-03 08:04:36,512 - __main__ - INFO -   F1 Score: 0.9794 ± 0.0127

2025-06-03 08:04:36,512 - __main__ - INFO -   ROC AUC: 0.9953 ± 0.0055

2025-06-03 08:04:36,512 - __main__ - INFO -   Training Time: 0.0034 ± 0.0038 seconds

2025-06-03 08:04:36,512 - __main__ - INFO - Saving results to /workspace/default_research_ebede925-2005-498e-bbd3-09686e50b0ab/results_ebede925-2005-498e-bbd3-09686e50b0ab_control_group_partition_1.json...

2025-06-03 08:04:36,512 - __main__ - INFO - Results saved successfully.

2025-06-03 08:04:36,512 - __main__ - INFO - Experiment completed successfully.

----------------------------------------

Experiment completed successfully!

Results summary:

Model type: logistic_regression

Noise level: 0.0

Accuracy: 0.9737 ± 0.0166

Precision: 0.9683 ± 0.0290

Recall: 0.9916 ± 0.0112

F1 Score: 0.9794 ± 0.0127

ROC AUC: 0.9953 ± 0.0055

----------------------------------------

Experiment finished at: Tue Jun  3 08:04:36 UTC 2025





Result 2:

Starting Breast Cancer Wisconsin dataset experiment (Control Group - Partition 1)

Tue Jun  3 08:07:07 UTC 2025

----------------------------------------

Setting up Python environment...

Python version:

Python 3.12.10

Environment path: /workspace/default_research_ebede925-2005-498e-bbd3-09686e50b0ab/venv

----------------------------------------

Checking GPU availability:

Traceback (most recent call last):

  File "<string>", line 1, in <module>

ModuleNotFoundError: No module named 'torch'

----------------------------------------

Running breast cancer experiment...

2025-06-03 08:07:07,647 - __main__ - INFO - Starting breast cancer experiment...

2025-06-03 08:07:07,647 - __main__ - INFO - Loading Breast Cancer Wisconsin dataset...

2025-06-03 08:07:07,651 - __main__ - INFO - Dataset loaded successfully. Shape: (569, 30), Classes: [0 1]

2025-06-03 08:07:07,651 - __main__ - INFO - Class distribution: [212 357]

2025-06-03 08:07:07,651 - __main__ - INFO - Standardizing features...

2025-06-03 08:07:07,652 - __main__ - INFO - No noise addition (noise_level=0)

2025-06-03 08:07:07,652 - __main__ - INFO - Starting 5-fold cross-validation with noise_level=0.0...

2025-06-03 08:07:07,658 - __main__ - INFO - Fold 1/5 - Accuracy: 0.9737, F1: 0.9787, AUC: 0.9954

2025-06-03 08:07:07,664 - __main__ - INFO - Fold 2/5 - Accuracy: 0.9474, F1: 0.9595, AUC: 0.9977

2025-06-03 08:07:07,669 - __main__ - INFO - Fold 3/5 - Accuracy: 0.9649, F1: 0.9730, AUC: 0.9848

2025-06-03 08:07:07,674 - __main__ - INFO - Fold 4/5 - Accuracy: 0.9912, F1: 0.9930, AUC: 1.0000

2025-06-03 08:07:07,679 - __main__ - INFO - Fold 5/5 - Accuracy: 0.9912, F1: 0.9930, AUC: 0.9987

2025-06-03 08:07:07,679 - __main__ - INFO - Cross-validation completed. Average metrics:

2025-06-03 08:07:07,679 - __main__ - INFO -   Accuracy: 0.9737 ± 0.0166

2025-06-03 08:07:07,679 - __main__ - INFO -   Precision: 0.9683 ± 0.0290

2025-06-03 08:07:07,679 - __main__ - INFO -   Recall: 0.9916 ± 0.0112

2025-06-03 08:07:07,680 - __main__ - INFO -   F1 Score: 0.9794 ± 0.0127

2025-06-03 08:07:07,680 - __main__ - INFO -   ROC AUC: 0.9953 ± 0.0055

2025-06-03 08:07:07,680 - __main__ - INFO -   Training Time: 0.0017 ± 0.0002 seconds

2025-06-03 08:07:07,680 - __main__ - INFO - Saving results to /workspace/default_research_ebede925-2005-498e-bbd3-09686e50b0ab/results_ebede925-2005-498e-bbd3-09686e50b0ab_control_group_partition_1.json...

2025-06-03 08:07:07,680 - __main__ - INFO - Results saved successfully.

2025-06-03 08:07:07,680 - __main__ - INFO - Experiment completed successfully.

----------------------------------------

Experiment completed successfully!

Results summary:

Model type: logistic_regression

Noise level: 0.0

Accuracy: 0.9737 ± 0.0166

Precision: 0.9683 ± 0.0290

Recall: 0.9916 ± 0.0112

F1 Score: 0.9794 ± 0.0127

ROC AUC: 0.9953 ± 0.0055

----------------------------------------

Experiment finished at: Tue Jun  3 08:07:07 UTC 2025



Starting Breast Cancer Wisconsin dataset experiment (Control Group - Partition 1)

Tue Jun  3 08:07:07 UTC 2025

----------------------------------------

Setting up Python environment...

Python version:

Python 3.12.10

Environment path: /workspace/default_research_ebede925-2005-498e-bbd3-09686e50b0ab/venv

----------------------------------------

Checking GPU availability:

Traceback (most recent call last):

  File "<string>", line 1, in <module>

ModuleNotFoundError: No module named 'torch'

----------------------------------------

Running breast cancer experiment...

2025-06-03 08:07:07,647 - __main__ - INFO - Starting breast cancer experiment...

2025-06-03 08:07:07,647 - __main__ - INFO - Loading Breast Cancer Wisconsin dataset...

2025-06-03 08:07:07,651 - __main__ - INFO - Dataset loaded successfully. Shape: (569, 30), Classes: [0 1]

2025-06-03 08:07:07,651 - __main__ - INFO - Class distribution: [212 357]

2025-06-03 08:07:07,651 - __main__ - INFO - Standardizing features...

2025-06-03 08:07:07,652 - __main__ - INFO - No noise addition (noise_level=0)

2025-06-03 08:07:07,652 - __main__ - INFO - Starting 5-fold cross-validation with noise_level=0.0...

2025-06-03 08:07:07,658 - __main__ - INFO - Fold 1/5 - Accuracy: 0.9737, F1: 0.9787, AUC: 0.9954

2025-06-03 08:07:07,664 - __main__ - INFO - Fold 2/5 - Accuracy: 0.9474, F1: 0.9595, AUC: 0.9977

2025-06-03 08:07:07,669 - __main__ - INFO - Fold 3/5 - Accuracy: 0.9649, F1: 0.9730, AUC: 0.9848

2025-06-03 08:07:07,674 - __main__ - INFO - Fold 4/5 - Accuracy: 0.9912, F1: 0.9930, AUC: 1.0000

2025-06-03 08:07:07,679 - __main__ - INFO - Fold 5/5 - Accuracy: 0.9912, F1: 0.9930, AUC: 0.9987

2025-06-03 08:07:07,679 - __main__ - INFO - Cross-validation completed. Average metrics:

2025-06-03 08:07:07,679 - __main__ - INFO -   Accuracy: 0.9737 ± 0.0166

2025-06-03 08:07:07,679 - __main__ - INFO -   Precision: 0.9683 ± 0.0290

2025-06-03 08:07:07,679 - __main__ - INFO -   Recall: 0.9916 ± 0.0112

2025-06-03 08:07:07,680 - __main__ - INFO -   F1 Score: 0.9794 ± 0.0127

2025-06-03 08:07:07,680 - __main__ - INFO -   ROC AUC: 0.9953 ± 0.0055

2025-06-03 08:07:07,680 - __main__ - INFO -   Training Time: 0.0017 ± 0.0002 seconds

2025-06-03 08:07:07,680 - __main__ - INFO - Saving results to /workspace/default_research_ebede925-2005-498e-bbd3-09686e50b0ab/results_ebede925-2005-498e-bbd3-09686e50b0ab_control_group_partition_1.json...

2025-06-03 08:07:07,680 - __main__ - INFO - Results saved successfully.

2025-06-03 08:07:07,680 - __main__ - INFO - Experiment completed successfully.

----------------------------------------

Experiment completed successfully!

Results summary:

Model type: logistic_regression

Noise level: 0.0

Accuracy: 0.9737 ± 0.0166

Precision: 0.9683 ± 0.0290

Recall: 0.9916 ± 0.0112

F1 Score: 0.9794 ± 0.0127

ROC AUC: 0.9953 ± 0.0055

----------------------------------------

Experiment finished at: Tue Jun  3 08:07:07 UTC 2025

Here is the experimental plan
{'control_group': {'partition_1': {'independent_vars': [{'model_type': 'logistic_regression', 'noise_type': 'none'}], 'control_experiment_filename': '/workspace/default_research_8d2fc1e0-bd98-46bf-ba21-0b01088eba87/control_experiment_8d2fc1e0-bd98-46bf-ba21-0b01088eba87_control_group_partition_1.sh', 'control_experiment_results_filename': '/workspace/default_research_8d2fc1e0-bd98-46bf-ba21-0b01088eba87/results_8d2fc1e0-bd98-46bf-ba21-0b01088eba87_control_group_partition_1.json', 'all_control_experiment_results_filename': '/workspace/default_research_8d2fc1e0-bd98-46bf-ba21-0b01088eba87/all_results_8d2fc1e0-bd98-46bf-ba21-0b01088eba87_control_group_partition_1.txt', 'done': True}}, 'experimental_group': {'partition_1': {'independent_vars': [{'model_type': 'random_forest', 'noise_type': 'none'}, {'model_type': 'gradient_boosting', 'noise_type': 'none'}, {'model_type': 'logistic_regression', 'noise_type': 'gaussian'}, {'model_type': 'random_forest', 'noise_type': 'gaussian'}, {'model_type': 'gradient_boosting', 'noise_type': 'gaussian'}], 'control_experiment_filename': '', 'control_experiment_results_filename': '', 'all_control_experiment_results_filename': '', 'done': False}, 'partition_2': {'independent_vars': [{'model_type': 'logistic_regression', 'noise_type': 'label_flip'}, {'model_type': 'random_forest', 'noise_type': 'label_flip'}, {'model_type': 'gradient_boosting', 'noise_type': 'label_flip'}, {'model_type': 'logistic_regression', 'noise_type': 'feature_deletion'}, {'model_type': 'random_forest', 'noise_type': 'feature_deletion'}], 'control_experiment_filename': '', 'control_experiment_results_filename': '', 'all_control_experiment_results_filename': '', 'done': False}, 'partition_3': {'independent_vars': [{'model_type': 'gradient_boosting', 'noise_type': 'feature_deletion'}], 'control_experiment_filename': '', 'control_experiment_results_filename': '', 'all_control_experiment_results_filename': '', 'done': False}}, 'question': 'Are ensemble methods (e.g., Random Forests, Gradient Boosting) more robust to added noise in the Breast Cancer Wisconsin dataset compared to linear models like Logistic Regression for a binary classification task?', 'workspace_dir': '/workspace/default_research_8d2fc1e0-bd98-46bf-ba21-0b01088eba87', 'hypothesis': 'Ensemble methods will demonstrate greater robustness across different types of noise compared to Logistic Regression, with particular strength in handling feature-level corruption', 'constant_vars': ['dataset', 'noise_intensity_level', 'cross_validation_strategy', 'random_seed'], 'independent_vars': ['model_type', 'noise_type'], 'dependent_vars': ['accuracy', 'precision', 'recall', 'f1_score'], 'controlled_experiment_setup_description': '1. Load the Breast Cancer Wisconsin dataset. 2. Apply different types of noise: (a) Gaussian noise to features, (b) Label flipping for a percentage of samples, (c) Random feature deletion. 3. Train each model type on each noise scenario with consistent noise intensity. 4. Compare performance differences to determine which models are most resistant to different noise types.', 'priority': 3, 'plan_id': '8d2fc1e0-bd98-46bf-ba21-0b01088eba87', 'dataset_dir': None}

Here are the actual results of the experiments: 

# Experiment Results: Logistic Regression on Breast Cancer Wisconsin Dataset

# Model: Logistic Regression

# Noise Type: none



## Performance Metrics (from JSON file):



- Accuracy: 0.9737 ± 0.0166

- Precision: 0.9683 ± 0.0290

- Recall: 0.9916 ± 0.0112

- F1 Score: 0.9794 ± 0.0127



Dataset: Breast Cancer Wisconsin

Samples: 569

Features: 30

Classes: 2

Class distribution: [212, 357]



Here are the results from 2 separate runs of this workflow:



Result 1:

# Experiment Results: Logistic Regression on Breast Cancer Wisconsin Dataset

# Model: Logistic Regression

# Noise Type: none



## Performance Metrics (from JSON file):



- Accuracy: 0.9737 ± 0.0166

- Precision: 0.9683 ± 0.0290

- Recall: 0.9916 ± 0.0112

- F1 Score: 0.9794 ± 0.0127



Dataset: Breast Cancer Wisconsin

Samples: 569

Features: 30

Classes: 2

Class distribution: [212, 357]





Result 2:

# Experiment Results: Logistic Regression on Breast Cancer Wisconsin Dataset

# Model: Logistic Regression

# Noise Type: none



## Performance Metrics (from JSON file):



- Accuracy: 0.9737 ± 0.0166

- Precision: 0.9683 ± 0.0290

- Recall: 0.9916 ± 0.0112

- F1 Score: 0.9794 ± 0.0127



Dataset: Breast Cancer Wisconsin

Samples: 569

Features: 30

Classes: 2

Class distribution: [212, 357]



Here is the experimental plan
{'control_group': {'partition_1': {'independent_vars': [{'model_type': 'logistic_regression', 'noise_level': '0'}], 'control_experiment_filename': '/workspace/default_research_2a57e818-be42-417e-b2da-19c0a25405c6/control_experiment_2a57e818-be42-417e-b2da-19c0a25405c6_control_group_partition_1.sh', 'control_experiment_results_filename': '/workspace/default_research_2a57e818-be42-417e-b2da-19c0a25405c6/results_2a57e818-be42-417e-b2da-19c0a25405c6_control_group_partition_1.txt', 'all_control_experiment_results_filename': '/workspace/default_research_2a57e818-be42-417e-b2da-19c0a25405c6/all_results_2a57e818-be42-417e-b2da-19c0a25405c6_control_group_partition_1.txt', 'done': True}}, 'experimental_group': {'partition_1': {'independent_vars': [{'model_type': 'random_forest', 'noise_level': '0'}, {'model_type': 'gradient_boosting', 'noise_level': '0'}, {'model_type': 'logistic_regression', 'noise_level': '10'}, {'model_type': 'random_forest', 'noise_level': '10'}, {'model_type': 'gradient_boosting', 'noise_level': '10'}], 'control_experiment_filename': '/workspace/default_research_2a57e818-be42-417e-b2da-19c0a25405c6/control_experiment_2a57e818-be42-417e-b2da-19c0a25405c6_experimental_group_partition_1.sh', 'control_experiment_results_filename': '/workspace/default_research_2a57e818-be42-417e-b2da-19c0a25405c6/results_2a57e818-be42-417e-b2da-19c0a25405c6_experimental_group_partition_1.txt', 'all_control_experiment_results_filename': '', 'done': True}, 'partition_2': {'independent_vars': [{'model_type': 'logistic_regression', 'noise_level': '20'}, {'model_type': 'random_forest', 'noise_level': '20'}, {'model_type': 'gradient_boosting', 'noise_level': '20'}, {'model_type': 'logistic_regression', 'noise_level': '30'}, {'model_type': 'random_forest', 'noise_level': '30'}], 'control_experiment_filename': '', 'control_experiment_results_filename': '', 'all_control_experiment_results_filename': '', 'done': False}, 'partition_3': {'independent_vars': [{'model_type': 'gradient_boosting', 'noise_level': '30'}], 'control_experiment_filename': '', 'control_experiment_results_filename': '', 'all_control_experiment_results_filename': '', 'done': False}}, 'question': 'Are ensemble methods (e.g., Random Forests, Gradient Boosting) more robust to added noise in the Breast Cancer Wisconsin dataset compared to linear models like Logistic Regression for a binary classification task?', 'workspace_dir': '/workspace/default_research_2a57e818-be42-417e-b2da-19c0a25405c6', 'hypothesis': 'Ensemble methods (Random Forests, Gradient Boosting) exhibit greater robustness to added noise in the Breast Cancer Wisconsin dataset compared to linear models (Logistic Regression) for binary classification', 'constant_vars': ['dataset', 'random_seed', 'evaluation_metrics', 'cross_validation_folds', 'hyperparameter_tuning_strategy'], 'independent_vars': ['model_type', 'noise_level'], 'dependent_vars': ['accuracy', 'precision', 'recall', 'f1_score', 'auc_roc'], 'controlled_experiment_setup_description': '1. Load the Breast Cancer Wisconsin dataset. 2. Preprocess the data (standardization, handle missing values if any). 3. Implement a noise injection function that adds Gaussian noise at specified levels (noise_level represents percentage of feature standard deviation). 4. For each model type and noise level combination, train the model using 5-fold cross-validation, record performance metrics, and evaluate statistical significance of differences.', 'priority': 1, 'plan_id': '2a57e818-be42-417e-b2da-19c0a25405c6', 'dataset_dir': None}

Here are the actual results of the experiments: 



Here are the results from 2 separate runs of this workflow:



Result 1:

====================================================================================================

EXPERIMENTAL RESULTS SUMMARY - BREAST CANCER WISCONSIN DATASET

====================================================================================================



EXPERIMENT OBJECTIVE:

To investigate whether ensemble methods (Random Forests, Gradient Boosting) are more robust

to added noise compared to linear models like Logistic Regression for binary classification tasks.



RESULTS TABLE:

----------------------------------------------------------------------------------------------------

Model Type           Noise Level  Accuracy        Precision       Recall          F1-Score        AUC-ROC        

----------------------------------------------------------------------------------------------------

gradient_boosting    0.0%         0.9491±0.0237 0.9471±0.0354 0.9748±0.0184 0.9602±0.0179 0.9927±0.0041

gradient_boosting    10.0%        0.9631±0.0103 0.9670±0.0133 0.9748±0.0206 0.9707±0.0083 0.9936±0.0043

logistic_regression  10.0%        0.9754±0.0129 0.9704±0.0211 0.9916±0.0112 0.9807±0.0101 0.9953±0.0050

random_forest        0.0%         0.9561±0.0123 0.9651±0.0282 0.9665±0.0272 0.9651±0.0097 0.9886±0.0082

random_forest        10.0%        0.9526±0.0180 0.9598±0.0287 0.9665±0.0323 0.9624±0.0143 0.9897±0.0067

----------------------------------------------------------------------------------------------------



COMPARATIVE ANALYSIS:

----------------------------------------------------------------------------------------------------



ACCURACY COMPARISON:



  At noise level 0.0%:

    1. random_forest: 0.9561±0.0123

    2. gradient_boosting: 0.9491±0.0237



  At noise level 10.0%:

    1. logistic_regression: 0.9754±0.0129

    2. gradient_boosting: 0.9631±0.0103

    3. random_forest: 0.9526±0.0180



F1 SCORE COMPARISON:



  At noise level 0.0%:

    1. random_forest: 0.9651±0.0097

    2. gradient_boosting: 0.9602±0.0179



  At noise level 10.0%:

    1. logistic_regression: 0.9807±0.0101

    2. gradient_boosting: 0.9707±0.0083

    3. random_forest: 0.9624±0.0143



AUC ROC COMPARISON:



  At noise level 0.0%:

    1. gradient_boosting: 0.9927±0.0041

    2. random_forest: 0.9886±0.0082



  At noise level 10.0%:

    1. logistic_regression: 0.9953±0.0050

    2. gradient_boosting: 0.9936±0.0043

    3. random_forest: 0.9897±0.0067



----------------------------------------------------------------------------------------------------



PERFORMANCE DEGRADATION ANALYSIS:

----------------------------------------------------------------------------------------------------



  RANDOM_FOREST:

    Accuracy degradation: 0.37%

    Precision degradation: 0.55%

    Recall degradation: -0.01%

    F1 Score degradation: 0.28%

    Auc Roc degradation: -0.11%



  GRADIENT_BOOSTING:

    Accuracy degradation: -1.48%

    Precision degradation: -2.10%

    Recall degradation: 0.00%

    F1 Score degradation: -1.08%

    Auc Roc degradation: -0.09%



====================================================================================================



CONCLUSION:

----------------------------------------------------------------------------------------------------

Based on accuracy degradation when noise is added:

  - gradient_boosting: -1.48% degradation in accuracy

  - random_forest: 0.37% degradation in accuracy



The gradient_boosting model shows the least degradation in performance when noise is added,

while the random_forest model shows the most degradation.



====================================================================================================

# Experiment Results: Logistic Regression on Breast Cancer Wisconsin Dataset

# Date: 2025-06-03 07:20:44

# Model: Logistic Regression

# Noise Level: 0% of feature standard deviation

# Cross-validation: 5-fold



## Performance Metrics (Mean ± Std):



- Accuracy: 0.9737 ± 0.0166

- Precision: 0.9683 ± 0.0290

- Recall: 0.9916 ± 0.0112

- F1 Score: 0.9794 ± 0.0127

- AUC-ROC: 0.9953 ± 0.0055



## Dataset Information:



- Dataset: Breast Cancer Wisconsin

- Number of samples: 569

- Number of features: 30

- Number of classes: 2

- Class distribution: [212 357]

========================================================================================================================

EXPERIMENTAL RESULTS SUMMARY - BREAST CANCER WISCONSIN DATASET (PARTITION 2)

========================================================================================================================



EXPERIMENT OBJECTIVE:

To investigate whether ensemble methods (Random Forests, Gradient Boosting) are more robust

to added noise compared to linear models like Logistic Regression for binary classification tasks.



RESULTS TABLE:

------------------------------------------------------------------------------------------------------------------------

Model Type           Noise Level  SNR (dB)   Accuracy        Precision       Recall          F1-Score        AUC-ROC        

------------------------------------------------------------------------------------------------------------------------

gradient_boosting    20.0%        13.99      0.9613±0.0089 0.9646±0.0194 0.9748±0.0106 0.9694±0.0067 0.9863±0.0101

logistic_regression  20.0%        13.99      0.9719±0.0129 0.9676±0.0213 0.9888±0.0105 0.9779±0.0100 0.9949±0.0059

logistic_regression  30.0%        10.47      0.9701±0.0142 0.9649±0.0215 0.9888±0.0105 0.9765±0.0112 0.9943±0.0052

random_forest        20.0%        13.99      0.9420±0.0204 0.9442±0.0338 0.9664±0.0188 0.9546±0.0153 0.9869±0.0073

random_forest        30.0%        10.47      0.9385±0.0199 0.9389±0.0317 0.9664±0.0211 0.9519±0.0150 0.9858±0.0073

------------------------------------------------------------------------------------------------------------------------



COMPARATIVE ANALYSIS:

------------------------------------------------------------------------------------------------------------------------



ACCURACY COMPARISON:



  At noise level 20.0%:

    1. logistic_regression: 0.9719±0.0129

    2. gradient_boosting: 0.9613±0.0089

    3. random_forest: 0.9420±0.0204



  At noise level 30.0%:

    1. logistic_regression: 0.9701±0.0142

    2. random_forest: 0.9385±0.0199



F1 SCORE COMPARISON:



  At noise level 20.0%:

    1. logistic_regression: 0.9779±0.0100

    2. gradient_boosting: 0.9694±0.0067

    3. random_forest: 0.9546±0.0153



  At noise level 30.0%:

    1. logistic_regression: 0.9765±0.0112

    2. random_forest: 0.9519±0.0150



AUC ROC COMPARISON:



  At noise level 20.0%:

    1. logistic_regression: 0.9949±0.0059

    2. random_forest: 0.9869±0.0073

    3. gradient_boosting: 0.9863±0.0101



  At noise level 30.0%:

    1. logistic_regression: 0.9943±0.0052

    2. random_forest: 0.9858±0.0073



------------------------------------------------------------------------------------------------------------------------



PERFORMANCE DEGRADATION ANALYSIS:

------------------------------------------------------------------------------------------------------------------------



  LOGISTIC_REGRESSION:

    Accuracy degradation: 0.18%

    Precision degradation: 0.28%

    Recall degradation: 0.00%

    F1 Score degradation: 0.14%

    Auc Roc degradation: 0.06%



  RANDOM_FOREST:

    Accuracy degradation: 0.37%

    Precision degradation: 0.56%

    Recall degradation: 0.01%

    F1 Score degradation: 0.29%

    Auc Roc degradation: 0.11%



ROBUSTNESS RANKING (based on average performance degradation):

  1. logistic_regression: 0.13% average degradation

  2. random_forest: 0.26% average degradation



========================================================================================================================



CONCLUSION:

------------------------------------------------------------------------------------------------------------------------

Based on the experiments with high noise levels (20% and 30%), logistic_regression showed

the greatest robustness to noise, with an average performance degradation of 0.13%.

In contrast, random_forest was the most affected by noise, with 0.26% degradation.



The results show a mixed pattern of robustness across model types,

suggesting that the relationship between model complexity and noise robustness

may depend on specific characteristics of the dataset and noise levels.
====================================================================================================

EXPERIMENTAL RESULTS SUMMARY - BREAST CANCER WISCONSIN DATASET

====================================================================================================



EXPERIMENT OBJECTIVE:

To investigate whether ensemble methods (Random Forests, Gradient Boosting) are more robust

to added noise compared to linear models like Logistic Regression for binary classification tasks.



RESULTS TABLE:

----------------------------------------------------------------------------------------------------

Model Type           Noise Level  Accuracy        Precision       Recall          F1-Score        AUC-ROC        

----------------------------------------------------------------------------------------------------

gradient_boosting    0.0%         0.9491±0.0237 0.9471±0.0354 0.9748±0.0184 0.9602±0.0179 0.9927±0.0041

gradient_boosting    10.0%        0.9631±0.0103 0.9670±0.0133 0.9748±0.0206 0.9707±0.0083 0.9936±0.0043

logistic_regression  10.0%        0.9754±0.0129 0.9704±0.0211 0.9916±0.0112 0.9807±0.0101 0.9953±0.0050

random_forest        0.0%         0.9561±0.0123 0.9651±0.0282 0.9665±0.0272 0.9651±0.0097 0.9886±0.0082

random_forest        10.0%        0.9526±0.0180 0.9598±0.0287 0.9665±0.0323 0.9624±0.0143 0.9897±0.0067

----------------------------------------------------------------------------------------------------



COMPARATIVE ANALYSIS:

----------------------------------------------------------------------------------------------------



ACCURACY COMPARISON:



  At noise level 0.0%:

    1. random_forest: 0.9561±0.0123

    2. gradient_boosting: 0.9491±0.0237



  At noise level 10.0%:

    1. logistic_regression: 0.9754±0.0129

    2. gradient_boosting: 0.9631±0.0103

    3. random_forest: 0.9526±0.0180



F1 SCORE COMPARISON:



  At noise level 0.0%:

    1. random_forest: 0.9651±0.0097

    2. gradient_boosting: 0.9602±0.0179



  At noise level 10.0%:

    1. logistic_regression: 0.9807±0.0101

    2. gradient_boosting: 0.9707±0.0083

    3. random_forest: 0.9624±0.0143



AUC ROC COMPARISON:



  At noise level 0.0%:

    1. gradient_boosting: 0.9927±0.0041

    2. random_forest: 0.9886±0.0082



  At noise level 10.0%:

    1. logistic_regression: 0.9953±0.0050

    2. gradient_boosting: 0.9936±0.0043

    3. random_forest: 0.9897±0.0067



----------------------------------------------------------------------------------------------------



PERFORMANCE DEGRADATION ANALYSIS:

----------------------------------------------------------------------------------------------------



  RANDOM_FOREST:

    Accuracy degradation: 0.37%

    Precision degradation: 0.55%

    Recall degradation: -0.01%

    F1 Score degradation: 0.28%

    Auc Roc degradation: -0.11%



  GRADIENT_BOOSTING:

    Accuracy degradation: -1.48%

    Precision degradation: -2.10%

    Recall degradation: 0.00%

    F1 Score degradation: -1.08%

    Auc Roc degradation: -0.09%



====================================================================================================



CONCLUSION:

----------------------------------------------------------------------------------------------------

Based on accuracy degradation when noise is added:

  - gradient_boosting: -1.48% degradation in accuracy

  - random_forest: 0.37% degradation in accuracy



The gradient_boosting model shows the least degradation in performance when noise is added,

while the random_forest model shows the most degradation.



====================================================================================================


Here are the results from 2 separate runs of this workflow:



Result 1:

# Experiment Results: Logistic Regression on Breast Cancer Wisconsin Dataset

# Date: 2025-06-03 07:20:07

# Model: Logistic Regression

# Noise Level: 0% of feature standard deviation

# Cross-validation: 5-fold



## Performance Metrics (Mean ± Std):



- Accuracy: 0.9737 ± 0.0166

- Precision: 0.9683 ± 0.0290

- Recall: 0.9916 ± 0.0112

- F1 Score: 0.9794 ± 0.0127

- AUC-ROC: 0.9953 ± 0.0055



## Dataset Information:



- Dataset: Breast Cancer Wisconsin

- Number of samples: 569

- Number of features: 30

- Number of classes: 2

- Class distribution: [212 357]





Result 2:

# Experiment Results: Logistic Regression on Breast Cancer Wisconsin Dataset

# Date: 2025-06-03 07:20:44

# Model: Logistic Regression

# Noise Level: 0% of feature standard deviation

# Cross-validation: 5-fold



## Performance Metrics (Mean ± Std):



- Accuracy: 0.9737 ± 0.0166

- Precision: 0.9683 ± 0.0290

- Recall: 0.9916 ± 0.0112

- F1 Score: 0.9794 ± 0.0127

- AUC-ROC: 0.9953 ± 0.0055



## Dataset Information:



- Dataset: Breast Cancer Wisconsin

- Number of samples: 569

- Number of features: 30

- Number of classes: 2

- Class distribution: [212 357]


