{% if favicon %} {% endif %}
{% if logo %} {% endif %}

Model Validation Report

Performance Analysis Report

Robustness Score
0.9 %} stroke="#28a745" {% elif (robustness_score|default(0)) > 0.7 %} stroke="#ffc107" {% elif (robustness_score|default(0)) > 0.5 %} stroke="#fd7e14" {% else %} stroke="#dc3545" {% endif %} stroke-width="10" stroke-dasharray="{{ (robustness_score|default(0) * 314) }} 314" transform="rotate(-90 60 60)" > {{ (robustness_score|default(0) * 100) | round(1) }}%
{{ (base_score|default(0) * 100) | round(1) }}% Base Score
{{ (raw_impact|default(0) * 100) | round(2) }}% Impact
{% if (robustness_score|default(0)) > 0.9 %} Excellent resistance to perturbations {% elif (robustness_score|default(0)) > 0.7 %} Good resistance to perturbations {% elif (robustness_score|default(0)) > 0.5 %} Moderate resistance to perturbations {% else %} Needs improvement in robustness {% endif %}
Model Information
{% if report_data and report_data.alternative_models %} {% endif %}
Type: {{ model_type|default('Unknown') }}
Features: {{ features|length|default(0) }}
Primary Metric: {{ metric|default('Score')|upper }}
Critical Features: {{ feature_subset|length|default(0) }}
Alternative Models: {{ report_data.alternative_models|length|default(0) }}
Test Summary
{% if feature_subset %} {% endif %}
Perturbation Levels: {% if report_data and report_data.raw and report_data.raw.by_level %}{{ report_data.raw.by_level|length }}{% else %}0{% endif %}
Iterations Per Level: {{ iterations|default(10) }}
Max Impact Level: {% set max_level = {'level': '0', 'impact': 0} %} {% if report_data and report_data.raw and report_data.raw.by_level %} {% for level, level_data in report_data.raw.by_level.items() %} {% if level_data.overall_result and level_data.overall_result.all_features and level_data.overall_result.all_features.impact and level_data.overall_result.all_features.impact > max_level.impact %} {% set _ = max_level.update({'level': level, 'impact': level_data.overall_result.all_features.impact}) %} {% endif %} {% endfor %} {% endif %} {{ (max_level.impact * 100)|round(2) }}% at {{ max_level.level }}
Feature Subset Impact: {% if report_data and report_data.feature_subset_max_impact and report_data.feature_subset_max_impact.value > 0 %} {{ (report_data.feature_subset_max_impact.value * 100)|round(2) }}% at {{ report_data.feature_subset_max_impact.level }} {% else %} {% set max_subset_impact = {'level': '0', 'impact': 0} %} {% if report_data and report_data.raw and report_data.raw.by_level %} {% for level, level_data in report_data.raw.by_level.items() %} {% if level_data.overall_result and level_data.overall_result.feature_subset and level_data.overall_result.feature_subset.impact and level_data.overall_result.feature_subset.impact > max_subset_impact.impact %} {% set _ = max_subset_impact.update({'level': level, 'impact': level_data.overall_result.feature_subset.impact}) %} {% endif %} {% endfor %} {% endif %} {% if max_subset_impact.impact > 0 %} {{ (max_subset_impact.impact * 100)|round(2) }}% at {{ max_subset_impact.level }} {% else %} 0.00% {% endif %} {% endif %}

Test Information

Test Type

{{ test_type|capitalize }}
Static report

Model Type

{{ model_type }}
Algorithm

Features

{{ features|length }}
Total features

Iterations

{{ iterations|default(10) }}
Per perturbation

Test Configuration

Generation Time {{ timestamp }}
Feature Subset {{ feature_subset_display }}
Metric {{ metric }}
Report Type Static (non-interactive)

Performance Metrics

Model Metrics Comparison

{% if report_data.metrics_details %} {% for metric_name in report_data.metrics_details|sort %} {% endfor %} {% else %} {% set primary_metrics = report_data.metrics %} {% if primary_metrics %} {% for metric_name, metric_value in primary_metrics.items() %} {% if metric_name != "base_score" and metric_name != "robustness_score" %} {% endif %} {% endfor %} {% endif %} {% endif %} {% if report_data.metrics_details %} {% for metric_name in report_data.metrics_details|sort %} {% endfor %} {% else %} {% set primary_metrics = report_data.metrics %} {% if primary_metrics %} {% for metric_name, metric_value in primary_metrics.items() %} {% if metric_name != "base_score" and metric_name != "robustness_score" %} {% endif %} {% endfor %} {% endif %} {% endif %} {% if report_data and report_data.alternative_models %} {% for model_name, model_data in report_data.alternative_models|dictsort %} {% if report_data.metrics_details %} {% for metric_name in report_data.metrics_details|sort %} {% endfor %} {% else %} {% set primary_metrics = report_data.metrics %} {% if primary_metrics %} {% for metric_name, metric_value in primary_metrics.items() %} {% if metric_name != "base_score" and metric_name != "robustness_score" %} {% endif %} {% endfor %} {% endif %} {% endif %} {% endfor %} {% endif %}
Model Base {{ metric|capitalize }} Robustness Score Avg. Impact{{ metric_name|title }}{{ metric_name|title }}
{{ model_name }} {{ "%.4f"|format(base_score) }} {{ "%.4f"|format(robustness_score) }} {{ "%.4f"|format(raw_impact) }}{{ "%.4f"|format(report_data.metrics_details[metric_name]|default(0)) }}{{ "%.4f"|format(metric_value|default(0)) }}
{{ model_name }} {{ "%.4f"|format(model_data.base_score|default(0)) }} {{ "%.4f"|format(model_data.get('robustness_score', 1.0 - model_data.get('raw_impact', 0))) }} {{ "%.4f"|format(model_data.get('raw_impact', 0)) }} {% if model_data.metrics_details and metric_name in model_data.metrics_details %} {{ "%.4f"|format(model_data.metrics_details[metric_name]|default(0)) }} {% elif model_data.metrics and metric_name in model_data.metrics %} {{ "%.4f"|format(model_data.metrics[metric_name]|default(0)) }} {% else %} - {% endif %} {% if model_data.metrics and metric_name in model_data.metrics %} {{ "%.4f"|format(model_data.metrics[metric_name]|default(0)) }} {% else %} - {% endif %}

Overview

Robustness Score

{{ "%.4f"|format(robustness_score) }}
Higher is better

Base {{ metric|capitalize }}

{{ "%.4f"|format(base_score) }}
Without perturbation

Average Impact

{{ "%.4f"|format(raw_impact) }}
Lower is better
{% if report_data and report_data.alternative_models %}

Models Compared

{{ report_data.alternative_models|length + 1 }}
Including primary model
{% endif %}

Performance by Perturbation Level

Shows the model's average performance at different perturbation levels. The red dotted line represents the base score without perturbations.

{% if charts.overview_chart %}
Model performance by perturbation level
{% else %}

No perturbation data available for visualization.

{% endif %} {% if charts.feature_subset_chart %}

Feature Subset Performance

Comparison between the impact of perturbation on all features versus only on the subset of critical features.

Feature subset performance by perturbation level
{% endif %}

Worst Performance by Perturbation Level

Visualizes the worst-case performance at each perturbation level, showing the most adverse scenarios for the model.

{% if charts.worst_performance_chart %}
Worst model performance by perturbation level
{% else %}

No worst performance data available for visualization.

{% endif %}
{% if report_data and report_data.alternative_models %}

Model Comparison

Compares the performance of different models across perturbation levels, helping to identify which model is more robust.

{% if charts.comparison_chart %}
Model comparison across perturbation levels
{% else %}

No comparison data available for visualization.

{% endif %}
{% endif %}
{% if has_model_feature_importance or feature_importance %}

Feature Importance

{% if has_model_feature_importance %}

Feature Importance Comparison

Compares the feature importance declared by the model with the importance based on robustness analysis, highlighting discrepancies.

{% if charts.feature_comparison_chart %}
Comparison of model-defined vs. robustness-based feature importance
{% else %}

No feature comparison data available for visualization.

{% endif %}
{% endif %} {% if feature_importance %}

Feature Importance Details

{% if has_model_feature_importance %} {% endif %} {% for feature, importance in feature_importance|dictsort(by='value', reverse=true) %} {% if has_model_feature_importance %} {% if model_feature_importance.get(feature) is not none %} {% set diff = (importance|float - model_feature_importance.get(feature, 0)|float) %} {% else %} {% endif %} {% endif %} {% endfor %}
Feature Robustness ImpactModel Importance Difference
{{ feature }} {{ "%.4f"|format(importance) }}{{ "%.4f"|format(model_feature_importance.get(feature, 0)|float) }}{{ "%.4f"|format((diff|abs_value)) }}N/A
{% endif %}
{% endif %} {% if charts.individual_feature_impact_chart %}

Individual Feature Impact Analysis

Feature Sensitivity to Perturbation

Shows the impact on model performance when each feature is perturbed individually. Red bars indicate performance degradation (feature is important for robustness), while green bars indicate performance improvement or no impact.

Individual feature impact analysis
{% endif %} {% if charts.method_comparison_chart %}

Perturbation Method Comparison

Raw vs Quantile Perturbation Performance

Compares the effectiveness of raw (Gaussian noise) vs quantile-based perturbation methods. The filled area highlights performance differences between methods, helping identify which perturbation approach is more suitable for this model.

Comparison of raw vs quantile perturbation methods
{% endif %} {% if charts.selected_features_comparison_chart %}

Selected Features Analysis

All Features vs Selected Features Comparison

Compares the robustness impact when perturbing all features versus only a selected subset. This analysis helps identify whether focusing on specific features provides similar insights with reduced testing time, and shows the relative sensitivity of the selected features.

Comparison of all features vs selected features perturbation
{% endif %} {# Detailed Distribution Analysis Section - REMOVIDO: Enhanced Performance Distribution by Perturbation Level {% if charts.detailed_boxplot_chart %}

Detailed Distribution Analysis

Enhanced Performance Distribution by Perturbation Level

Provides comprehensive statistical analysis of performance distributions across perturbation levels. Features detailed annotations including mean (μ), standard deviation (σ), sample count (n), and coverage percentages (cov). The gradient colors and statistical overlays offer deep insights into model robustness patterns.

Detailed statistical analysis of performance distribution
{% endif %} #} {# Distribution Grid Section - REMOVIDO {% if charts.distribution_grid_chart %}

Comprehensive Distribution Grid

Model × Perturbation Distribution Matrix

Advanced matrix visualization showing performance distributions across multiple models and perturbation levels. Each cell combines violin plots (density distributions) with boxplots (quartile statistics) to provide comprehensive insights into model behavior under different stress conditions. Statistical annotations include mean (μ), standard deviation (σ), and sample counts (n), while baseline references enable direct performance comparisons across models and perturbation levels.

Comprehensive distribution grid showing model performance across perturbation levels
{% endif %} #}

Performance Distribution

Score Distribution

Shows the complete distribution of performance scores using violin plots (density), boxplots (quartiles), and individual points. The red diamond indicates the base score.

{% if charts.boxplot_chart %}
Distribution visualization of model performance scores
{% else %}

No distribution data available for visualization. Run tests with multiple iterations to generate this chart.

{% endif %}
{% if report_data and report_data.raw and report_data.raw.by_level %}

Performance by Perturbation Level

{% for level, level_data in report_data.raw.by_level|dictsort %} {% if level_data.overall_result and level_data.overall_result.all_features %} {% endif %} {% endfor %}
Perturbation Level Average {{ metric|capitalize }} Worst {{ metric|capitalize }} Impact
{{ level }} {{ "%.4f"|format(level_data.overall_result.all_features.mean_score) }} {{ "%.4f"|format(level_data.overall_result.all_features.worst_score) }} {{ "%.4f"|format(base_score - level_data.overall_result.all_features.mean_score) }}
{% endif %}
{% if report_data and report_data.alternative_models %}

Alternative Models

Model Comparison

{% for model_name, model_data in report_data.alternative_models|dictsort %} {% endfor %}
Model Base {{ metric|capitalize }} Robustness Score Average Impact Model Type
{{ model_name }} {{ "%.4f"|format(base_score) }} {{ "%.4f"|format(robustness_score) }} {{ "%.4f"|format(raw_impact) }} {{ model_type }}
{{ model_name }} {{ "%.4f"|format(model_data.base_score) }} {{ "%.4f"|format(model_data.get('robustness_score', 1.0 - model_data.get('raw_impact', 0))) }} {{ "%.4f"|format(model_data.get('raw_impact', 0)) }} {{ model_data.get('model_type', 'Unknown') }}
{% endif %}