🚀 Pipeline Results Report

Generated: /Users/jiayizx/Desktop/verbalize-sampling

Experiments: 1

Metrics: response_count

Available Probability Definitions: default, implicit, explicit, relative, percentage, confidence, perplexity, nll

📋 Experiment Configurations

NameTaskMethodModelResponsesTemperatureProbability Definition
direct (samples=50)state_namedirectclaude-4-sonnet500.7implicit

📝 Generation Examples

direct (samples=50)

Method: directTask: state_name
Prompt:
Name a US State. Only provide the answer without explanation or punctuation.
Response:
California
Method: directTask: state_name
Prompt:
Name a US State. Only provide the answer without explanation or punctuation.
Response:
California
Method: directTask: state_name
Prompt:
Name a US State. Only provide the answer without explanation or punctuation.
Response:
California

📊 Evaluation Results

direct (samples=50)

Response Count

MetricValue
Average Kl Divergence3.6469
Average Precision1.0000
Average Unique Recall Rate0.0600
Num Prompts1.0000

📈 Overall Comparison

📈 Comparison Chart Visualizations

Comparison Chart
Comparison Chart