🚀 Pipeline Results Report

Generated: /Users/jiayizx/Desktop/verbalize-sampling

Experiments: 1

Metrics: response_count

Available Probability Definitions: default, implicit, explicit, relative, percentage, confidence, perplexity, nll

📋 Experiment Configurations

NameTaskMethodModelResponsesTemperatureProbability Definition
direct (samples=50)state_namedirectgpt-4.1500.7implicit

📝 Generation Examples

direct (samples=50)

Method: directTask: state_name
Prompt:
Name a US State. Only provide the answer without explanation or punctuation.
Response:
California
Method: directTask: state_name
Prompt:
Name a US State. Only provide the answer without explanation or punctuation.
Response:
Montana
Method: directTask: state_name
Prompt:
Name a US State. Only provide the answer without explanation or punctuation.
Response:
Texas

📊 Evaluation Results

direct (samples=50)

Response Count

MetricValue
Average Kl Divergence2.5134
Average Precision1.0000
Average Unique Recall Rate0.1200
Num Prompts1.0000

📈 Overall Comparison

📈 Comparison Chart Visualizations

Comparison Chart
Comparison Chart