CaDaR Benchmark Analysis Report
==================================================
Dataset: arbml/darija
Total samples: 500
Average WER: 0.7020
Perfect conversions: 17 (3.4%)

Performance by WER Range:
------------------------------
WER < 0: 17 samples (3.4%)
WER < 0.1: 17 samples (3.4%)
WER < 0.2: 18 samples (3.6%)
WER < 0.5: 57 samples (11.4%)

Sample Results:
------------------------------
Good Cases (WER = 0):
Original: wakha.
Arabic: واخا.

Original: tl9na!
Arabic: تلقنا!

Original: s7ra ftalaja.
Arabic: سحرا فتالاجا.

Medium Cases (0 < WER < 0.2):
WER: 0.1429
Original: ana matwatar bzaf flkhadma bzaf fhad lwa9t
Arabic: انا ماتواتار بزاف فلخادما بزاف فهاد الواقة
Roundtrip: ana matwatar bzaf flkhadma bzaf fhad alwa9t

Worst Cases (highest WER):
WER: 1.3333
Original: La ana mr2a...!
Arabic: الا انا مرʾا. . !
Roundtrip: ala ana mr2a. . !

WER: 1.3333
Original: wakha-nta at9od tri9.
Arabic: واخا -نتا اتقود تريق.
Roundtrip: wakha -nta at9wd try9.

WER: 1.2000
Original: wakha,w lmchrobat, ach ndiro fihom?
Arabic: واخا ،و المشروباة ،اش نديرو فيهوم ؟
Roundtrip: wakha, w almshrwbat, ash ndyru fyhum?
