Metadata-Version: 2.4
Name: ConfMatrixCalc
Version: 0.8.0
Summary: Statistical Analysis of Phoneme Discrimination Data
Author-email: Arne Leijon <leijon@kth.se>
License-Expression: MIT
License-File: LICENSE.txt
Keywords: Bayesian,confusion matrix,phoneme discrimination,speech
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.12
Requires-Dist: matplotlib>=3.10
Requires-Dist: numpy>=2.2
Requires-Dist: openpyxl>=3.1
Requires-Dist: pandas>=2.3
Requires-Dist: samppy>=1.3.2
Requires-Dist: scipy>=1.15
Description-Content-Type: text/markdown

Package **ConfMatrixCalc** implements probabilistic Bayesian analysis
of closed-set identification / discrimination tests, 
in which the results may be recorded in the form of confusion matrices.
The package estimates probabilities of correct response
 for each test item,
and probabilities for 
all allowed incorrect response alternatives for each stimulus.

The package was developed for analyzing phoneme identification test results,
but it can be used to analyze results from any type
of closed-set classification, performed by either humans or machines.
The analysis approach was presented and validated in (Leijon et al., 2016).

Phoneme identification tests are used, for example,
to evaluate the detailed ("microscopic") speech-recognition ability of
listeners using two or more different hearing aids
or other sound-transmission instruments or algorithms.
Phoneme identification performance can be tested 
with real words with minimal phonemic contrasts
(e.g., Fairbanks, 1958; Witte et al., 2024),
or using nonsense "words" with a fixed structure, 
e.g., CV (Miller & Nicely, 1955), or VCV, or CVCVC, where
C is a consonant and V is a vowel. 

Early speech research showed that the phoneme identification ability
with nonsense syllables is correlated with 
general sentence understanding (Fletcher and Steinberg, 1929, Fig. 11).

## Confusion Matrices
The results from a closed-set test may be recorded 
(e.g., Miller & Nicely, 1955)
as a two-dimensional table of *confusion counts*.
A table element with index (s, r) shows how many times
the listener responded by the *r*th category, when the *s*th stimulus was presented.

If the test includes several subgroups of items 
with minimal contrast, and responses are allowed 
only among the alternatives in each such subgroup 
(e.g., "Rhyme Test", Fairbanks, 1958),
a complete confusion matrix would be very sparse, 
with nonzero counts only in block matrices along the main diagonal.
Then it is most convenient to record 
the test results in a "long-format" table, 
with a separate row for each stimulus-response pair.

The statistical analysis of closed-set identification
data is non-trivial, because the matrix is usually quite sparse for each listener.
For example, in a consonant-identification test with 16 consonants,
each stimulus type might be presented, say, five times.
Then each matrix row will have 11 - 15 elements with a zero count. 
Furthermore, the response counts for each stimulus category 
are statistically dependent. 

This makes it difficult to estimate underlying response probabilities and to
quantify the statistical reliability of observed test results.
The Bayesian analysis method handles these problems in a coherent manner.

## Analysis Results
1. **Overall performance** is indicated by two measures,
    each with a *credible range* to indicate the uncertainty of the estimate,
    and *credible differences* between Test Conditions:

    1. **Probability of Correct** identification (PC), across all presented test stimuli.

    1. The **Mutual Information** (MI) between stimulus and response (Miller and Nicely, 1955),
        sometimes called "transmitted information".
        This measure indicates the average amount of information about the stimulus category,
        received by the listener by hearing each presented phoneme.

1. **Detailed performance** is shown by *credible confusion pattern*, i.e., a set of
    stimulus-response pairs where listeners' response probabilities are
    jointly credibly different between test conditions.

The Bayesian model is hierarchical.
The package can estimate distributions of results for
* an unseen *random individual* in the population from which 
test participants were recruited,
* the *mean* of the population from which participants were recruited,
* an overview of the *participant group* recruited from each population,
* individual *participants* in each group.

## Phoneme Identification Experiments
The package can analyze data from simple or rather complex experimental designs,
including the following features:

1. Phoneme identification data may be collected in one or more **Test Conditions**.
Each test condition may be a combination of categories from several *Test Factors*.
For example, the main test factor may be *Hearing Aid*,
with categories *A*, *B*, or *Unaided*.
Another test factor may be, e.g.,
*Background*, with categories *Quiet*, or *Noisy*.
The analysis shows *credible differences* 
between categories in user-selected test factor(s).

1. The study may involve one or more distinct **Populations**,
from which separate groups of participants have been recruited.

1. Populations are distinguished by a combination of 
categories from one or more **Group Factors**.
For example, one dimension may be *Age*,
with categories *Young*, *Middle*, or *Old*.
Another dimension may be, e.g., *Hearing Loss*, 
with categories *None*, *Mild*, or *Moderate*.
The analysis shows *credible differences* 
between categories in user-selected group factor(s)

1. The analysis model does not require anything about 
the number of participants or the number of
presentations for each stimulus category.
The analysis estimates the **statistical credibility**
of all results, given the limited amount of collected data.
Of course, the reliability is improved
if there are many participants from each population, 
each tested with a large number of stimulus item.

## Package Documentation
General information is given in the package doc-string that may be accessed by command
`help(ConfMatrixCalc)`. 
The template script `run_cm.py` includes comments 
explaining the required user input.

Input data can be accessed from files in several 
of the formats that package Pandas can handle, 
e.g., .csv, .xlsx.
The simulation script `run_sim.py` generates data files
that illustrate the preferred style 
of tabulated test results.

After running an analysis, the logging output briefly explains
the analysis results presented in figures and tables.

## Usage
1. Install the most recent package version:
 
`python3 -m pip install --upgrade ConfMatrixCalc`

1. For an introduction to the input data format, 
you may want to study and run the included simulation script:
`python3 run_sim.py`.
For an introduction to the possible analysis results,
you can then analyse the simulated data set 
by running the template script:
`python3 run_cm.py`.

1. To analyse a real data set, copy the template script `run_cm.py` to your work directory, rename it,
    and edit the copy as guided by comments in the template, to specify
    - all stimuli and closed-set response categories,
   and your experimental design, 
    - the top input data directory,
    - a directory where all output result files will be stored.
    - your desired set of analysis results.

1. Run your edited script: `python3 run_my_cm.py`.

## Requirements
This package requires Python 3.12 or later, using
Numpy, Scipy, and Matplotlib,
as well as a support package samppy,
and the Openpyxl package for reading data from Excel workbook documents.
The pip installer will check and install these required packages if needed.

Pandas can also read input files and write result tables in some other formats, 
but may then need other support packages that must be installed manually.

## New in current version 0.8
1. This version can analyse either complete confusion matrices 
(like Miller & Nicely, 1955), or results from tests using 
several subsets of stimuli, 
each with a closed response set
(Fairbanks, 1958; Witte et al., 2024).
2. Input and output files are handled by the Pandas package.
3. The analysis can use 
either a point- or sample-estimated model of
response probabilities in each population.

## References
A. Leijon, G. E. Henter, and M. Dahlquist (2016).
Bayesian analysis of phoneme confusion matrices.
*IEEE Trans Audio, Speech, and Language Proc* 24(3):469–482.
doi: 10.1109/TASLP.2015.2512039.
[download](https://ieeexplore.ieee.org/document/7364191)

A. Leijon (2026).
ConfMatrixCalc --- Bayesian Analysis of
Phoneme Identification Data. 
*Documentation: Theory, Validation, and Computational Details.* 
Contact the author for a copy.

G. Fairbanks. Test of phonemic differentiation: The rhyme test. 
*J Acoust Soc Amer* 30(7):596–600, 1958.
doi: 10.1121/1.1909702.

H. Fletcher and J. Steinberg (1929). Articulation testing methods.
*Bell System Technical Journal* 8:806–854.
doi: 10.1002/j.1538-7305.1929.tb01246.x.

G. A. Miller and P. E. Nicely (1955).
An analysis of perceptual confusions among some English consonants.
*J Acoust Soc Amer* 27(2):338–352, 1955.
doi: 10.1121/1.1907526.

E. Witte, J. Ekeroot, and S. Köbler. 
The development of linguistic stimuli for the Swedish situated
phoneme test. *Nordic Journal of Linguistics*, 47(1):73–110, 2024.
doi:10.1017/S0332586521000275.
[download](https://doi.org/10.1017/S0332586521000275)

## Acknowledgment
This Python package is a generalization of a similar MatLab package,
developed by Arne Leijon for *ORCA Europe, Widex A/S, Stockholm, Sweden*.
The MatLab development was financially supported by *Widex A/S, Denmark*.

