Statistics package
The module offers methods such as:
callrate calculation for SNP data. The call rate for a given SNP is defined as the proportion of individuals in the study for which the corresponding SNP information is not missing.
calculation of the frequency of the occurrence of alleles.
the Hardy-Weinberg equilibrium (HWE) is a principle stating that the genetic variation in a population will remain constant from one generation to the next in the absence of disturbing factors.
- snplib.statistics.allele_freq(data: DataFrame | str, id_col: str = None, seq_col: str = None) DataFrame | float | None[source]
The allele frequency represents the incidence of a gene variant in a population.
- Parameters:
data – Data array.
id_col – Columns with snp names.
seq_col – Columns with value snp in format ucg - 0, 1, 2, 5.
- Returns:
Return the alleles frequency.
- snplib.statistics.call_rate(data: DataFrame | str, id_col: str = None, snp_col: str = None) DataFrame | float | None[source]
The call rate for a given SNP is defined as the proportion of individuals in the study for which the corresponding SNP information is not missing. In the following example, we filter using a call rate of 95%, meaning we retain SNPs for which there is less than 5% missing data.
Of the say, 54K markers in the chip, 50K have been genotyped for a
- particular animal, the “call rate animal” is 50K/54K=93%
Of the say, 900 animals genotyped for marker CL635944_160.1, how many
have actually been successfully read? Assume that 600 have been read, then the “call rate marker” is 600/900 = 67%
- Parameters:
data – Pre-processed data on which the call rate is calculated.
id_col – The name of the column with the id of the animals or markers.
snp_col – The name of the column with the snp sequence.
- Returns:
Return dataframe with call rates for each animal if a dataframe is transmitted. The number if the snp sequence is passed as a string. None if there were errors.
- snplib.statistics.hwe(obs_hets: int | float, obs_hom1: int | float, obs_hom2: int | float) float[source]
Python interpretation hwe - https://github.com/jeremymcrae/snphwe
- Parameters:
obs_hets – Number of observed heterozygotes (AB, BA)
obs_hom1 – Number of observed homozygotes1 (AA)
obs_hom2 – Number of observed homozygotes2 (BB)
- Returns:
This is where the p-value is returned
- snplib.statistics.hwe_test(seq_snp: Series, freq: float, crit_chi2: float = 3.841) bool[source]
The Hardy-Weinberg equilibrium is a principle stating that the genetic variation in a population will remain constant from one generation to the next in the absence of disturbing factors. https://www.nature.com/scitable/definition/hardy-weinberg-equilibrium-122/
- Parameters:
seq_snp – SNP sequence
freq – Allele frequency
crit_chi2 – The critical value for a test (“either / or”: observed and expected values are either one way or the other), therefore with degrees of freedom = df = 1 is 3.84 at p = 0.05
- Returns:
A decision is returned to exclude or retain the inspected snp