Statistics package

The module offers methods such as:

  • callrate calculation for SNP data. The call rate for a given SNP is defined as the proportion of individuals in the study for which the corresponding SNP information is not missing.

  • calculation of the frequency of the occurrence of alleles.

  • the Hardy-Weinberg equilibrium (HWE) is a principle stating that the genetic variation in a population will remain constant from one generation to the next in the absence of disturbing factors.

snplib.statistics.allele_freq(data: DataFrame | str, id_col: str = None, seq_col: str = None) DataFrame | float | None[source]

The allele frequency represents the incidence of a gene variant in a population.

Parameters:
  • data – Data array.

  • id_col – Columns with snp names.

  • seq_col – Columns with value snp in format ucg - 0, 1, 2, 5.

Returns:

Return the alleles frequency.

snplib.statistics.call_rate(data: DataFrame | str, id_col: str = None, snp_col: str = None) DataFrame | float | None[source]

The call rate for a given SNP is defined as the proportion of individuals in the study for which the corresponding SNP information is not missing. In the following example, we filter using a call rate of 95%, meaning we retain SNPs for which there is less than 5% missing data.

Of the say, 54K markers in the chip, 50K have been genotyped for a

particular animal, the “call rate animal” is 50K/54K=93%

Of the say, 900 animals genotyped for marker CL635944_160.1, how many

have actually been successfully read? Assume that 600 have been read, then the “call rate marker” is 600/900 = 67%

Parameters:
  • data – Pre-processed data on which the call rate is calculated.

  • id_col – The name of the column with the id of the animals or markers.

  • snp_col – The name of the column with the snp sequence.

Returns:

Return dataframe with call rates for each animal if a dataframe is transmitted. The number if the snp sequence is passed as a string. None if there were errors.

snplib.statistics.hwe(obs_hets: int | float, obs_hom1: int | float, obs_hom2: int | float) float[source]

Python interpretation hwe - https://github.com/jeremymcrae/snphwe

Parameters:
  • obs_hets – Number of observed heterozygotes (AB, BA)

  • obs_hom1 – Number of observed homozygotes1 (AA)

  • obs_hom2 – Number of observed homozygotes2 (BB)

Returns:

This is where the p-value is returned

snplib.statistics.hwe_test(seq_snp: Series, freq: float, crit_chi2: float = 3.841) bool[source]

The Hardy-Weinberg equilibrium is a principle stating that the genetic variation in a population will remain constant from one generation to the next in the absence of disturbing factors. https://www.nature.com/scitable/definition/hardy-weinberg-equilibrium-122/

Parameters:
  • seq_snp – SNP sequence

  • freq – Allele frequency

  • crit_chi2 – The critical value for a test (“either / or”: observed and expected values are either one way or the other), therefore with degrees of freedom = df = 1 is 3.84 at p = 0.05

Returns:

A decision is returned to exclude or retain the inspected snp

snplib.statistics.minor_allele_freq(value: float) float[source]

The minor allele frequency is therefore the frequency at which the minor allele occurs within a population.

Parameters:

value – Allele frequency

Returns:

Return the minor alleles frequency