TF-MoDISco (Transcription Factor Motif Discovery from Importance Scores) is a method for discovering sequence motifs from neural network-derived importance scores. Unlike traditional motif discovery methods that rely solely on sequence enrichment, TF-MoDISco leverages context-aware importance scores to identify patterns.
About Seqlets: Seqlets are short subsequences identified as having high contribution scores. These are the building blocks that TF-MoDISco clusters into motifs. The number of seqlets reported represents high-confidence instances used for motif construction and is a subset of the total number of binding sites. Additional sites likely exist in the data but were filtered during the clustering process.
Position Statistics: Seqlet positions are displayed relative to its input region's midpoint (set at position 0), with negative values indicating upstream positions and positive values downstream. The "median distance from center" statistic is the median absolute seqlet distance from the corresponding regions' midpoints.
Contribution Score Statistics: The "total contribution" of a seqlet is computed as the the total absolute contribution scores across all positions. This provides a measure of the overall strength of the seqlets's influence on model predictions.
Seqlet Selection for Visualization: Representative seqlets are selected from different total contribution quantiles (10th, 20th, 30th, etc. percentiles) to show the range of pattern strength.
Contribution Weight Matrices (CWMs) represent the average contribution scores across aligned seqlets, quantifying each position's contribution to binding predictions.
Position Probability Matrices (PPMs) show sequence composition frequencies. When scaled by information content (IC), they become information-weighted PPMs that emphasize positions with higher sequence consistency.
Tomtom Database Matches: Each discovered motif is compared against the user-provided database of known motifs using TOMTOM. Only the top 3 matches by statistical significance are displayed, but there may be other significant matches not shown. The factor responsible for binding may not necessarily correspond to the top-ranked match (Match 0) - biological context and additional evidence should be considered when interpreting results.
Pattern Naming: When database matches are available, patterns are given descriptive names constructed from the first 10 characters of the top matches, separated by semicolons (e.g., "CTCF_HUMAN;SP1_MOUSE;ZNF143"). The original pattern ID in the H5 (e.g., "pos_patterns.pattern_0") is also shown and remains the canonical identifier.
| Pattern | Seqlets | Avg. Contribution | Med. Center Dist. | CWM Fwd | CWM Rev | {% if meme_motif_db %}Match 0 | Q-value | Logo | Match 1 | Q-value | Logo | Match 2 | Q-value | Logo | {% endif %}
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| {% if descriptive_names and pattern_tag in descriptive_names %} {{ descriptive_names[pattern_tag] }} {% else %} {{ pattern_tag }} {% endif %} {% if descriptive_names and pattern_tag in descriptive_names %} {{ pattern_tag }} {% endif %} | {{ data.n_seqlets }} | {% if data.std_importance and data.std_importance == data.std_importance %} {{ "%.3f"|format(data.avg_importance) }} ± {{ "%.3f"|format(data.std_importance) }} {% else %} {{ "%.3f"|format(data.avg_importance) }} {% endif %} | {% if data.median_abs_distance_from_center and data.median_abs_distance_from_center == data.median_abs_distance_from_center %} {% if data.std_distance_from_center and data.std_distance_from_center == data.std_distance_from_center %} {{ "%.1f"|format(data.median_abs_distance_from_center) }} ± {{ "%.1f"|format(data.std_distance_from_center) }} {% else %} {{ "%.1f"|format(data.median_abs_distance_from_center) }} {% endif %} {% else %} - {% endif %} |
{% if pattern_tag in logo_paths %}
|
{% if pattern_tag in logo_paths %}
|
{% if meme_motif_db %}
{% for i in range(3) %}
{% if pattern_tag in tomtom_data and 'match_' + i|string in tomtom_data[pattern_tag] and tomtom_data[pattern_tag]['match_' + i|string] %}
{{ tomtom_data[pattern_tag]['match_' + i|string] }} |
{{ "%.2e"|format(tomtom_data[pattern_tag]['pval_' + i|string]) }} |
{% if tomtom_logos and pattern_tag in tomtom_logos and 'match_' + i|string + '_base64' in tomtom_logos[pattern_tag] %}
|
{% else %}
{% endif %} {% endfor %} {% endif %} |
Contribution Weight Matrix: shows actual contribution scores
Hypothetical Contribution Weight Matrix: shows counterfactual contributions
Information-weighted Position Probability Matrix (PPM scaled by information content)
CWM trimmed to core region
| Metric | Value | Std Dev |
|---|---|---|
| Number of seqlets | {{ data.n_seqlets }} | - |
| Average contribution score | {{ "%.3f"|format(data.avg_importance) }} | {% if data.std_importance and data.std_importance == data.std_importance %} {{ "%.3f"|format(data.std_importance) }} {% else %} - {% endif %} |
| GC content | {{ "%.3f"|format(data.gc_content) }} | - |
| Median distance from center | {{ "%.1f"|format(data.median_abs_distance_from_center) }} | {% if data.std_distance_from_center and data.std_distance_from_center == data.std_distance_from_center %} {{ "%.1f"|format(data.std_distance_from_center) }} {% else %} - {% endif %} |
Top matches from motif database comparison:
| Rank | Match | Logo | {% if ttl %}P-value{% else %}Q-value{% endif %} |
|---|---|---|---|
| {{ i + 1 }} | {{ tomtom_data[pattern_tag]['match_' + i|string] }} |
{% if tomtom_logos and pattern_tag in tomtom_logos and 'match_' + i|string + '_base64' in tomtom_logos[pattern_tag] %}
|
{{ "%.2e"|format(tomtom_data[pattern_tag]['pval_' + i|string]) }} |
Distribution of total contribution scores across seqlets
Distribution of seqlet positions within input sequences
Representative seqlets from contribution score quantiles:
{% endif %}Note: No patterns were found in the provided TF-MoDISco results file. This could indicate that no significant motifs were discovered during the analysis, or there may be an issue with the input file format.