Trait Specificity Analysis
12 Subtypes (6+6)
Trait specificity across Bayesian-derived clusters
Research Use Only: Trait specificity measures statistical association
between phenotypes and gene clusters. High specificity indicates a trait distinguishes
clusters but does not imply the trait is unique to that cluster.
About Mutual Information Analysis
Mutual Information (MI) quantifies how much knowing a phenotype reduces uncertainty about cluster membership. High MI traits are most discriminative for distinguishing gene clusters. Specificity measures the gap between maximum and mean prevalence across clusters.
Total Traits
-
Phenotypes analyzed
Gene Clusters
-
Used for MI calculation
Significant Traits
-
p < 0.05
Max MI Score
-
Most discriminative
What you're seeing: Traits ranked by "Mutual Information" (MI)—a measure of how well each phenotype
distinguishes between gene clusters. Higher MI means knowing whether a gene has that phenotype tells you more about
which cluster it belongs to. "Specificity" shows the gap between the highest and average cluster prevalence.
What it means: High-MI traits are the most useful for distinguishing different gene groups.
If "Seizures" has high MI, it means some gene clusters are much more likely to cause seizures than others—it's a
discriminating feature.
Traits Ranked by Mutual Information
Showing all traits| Rank | Trait | Mutual Information | Specificity | p-value | Best Cluster |
|---|
What you're seeing: Permutation testing results validate that observed MI scores are
statistically significant compared to random chance. The volcano plot shows effect size vs significance,
while the histogram compares observed MI to the null distribution from 1000 permutations.
What it means: Traits in the upper-right of the volcano plot are both highly discriminative
AND statistically significant. FDR correction accounts for multiple testing.
Volcano Plot
Effect size vs -log10(FDR p-value). Upper-right = significant & large effect.
Null Distribution Comparison
Select a trait to see observed MI vs permutation null distribution
Permutation Test Results
Loading...| Trait | Observed MI | Null Mean | Effect Size | p-value | FDR p | Significant |
|---|
What you're seeing: Traits arranged in a circle, connected by lines when they frequently co-occur
in the same genes. Line thickness reflects correlation strength—thicker lines mean traits are more likely to appear
together. Node colors indicate the cluster where each trait is most prevalent.
What it means: Connected traits tend to co-occur as part of syndrome patterns. Clusters of connected
traits may represent biological modules—sets of features caused by related gene functions.
Trait Co-occurrence Network
Traits connected by co-occurrence in gene phenotype profiles
Cluster × Trait Prevalence
Phenotype prevalence within each gene cluster. Darker = higher prevalence.
What you're seeing: Each row is a gene cluster, each column is a phenotype. Color intensity
shows the prevalence—what percentage of genes in that cluster have that phenotype. What it means:
Look for columns with high variation between rows—these phenotypes differ dramatically between clusters.
Look for rows with distinct patterns—these clusters have unique phenotype profiles. Uniform columns indicate
phenotypes that are equally common across all clusters (not discriminative).
Mutual Information vs Specificity
Traits with high MI and specificity are most discriminative. Size = overall prevalence.
What you're seeing: Each dot is a phenotype. Position shows MI (x-axis) and specificity (y-axis).
Dot size indicates how common the phenotype is overall. Colors show which cluster has the highest prevalence
for that trait. What it means: The upper-right corner contains the "best" discriminating traits—
high MI (statistically useful) and high specificity (one cluster really stands out). Traits in the lower-left
are common across all clusters and don't help distinguish gene groups.
What you're seeing: For each gene cluster, this shows the "signature" phenotypes—traits that are
most characteristic of that cluster (highest prevalence in that cluster compared to others). The percentage
shows how many genes in that cluster have each trait. What it means: These signatures help
characterize what makes each cluster distinct. If Cluster 1's signature includes "Macrocephaly" and "Sleep Disturbance,"
genes in that cluster tend to cause those features. This could inform clinical monitoring and management.