Trait Specificity Analysis

12 Subtypes (6+6) Trait specificity across Bayesian-derived clusters

Research Use Only: Trait specificity measures statistical association between phenotypes and gene clusters. High specificity indicates a trait distinguishes clusters but does not imply the trait is unique to that cluster.

About Mutual Information Analysis

Mutual Information (MI) quantifies how much knowing a phenotype reduces uncertainty about cluster membership. High MI traits are most discriminative for distinguishing gene clusters. Specificity measures the gap between maximum and mean prevalence across clusters.

Total Traits

Phenotypes analyzed

Gene Clusters

Used for MI calculation

Significant Traits

p < 0.05

Max MI Score

Most discriminative

What you're seeing: Traits ranked by "Mutual Information" (MI)—a measure of how well each phenotype distinguishes between gene clusters. Higher MI means knowing whether a gene has that phenotype tells you more about which cluster it belongs to. "Specificity" shows the gap between the highest and average cluster prevalence. What it means: High-MI traits are the most useful for distinguishing different gene groups. If "Seizures" has high MI, it means some gene clusters are much more likely to cause seizures than others—it's a discriminating feature.

Traits Ranked by Mutual Information

Showing all traits

Rank	Trait	Mutual Information	Specificity	p-value	Best Cluster

What you're seeing: Permutation testing results validate that observed MI scores are statistically significant compared to random chance. The volcano plot shows effect size vs significance, while the histogram compares observed MI to the null distribution from 1000 permutations. What it means: Traits in the upper-right of the volcano plot are both highly discriminative AND statistically significant. FDR correction accounts for multiple testing.

Volcano Plot

Effect size vs -log10(FDR p-value). Upper-right = significant & large effect.

Null Distribution Comparison

Select a trait to see observed MI vs permutation null distribution

Permutation Test Results

Trait	Observed MI	Null Mean	Effect Size	p-value	FDR p	Significant

What you're seeing: Traits arranged in a circle, connected by lines when they frequently co-occur in the same genes. Line thickness reflects correlation strength—thicker lines mean traits are more likely to appear together. Node colors indicate the cluster where each trait is most prevalent. What it means: Connected traits tend to co-occur as part of syndrome patterns. Clusters of connected traits may represent biological modules—sets of features caused by related gene functions.

Trait Co-occurrence Network

Traits connected by co-occurrence in gene phenotype profiles

Cluster × Trait Prevalence

Phenotype prevalence within each gene cluster. Darker = higher prevalence.

What you're seeing: Each row is a gene cluster, each column is a phenotype. Color intensity shows the prevalence—what percentage of genes in that cluster have that phenotype. What it means: Look for columns with high variation between rows—these phenotypes differ dramatically between clusters. Look for rows with distinct patterns—these clusters have unique phenotype profiles. Uniform columns indicate phenotypes that are equally common across all clusters (not discriminative).

Mutual Information vs Specificity

Traits with high MI and specificity are most discriminative. Size = overall prevalence.

What you're seeing: Each dot is a phenotype. Position shows MI (x-axis) and specificity (y-axis). Dot size indicates how common the phenotype is overall. Colors show which cluster has the highest prevalence for that trait. What it means: The upper-right corner contains the "best" discriminating traits— high MI (statistically useful) and high specificity (one cluster really stands out). Traits in the lower-left are common across all clusters and don't help distinguish gene groups.

What you're seeing: For each gene cluster, this shows the "signature" phenotypes—traits that are most characteristic of that cluster (highest prevalence in that cluster compared to others). The percentage shows how many genes in that cluster have each trait. What it means: These signatures help characterize what makes each cluster distinct. If Cluster 1's signature includes "Macrocephaly" and "Sleep Disturbance," genes in that cluster tend to cause those features. This could inform clinical monitoring and management.