Subtype Profiles

12 Subtypes (6+6) Bayesian GMM with Dirichlet Process Prior

About This Analysis

The 12 ASD subtypes (6 major + 6 minor) represent distinct phenotype profiles identified through Bayesian Gaussian Mixture Model analysis with publication bias correction. Major clusters (>4% weight) include Pure ID, Full Syndrome, GDD Predominant, and more. Minor clusters capture rarer but clinically distinct presentations like Infantile Epilepsy and Vision+Feeding+Heart.

Subtypes Identified
-
Gene phenotype clusters
Avg Genes/Subtype
-
Mean cluster size
Distinct Profiles
-
Unique phenotype signatures
Key Discriminators
-
Top differentiating traits
Cluster Phenotype Profiles
Trait frequencies by cluster (select clusters to compare)
What you're seeing: A radar chart showing the relative prevalence of key phenotypes within each cluster. Each axis represents a phenotype; distance from center indicates frequency in that cluster.
What it means: Clusters with different shapes have distinct phenotype signatures. Compare clusters by toggling them on/off to see what distinguishes each subtype.
Phenotype Flow by Category
How clusters connect to phenotype categories and specific traits
What you're seeing: A three-level Sankey diagram showing the flow from Clusters → Categories → Traits. Flow width represents the weighted contribution of each connection.
What it means: Follow the flows to see which phenotype categories and specific traits characterize each cluster. Wider flows indicate stronger associations.
Phenotype Hierarchy Sunburst
Explore Cluster → Category → Trait relationships. Click to drill down, center to go back.
What you're seeing: A hierarchical sunburst showing the nested structure: Clusters (inner ring) → Categories (middle) → Traits (outer). Arc size reflects prevalence values.
What it means: This visualization reveals the hierarchical composition of each cluster. Click segments to drill down. Use scroll wheel or buttons to zoom.
100%

Discriminating Traits

Traits with highest variance across clusters
Trait Category
Latent Class Analysis (LCA)
Soft clustering with probabilistic class membership using Gaussian Mixture Models
What you're seeing: Unlike hard clustering, LCA assigns each gene a probability of belonging to each class. Genes with high uncertainty (entropy) may represent intermediate phenotypes or mixed subtypes.
What it means: This approach reveals genes that don't fit neatly into discrete categories, potentially identifying novel subtype boundaries or transitional phenotypes.
Optimal Classes
-
By BIC criterion
Agreement Rate
-
vs hard clusters
Mean Max Probability
-
Classification confidence
Uncertain Genes
-
Max prob < 0.5
Model Selection (BIC/AIC)
Lower is better; BIC penalizes complexity more
Class Size Distribution
Number of genes assigned to each latent class
Gene Classification Confidence
Maximum class probability per gene (sorted by confidence)
Hard Clusters vs Soft Classes
Sankey flow showing how genes in hard clusters map to soft class assignments

Gene Class Assignments

-
Gene Assigned Class Max Probability Entropy Hard Cluster Status