Subtype Profiles

12 Subtypes (6+6) Bayesian GMM with Dirichlet Process Prior

About This Analysis

The 12 ASD subtypes (6 major + 6 minor) represent distinct phenotype profiles identified through Bayesian Gaussian Mixture Model analysis with publication bias correction. Major clusters (>4% weight) include Pure ID, Full Syndrome, GDD Predominant, and more. Minor clusters capture rarer but clinically distinct presentations like Infantile Epilepsy and Vision+Feeding+Heart.

Subtypes Identified

Gene phenotype clusters

Avg Genes/Subtype

Mean cluster size

Distinct Profiles

Unique phenotype signatures

Key Discriminators

Top differentiating traits

Cluster Phenotype Profiles

Trait frequencies by cluster (select clusters to compare)

What you're seeing: A radar chart showing the relative prevalence of key phenotypes within each cluster. Each axis represents a phenotype; distance from center indicates frequency in that cluster.
What it means: Clusters with different shapes have distinct phenotype signatures. Compare clusters by toggling them on/off to see what distinguishes each subtype.

Cluster Phenotype Signature Matrix

Top discriminating traits (color = frequency in cluster)

What you're seeing: A heatmap showing trait frequencies across clusters. Darker blue cells indicate higher prevalence of that trait in that cluster.
What it means: Traits that vary strongly across rows are key discriminators between clusters. Look for patterns of high/low values to understand each cluster's phenotype profile.

Phenotype Flow by Category

How clusters connect to phenotype categories and specific traits

What you're seeing: A three-level Sankey diagram showing the flow from Clusters → Categories → Traits. Flow width represents the weighted contribution of each connection.
What it means: Follow the flows to see which phenotype categories and specific traits characterize each cluster. Wider flows indicate stronger associations.

Phenotype Hierarchy Sunburst

Explore Cluster → Category → Trait relationships. Click to drill down, center to go back.

What you're seeing: A hierarchical sunburst showing the nested structure: Clusters (inner ring) → Categories (middle) → Traits (outer). Arc size reflects prevalence values.
What it means: This visualization reveals the hierarchical composition of each cluster. Click segments to drill down. Use scroll wheel or buttons to zoom.

100%

Discriminating Traits

Traits with highest variance across clusters

Trait	Category

Latent Class Analysis (LCA)

Soft clustering with probabilistic class membership using Gaussian Mixture Models

What you're seeing: Unlike hard clustering, LCA assigns each gene a probability of belonging to each class. Genes with high uncertainty (entropy) may represent intermediate phenotypes or mixed subtypes.
What it means: This approach reveals genes that don't fit neatly into discrete categories, potentially identifying novel subtype boundaries or transitional phenotypes.

Optimal Classes

By BIC criterion

Agreement Rate

vs hard clusters

Mean Max Probability

Classification confidence

Uncertain Genes

Max prob < 0.5

Model Selection (BIC/AIC)

Lower is better; BIC penalizes complexity more

Class Size Distribution

Number of genes assigned to each latent class

Gene Classification Confidence

Maximum class probability per gene (sorted by confidence)

Hard Clusters vs Soft Classes

Sankey flow showing how genes in hard clusters map to soft class assignments

Gene Class Assignments

Gene	Assigned Class	Max Probability	Entropy	Hard Cluster	Status