Gene Clusters
✓ Strong Evidence
Perfect assignment confidence (1.000) • Zero entropy • 12 distinct subtypes
About This Analysis
Genes are grouped into 12 phenotype-based clusters (6 major + 6 minor) using Bayesian Gaussian Mixture Model with Dirichlet Process Prior. This method automatically determines the optimal cluster count and handles publication bias through sqrt weighting. The UMAP projection visualizes gene relationships in 2D space where nearby genes share similar phenotype profiles. Click any point to view the gene's detailed report.
Genes Clustered
-
5+ papers per gene
Subtypes Found
-
Bayesian GMM (6+6)
Assignment Confidence
-
Mean probability
Entropy
-
Assignment uncertainty
Gene Cluster Map (UMAP Projection)
Each point is a gene, colored by cluster assignment. Genes with similar phenotype profiles are positioned closer together.
What you're seeing: Each dot represents a gene, positioned by its cluster's phenotype profile.
The 12 clusters are spatially separated: 6 major clusters (larger dots) form the core, while 6 minor clusters occupy satellite positions.
Labels show the primary phenotype characteristic of each cluster.
What it means: Genes that cluster together share similar phenotype patterns. Spatial proximity between clusters indicates related phenotype profiles (e.g., seizure-related clusters are positioned near each other).
What it means: Genes that cluster together share similar phenotype patterns. Spatial proximity between clusters indicates related phenotype profiles (e.g., seizure-related clusters are positioned near each other).
Cluster Phenotype Flow
How genes flow into clusters and map to their top phenotype traits
What you're seeing: A Sankey diagram showing the flow from gene clusters (left) to their most characteristic phenotypes (right).
The width of each flow represents the number of genes with that trait.
What it means: Each cluster has a distinct phenotype signature. Wider flows indicate phenotypes that are more prevalent within that cluster.
What it means: Each cluster has a distinct phenotype signature. Wider flows indicate phenotypes that are more prevalent within that cluster.
Cluster Weight Distribution
Bayesian mixture weights (6 Major >4%, 6 Minor <4%)
What you're seeing: The weight of each cluster in the Bayesian mixture model.
The red dashed line marks the 4% threshold separating major from minor clusters.
What it means: Major clusters (green) represent more prevalent phenotype patterns in the literature, while minor clusters (gray) capture rarer but distinct subtypes.
What it means: Major clusters (green) represent more prevalent phenotype patterns in the literature, while minor clusters (gray) capture rarer but distinct subtypes.
Subtype Gene Counts
Number of genes per subtype (12 clusters)
What you're seeing: The number of genes assigned to each of the 12 ASD subtypes.
What it means: Pure ID has the most genes (60), while the minor clusters contain 7-8 genes each. Assignment confidence is 1.000 (perfect) with zero entropy.
What it means: Pure ID has the most genes (60), while the minor clusters contain 7-8 genes each. Assignment confidence is 1.000 (perfect) with zero entropy.
Genes by Cluster
| Gene | Cluster | Papers | Traits |
|---|