Gene Clusters

✓ Strong Evidence Perfect assignment confidence (1.000) • Zero entropy • 12 distinct subtypes

About This Analysis

Genes are grouped into 12 phenotype-based clusters (6 major + 6 minor) using Bayesian Gaussian Mixture Model with Dirichlet Process Prior. This method automatically determines the optimal cluster count and handles publication bias through sqrt weighting. The UMAP projection visualizes gene relationships in 2D space where nearby genes share similar phenotype profiles. Click any point to view the gene's detailed report.

Genes Clustered

5+ papers per gene

Subtypes Found

Bayesian GMM (6+6)

Assignment Confidence

Mean probability

Entropy

Assignment uncertainty

Gene Cluster Map (UMAP Projection)

Each point is a gene, colored by cluster assignment. Genes with similar phenotype profiles are positioned closer together.

What you're seeing: Each dot represents a gene, positioned by its cluster's phenotype profile. The 12 clusters are spatially separated: 6 major clusters (larger dots) form the core, while 6 minor clusters occupy satellite positions. Labels show the primary phenotype characteristic of each cluster.
What it means: Genes that cluster together share similar phenotype patterns. Spatial proximity between clusters indicates related phenotype profiles (e.g., seizure-related clusters are positioned near each other).

Cluster Phenotype Flow

How genes flow into clusters and map to their top phenotype traits

What you're seeing: A Sankey diagram showing the flow from gene clusters (left) to their most characteristic phenotypes (right). The width of each flow represents the number of genes with that trait.
What it means: Each cluster has a distinct phenotype signature. Wider flows indicate phenotypes that are more prevalent within that cluster.

Cluster Weight Distribution

Bayesian mixture weights (6 Major >4%, 6 Minor <4%)

What you're seeing: The weight of each cluster in the Bayesian mixture model. The red dashed line marks the 4% threshold separating major from minor clusters.
What it means: Major clusters (green) represent more prevalent phenotype patterns in the literature, while minor clusters (gray) capture rarer but distinct subtypes.

Subtype Gene Counts

Number of genes per subtype (12 clusters)

What you're seeing: The number of genes assigned to each of the 12 ASD subtypes.
What it means: Pure ID has the most genes (60), while the minor clusters contain 7-8 genes each. Assignment confidence is 1.000 (perfect) with zero entropy.

Genes by Cluster

Gene	Cluster	Papers	Traits