Statistical Evidence

Strong Evidence

Perfect assignment confidence with zero entropy

Confidence

1.0000

Entropy

Clusters

12 (6+6)

Genes

241

About This Evidence

This page presents the statistical evidence supporting the 12-cluster (6 major + 6 minor) ASD subtype model. The clusters were identified using Bayesian Gaussian Mixture Model with Dirichlet Process Prior, which automatically determines the optimal number of clusters. Publication bias was corrected using sqrt weighting to prevent over-representation of well-studied genes.

Key Statistical Findings

Assignment Confidence

1.0000

Perfect certainty

Assignment Entropy

0.000

No uncertainty

Major Clusters

Weight > 4%

Minor Clusters

Weight < 4%

Cluster Weight Distribution

Major clusters (>4%) vs Minor clusters (<4%) - sorted by weight

What you're seeing: Bar chart showing the weight (proportion) of each cluster in the Bayesian mixture model. The red dashed line marks the 4% threshold separating major from minor clusters.
What it means: Pure ID dominates at 25%, while six minor clusters each contribute 2-4%. This natural separation suggests two distinct tiers of ASD subtypes.

Method Comparison

Bayesian GMM outperforms traditional methods

Why Bayesian GMM?

Automatic cluster detection
Dirichlet Process Prior determines optimal k
Handles publication bias
Sqrt weighting prevents over-studied gene dominance
Probability estimates
Each gene has assignment probabilities, not just labels
Uncertainty quantification
Entropy measures confidence in assignments
Robust to outliers
Soft clustering handles ambiguous cases gracefully

Validation Metrics

Metric	Value	Interpretation	Status
Mean Assignment Confidence	1.0000	Probability of correct cluster assignment	Excellent
Mean Assignment Entropy	~0 (10^-8)	Uncertainty in assignments (lower is better)	Excellent
Effective Clusters	12	Number of clusters with non-negligible weight	Optimal
Major/Minor Separation	4% threshold	Clear separation between cluster tiers	Clear
Publication Bias Correction	sqrt weighting	Reduces influence of over-studied genes	Applied

12 Cluster Summary

ID	Subtype	Type	Weight	Genes	Confidence