Statistical Evidence
Strong Evidence
Perfect assignment confidence with zero entropy
Confidence
1.0000
Entropy
~0
Clusters
12 (6+6)
Genes
241
About This Evidence
This page presents the statistical evidence supporting the 12-cluster (6 major + 6 minor) ASD subtype model. The clusters were identified using Bayesian Gaussian Mixture Model with Dirichlet Process Prior, which automatically determines the optimal number of clusters. Publication bias was corrected using sqrt weighting to prevent over-representation of well-studied genes.
Key Statistical Findings
Assignment Confidence
1.0000
Perfect certainty
Assignment Entropy
0.000
No uncertainty
Major Clusters
6
Weight > 4%
Minor Clusters
6
Weight < 4%
Cluster Weight Distribution
Major clusters (>4%) vs Minor clusters (<4%) - sorted by weight
What you're seeing: Bar chart showing the weight (proportion) of each cluster in the Bayesian mixture model.
The red dashed line marks the 4% threshold separating major from minor clusters.
What it means: Pure ID dominates at 25%, while six minor clusters each contribute 2-4%. This natural separation suggests two distinct tiers of ASD subtypes.
What it means: Pure ID dominates at 25%, while six minor clusters each contribute 2-4%. This natural separation suggests two distinct tiers of ASD subtypes.
Method Comparison
Bayesian GMM outperforms traditional methods
Why Bayesian GMM?
-
Automatic cluster detection
Dirichlet Process Prior determines optimal k -
Handles publication bias
Sqrt weighting prevents over-studied gene dominance -
Probability estimates
Each gene has assignment probabilities, not just labels -
Uncertainty quantification
Entropy measures confidence in assignments -
Robust to outliers
Soft clustering handles ambiguous cases gracefully
Validation Metrics
| Metric | Value | Interpretation | Status |
|---|---|---|---|
| Mean Assignment Confidence | 1.0000 | Probability of correct cluster assignment | Excellent |
| Mean Assignment Entropy | ~0 (10-8) | Uncertainty in assignments (lower is better) | Excellent |
| Effective Clusters | 12 | Number of clusters with non-negligible weight | Optimal |
| Major/Minor Separation | 4% threshold | Clear separation between cluster tiers | Clear |
| Publication Bias Correction | sqrt weighting | Reduces influence of over-studied genes | Applied |
12 Cluster Summary
| ID | Subtype | Type | Weight | Genes | Confidence |
|---|