ISSN : -
E-ISSN : 2146-3131

Complementary ROC-Derived Indices for Screening Improper Expression Profiles in RNA-Seq Differential Expression Analysis
Merve Başol Göksülük1, Ebru Öztürk2, Ünal Erkorkmaz1, Asuman Deveci Özkan3, Hamdi Furkan Kepenek2, Dinçer Göksülük1
1Department of Biostatistics, Sakarya University Faculty of Medicine, Sakarya, Türkiye
2Department of Biostatistics, Hacettepe University Faculty of Medicine, Ankara, Türkiye
3Department of Medical Biology, Sakarya University Faculty of Medicine, Sakarya, Türkiye
DOI : 10.4274/balkanmedj.galenos.2026.2026-1-309
Pages : 330-342

Abstract

Background: Differential expression (DE) analysis of RNA sequencing (RNA-Seq) data are cornerstone of transcriptomic research. Widely used statistical frameworks are primarily optimized to detect monotonic mean shifts between conditions and may therefore overlook genes or microRNAs whose disease association arises at both low and high expression levels. Such non-monotonic patterns, referred to here as improper expression profiles, may reflect biologically relevant heterogeneity but remain difficult to identify using standard tools.

Aims: To evaluated whether receiver operating characteristic (ROC)-based indices, specifically the generalized area under the curve (gAUC) and the length of the ROC curve (LROC), can support exploratory screening and prioritization of improper expression profiles in RNA-Seq data, as a complement to conventional DE methods.

Study Design: Methodological study.

Methods: Using simulated negative binomial count data, we compared DESeq2, classical AUC (cAUC), gAUC, and LROC across varying sample sizes and dispersion levels, focusing on improper expression profiles. Performance was summarized using true positive rate and positive predictive value under ranking-based feature selection, including a one-shot benchmark operating point (available only in simulations) and sensitivity analyses across selection sizes. The methods were also applied to a publicly available CC miRNA dataset using heuristic post-hoc screening rules informed by simulation diagnostics.

Results: cAUC was largely insensitive to improper expression patterns. DESeq2 performed robustly for conventionally differentially expressed features but recovered a smaller fraction of simulated improper profiles under ranking-based selection. Across simulation scenarios, gAUC showed the highest and most stable recovery of improper profiles, whereas LROC provided complementary signal under low-to-moderate dispersion but degraded under extreme overdispersion. In the CC dataset, ROC-derived indices identified candidate improper miRNAs that were not prioritized by DESeq2, and several top candidates had literature support consistent with biological plausibility.

Conclusion: gAUC, supported by LROC as an auxiliary index, provides a practical ROC-based screening extension to standard RNA-Seq workflows. Because these indices are applied using heuristic thresholds without controlled error rates, the resulting candidates should be interpreted as exploratory prioritization and require independent validation.

Viewed : 49
Downloaded : 0