bioLeak - Leakage-Safe Modeling and Auditing for Genomic and Clinical Data
Prevents and detects information leakage in biomedical machine learning. Provides leakage-resistant split policies (subject-grouped, batch-blocked, study leave-out, time-ordered), guarded preprocessing (train-only imputation, normalization, filtering, feature selection), cross-validated fitting with common learners, permutation-gap auditing, batch and fold association tests, and duplicate detection.
Last updated
5.88 score 5 stars 16 scripts 589 downloadssplitGraph - Dataset Dependency Graphs for Leakage-Aware Evaluation
Represent biomedical dataset structure as typed dependency graphs so that sample provenance, repeated-measure structure, study design, batch effects, and temporal relationships are explicit and inspectable. Validates dataset structure, detects sample-level overlap, derives deterministic split constraints, and produces a tool-agnostic split specification for leakage-aware evaluation workflows.
Last updated
4.48 score 1 stars 5 scripts 456 downloads