IsoFMiR: An unsupervised anomaly detection framework for biomarker discovery in rare cancers

Metrics

1 Download

Share Dataset

Share this dataset on your favorite social media networks.

IsoFMiR: An unsupervised anomaly detection framework for biomarker discovery in rare cancers

Version 1.0

Dey, Mritunjoy, 2025, "IsoFMiR: An unsupervised anomaly detection framework for biomarker discovery in rare cancers", https://doi.org/10.18150/UAKCCS, RepOD, V1

Learn about Data Citation Standards.

Description

Identifying microRNA (miRNA) biomarkers in rare cancers remains a major challenge due to limited patient cohorts and high tumor heterogeneity. To overcome these limitations, we developed IsoFMiR — an unsupervised anomaly detection framework designed to discover cancer-specific miRNA signatures without relying on large, labeled datasets.

Built upon the Isolation Forest algorithm, IsoFMiR leverages the concept that samples from rare cancers such as sarcomas exhibit distinct molecular patterns when compared to more common malignancies. By training the model on miRNA expression profiles from abundant cancer datasets and subsequently applying it to sarcoma samples, IsoFMiR isolates anomalous expression signatures that may represent potential biomarkers.

This approach addresses key limitations of conventional supervised learning methods, which often fail when applied to data-scarce conditions. Through this unsupervised design, IsoFMiR facilitates robust biomarker discovery, even in cases where annotated samples are minimal. The framework was implemented using Python 3.11.9 and R 4.4.2, with the Scikit-learn library employed for building the Isolation Forest model. Sequencing data were sourced from publicly available repositories such as The Cancer Genome Atlas (TCGA) and XenaBrowser, ensuring transparency and reproducibility.

Overall, IsoFMiR represents a scalable and data-efficient framework that enables the identification of clinically relevant miRNA biomarkers in rare cancers. By combining unsupervised learning with accessible genomic resources, it paves the way for improved understanding of sarcoma biology and the development of novel diagnostic and prognostic strategies.

GDC TCGA: https://portal.gdc.cancer.gov/

Xena Browser: https://xenabrowser.net/

This set of scripts allows for running the IsoFMir algorithm. Each script is described below and should be used in the exact sequencial order.

Creating_Training_Data_Set → Reads multiple CSV files containing microRNA data, extracts patient information, and generates a training dataset with up to n patients.

Patients_Not_Used_For_Training → Creates a dataset of patients excluded from training. Provide the paths to the training data and patient information files generated by the Creating_Training_Data_Set script.

Training → Trains an Isolation Forest model using microRNA data from non-sarcoma cancer samples.

Testing → Evaluates the model’s ability to distinguish sarcoma samples from other cancer types.

Consensus_and_SHAP_Analysis → Performs consensus and SHAP analyses. The script randomly selects 50 test samples by default; this number can be adjusted as needed.

Overall_Survival_Code → Calculates overall survival for the selected microRNAs in sarcoma patients.

(2025)

Subject

Medicine, Health and Life Sciences

Keyword

miRNA, bioinformatics, cancer, sarcoma

License

CC0 Creative Commons Zero 1.0

1 File

			Download
		miRNA_analysis_toolbox.zip ZIP Archive - 11.3 KB - Oct 14, 2025 - 1 Download MD5: 49ddcdb43dcbf170961bda09d65a9ec4 License: CC0 Creative Commons Zero 1.0 Script for building IsoFMir framework	Download

Select File(s)

Please select a file or files to be deleted.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select a file or files to be edited.

Select File(s)

Please select a file or files to be edited.

Select File(s)

Please select a file or files to be edited.

Edit license

License

For selected file(s) set a license to

Select File(s)

Please select a file or files to be downloaded.

Select File(s)

Please select a file or files for access request.

Select File(s)

Please select restricted file(s) to be unrestricted.

Request Access

You need to Log In/Sign Up to request access to this file.

Continue

Dataset Terms

Please confirm and/or complete the information needed below in order to continue.

Asterisks indicate required fields

Request Access

Access to file(s) subject to additional consent under following conditions:

Package File Download

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Embargo

Are you sure you want to lift the embargo?

Once you lift the embargo, you will not be able to set it again.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Dataset Anonymized Private URL

Use a Anonymized Private URL to allow those without Dataverse accounts to access your dataset. For more information about the Private URL feature, please refer to the User Guide.

Private URL has not been created.

"WARNING. This dataset has at least one published version. Those who have access to the Anonymized Private URL for this dataset may be able to use its accessible metadata to look up the full, not anonymized version of this dataset.

Dataset Anonymized Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your dataset.

Dataset Private URL

Use a Private URL to allow those without Dataverse accounts to access your dataset. For more information about the Private URL feature, please refer to the User Guide.

Private URL has not been created.

Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your dataset.

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Additional information for the dataset reviewer:

Submit

Publish Dataset

This dataset cannot be published until RepOD is published. Would you like to publish both right now?

Once you publish this dataset it must remain published.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (1.1)

Major Release (2.0)

Publish Dataset

This dataset cannot be published until RepOD is published by its administrator.

Publish Dataset

This dataset cannot be published until RepOD and are published.

Return for modification

Additional information

Return e-mail address

Send a copy of this message to the return e-mail address.