Sõda

MEEDIAVALVUR: algab „sõjalise erioperatsiooni“ teine etapp nimega „SÕDA“


TCGAbiolinks is an R package for the analysis of large-scale cancer genomic data hosted at Bioconductor. Utilizing the National Cancer Institute (NCI) Genomic Data Commons (GDC) [1] API, TCGAbiolinks queries, downloads, and processes data—specifically from The Cancer Genome Atlas (TCGA)—for bioinformatic and statistical analysis. The automated process helps researchers to merge clinical and molecular data, conduct reproducible multi-omics analysis, and speed up discoveries in cancer genomics.

Overview

TCGAbiolinks was created because an integrated workflow to work with large-scale TCGA[2] cancer genomics data was increasingly needed. Eventually, its functionality was expanded to aid in other GDC projects. Official package releases are maintained at Bioconductor,while the development source code is on GitHub.

Main Objectives

The major objectives of this package are:

  • Facilitating the GDC open-access data retrieval
  • Preparing the data with the appropriate pre-processing methods
  • Providing standard workflows for analyses (e.g., differential expression and methylation analysis),
  • Enabling reproducibility of past research findings.

In practice, TCGAbiolinks supports a variety of analyses—e.g., differentially expressed genes or differentially methylated regions—and visualization methods like survival, volcano, and starburst plots.

Key Features

Data Integration: TCGAbiolinks supports several GDC projects, which allows for comparisons between tumor types or molecular profiles.

Preprocessing and Normalization: Users can filter or normalize data to account for outliers, missing values, and batch effects, and thereby enable consistent cross-study comparisons.

Clinical Data Support: Clinical metadata (e.g., staging, survival measures) can be retrieved from GDC and merged with molecular datasets, providing a foundation for translational and prognostic investigations.

Differential Analysis: Provides functions in the package prepared RNA-seq or methylation data for analysis with packages such as DESeq2 or edgeR, allowing one to easily identify differentially expressed genes or regions.

Visualization Tools: TCGAbiolinks encompasses routines to generate heatmaps, Kaplan–Meier survival plots, volcano plots, and other standard plots relevant to biological interpretation.

Practical Applications

TCGAbiolinks has found utilized in several applications of cancer genomics research. For example, it was utilized to examine microRNA expression and survival in head and neck squamous cell carcinoma (HNSCC) and characterized molecular subtypes with different prognoses.[3]TCGAbiolinks was used in another instance in conjunction with Gene Expression Omnibus (GEO) and TCGA data for identification of significant functional networks in lung squamous cell carcinoma.[4] It was also extended in another pipeline for MMRF-CoMMpass data on multiple myeloma for the identification of prognostic markers for target therapies.[5]

See Also

References

  1. ^ "TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data". Bioconductor. Retrieved 2025-02-12.
  2. ^ Silva, Tiago C., et al. "TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages." F1000Research 5 (2016).
  3. ^ Identification of Prognosis Associated microRNAs in HNSCC Subtypes Based on TCGA Dataset. Medicina. 2020; 56(10):535. https://doi.org/10.3390/medicina56100535
  4. ^ Transcriptomic and functional network features of lung squamous cell carcinoma through integrative analysis of GEO and TCGA data. Scientific Reports. 2018; 8:15834. https://doi.org/10.1038/s41598-018-34160-w
  5. ^ Identifying prognostic markers for multiple myeloma through integration and analysis of MMRF-CoMMpass data. Journal of Computational Science. 2021; 51:101346. https://doi.org/10.1016/j.jocs.2021.101346

Kommenteeri