Predicting Regulatory Mutations and Their Target Genes by New Computational Integrative Analysis: A Study of Follicular Lymphoma

Authors:


Junbai Wang 1,2*, Mingyi Yang 3,4, Omer Ali 2,5, Jenny Sofie Dragland 5, Magnar Bjørås 3,6 , Lorant Farkas 2,5

1. Department of Clinical Molecular Biology (EpiGen), Akershus University Hospital and University of Oslo, Lørenskog, Norway

2. Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Campus AHUS/Oslo, Norway

3. Department of Microbiology, Oslo University Hospital, Oslo, Norway

4. Department of Medical Biochemistry, Oslo University Hospital, Oslo, Norway

5. Department of Pathology, Akershus University Hospital and University of Oslo, Lørenskog, Norway

6. Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway







* To whom correspondence should be addressed. Email: junbai.wang@medisin.uio.no

Abstract


Mutations in DNA regulatory regions are increasingly recognized as an important driver in cancer and other complex diseases. These mutations can modify gene expression by impacting DNA-protein binding and epigenetic profile, such as DNA methylation in genome regulatory elements. Identifying mutation hotspots associated with expression regulation and disease progression in non-coding DNA remains a challenge. Unlike most existing approaches of assigning a mutation score to each individual single nucleotide polymorphism (SNP), a mutation block (MB)-based approach is introduced in this study to assess the impact of clusters of SNPs on transcription factor-DNA binding affinity, differential expression gene (DEG), and nearby DNA methylation. Two new Python packages are developed for the identification of functional MBs from multiple omics dataset. The first tool, Differential Methylation Region analysis tool (DMR-analysis), is used to detect differential methylation region (DMR) and map it into regulatory elements. The second tool, an integrated DMR, DEG, and SNP analysis tool (DDS-analysis), is applied to combine omics data for identifying functional MBs and the long-distance target genes, by considering correlations between the DNA methylation at regulatory regions and the activity of target-gene expression. Both tools are validated in Follicular Lymphoma (FL) Cohorts, where not only known functional MBs and their target genes (BCL2 and BCL6) are recovered but also novel ones (CDCA4 and JAG2) are found that associate with the development of FL, link to the activity of target-gene expression, and significantly correlate with the methylation of nearby DNA sequences in FL. These new tools will help tremulously for the identification of regulatory mutations in cancer or other diseases, by integrative analysis of multi-omics data.

The work is supported by South-Eastern Norway Regional Health Authority and Sigma2 Norwegian research infrastructure services

DMR_analysis

dmr_analysis is a software tool for differentially Methylated Regions analysis to rank significant DMRs.
For online documentation and Github page with demos please see: Online Documentation and GitHub Page
The Python package contains the following pipeline tasks, and to run a task type: dmr_analysis task [args]

DDS_analysis

dss_analysis tools is an integrated data analysis pipeline by considering both Differential Methylation Region (DMR) and Differentially Expressed Genes (DEG) in SNP analysis. The package is depended on some functiones from the bpb3
For online documentation and Github page with demos please see: Online Documentation and GitHub Page
The Python package contains the following pipeline tasks, and to run a task type: dds_analysis task [args]

BayersPI-BAR3


BayersPI-BAR3 (or bpb3) is Bayesian method for protein–DNA interaction with binding affinity Ranking in Python3.
For package download and online documentation please see: BayesPI-BAR3 GitHub Page
The Python package contains the following pipeline tasks, and to run a task type: bpb3 task [args]

References:


Wang, J.B. and K. Batmanov, BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations. Nucleic Acids Research, 2015. 43(21).

Batmanov, K., J. Delabie, and J. Wang, BayesPI-BAR2: A New Python Package for Predicting Functional Non-coding Mutations in Cancer Patient Cohorts. Front Genet, 2019. 10: p. 282.

Farooq, A., et al., HMST-Seq-Analyzer: A new python tool for differential methylation and hydroxymethylation analysis in various DNA methylation sequencing data. Computational and structural biotechnology journal, 2020. 18: p. 2877-2889.

Farooq, A., et al., Integrating whole genome sequencing, methylation, gene expression, topological associated domain information in regulatory mutation prediction: A study of follicular lymphoma. Comput Struct Biotechnol J, 2022. 20: p. 1726-1742.

Yang, M., et al., Identifying functional regulatory mutation blocks by integrating genome sequencing and transcriptome data. iScience, 2023. 26(8): p. 107266.