Predicting Regulatory Mutations and Their Target Genes by New Computational Integrative Analysis: A Study of Follicular Lymphoma
Authors:
Junbai Wang 1,2*, Mingyi Yang 3,4, Omer Ali 2,5, Jenny Sofie Dragland 5, Magnar Bjørås 3,6 , Lorant Farkas 2,5
1. Department of Clinical Molecular Biology (EpiGen), Akershus University Hospital and University of Oslo, Lørenskog, Norway
2. Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Campus AHUS/Oslo, Norway
3. Department of Microbiology, Oslo University Hospital, Oslo, Norway
4. Department of Medical Biochemistry, Oslo University Hospital, Oslo, Norway
5. Department of Pathology, Akershus University Hospital and University of Oslo, Lørenskog, Norway
6. Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
* To whom correspondence should be addressed. Email: junbai.wang@medisin.uio.no
Abstract
Mutations in DNA regulatory regions are increasingly recognized as an important driver in
cancer and other complex diseases. These mutations can modify gene expression by
impacting DNA-protein binding and epigenetic profile, such as DNA methylation in genome
regulatory elements. Identifying mutation hotspots associated with expression regulation and
disease progression in non-coding DNA remains a challenge. Unlike most existing
approaches of assigning a mutation score to each individual single nucleotide polymorphism
(SNP), a mutation block (MB)-based approach is introduced in this study to assess the impact
of clusters of SNPs on transcription factor-DNA binding affinity, differential expression gene
(DEG), and nearby DNA methylation. Two new Python packages are developed for the
identification of functional MBs from multiple omics dataset. The first tool, Differential
Methylation Region analysis tool (DMR-analysis), is used to detect differential methylation
region (DMR) and map it into regulatory elements. The second tool, an integrated DMR,
DEG, and SNP analysis tool (DDS-analysis), is applied to combine omics data for identifying
functional MBs and the long-distance target genes, by considering correlations between the
DNA methylation at regulatory regions and the activity of target-gene expression. Both tools
are validated in Follicular Lymphoma (FL) Cohorts, where not only known functional MBs
and their target genes (BCL2 and BCL6) are recovered but also novel ones (CDCA4 and
JAG2) are found that associate with the development of FL, link to the activity of target-gene
expression, and significantly correlate with the methylation of nearby DNA sequences in FL.
These new tools will help tremulously for the identification of regulatory mutations in cancer
or other diseases, by integrative analysis of multi-omics data.
dmr_analysis is a software tool for differentially Methylated Regions analysis to rank significant DMRs.
For online documentation and Github page with demos please see: Online Documentation and GitHub Page The Python package contains the following pipeline tasks, and to run a task type: dmr_analysis task [args]
dmr_analysis_block: Predict Differentially Methylated Region (DMR) from genome-wide.
dmr_combine_multChrs4rank: Combine predicted DMRs/MRs from multiple chromosomes then rank the DMRs by using logistic regression model.
dmr_selected4plot: Plot figure and export raw/smoothed methylation data for selected DMR or MR.
dmr_map2genome : Map all DMR/MRs to reference genome.
dmr_map2chromSegment: Map all DMR/MRs to chromatin segments generated from 6 human cell lines.
dmr_cal2genome_percent: Calculate the percentage of DMRs intersected with predefined genomic regions such as TSS, TES, 5dist ect.
dmr_cal2chromSegment_percent: Calculate the percentage of DMRs intersected with chromatin segments generated from 6 human cells.
dmr_percent2plot: Plot the percentage of DMRs in predefined genomic or chromatin segment regions.
dmr_combine2geneAnnot : Combine annotations from both predefined genomic regions and chromatin segments (This function is slow and requests both genome and chromatin segment results available).
dmr_exportData: Plot and export data for DMRs/MRs located in specific regions (e.g., DMRs/MRs intersected with mutation block or enhancer regions).
dmr_gene_annotation: Clean the reference file and create genomic region files (TSS, geneBody, TES, 5dist, and intergenic) from the reference. This module is reimplemented from the HMST-seq-analyzer tool
DDS_analysis
dss_analysis tools is an integrated data analysis pipeline by considering both Differential Methylation Region (DMR) and Differentially Expressed Genes (DEG) in SNP analysis. The package is depended on some functiones from the bpb3 For online documentation and Github page with demos please see: Online Documentation and GitHub Page The Python package contains the following pipeline tasks, and to run a task type: dds_analysis task [args]
bpb3summary2bed_format: Convert bpb3 block summary file to a bed format file.
map_block2genome: Map mutation block to genomic regions.
map_block2chromSegment: Map mutation block to chromatin Segment regions.
map_block2dmr: Map mutation block to differential methylated regions.
find_geneExp4block: Find differential expressed genes for mutation blocks.
find_block_patieintID: Find patient ID for mutation blocks.
combine_dmr_deg2block: Combine DMR, DEG, and mutation block information together.
filter_blocks: Filter mutation blocks by using DMR or/and DEG condition.
collect_gene_names4blocks: Collect unique gene names for mutation blocks with DMR and/or DEG.
check_block_gene_inTAD: Check whether block and gene are in the same TAD or TAD boundary.
dds_geneRanking: Select top-ranked genes from final prediction.
go_pathway_analysis4out_blocks_gene: GO pathway analysis of genes.
find_enhancer_target_genes: Find enhancer and its target genes overlapping with mutation blocks that associated with selected gene.
chromSegment_test4blocks: Enrichment test of mutation blocks or methylation regions that associated with genes in 7 chromatin segmentations of the human genome.
dTarget_methy_vs_express:Predict long-distance target gene for a specific region (e.g., mutation block) based on coupling of methylation and gene expression across samples.
plot_mr_vs_exp: Plot DMR/MR methylation level and Gene expression for a pair of DMR and its target gene.
plot_tss_enhancer_mrs: Plot the average methylation level of predicted DMRs at TSS and enhancer regions by the target genes predicted from dTarget_methy_vs_express.
filterDEG4bpb3: Filter Differential Expressed Genes (DEG) by rratio based on the exported file from bpb3 differential_expression then export it with group mean and rratio.
preprocess: This module first finds DEG in TSS, 5dist regions then preprocesses data for dds_analysis.
BayersPI-BAR3
BayersPI-BAR3 (or bpb3) is Bayesian method for protein–DNA interaction with binding affinity Ranking in Python3.
For package download and online documentation please see: BayesPI-BAR3 GitHub Page
The Python package contains the following pipeline tasks, and to run a task type: bpb3 task [args]
differential_expression: Predict differentialy expressed genes (DEG) based on two group of samples.
gene_regions: Extracts regions near transcription start sites of selected genes based on genCode gtf.
mussd: Mutation filtering based on the Space and Sample Distribution - MuSSD.
highly_mutated_blocks: Find blocks with significantly more mutations than would be expected.
bayespi_bar: BayesPI-BAR delta-dbA ranking computation for TF binding affinity affected by DNA mutation.
choose_background_parameters: Selects parameters for mutation background computation.
affinity_change_significance_test: Significant test of TF binding affinity changes between foreground and background affinity changes.
parallel: Run commands from the given file in parallel.
make_cluster4pwm: Make input PWM files for bpb3 based on clustered PWMs.
bpb3selectedPWM: The second level analysis of bpb3 by using the top PWMs in TF ranking after the first level analysis of bpb3 based on the clustered PWMs.
run_pipeline: Run full bpb3 pipeline (e.g., the first level analysis of bpb3 if clustered PWMs are used in the calculation).
clean_tmp: Clean temporary files from output folders.
References:
Wang, J.B. and K. Batmanov, BayesPI-BAR: a new biophysical model for characterization of
regulatory sequence variations. Nucleic Acids Research, 2015. 43(21).
Batmanov, K., J. Delabie, and J. Wang, BayesPI-BAR2: A New Python Package for Predicting
Functional Non-coding Mutations in Cancer Patient Cohorts. Front Genet, 2019. 10:
p. 282.
Farooq, A., et al., HMST-Seq-Analyzer: A new python tool for differential methylation and
hydroxymethylation analysis in various DNA methylation sequencing data.
Computational and structural biotechnology journal, 2020. 18: p. 2877-2889.
Farooq, A., et al., Integrating whole genome sequencing, methylation, gene expression,
topological associated domain information in regulatory mutation prediction: A study
of follicular lymphoma. Comput Struct Biotechnol J, 2022. 20: p. 1726-1742.
Yang, M., et al., Identifying functional regulatory mutation blocks by integrating genome
sequencing and transcriptome data. iScience, 2023. 26(8): p. 107266.