RNA Splicing Model: Instructions for Use
Introduction
This tool utilizes the DanQ and Transformer deep learning models, integrated with the E-value algorithm from bioinformatics, to predict the potential impact of genetic variants on RNA splicing products by analyzing changes in splice sites before and after sequence alteration.
Special Notes:
This tool exclusively evaluates the impact of variants on splice site recognition (e.g., changes in donor/acceptor site strength) and does not perform functional analysis of splicing regulatory elements (SREs) such as enhancers or silencers.
The actual mRNA splicing process is regulated by multiple factors (e.g., binding of regulatory proteins, epigenetic modifications). Therefore, the splicing probabilities output by the model should not be directly equated with actual biological outcomes and require experimental validation.
Parameter Description
1、DanQ Score
Model Principle:
The DanQ model employs a Convolutional Neural Network (CNN) to extract local DNA sequence features (e.g., motifs), combined with a Bidirectional Long Short-Term Memory network (BiLSTM) to capture long-range sequence dependencies. It is used to predict the probability of a variant site functioning as a splice donor or acceptor.
Score Range:
0–1. A score closer to 1 indicates a higher probability that the site is a functional splice site, implying a greater likelihood of splicing occurring.
2、SPTransformer Score
Model Principle:
SPTransformer, based on a self-attention mechanism, models global contextual information to analyze the functional relevance of splice sites within a sequence. It excels at handling complex long-range dependencies (e.g., intron-exon boundary features).
Score Range:
0–1. A score closer to 1 indicates high specificity for the site during the splicing process, implying a greater likelihood of splicing.
3、E-value
The E-value (Expectation value) assesses the conservation of the sequence near the variant site and its association with functional sites by comparing the motifs of predicted splice sites with known, genomically annotated splice site motifs. A lower E-value indicates stronger conservation of the sequence as a splice site and higher reliability of the functional prediction.
Special Notes:
The DanQ model focuses on local patterns and sequence dependencies, while SPTransformer excels at global contextual modeling. Their combined use enhances the recall and accuracy of splice site prediction.
The prediction results page dynamically displays changes in DanQ Score, SPTransformer Score, and E-value before and after the variant. The AI Splicer tool employs a multi-model fusion algorithm to synergistically compute these metrics and output a final prediction of splicing impact.
Prediction Results
The results page dynamically presents the splicing outcomes predicted by the AI Splicer tool. The conclusion regarding the splicing pattern is annotated in the top-left corner of the splice map. Score changes corresponding to splice sites are displayed via hover-over information cards.
1. Icon Legend
2. Result Display
Changes in the base sequence can lead to alternative splicing pathway choices. Analysis of splicing outcomes indicates that if the length of the spliced product is not a multiple of three, it may cause a frameshift mutation. If a premature termination codon (PTC) appears in the spliced product, it may trigger premature translation termination.
The following are several types of splicing outcomes predicted by the AI Splicer tool:
(1) Maintenance of the original splicing pattern
The variant does not alter the core sequence of the splice donor/acceptor sites; splicing occurs as in the wild type.
(2) Exon truncation
The variant activates a new splice site within an exonic region; this new donor/acceptor site leads to the truncation of a portion of the exon.
(3) Exon skipping
Disruption of a donor/acceptor site without compensation by a cryptic splice site results in the entire exon being skipped.
(4) Pseudoexon inclusion
A variant in an intronic region activates a new splicing signal, causing a normally non-coding intronic sequence to be incorrectly recognized as an exon and included in the mRNA, thereby creating a pseudoexon.
(5) Intron retention
Retention due to the generation of a new splice site:
A variant creates a new splice signal near an original splice site, leading to the retention of a portion of the intron.
One end of the original splice site is disrupted, and the other end connects directly to an exon, resulting in the retention of the entire intron sequence.
Retention due to activation of a cryptic splice site:
Disruption of the original site leads to the activation of a cryptic splice site within the intron, causing partial intron retention.
Knowledge Review
1. Genomic Coordinate System
To describe variant characteristics within intronic regions, it is necessary to clearly define the rules for gene sequence coordinates, as illustrated in the diagram:
Star
"IVS stands for intronic variation, and the number following it indicates the intron number.
2. Variant Nomenclature
(1) 'g.' refers to a genomic reference sequence
g.95T>G: Thymine (T) at genomic position 95 is substituted by Guanine (G) .
g.123_124delCT (or g.123_124del): Deletion of Cytosine (C) and Thymine (T) at genomic positions 123 to 124.
(2) 'c.' refers to a coding DNA (cDNA) reference sequence
c.65T>G: Thymine (T) at coding sequence position 65 is substituted by Guanine (G) .
c.68+1G>A: The intronic Guanine (G) , one base after CDS position 68, changes to Adenine (A) .
c.68-2A>T: The intronic Adenine (A), two bases before CDS position 68, changes to Thymine (T).
c.52_53delTG: Deletion of Thymine (T) and Guanine (G) at coding sequence positions 52 to 53.
c.52_53insAGG: Insertion of Adenine-Guanine-Guanine (AGG) between coding sequence positions 52 and 53.
c.52_53delTGinsAGG: Deletion of Thymine-Guanine (TG) at coding sequence positions 52-53 and insertion of Adenine-Guanine-Guanine (AGG).
c.-12C>T: Cytosine at position -12 in the 5' UTR (12 bases upstream of the translation initiation codon) is substituted by Thymine (T).
c.*17G>T: Guanine at position *17 in the 3' UTR (17 bases downstream of the translation termination codon) is substituted by Thymine (T).
(3) 'p.' refers to a protein reference sequence
p.Ala3Phe (or p.A3F): Alanine at protein position 3 is substituted by Phenylalanine.
p.Cys76Ter (or p.C76*): Cysteine at protein position 76 is substituted by a termination codon (nonsense variant).
p.Val12fs: Frameshift variant starting at Valine at protein position 12.
p.Met1dup: Duplication of Methionine at protein position 1 (in-frame duplication).
(4) Additional special variant types
Duplication: c.112_114dupTAC: Duplication of the sequence TAC from coding positions 112 to 114.
Inversion: or c.89_91invTGG: Inversion of the sequence TGG from coding positions 89 to 91.
Small deletion: g.12300_12345del: Deletion from genomic position 12300 to 12345.
Wechat
Comparison
Al agent
Tutorials
Back to top