ABOUT RDDC
About RDDC

Introduction to Rare Disease Data Center

The Rare Disease Database (RDDC), guided by the Artificial Intelligence Innovation Center of the Research Institute of Tsinghua, Pearl River Delta, and supported by Page 8 of 23 Cyagen, was initiated in February 2021 and launched its 1.0 version in February 2022. Throughout its first year, the RDDC received significant attention and feedback from researchers and medical professionals, and their valuable input led to the development of the upgraded 2.0 version, which was released on July 1, 2023.
The RDDC 2.0 introduces enhanced user interfaces and improved data interaction capabilities, designed to better serve researchers by facilitating efficient data queries and mining. Furthermore, RDDC 2.0 emphasizes the presentation of genetic and hereditary data, with a focus on utilizing large genetic datasets to develop advanced biological AI tools.
Currently, there is no publicly accessible database for rare disease research in China, and international databases often lack clear visualizations for researchers' needs. The RDDC aims to support medical doctors, researchers, rare disease patients, and their families by providing comprehensive and easily understandable information about rare diseases. It offers graphical data representations while maintaining the original form, simplifying information filtering through an upcoming tagging and classification system. Additionally, it integrates domestic resources on rare diseases, offering a solid data foundation for research in China.
The purpose of the RDDC is to create a centralized platform for gene, disease, and animal model information. This platform allows users to efficiently transition from target gene discovery to querying the phenotypes and functions of the target gene, as well as identifying relevant animal models available on the market. This resource supports the development of research plans and facilitates scientific research and drug discovery focusing on disease-causing genes. The main information provided by the database includes:
On the genes page, RDDC has collated gene information from humans, mice, rats, and other planned species. Users can access the following information:
Basic gene information (ID, alias) and function description
Comparison information of the orthologous gene in humans, mice and rats
Locus information of genes
Display of gene-related mutations
Functional domain of corresponding proteins
Gene transcript information
Information on gene-related diseases (in humans)
Information on gene-related phenotypes (in humans)
Gene expression information
Subcellular localization of genes (in humans)
Protein interaction maps
In the Diseases page, the RDDC has gathered information from Malacards, OMIM, Orphanet, ClinVar, and other open-source databases, along with local disease data provided by the Rare Diseases Alliance. Users can access the following information:
Description of basic disease information in different databases
Disease ID and alias
Disease epidemiology information (updating)
Description of the disease`s Human Phenotype Ontology (HPO)
List of disease-related genes and distribution of gene mutations
Progress in the development of disease-related drugs
Information on disease-related clinical trials
In the Mouse-Model page, the RDDC has collected various types of gene-edited mouse models used in numerous studies. Users can access the following information:
Basic information about the mouse model
The methodology used to create the mouse model
Background and phenotype information of hybrid mice involved in gene editing
Phenotype information of hybrid mice involved in gene editing
Publications related to the mouse model
In addition to data cleaning and visual display, the RDDC utilizes structured data for AI model training. In recent years, it has become evident that AI holds extensive applications across various aspects of biomedical research (Figure 1). Currently, the RDDC's focus remains on target discovery, and it has successfully developed AI tools such as “RNA Splicer” 1.0 and “Pathogenicity Predictor” 1.0. In the future, the RDDC is committed to developing AI application models that encompass the entire process from the discovery of rare disease targets to the commercialization of rare disease drugs.
Furthermore, the RDDC aims to develop more AI tools to continuously address complex issues throughout the drug development process. This includes biological mechanism research, identification of potential drug targets, understanding intricate drug reactions, optimization of gene therapy drugs, transitioning drugs from animal models to human use, and assessing drug effects on populations.
Biomedical Data Modalities
Machine Learning Models
Challenges & Opportunities
Figure 1. RDDC aims to apply AI technology throughout the entire process of drug development for rare diseases.
Tools that have been launched include:
RNA Splicer:This tool can predict whether a base mutation causes changes in mRNA splicing sites, and it can analyze and display the prediction results in detail.
Pathogenicity Predictor:Using the XGBoost method in machine learning, this tool can predict the degree of disease effect caused by a base mutation. The prediction results can be divided into four pathogenicity
ASO Designer:This tool can predict the best ASO candidate sequence by calculating the binding energy between ASO and the base sequence of the target region, as well as other base pairing indicators
SNP Viewer tool:Users can view the mutation distribution and mutation status of the input gene, making it easier to query mutation hot spots and sites.
Pathway Enrichment Analysis:This online pathway enrichment tool can visually display the changes in gene expression within a pathway after enrichment.

Rare Diseases - An Overview

The definition of rare diseases varies across different countries and regions but generally refers to diseases affecting fewer than 1 in 2,000 individuals in the general population. Due to the rarity of individual cases, the progress in rare disease research significantly lags behind that of common diseases, and the classification within and between various rare diseases remains largely unclear. According to the U.S. Food and Drug Administration (FDA), there are over 7,000 known rare diseases worldwide, whereas the more detailed Malacards database lists over 14,000. Globally, there are more than 350 million rare disease patients, with nearly 50% being children. However, among these rare diseases, fewer than 10% have approved treatments or therapies.
It is also noteworthy that over 80% of rare diseases are genetic or gene-related, with many being monogenic diseases. This factor explains why many gene therapies focus on rare diseases.
Rare diseases make up over two-thirds of all diseases worldwide
The proportion of rare diseases in Malacards disease classification
Figure 2. Rare Disease Ratio
Given the large population in China, it is estimated that there are approximately 20 million rare disease patients. In 2018, the National Health Commission, Ministry of Science and Technology, Ministry of Industry and Information Technology, National Medical Products Administration, and National Administration of Traditional Chinese Medicine jointly released the "First List of Rare Diseases." This initiative aims to promote the diagnosis and treatment of rare diseases and includes 121 conditions such as Alport syndrome, amyotrophic lateral sclerosis (ALS), and hemophilia. However, due to the lack of suitable animal models and the small target population, research and development investments for rare disease drugs remain relatively low.
In terms of integrating rare disease information, the rare disease registration system led by Peking Union Medical College Hospital and involving several renowned hospitals nationwide, was launched in 2020. This system aims to integrate the research and treatment resources of multiple hospitals, centralize domestic pathological case information, and address the issue of isolated rare disease information. Moreover, significant efforts have been made by the NIH in the USA and NORD in Europe, as well as by numerous rare disease-related non-profit organizations. However, current rare disease information primarily focuses on clinical trials and basic disease information, without adequately structuring important pre-clinical disease model information. On the other hand, databases like the MGI, which centers on mouse models, focus on phenotype organization rather than clinical information integration. The Orphanet database, which mainly contains structured information on rare disease epidemiology and standardized nomenclature, stores data in packages without effective visualization.
Given the current "isolated" research status of rare diseases and the increased interest in rare disease research driven by advancements in gene editing technology, there are higher demands on both data quality and content presentation in rare disease databases. Additionally, as AI plays an increasingly significant role in life sciences, such as small molecule drug discovery, macromolecular structure prediction, and pathological image recognition, predicting rare diseases caused by genetic mutations and developing therapeutic viral vectors with AI will become potential major breakthroughs in the healthcare field.
The establishment of the RDDC seeks to provide a comprehensive platform for patients, medical professionals, and researchers. This platform will offer visualized and interactive information on diseases, genes, animal models, and AI tools. By doing so, it aims to facilitate maximum convenience for all individuals committed to the treatment or research of rare diseases.
Wechat
Comparison
Al agent
Back to top