Welcome to this page, where you can discover models developed by the RIKEN AGIS program.
Terms of Use
- What is the scientific foundation model (alpha version)?
The scientific foundation models published on this page are outcomes of RIKEN’s research and development. They are released for trial use and verification, specifically to support scientific advancement and enhance societal application of scientific technologies. - Copyright and Intellectual Property Rights
Copyright and other intellectual property rights for these scientific foundation models, their associated source code, and documentation belong to RIKEN and the respective collaborating researchers. - Terms of Use
To ensure the best experience with these models, please review the Terms of Use featured on each model’s distribution site, including guidance on licensing, commercial use, and prohibited activities. Different models may have varied terms. - Contact Information
We are here to support your interests. If you have questions about intellectual property, licensing, or collaborative opportunities regarding this model, please reach out to us.
Email Address: agis_pr@ml.riken.jp
- MolCrawl
MolCrawl is a multimodal foundation model framework designed to handle chemical and life science data in a unified manner, and is currently under development in its alpha stage. It aims to represent and learn relationships across diverse modalities, including genomic sequences, protein sequences, single-cell transcriptomics, molecular structures, and natural language descriptions of chemical reactions. This unified framework is designed to support the entire pipeline, from data acquisition to modality-specific model construction, in a consistent and integrated way.
Yoshinobu IGARASHI, Kazunobu MATSUBARA, Tatsuya SAGAWA, Ryosuke KOJIMA: Toward Multimodal Foundation Models: Development of Decoders and Encoders for Compounds, Sequences, Expression, and Language. Chem-Bio Informatics Society Annual Meeting 2025(CBI2025), Oct. 27-30 - ReactionT5
ReactionT5 is a foundation model for chemical reactions, pretrained on the large-scale and diverse datasets of the Open Reaction Database (ORD). Unlike conventional models trained on relatively limited and biased sources such as patent data, it achieves strong generalization by leveraging a broad range of reaction data. It demonstrates high accuracy in zero-shot settings across multiple reaction prediction tasks—including product prediction, retrosynthesis, and yield prediction—and is also expected to exhibit strong domain adaptability through few-shot fine-tuning.
Sagawa T, Kojima R. ReactionT5: a pre-trained transformer model for accurate chemical reaction prediction with limited data. J Cheminform. 2025 Aug 19;17(1):126. doi: 10.1186/s13321-025-01075-4. PMID: 40830907; PMCID: PMC12366004. - BB-EIT
BB-EIT is a generalized foundation model that integrates augmented chemical embeddings with physicochemical features to predict protein adsorption behavior on polymer brush surfaces with state-of-the-art accuracy. By leveraging data augmentation techniques rooted in chemical language models, it maintains high robustness even in data-sparse regimes and achieves exceptional generalizability across diverse polymer systems and previously unseen protein structures.
Shiwei SU, Nobuyuki TANAKA, Yoshitaka USHIKU, Koichi TAKAHASHI, "BB-EIT: A Generalized Prediction Model for Protein Adsorption on Polymer Brushes Using Augmented Chemical Embeddings", ACS Applied Materials & Interfaces, 10.1021/acsami.5c25223