Doctoral Researchers

Amparo Gimenez Rios she/he

Bridging artificial intelligence and biology to make biomedical data interpretation clearer, contextual, and reproducible.

Doctoral researcher in Bioinformatics and Computational Biomedicine

Research interests:

Bioinformatics, Gene Function Interpretation, Large Language Models, Functional Genomics, Systems Biology, Machine Learning, Data Integration, Ai-Driven Annotation, Biomedical Text Mining, Omics Integration

Research fields:

Bioinformatics, Computational Biology, Artificial Intelligence, Genomics, Transcriptomics

Strategic Research Areas

Personal profile:

My current research explores how artificial intelligence, particularly large language models (LLMs), can transform the way we interpret gene and protein functions in biomedical research. Current annotation systems, such as Gene Ontology (GO), provide hierarchical categories but often lack intuitive, contextualized interpretation. Recent advances demonstrate the growing potential of LLMs to bridge this gap. Recent studies, such as Harnessing Large Language Models for Candidate Gene Prioritization and Selection (Toufiq et al., 2023) and GeneGPT (Jin et al., 2024), demonstrate the growing potential of LLMs to reason over biomedical data and generate structured, verifiable hypotheses. While these studies focus on specific tasks (such as selecting important genes, searching databases, or summarizing research papers), I aim to explore how LLMs can understand and organize biological knowledge in a more intuitive way, grouping genes and biological processes into meaningful categories that capture how they work together in real biological systems.

This work builds upon my previous research experience, conducted as part of my undergraduate and Master’s studies, which focused on examining the relationship between neurodivergent conditions and migraine disorder. Working with gene functions and biological processes in that context exposed the limitations of current annotation systems and statistical approaches, which now motivate my exploration of AI-driven, context-aware methods for interpreting biomedical data.

Ultimately, my goal is to contribute to a new paradigm of explainable functional profiling to link gene and protein functions to biological context, improving interpretability, scalability, and accessibility for researchers and the wider research community.

I am really looking forward to seeing where this journey takes me!

Achievements/Highlights

Completed an MSc in Bioinformatics at the University of Glasgow with Distinction, where I explored the association between autism/ADHD Polygenic Risk Scores with migraine disorder status (using UK Biobank data).
Awarded a First-Class Honours BSc in Biomedical Science, where I investigated the genetic overlap and protein-protein interactions network between autism and migraine disorder.
Presented a poster on my Honours project research at the BNA2025 Festival of Neuroscience, with a travel grant awarded by the Migraine Science Collaborative (a non-profit news website and information resource hub for migraine researchers and clinicians).
Participated in the ethics and artificial intelligence in education conference at the ALT Winter Summit 2023 as a student panellist in collaboration with the Department of Learning and Teaching Enhancement from the Edinburgh Napier University.
Developed a strong interdisciplinary background combining neuroscience, genomics, proteomics, and bioinformatics.

As a neurodivergent researcher, I am passionate about creating more inclusive, accessible, and open forms of scientific inquiry. I believe diverse perspectives are key to innovation, especially in computational biology, where human experience shapes how we interpret complex data. My motivation aligns with the FAIR and open-science principles, ensuring that bioinformatics tools and insights are transparent and usable by everyone, not just those with technical expertise.

I love world-building that reimagines the fabric of reality itself! In my creative projects, I reinterpret natural laws through a metaphysical lens—constructing parallel worlds where physics, consciousness, and divinity intertwine. These universes often coexist with ours, governed by alternate principles that mirror but distort real scientific phenomena. This process allows me to blend scientific reasoning with narrative imagination, exploring how energy, time, and matter might behave under entirely new rules, yet still feel internally consistent. Much like speculative cosmology grounded in magic and symbolism.

Research Project:

Using Large Language Models to Enhance Functional Interpretation of Omics Data

Supervisors & collaborators:

Modern biology increasingly relies on high-throughput experiments that measure molecular differences between healthy and diseased cells, producing vast datasets of genes, transcripts, or proteins with differential patterns. Interpreting these long lists of molecular changes in-context remains a key challenge. Functional enrichment and annotation tools are central to this task, yet they remain limited by standard statistical approaches like Gene Set Enrichment Analysis (GSEA) and annotation databases like GO. While these frameworks provide structured terminology and highlights, they often fail to capture context-specific meaning or higher-order relationships among biological processes. With the rapid evolution of LLMs capable of reasoning over complex biomedical text, there is an opportunity to develop a new approach that is capable to infer biological meaning beyond standard statistical methods and database boundaries.

Aims

To identify key gaps where traditional enrichment fails to provide interpretable, contextual insight.
To explore how AI can be used to classify or summarize gene functions into broader, more intuitive categories relevant to disease or system-level biology, and make in-context interpretation.

Methodology

This project will explore the use of LLMs in combination with structured knowledge graphs derived from public biomedical databases to enhance functional interpretation.

Impact

It will inform the development of future tools that make gene and pathway data more comprehensible, empowering researchers to focus on the wider biological pictures and meaning.

References

Hu, M., Alkhairy, S., Lee, I., Pillich, R. T., Fong, D., Smith, K., Bachelder, R., Ideker, T., & Pratt, D. (2025). Evaluation of large language models for discovery of gene set function. Nature Methods, 22(1), 82–91. https://doi.org/10.1038/s41592-024-02525-x

Chen, J. Y., Wang, J. F., Hu, Y., Li, X. H., Qian, Y. R., & Song, C. L. (2025). Evaluating the advancements in protein language models for encoding strategies in protein function prediction: a comprehensive review. In Frontiers in Bioengineering and Biotechnology (Vol. 13). Frontiers Media SA. https://doi.org/10.3389/fbioe.2025.1506508

Toufiq, M., Rinchai, D., Bettacchioli, E., Kabeer, B. S. A., Khan, T., Subba, B., White, O., Yurieva, M., George, J., Jourde-Chiche, N., Chiche, L., Palucka, K., & Chaussabel, D. (2023). Harnessing large language models (LLMs) for candidate gene prioritization and selection. Journal of Translational Medicine, 21(1). https://doi.org/10.1186/s12967-023-04576-8

Feng, R., Zhang, C., & Zhang, Y. (2024). Large language models for biomolecular analysis: From methods to applications. In TrAC – Trends in Analytical Chemistry (Vol. 171). Elsevier B.V. https://doi.org/10.1016/j.trac.2024.117540

Jin, Q., Yang, Y., Chen, Q., & Lu, Z. (2024). GeneGPT: augmenting large language models with domain tools for improved access to biomedical information. Bioinformatics, 40(2). https://doi.org/10.1093/bioinformatics/btae075

Khan, T., Yurieva, M., Kabeer, B. S. A., Toufiq, M., Rinchai, D., Palucka, K., & Chaussabel, D. (2024). Deep functional profiling of gene sets using Large Language Models: A blueprint for tailored, context-aware functional annotation. https://doi.org/10.1101/2024.12.12.628275

Hung, J. H., Yang, T. H., Hu, Z., Weng, Z., & DeLisi, C. (2012). Gene set enrichment analysis: Performance evaluation and usage guidelines. Briefings in Bioinformatics, 13(3), 281–291. https://doi.org/10.1093/bib/bbr049

Rehana, H., Çam, N. B., Basmaci, M., Zheng, J., Jemiyo, C., He, Y., Ozgur, A., & Hur, J. (2024). Evaluating GPT and BERT models for protein–protein interaction identification in biomedical text. Bioinformatics Advances, 4(1). https://doi.org/10.1093/bioadv/vbae133

Sarumi, O. A., & Heider, D. (2024). Large language models and their applications in bioinformatics. In Computational and Structural Biotechnology Journal (Vol. 23, pp. 3498–3505). Elsevier B.V. https://doi.org/10.1016/j.csbj.2024.09.031

Collaboration goals

As my research explores the use of artificial intelligence, particularly large language models (LLMs), to improve how gene and protein functions are interpreted and communicated, I am keen to collaborate with researchers and organizations interested in making biological data analysis more intuitive, explainable, and accessible. My project focuses on bridging classical bioinformatics and reasoning-based AI, aiming to create tools that are not only scientifically rigorous but also tailored to the real needs of researchers: easy to use, interpret, and integrate into existing workflows.

I welcome collaborations with:

Bioinformaticians and computational biologists interested in integrating AI-driven reasoning with statistical genomics and proteomics.

Software developers and data scientists focused on designing interpretable and user-friendly AI tools for biological data.

Biomedical researchers and clinicians seeking AI-assisted frameworks to contextualize functional genomics and proteomics data.

Human-computer interaction specialists and science communicators who can help improve usability and accessibility.

Industry partners and open-science organizations aiming to make complex biological analyses more transparent, reproducible, and researcher-oriented.

https://www.linkedin.com/in/amparo-gimenez-rios-amrsb-631973163/