Publications

Selected

A multimodal conversational agent for DNA, RNA and protein tasks

B.P. de Almeida*, G. Richard*, H. Dalla-Torre, C. Blum, L. Hexemer, P. Pandey, S. Laurent, M. Lopez, A. Laterre, M. Lang, U. Şahin, K. Beguir, T. Pierrot. Nature Machine Intelligence 2025

SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models

B.P. de Almeida*, H. Dalla-Torre*, G. Richard, C. Blum, L. Hexemer, M. Gélard, P. Pandey, S. Laurent, A. Laterre, M. Lang, U. Şahin, K. Beguir, T. Pierrot. Nature Methods 2025

Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo

B.P. de Almeida, C. Schaub, M. Pagani, S. Secchia, E.E.M. Furlong, A. Stark. Nature 2024

Pan-cancer association of a centrosome amplification gene expression signature with genomic alterations and clinical outcome

B.P. de Almeida, A.F. Vieira, J. Paredes, M. Bettencourt-Dias, N.L. Barbosa-Morais. PLoS Computational Biology 2019

Preprints

A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction

S. Boshar, B. Evans, Z. Tang, A. Picard, Y. Adel, F.K. Lorbeer, C. Rajesh, T. Karch, S. Sidbon, D. Emms, J. Mendoza-Revilla, F. Al-Ani, E. Seitz, Y. Schiff, Y. Bornachot, A. Hernandez, M. Lopez, A. Laterre, K. Beguir, P. Koo, V. Kuleshov, A. Stark, B.P. de Almeida§, T. Pierrot§. bioRxiv 2025

Others

Simple Guidance Mechanisms for Discrete Diffusion Models

Y. Schiff, S.S. Sahoo, H. Phung, G. Wang, S. Boshar, H. Dalla-Torre, B.P. de Almeida, A. Rush, T. Pierrot, V. Kuleshov. ICLR 2025

Multi-modal Transfer Learning between Biological Foundation Models

J. J. Garau-Luis*, P. Bordes*, L. Gonzalez*, M. Roller, B.P. de Almeida, L. Hexemer, C. Blum, S. Laurent, J. Grzegorzewski, M. Lang, T. Pierrot§, G. Richard§. NeurIPS 2024

Are genomic language models all you need? Exploring genomic language models on protein downstream tasks

S. Boshar, E. Trop, B.P. de Almeida, L. Copoiu, T. Pierrot. Bioinformatics 2024

The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics

H. Dalla-Torre, L. Gonzalez, J. Mendoza-Revilla, N.L. Carranza, A.H. Grzywaczewski, F. Oteri, C. Dallago, E. Trop, B.P. de Almeida, H. Sirelkhatim, G. Richard, M. Skwark, K. Beguir, M. Lopez, T. Pierrot. Nature Methods 2024
Two example notebooks showing how to finetune any of the models with regular finetuning and with LoRA on any of the Nucleotide Transformer tasks are also available in HuggingFace example notebooks.

A Foundational Large Language Model for Edible Plant Genomes

J. Mendoza-Revilla, E. Trop, L. Gonzalez, M. Roller, H. Dalla-Torre, B.P. de Almeida, G. Richard, J. Caton, N.L. Carranza, M. Skwark, A. Laterre, K. Beguir, T. Pierrot, M. Lopez. Communications Biology 2024

Identification of candidate causal variants and target genes at 41 breast cancer risk loci through differential allelic expression analysis

J.M. Xavier, R. Magno, R. Russell, B.P. de Almeida, A. Jacinta-Fernandes, A. Besouro-Duarte, M. Dunning, S. Samarajiwa, M. O'Reilly, A.M. Maia, C.L. Rocha, N. Rosli, B.A.J. Ponder, A.T. Maia. Scientific Reports 2024

Softwares

cTRAP: Identification of candidate causal perturbations from differential gene expression data

B.P. de Almeida*, N. Saraiva-Agostinho*, N.L. Barbosa-Morais. R package

Other resources

Reference compendium of non-redundant TF motifs

Clustering of 6,502 TF motif models from multiple species (mostly focused on Drosophila and human) by similarity to remove redundancy

CODE