Publications
Selected
A multimodal conversational agent for DNA, RNA and protein tasks
B.P. de Almeida*, G. Richard*, H. Dalla-Torre, C. Blum, L. Hexemer, P. Pandey, S. Laurent, M. Lopez, A. Laterre, M. Lang, U. Şahin, K. Beguir, T. Pierrot. Nature Machine Intelligence 2025
SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models
B.P. de Almeida*, H. Dalla-Torre*, G. Richard, C. Blum, L. Hexemer, M. Gélard, P. Pandey, S. Laurent, A. Laterre, M. Lang, U. Şahin, K. Beguir, T. Pierrot. Nature Methods 2025
Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo
B.P. de Almeida, C. Schaub, M. Pagani, S. Secchia, E.E.M. Furlong, A. Stark. Nature 2024
Enhancers display constrained sequence flexibility and context-specific modulation of motif function
F. Reiter*, B.P. de Almeida*, A. Stark. Genome Research 2023
DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers
B.P. de Almeida, F. Reiter, M. Pagani, A. Stark. Nature Genetics 2022
Pan-cancer association of a centrosome amplification gene expression signature with genomic alterations and clinical outcome
B.P. de Almeida, A.F. Vieira, J. Paredes, M. Bettencourt-Dias, N.L. Barbosa-Morais. PLoS Computational Biology 2019
Preprints
A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction
S. Boshar, B. Evans, Z. Tang, A. Picard, Y. Adel, F.K. Lorbeer, C. Rajesh, T. Karch, S. Sidbon, D. Emms, J. Mendoza-Revilla, F. Al-Ani, E. Seitz, Y. Schiff, Y. Bornachot, A. Hernandez, M. Lopez, A. Laterre, K. Beguir, P. Koo, V. Kuleshov, A. Stark, B.P. de Almeida§, T. Pierrot§. bioRxiv 2025
Others
Simple Guidance Mechanisms for Discrete Diffusion Models
Y. Schiff, S.S. Sahoo, H. Phung, G. Wang, S. Boshar, H. Dalla-Torre, B.P. de Almeida, A. Rush, T. Pierrot, V. Kuleshov. ICLR 2025
Multi-modal Transfer Learning between Biological Foundation Models
J. J. Garau-Luis*, P. Bordes*, L. Gonzalez*, M. Roller, B.P. de Almeida, L. Hexemer, C. Blum, S. Laurent, J. Grzegorzewski, M. Lang, T. Pierrot§, G. Richard§. NeurIPS 2024
Are genomic language models all you need? Exploring genomic language models on protein downstream tasks
S. Boshar, E. Trop, B.P. de Almeida, L. Copoiu, T. Pierrot. Bioinformatics 2024
The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics
H. Dalla-Torre, L. Gonzalez, J. Mendoza-Revilla, N.L. Carranza, A.H. Grzywaczewski, F. Oteri, C. Dallago, E. Trop, B.P. de Almeida, H. Sirelkhatim, G. Richard, M. Skwark, K. Beguir, M. Lopez, T. Pierrot. Nature Methods 2024
Two example notebooks showing how to finetune any of the models with regular finetuning and with LoRA on any of the Nucleotide Transformer tasks are also available in HuggingFace example notebooks.
A Foundational Large Language Model for Edible Plant Genomes
J. Mendoza-Revilla, E. Trop, L. Gonzalez, M. Roller, H. Dalla-Torre, B.P. de Almeida, G. Richard, J. Caton, N.L. Carranza, M. Skwark, A. Laterre, K. Beguir, T. Pierrot, M. Lopez. Communications Biology 2024
Developmental and housekeeping transcriptional programs display distinct modes of enhancer-enhancer cooperativity in Drosophila
V. Loubiere, B.P. de Almeida, M. Pagani, A. Stark. Nature Communications 2024
Identification of candidate causal variants and target genes at 41 breast cancer risk loci through differential allelic expression analysis
J.M. Xavier, R. Magno, R. Russell, B.P. de Almeida, A. Jacinta-Fernandes, A. Besouro-Duarte, M. Dunning, S. Samarajiwa, M. O'Reilly, A.M. Maia, C.L. Rocha, N. Rosli, B.A.J. Ponder, A.T. Maia. Scientific Reports 2024
Softwares
cTRAP: Identification of candidate causal perturbations from differential gene expression data
B.P. de Almeida*, N. Saraiva-Agostinho*, N.L. Barbosa-Morais. R package
Other resources
Reference compendium of non-redundant TF motifs
Clustering of 6,502 TF motif models from multiple species (mostly focused on Drosophila and human) by similarity to remove redundancy
CODE