B.P. de Almeida, C. Schaub, M. Pagani, S. Secchia, E.E.M. Furlong, A. Stark. Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo. Nature 2023
F. Reiter*, B.P. de Almeida*, A. Stark. Enhancers display constrained sequence flexibility and context-specific modulation of motif function. Genome Research 2023
B.P. de Almeida, F. Reiter, M. Pagani, A. Stark. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nature Genetics 2022
B.P. de Almeida, A.F. Vieira, J. Paredes, M. Bettencourt-Dias, N.L. Barbosa-Morais. Pan-cancer association of a centrosome amplification gene expression signature with genomic alterations and clinical outcome. PLoS Computational Biology 2019
B.P. de Almeida*, H. Dalla-Torre*, G. Richard, C. Blum, L. Hexemer, M. Gélard, P. Pandey, S. Laurent, A. Laterre, M. Lang, U. Şahin, K. Beguir, T. Pierrot. SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models. bioRxiv 2024
Example notebooks are available on Google Colab
E. Trop*, C.-H. Kao*, M. Polen*, Y. Schiff*, B.P. de Almeida, A. Gokaslan, T. Pierrot, V. Kuleshov. Advancing DNA Language Models: The Genomics Long-Range Benchmark. MLGenX ICLR Workshop 2024
J. J. Garau-Luis*, P. Bordes*, L. Gonzalez*, M. Roller, B.P. de Almeida, L. Hexemer, C. Blum, S. Laurent, J. Grzegorzewski, M. Lang, T. Pierrot§, G. Richard§. Multi-modal Transfer Learning between Biological Foundation Models. arXiv 2024
V. Loubiere, B.P. de Almeida, M. Pagani, A. Stark. Developmental and housekeeping transcriptional programs display distinct modes of enhancer-enhancer cooperativity in Drosophila. bioRxiv 2023
J.M. Xavier, R. Magno, R. Russell, B.P. de Almeida, A. Jacinta-Fernandes, A. Duarte, M. Dunning, S. Samarajiwa, M. O’Reilly, C.L. Rocha, N. Rosli, B.A.J. Ponder, A.T. Maia. Mapping of cis-regulatory variants by differential allelic expression analysis identifies candidate risk variants and target genes of 27 breast cancer risk loci. medRxiv 2022
13. S. Boshar, E. Trop, B.P. de Almeida, L. Copoiu, T. Pierrot. Are genomic language models all you need? Exploring genomic language models on protein downstream tasks. Bioinformatics 2024
12. H. Dalla-Torre, L. Gonzalez, J. Mendoza-Revilla, N.L. Carranza, A.H. Grzywaczewski, F. Oteri, C. Dallago, E. Trop, B.P. de Almeida, H. Sirelkhatim, G. Richard, M. Skwark, K. Beguir, M. Lopez, T. Pierrot. The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics. Nat Methods 2024
Two example notebooks showing how to finetune any of the models with regular finetuning and with LoRA on any of the Nucleotide Transformer tasks are also available in HuggingFace example notebooks.
11. J. Mendoza-Revilla, E. Trop, L. Gonzalez, M. Roller, H. Dalla-Torre, B.P. de Almeida, G. Richard, J. Caton, N.L. Carranza, M. Skwark, A. Laterre,K. Beguir, T. Pierrot, M. Lopez. A Foundational Large Language Model for Edible Plant Genomes. Commun Biol 2024
B.P. de Almeida*, N. Saraiva-Agostinho*, N.L. Barbosa-Morais. cTRAP: Identification of candidate causal perturbations from differential gene expression data. R package
* equal contribution, § shared corresponding author
Reference compendium of non-redundant TF motifs - Clustering of 6,502 TF motif models from multiple species (mostly focused on Drosophila and human) by similarity to remove redundancy
Check my GitHub account for more details.