Bernardo P. de Almeida
Home About me Publications Code Tutorials CV

Publications

Selected

B.P. de Almeida, C. Schaub, M. Pagani, S. Secchia, E.E.M. Furlong, A. Stark. Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo. Nature 2023

CODE

F. Reiter*, B.P. de Almeida*, A. Stark. Enhancers display constrained sequence flexibility and context-specific modulation of motif function. Genome Research 2023

CODE

B.P. de Almeida, F. Reiter, M. Pagani, A. Stark. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nature Genetics 2022

CODE MODEL DATA GEO

B.P. de Almeida, A.F. Vieira, J. Paredes, M. Bettencourt-Dias, N.L. Barbosa-Morais. Pan-cancer association of a centrosome amplification gene expression signature with genomic alterations and clinical outcome. PLoS Computational Biology 2019

CODE


Preprints

B.P. de Almeida*, H. Dalla-Torre*, G. Richard, C. Blum, L. Hexemer, M. Gélard, P. Pandey, S. Laurent, A. Laterre, M. Lang, U. Şahin, K. Beguir, T. Pierrot. SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models. bioRxiv 2024

CODE MODELS

Example notebooks are available on Google Colab

E. Trop*, C.-H. Kao*, M. Polen*, Y. Schiff*, B.P. de Almeida, A. Gokaslan, T. Pierrot, V. Kuleshov. Advancing DNA Language Models: The Genomics Long-Range Benchmark. MLGenX ICLR Workshop 2024

DATA

J. J. Garau-Luis*, P. Bordes*, L. Gonzalez*, M. Roller, B.P. de Almeida, L. Hexemer, C. Blum, S. Laurent, J. Grzegorzewski, M. Lang, T. Pierrot§, G. Richard§. Multi-modal Transfer Learning between Biological Foundation Models. arXiv 2024

MODEL

V. Loubiere, B.P. de Almeida, M. Pagani, A. Stark. Developmental and housekeeping transcriptional programs display distinct modes of enhancer-enhancer cooperativity in Drosophila. bioRxiv 2023

CODE

J.M. Xavier, R. Magno, R. Russell, B.P. de Almeida, A. Jacinta-Fernandes, A. Duarte, M. Dunning, S. Samarajiwa, M. O’Reilly, C.L. Rocha, N. Rosli, B.A.J. Ponder, A.T. Maia. Mapping of cis-regulatory variants by differential allelic expression analysis identifies candidate risk variants and target genes of 27 breast cancer risk loci. medRxiv 2022

CODE


Others

13. S. Boshar, E. Trop, B.P. de Almeida, L. Copoiu, T. Pierrot. Are genomic language models all you need? Exploring genomic language models on protein downstream tasks. Bioinformatics 2024

CODE MODEL BENCHMARK

12. H. Dalla-Torre, L. Gonzalez, J. Mendoza-Revilla, N.L. Carranza, A.H. Grzywaczewski, F. Oteri, C. Dallago, E. Trop, B.P. de Almeida, H. Sirelkhatim, G. Richard, M. Skwark, K. Beguir, M. Lopez, T. Pierrot. The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics. Nat Methods 2024

CODE MODELS DATA BENCHMARK

Two example notebooks showing how to finetune any of the models with regular finetuning and with LoRA on any of the Nucleotide Transformer tasks are also available in HuggingFace example notebooks.

11. J. Mendoza-Revilla, E. Trop, L. Gonzalez, M. Roller, H. Dalla-Torre, B.P. de Almeida, G. Richard, J. Caton, N.L. Carranza, M. Skwark, A. Laterre,K. Beguir, T. Pierrot, M. Lopez. A Foundational Large Language Model for Edible Plant Genomes. Commun Biol 2024

CODE MODELS DATA BENCHMARK


Softwares

B.P. de Almeida*, N. Saraiva-Agostinho*, N.L. Barbosa-Morais. cTRAP: Identification of candidate causal perturbations from differential gene expression data. R package

CODE


* equal contribution, § shared corresponding author


Other resources

Reference compendium of non-redundant TF motifs - Clustering of 6,502 TF motif models from multiple species (mostly focused on Drosophila and human) by similarity to remove redundancy

CODE



Check my GitHub account for more details.