Manuel Faysse
PhD Candidate
Paris, France
Hey! I am Manu, a last year PhD student working on LLM and information retrieval research, but curious about (way too) many other things!
I am nearing the end of my academic post-training phase as a PhD student at CentraleSupélec (with Pierre Colombo) and most recently worked under the distilled supervision of Hervé Jégou at Meta FAIR Paris. My research focuses on practical applications of large language models, with focus on Visual Document Retrieval (ColPali, ViDoRe), LLM pretraining (CroissantLLM, Long Context Modeling at Meta), as well as multimodality, automatic evaluation, model memorization, or confidence estimation and contextualization techniques for neural information retrieval.
My work has been published in top international venues (ICLR, ICML, EMNLP, TMLR, COLM), has been featured in the press (MIT Tech Review, Nature Magazine, Usine Digitale, etc.), gave way to many invited talks (Meta, Amazon, IBM, Naver, LlamaIndex, etc.) and has been listed as a top AI innovation of 2024 (State of AI, Tech Radar). Importantly to me, my work is largely used across the industry, both in early stage startups, established large tech companies or government agencies.
My PhD is funded through the CIFRE French program in collaboration with Illuin Technology, where before joining Meta, I held a Staff Research Scientist position, and spent a share of my time advising and accompanying various R&D efforts in the LLM and Vision LLM space. Don’t hesitate to reach out on X.
news
| Feb 18, 2026 | Jina releases their embeddings v5 models which claim the top spot on MMTEBv2, the default multilingual IR benchmark. The nano model is based on our EuroBert model which the main author state is the best small multilingual encoder backbone amongst all those they experimented with. |
|---|---|
| Jan 23, 2026 | Our work “Should We Still Pretrain Encoders with Masked Language Modeling?” is accepted at ICLR 2026! |
| Jan 2, 2026 | We release the "Vidore V3" dataset and paper. |
| Jul 7, 2025 | Our Eurobert paper “EuroBERT: Scaling Multilingual Encoders for European Languages” is accepted at COLM! |
| Jul 3, 2025 | We release "Should We Still Pretrain Encoders with Masked Language Modeling?" |