Manuel Faysse

On a latent space odyssey.

prof_pic.jpg

PhD Candidate

Lead Research Scientist

Paris, France

Hey! I am Manu, a 2nd year PhD student working on applied NLP and ML Privacy research, but curious about (way too) many other things!

After pretraining at EPFL with a master’s in Robotics and Data Science, and an awesome research stint with the Computational Privacy Group at Imperial College London, I worked as a Research Scientist at Illuin Technology on various NLP use cases, notably deep multimodal models for Document ML and neural information retrieval.

I am now in my academic finetuning phase as a PhD student at CentraleSupélec (Université Paris Saclay), supervised by the distilled knowledge of Pierre Colombo. My research focuses on industrial applications of large language models, with papers on instruction model automatic evaluation, bilingual Large Language Model pretraining (CroissantLLM), multimodal information retrieval (ColPali), as well as model memorization, or confidence estimation techniques for neural information retrieval.

My work has been published in top international conferences (ICML, EMNLP), has been featured in the press (MIT Tech Review, Nature Magazine, Usine Digitale, Usine Nouvelle, etc.), gave way to invited talks (Meta, IBM, Naver, LlamaIndex, etc.) and is currently used in production for various awesome applications.

My PhD is funded through the CIFRE French program in collaboration with Illuin Technology, where I currently hold a Lead Research Scientist position, and spend a minor share of my time advising and accompanying various R&D efforts in the LLM and Vision LLM space.

Don’t hesitate to contact me to discuss, or to inquire about potential collaborations or invited talks !

news

Aug 19, 2024 Gave an invited talk at Unbabel on ColPali and Retrieval in Vision Space.
Jul 26, 2024 Invited at the LlamaIndex webinar to talk about ColPali and Document Retrieval in Vision Space.
Jul 25, 2024 The MIT Technology Review has published a featured article on our work on Copyright Traps in LLMs.
Jun 21, 2024 We release ColPali - Efficient Document Retrieval with Vision Language Models 👀 !
Jun 14, 2024 Gave an invited talk at IBM Research Paris on the topic of CroissantLLM and Large Language Models.

selected publications

2024

  1. colpali.png
    ColPali: Efficient Document Retrieval with Vision Language Models
    Manuel Faysse, Hugues Sibille, Tony Wu, and 4 more authors
    2024
  2. croissant.png
    CroissantLLM: A Truly Bilingual French-English Language Model
    Manuel Faysse, Patrick Fernandes, Nuno M. Guerreiro, and 13 more authors
    2024

2023

  1. gavel.png
    Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications
    Manuel Faysse, Gautier Viaud, Céline Hudelot, and 1 more author
    In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023