Michele Catasta pirroh

👋 Hi there, I'm Michele Catasta

👨‍💻 VP of AI at Replit (building the future of software development with AI)

🔬 Former Head of Applied Research @ Google Labs (working on AI applied to Source Code, Large Language Models)

👨‍🏫 Former Research Scientist and Instructor in AI @ Stanford University

🧐 Expertise: Large Language Models, AI for Code, Machine Learning, Information Retrieval, Data Science

When	What	Links
May 2023	PaLM 2 announced at Google I/O -- I worked on code pre-training and evaluations	[paper] - [blog post] - [website]
Apr 2023	`replit-code-v1-3b` announced at the Replit Developer Day and released opensource	[X thread] - [video] - [HuggingFace model] - [GitHub repo]
H2 2022	Invited talks on AI meets Source Code: status quo and outlooks	[video] and events: [EPFL], [Synapse AI Symposium], [Berkeley AI Summit] & more
Apr 2022	PaLM: Scaling Language Modeling with Pathways submitted to arXiv -- I worked on PaLM-Coder	[paper] - [blog post]
Mar 2021	Language-Agnostic Representation Learning of Source Code from Structure and Context (AKA Code Transformer) accepted at ICLR 2021	[paper] - [demo] - [code]

🎓 Education

Postdoc in Machine Learning at Stanford University
- Advised by Prof. Jure Leskovec
- Affiliated with SNAP and Statistical Machine Learning Group
PhD in Computer Science at EPFL

👨‍💻 Experience

Head of Applied Research at Google X & Google Labs
- Worked on Large Language Models and AI for Code (including PaLM and PaLM 2)
Research Scientist at Stanford University and at EPFL
- Contributed to several projects (funded by IARPA, DARPA, Samsung, Google, Amazon, ...) with research on Deep Learning (GNNs, Transformers, Open Graph Benchmark, etc.), Recommender Systems, Crowdsourcing, and Data Science.
Intern at MIT Media Lab (w/ Prof. Alex 'Sandy' Pentland), Yahoo Research (w/ Prof. Ricardo Baeza-Yates), and Google.
Co-founder of Sindice.com, the largest Semantic Web Search Engine (back in the days). The core technologies developed for Sindice evolved into:
- a top-level Apache project, Any23
- several contributions to Hadoop, Lucene and Solr
- Siren, an investigative intelligence platform which secured $15M+ in funding -- kudos to my amazing ex-colleagues 👍

👨‍🏫 Teaching

At Stanford University:
- CS224W: Machine Learning with Graphs -- Co-instructor together with Prof. Jure Leskovec
- CS246: Mining Massive Data Sets -- Co-instructor together with Prof. Jure Leskovec
- CS329S: Machine Learning Systems Design -- Advisor
- CS341: Project in Mining Massive Data Sets -- Instructor
At EPFL:
- Applied Data Analysis -- Created and taught the first edition of the course
  - ADA is now taught by my friend and research collaborator Prof. Robert West (head of the Data Science Lab), who masterfully improved the course in several areas
  - largest course offered by the CS department at EPFL, recently grown to 600+ students -- kudos to Bob 👍