I’m an applied scientist at Microsoft and a hobbyist artificial intelligence researcher.

You can find my publications on Semantic Scholar and my CV. My models and datasets are on HuggingFace. I'm best reached at kyobrien@microsoft.com.

Direction

I am fascinated by how little we understand the internal mechanisms of advanced machine-learning models. Opaque optimization has proven effective at achieving impressive capabilities- at the expense of understanding what internal algorithms are learned by models. This opacity could become an increasing concern as capabilities improve and models are deployed in high-stakes settings. 
The north star of my research career is to better understand how concepts such as goals, and truth are represented within frontier models. While interpretability insights are valuable on their own, my focus is to translate them into practical model editing techniques. The ability to intervene on a model's internals is essential for rapidly addressing undesirable behavior without undergoing costly retraining. Model editing is analogous to the critical safety and utility benefits that patching provides in traditional software. I am especially interested in studying concept editing across model scale and distribution shifts so as to predict how such insights will generalize to future systems and environments. 

Publications

Composable Interventions for Language Models

Arinbjörn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag J. Vaidya, Faisal Mahmood, M. Zitnik, Tianlong Chen, Thomas Hartvigsen

Arxiv, 2024 — Paper Link

Improving Black-box Robustness with In-Context Rewriting

Kyle O'Brien, Isha Puri, Nathan Ng, Jorge Mendez, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi, Thomas Hartvigsen

TMLR, 2024 — Paper Link

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, V. JyothirS, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke, Katherine Lee, Naomi Saphra

Arxiv, 2024 — Paper Link

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Stella Rose Biderman, Hailey Schoelkopf, Quentin G. Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal

ICML, 2023 — Paper Link

I am always happy to meet with people to discuss research, collaborations, and career advice. I have benefited tremendously from others sharing their time with me and am excited to pay it forward.