I’m an applied scientist at Microsoft and a hobbyist artificial intelligence researcher.
You can find my publications on Semantic Scholar and my CV. My models and datasets are on HuggingFace. I'm best reached at kyobrien@microsoft.com.
Direction
I am fascinated by how little we understand the internal mechanisms of advanced machine-learning models. Opaque optimization has proven effective at achieving impressive capabilities- at the expense of understanding what internal algorithms are learned by models. This opacity could become an increasing concern as capabilities improve and models are deployed in high-stakes settings.
The north star of my research career is to better understand how concepts such as goals, and truth are represented within frontier models. While interpretability insights are valuable on their own, my focus is to translate them into practical model editing techniques. The ability to intervene on a model's internals is essential for rapidly addressing undesirable behavior without undergoing costly retraining. Model editing is analogous to the critical safety and utility benefits that patching provides in traditional software. I am especially interested in studying concept editing across model scale and distribution shifts so as to predict how such insights will generalize to future systems and environments.
Publications
Composable Interventions for Language Models
Arinbjörn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag J. Vaidya, Faisal Mahmood, M. Zitnik, Tianlong Chen, Thomas Hartvigsen
Arxiv, 2024 — Paper Link
Improving Black-box Robustness with In-Context Rewriting
Kyle O'Brien, Isha Puri, Nathan Ng, Jorge Mendez, Hamid Palangi, Yoon Kim, Marzyeh Ghassemi, Thomas Hartvigsen
TMLR, 2024 — Paper Link
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, V. JyothirS, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke, Katherine Lee, Naomi Saphra
Arxiv, 2024 — Paper Link
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Rose Biderman, Hailey Schoelkopf, Quentin G. Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, Oskar van der Wal
ICML, 2023 — Paper Link