User profiles for J. Steinhardt

Jacob Steinhardt

Stanford University
Verified email at cs.stanford.edu
Cited by 15724

The malicious use of artificial intelligence: Forecasting, prevention, and mitigation

…, H Anderson, H Roff, GC Allen, J Steinhardt… - arXiv preprint arXiv …, 2018 - arxiv.org
This report surveys the landscape of potential security threats from malicious uses of AI, and
proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the …

[HTML][HTML] Potently neutralizing and protective human antibodies against SARS-CoV-2

…, A Chandrashekar, NB Mercado, JJ Steinhardt… - Nature, 2020 - nature.com
The ongoing pandemic of coronavirus disease 2019 (COVID-19), which is caused by severe
acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a major threat to global health …

Concrete problems in AI safety

…, C Olah, J Steinhardt, P Christiano, J Schulman… - arXiv preprint arXiv …, 2016 - arxiv.org
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing
attention to the potential impacts of AI technologies on society. In this paper we discuss one …

The many faces of robustness: A critical analysis of out-of-distribution generalization

…, D Song, J Steinhardt, J Gilmer - Proceedings of the …, 2021 - openaccess.thecvf.com
We introduce four new real-world distribution shift datasets consisting of changes in image
style, image blurriness, geographic location, camera operation, and more. With our new …

Measuring massive multitask language understanding

…, A Zou, M Mazeika, D Song, J Steinhardt - arXiv preprint arXiv …, 2020 - arxiv.org
We propose a new test to measure a text model's multitask accuracy. The test covers 57 tasks
including elementary mathematics, US history, computer science, law, and more. To attain …

Jailbroken: How does llm safety training fail?

…, N Haghtalab, J Steinhardt - Advances in Neural …, 2024 - proceedings.neurips.cc
Large language models trained for safety and harmlessness remain susceptible to
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases of …

Natural adversarial examples

…, K Zhao, S Basart, J Steinhardt… - Proceedings of the …, 2021 - openaccess.thecvf.com
We introduce two challenging datasets that reliably cause machine learning model performance
to substantially degrade. The datasets are collected with a simple adversarial filtration …

Certified defenses against adversarial examples

A Raghunathan, J Steinhardt, P Liang - arXiv preprint arXiv:1801.09344, 2018 - arxiv.org
… (i, j). In order to speed up computation, for each update, we randomly pick it and only compute
gradients for pairs (it,j),j = it, requiring only 9 top eigenvector computations in each step. …

Measuring mathematical problem solving with the math dataset

…, S Basart, E Tang, D Song, J Steinhardt - arXiv preprint arXiv …, 2021 - arxiv.org
… “This research study is being conducted by the Steinhardt Group at UC Berkeley. For questions
about this study, please contact Dan Hendrycks at hendrycks@berkeley.edu. In this study…

Certified defenses for data poisoning attacks

J Steinhardt, PWW Koh… - Advances in neural …, 2017 - proceedings.neurips.cc
Abstract Machine learning systems trained on user-provided data are susceptible to data
poisoning attacks, whereby malicious users inject false training data with the aim of corrupting …