User profiles for J. Steinhardt
Jacob SteinhardtStanford University Verified email at cs.stanford.edu Cited by 15724 |
The malicious use of artificial intelligence: Forecasting, prevention, and mitigation
This report surveys the landscape of potential security threats from malicious uses of AI, and
proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the …
proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the …
[HTML][HTML] Potently neutralizing and protective human antibodies against SARS-CoV-2
…, A Chandrashekar, NB Mercado, JJ Steinhardt… - Nature, 2020 - nature.com
The ongoing pandemic of coronavirus disease 2019 (COVID-19), which is caused by severe
acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a major threat to global health …
acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a major threat to global health …
Concrete problems in AI safety
Rapid progress in machine learning and artificial intelligence (AI) has brought increasing
attention to the potential impacts of AI technologies on society. In this paper we discuss one …
attention to the potential impacts of AI technologies on society. In this paper we discuss one …
The many faces of robustness: A critical analysis of out-of-distribution generalization
We introduce four new real-world distribution shift datasets consisting of changes in image
style, image blurriness, geographic location, camera operation, and more. With our new …
style, image blurriness, geographic location, camera operation, and more. With our new …
Measuring massive multitask language understanding
We propose a new test to measure a text model's multitask accuracy. The test covers 57 tasks
including elementary mathematics, US history, computer science, law, and more. To attain …
including elementary mathematics, US history, computer science, law, and more. To attain …
Jailbroken: How does llm safety training fail?
…, N Haghtalab, J Steinhardt - Advances in Neural …, 2024 - proceedings.neurips.cc
Large language models trained for safety and harmlessness remain susceptible to
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases of …
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases of …
Natural adversarial examples
…, K Zhao, S Basart, J Steinhardt… - Proceedings of the …, 2021 - openaccess.thecvf.com
We introduce two challenging datasets that reliably cause machine learning model performance
to substantially degrade. The datasets are collected with a simple adversarial filtration …
to substantially degrade. The datasets are collected with a simple adversarial filtration …
Certified defenses against adversarial examples
… (i, j). In order to speed up computation, for each update, we randomly pick it and only compute
gradients for pairs (it,j),j = it, requiring only 9 top eigenvector computations in each step. …
gradients for pairs (it,j),j = it, requiring only 9 top eigenvector computations in each step. …
Measuring mathematical problem solving with the math dataset
… “This research study is being conducted by the Steinhardt Group at UC Berkeley. For questions
about this study, please contact Dan Hendrycks at hendrycks@berkeley.edu. In this study…
about this study, please contact Dan Hendrycks at hendrycks@berkeley.edu. In this study…
Certified defenses for data poisoning attacks
J Steinhardt, PWW Koh… - Advances in neural …, 2017 - proceedings.neurips.cc
Abstract Machine learning systems trained on user-provided data are susceptible to data
poisoning attacks, whereby malicious users inject false training data with the aim of corrupting …
poisoning attacks, whereby malicious users inject false training data with the aim of corrupting …