Google Scholar

User profiles for J. Steinhardt

Jacob Steinhardt

Stanford University

Verified email at cs.stanford.edu

Cited by 15724

[PDF] arxiv.org

The malicious use of artificial intelligence: Forecasting, prevention, and mitigation

…, H Anderson, H Roff, GC Allen, J Steinhardt… - arXiv preprint arXiv …, 2018 - arxiv.org

This report surveys the landscape of potential security threats from malicious uses of AI, and
proposes ways to better forecast, prevent, and mitigate these threats. After analyzing the …

Save Cite Cited by 942 Related articles All 4 versions Library Search View as HTML

[HTML] nature.com

[HTML][HTML] Potently neutralizing and protective human antibodies against SARS-CoV-2

…, A Chandrashekar, NB Mercado, JJ Steinhardt… - Nature, 2020 - nature.com

The ongoing pandemic of coronavirus disease 2019 (COVID-19), which is caused by severe
acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is a major threat to global health …

Save Cite Cited by 1046 Related articles All 19 versions

[PDF] arxiv.org

Concrete problems in AI safety

…, C Olah, J Steinhardt, P Christiano, J Schulman… - arXiv preprint arXiv …, 2016 - arxiv.org

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing
attention to the potential impacts of AI technologies on society. In this paper we discuss one …

Save Cite Cited by 2535 Related articles All 9 versions View as HTML

[PDF] thecvf.com

The many faces of robustness: A critical analysis of out-of-distribution generalization

…, D Song, J Steinhardt, J Gilmer - Proceedings of the …, 2021 - openaccess.thecvf.com

We introduce four new real-world distribution shift datasets consisting of changes in image
style, image blurriness, geographic location, camera operation, and more. With our new …

Save Cite Cited by 1239 Related articles All 8 versions View as HTML

[PDF] arxiv.org

Measuring massive multitask language understanding

…, A Zou, M Mazeika, D Song, J Steinhardt - arXiv preprint arXiv …, 2020 - arxiv.org

We propose a new test to measure a text model's multitask accuracy. The test covers 57 tasks
including elementary mathematics, US history, computer science, law, and more. To attain …

Save Cite Cited by 1099 Related articles All 4 versions View as HTML

[PDF] neurips.cc

Jailbroken: How does llm safety training fail?

…, N Haghtalab, J Steinhardt - Advances in Neural …, 2024 - proceedings.neurips.cc

Large language models trained for safety and harmlessness remain susceptible to
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases of …

Save Cite Cited by 233 Related articles All 5 versions View as HTML

[PDF] thecvf.com

Natural adversarial examples

…, K Zhao, S Basart, J Steinhardt… - Proceedings of the …, 2021 - openaccess.thecvf.com

We introduce two challenging datasets that reliably cause machine learning model performance
to substantially degrade. The datasets are collected with a simple adversarial filtration …

Save Cite Cited by 1165 Related articles All 9 versions View as HTML

[PDF] arxiv.org

Certified defenses against adversarial examples

A Raghunathan, J Steinhardt, P Liang - arXiv preprint arXiv:1801.09344, 2018 - arxiv.org

… (i, j). In order to speed up computation, for each update, we randomly pick it and only compute
gradients for pairs (it,j),j = it, requiring only 9 top eigenvector computations in each step. …

Save Cite Cited by 1042 Related articles All 5 versions View as HTML

[PDF] arxiv.org

Measuring mathematical problem solving with the math dataset

…, S Basart, E Tang, D Song, J Steinhardt - arXiv preprint arXiv …, 2021 - arxiv.org

… “This research study is being conducted by the Steinhardt Group at UC Berkeley. For questions
about this study, please contact Dan Hendrycks at hendrycks@berkeley.edu. In this study…

Save Cite Cited by 481 Related articles All 7 versions View as HTML

[PDF] neurips.cc

Certified defenses for data poisoning attacks

J Steinhardt, PWW Koh… - Advances in neural …, 2017 - proceedings.neurips.cc

Abstract Machine learning systems trained on user-provided data are susceptible to data
poisoning attacks, whereby malicious users inject false training data with the aim of corrupting …

Save Cite Cited by 811 Related articles All 10 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

User profiles for J. Steinhardt

Jacob Steinhardt

The malicious use of artificial intelligence: Forecasting, prevention, and mitigation

[HTML][HTML] Potently neutralizing and protective human antibodies against SARS-CoV-2

Concrete problems in AI safety

The many faces of robustness: A critical analysis of out-of-distribution generalization

Measuring massive multitask language understanding

Jailbroken: How does llm safety training fail?

Natural adversarial examples

Certified defenses against adversarial examples

Measuring mathematical problem solving with the math dataset

Certified defenses for data poisoning attacks