AI Safety Fundamentals: Alignment

En podcast af BlueDot Impact

Prøv Podimo gratis! i 30 dage

Et univers fyldt med hundredvis af eksklusive podcasts & lydbøger, klik her for at prøve

83 Episoder

Is Power-Seeking AI an Existential Risk?
Udgivet: 13.5.2023
Where I Agree and Disagree with Eliezer
Udgivet: 13.5.2023
Supervising Strong Learners by Amplifying Weak Experts
Udgivet: 13.5.2023
Measuring Progress on Scalable Oversight for Large Language Models
Udgivet: 13.5.2023
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models
Udgivet: 13.5.2023
Summarizing Books With Human Feedback
Udgivet: 13.5.2023
Takeaways From Our Robust Injury Classifier Project [Redwood Research]
Udgivet: 13.5.2023
AI Safety via Debatered Teaming Language Models With Language Models
Udgivet: 13.5.2023
High-Stakes Alignment via Adversarial Training [Redwood Research Report]
Udgivet: 13.5.2023
AI Safety via Debate
Udgivet: 13.5.2023
Robust Feature-Level Adversaries Are Interpretability Tools
Udgivet: 13.5.2023
Introduction to Logical Decision Theory for Computer Scientists
Udgivet: 13.5.2023
Debate Update: Obfuscated Arguments Problem
Udgivet: 13.5.2023
Discovering Latent Knowledge in Language Models Without Supervision
Udgivet: 13.5.2023
Feature Visualization
Udgivet: 13.5.2023
Toy Models of Superposition
Udgivet: 13.5.2023
Understanding Intermediate Layers Using Linear Classifier Probes
Udgivet: 13.5.2023
Acquisition of Chess Knowledge in Alphazero
Udgivet: 13.5.2023
Careers in Alignment
Udgivet: 13.5.2023
Embedded Agents
Udgivet: 13.5.2023

4 / 5

Listen to resources from the AI Safety Fundamentals: Alignment course!https://aisafetyfundamentals.com/alignment