Natasha Jaques 2
TalkRL: The Reinforcement Learning Podcast - En podcast af Robin Ranjit Singh Chauhan
Kategorier:
Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more! Dr Natasha Jaques is a Senior Research Scientist at Google Brain. Featured References Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience Marwa Abdulhai, Natasha Jaques, Sergey Levine Additional References Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019 Learning to summarize from human feedback, Nisan Stiennon et al 2020 Training language models to follow instructions with human feedback, Long Ouyang et al 2022