A. Rupam Mahmood

research

∙ 06/23/2023

Maintaining Plasticity in Deep Continual Learning

Modern deep-learning systems are specialized to problem settings in whic...

0 Shibhansh Dohare, et al. ∙

research

∙ 06/23/2023

Correcting discount-factor mismatch in on-policy policy gradient methods

The policy gradient theorem gives a convenient form of the policy gradie...

0 Fengdi Che, et al. ∙

research

∙ 05/29/2023

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo

We present a scalable and effective exploration strategy based on Thomps...

3 Haque Ishfaq, et al. ∙

research

∙ 05/09/2023

Reducing the Cost of Cycle-Time Tuning for Real-World Policy Optimization

Continuous-time reinforcement learning tasks commonly use discrete steps...

0 Homayoon Farrahi, et al. ∙

research

∙ 02/07/2023

Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning

Modern representation learning methods may fail to adapt quickly under n...

0 Mohamed Elsayed, et al. ∙

research

∙ 02/03/2023

Learning to Optimize for Reinforcement Learning

In recent years, by leveraging more data, computation, and diverse tasks...

0 Qingfeng Lan, et al. ∙

research

∙ 12/06/2022

Variable-Decision Frequency Option Critic

In classic reinforcement learning algorithms, agents make decisions at d...

13 Amirmohammad Karimi, et al. ∙

research

∙ 10/20/2022

HesScale: Scalable Computation of Hessian Diagonals

Second-order optimization uses curvature information about the objective...

0 Mohamed Elsayed, et al. ∙

research

∙ 10/05/2022

Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers

Real-time learning is crucial for robotic agents adapting to ever-changi...

0 Yan Wang, et al. ∙

research

∙ 05/22/2022

Memory-efficient Reinforcement Learning with Knowledge Consolidation

Artificial neural networks are promising as general function approximato...

0 Qingfeng Lan, et al. ∙

research

∙ 03/23/2022

Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots

An oft-ignored challenge of real-world reinforcement learning is that th...

0 Yufeng Yuan, et al. ∙

research

∙ 02/04/2022

A Temporal-Difference Approach to Policy Gradient Estimation

The policy gradient theorem (Sutton et al., 2000) prescribes the usage o...

6 Samuele Tosatto, et al. ∙

research

∙ 12/22/2021

An Alternate Policy Gradient Estimator for Softmax Policies

Policy gradient (PG) estimators for softmax policies are ineffective wit...

6 Shivam Garg, et al. ∙

research

∙ 08/13/2021

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness

The Backprop algorithm for learning in neural networks utilizes two mech...

0 Shibhansh Dohare, et al. ∙

research

∙ 07/17/2021

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences

Approximate Policy Iteration (API) algorithms alternate between (approxi...

0 Alan Chan, et al. ∙

research

∙ 06/10/2021

Analyzing Neural Jacobian Methods in Applications of Visual Servoing and Kinematic Control

Designing adaptable control laws that can transfer between different rob...

0 Michael Przystupa, et al. ∙

research

∙ 03/09/2021

Model-free Policy Learning with Reward Gradients

Policy gradient methods estimate the gradient of a policy objective sole...

0 Qingfeng Lan, et al. ∙

research

∙ 03/27/2019

Autoregressive Policies for Continuous Control Deep Reinforcement Learning

Reinforcement learning algorithms rely on exploration to discover new be...

0 Dmytro Korenkevych, et al. ∙

research

∙ 09/20/2018

Benchmarking Reinforcement Learning Algorithms on Real-World Robots

Through many recent successes in simulation, model-free reinforcement le...

0 A. Rupam Mahmood, et al. ∙

research

∙ 03/19/2018

Setting up a Reinforcement Learning Task with a Real-World Robot

Reinforcement learning is a promising approach to developing hard-to-eng...

0 A. Rupam Mahmood, et al. ∙

research

∙ 12/13/2015

True Online Temporal-Difference Learning

The temporal-difference methods TD(λ) and Sarsa(λ) form a core part of m...

0 Harm van Seijen, et al. ∙

research

∙ 07/01/2015

An Empirical Evaluation of True Online TD(λ)

The true online TD(λ) algorithm has recently been proposed (van Seijen a...

0 Harm van Seijen, et al. ∙

A. Rupam Mahmood

Featured Co-authors

Sign in with Google

Consider DeepAI Pro