/sci/ - Science & Math » Thread #13196435

120KiB, 1156x808, Screenshot 2021-05-27 at 13.11.40.jpg

View Same Google iqdb SauceNAO

Anonymous Thu 27 May 12:12:29 2021 No.13196435 View Reply Original Report

Quoted By: >>13196439 >>13196481

Has anyone got any tips on improving my reinforcement learning algorithm?

pic related. I have to wait over 6 hours just to see if my hyper parameter tuning is working. If it fails, then I tweak and wait another 6 hours... very slow.

I'm using PPO btw.

Anonymous

Anonymous Thu 27 May 2021 12:13:45 No.13196439 Report

Quoted By:

>>13196435
>ep_rew_mean
this means "episode reward mean" btw. so pic in the OP is the average reward of each episode over the amount of steps that the algorithm has been running.

Anonymous

Anonymous Thu 27 May 2021 12:28:41 No.13196481 Report

Quoted By: >>13196489

>>13196435
>pic related. I have to wait over 6 hours just to see if my hyper parameter tuning is working. If it fails, then I tweak and wait another 6 hours... very slow.
That's deep learning for you.

Have you tried using a higher learning rate?

Anonymous

View Same Google iqdb SauceNAO Screenshot 2021-05-08 at 09.24.43.jpg, 44KiB, 990x848

Anonymous Thu 27 May 2021 12:32:17 No.13196489 Report

Quoted By: >>13196505

>>13196481
A higher learning rate increases the variance in the blue line but it still averages at around 0. The optimal score is 0.15 so ideally it should get to around 0.12. any other ideas?

desu I'm not sure that if i let it run for 12 hours or longer it might then start to work. A common thing i noticed on traditional ML classification problems is a loss graph like pic related where no progress is made for hours and then suddenly it just breaks out of a local optima and learns.

Anonymous

Anonymous Thu 27 May 2021 12:36:40 No.13196505 Report

Quoted By: >>13196518

>>13196489
I work in supervised learning so I have no experience at all with reinforcement learning, but do you have something comparable to a batch size? Basically a parameter to control how long you accumulate gradients before taking a step?
I've noticed that these types of graphs are more common when you have very noisy gradients, which can obviously also increase the time it takes to converge.

Anonymous

Anonymous Thu 27 May 2021 12:45:59 No.13196518 Report

Quoted By: >>13196532

>>13196505
yeh in RL it's called the rollout buffer. Essentially, your agent acts on the enviroment and at each time step, collects the current state and reward. Then after x timesteps you compute the loss over the rollout buffer, backpropogate the error and then do an update step.

The problem with this is that it means the larger the rolllout buffer (batch size) the longer it takes to do each update, because the agent has to collect more experiences to fill up the buffer before each updating step can be done. I haven't tried tweaking it too much though. Will give it a shot.

Anonymous

Anonymous Thu 27 May 2021 12:56:16 No.13196532 Report

Quoted By:

>>13196518
I see, thanks for explaining! Yeah, I think it might be worth a shot. Bumping so maybe an actual RL expert finds this thread.

In general don't get discouraged by long training times though, happens to me too. Sometimes you just need more GPUs.

Capcode	All Only User Posts Only Moderator Posts Only Admin Posts Only Developer Posts
Show Posts	All Only With Images Only Without Images
Deleted Posts	All Only Deleted Posts Only Non-Deleted Posts
Ghost Posts	All Only Ghost Posts Only Non-Ghost Posts
Post Type	All Only Sticky Threads Only Opening Posts Only Reply Posts
Results	All Grouped By Threads
Order	Latest Posts First Oldest Posts First

Your latest searches