/sci/ - Science & Math » Thread #12759849

89KiB, 426x374, Screenshot 2020-05-04 at 17.44.43.png

View Same Google iqdb SauceNAO

Anonymous Fri 26 Feb 19:52:12 2021 No.12759849 View Reply Original Report

Quoted By: >>12759862 >>12759944 >>12759954

Why is reinforcement learning so difficult

Anonymous

Anonymous Fri 26 Feb 2021 19:55:38 No.12759862 Report

Quoted By: >>12759922 >>12759924

>>12759849
easy you are just below 200 iq

Anonymous

Anonymous Fri 26 Feb 2021 20:02:50 No.12759884 Report

Quoted By: >>12759922 >>12759924

>why is hardest of top-notch fields in one of the hardest sciences is difficult

Anonymous

Anonymous Fri 26 Feb 2021 20:10:27 No.12759913 Report

Quoted By:

Because the material is bullshit or subjective.

Anonymous

Anonymous Fri 26 Feb 2021 20:14:48 No.12759922 Report

Quoted By: >>12759949 >>12759954 >>12759987

>>12759884
>>12759862
actually no. It's because math fags are never explicit in what they mean.

For example basically every video on PPO says to take the probability of action a w.r.t state s under policy pi when calculating the policy gradient.

But nobody ever fucking tells you that the output of your policy network has to be parameters for a gaussian distribution (mu and sigma) and that you have to sample an action from that distribution and then use that to take the probability of that action - Instead you have to read through the code to find that out. All the math cunts tell you is "bro just take the probability of action a under state s" but nobody fucking tells you how.

Fucking math fags with their shit notation and non-explicitness.

Anonymous

Anonymous Fri 26 Feb 2021 20:17:13 No.12759924 Report

Quoted By: >>12759975

>>12759862
>>12759884
Machine learning is not that difficult and doesn't require an IQ above ~110 to understand.
It is no where near "one of the hardest sciences".

Anonymous

Anonymous Fri 26 Feb 2021 20:22:24 No.12759944 Report

Quoted By:

>>12759849
In Deep Q learning you have a network that predicts the reward for different actions, and then you choose the action with the highest expected reward, and then do update the network to more accurately predict the reward for that action once you get feedback.

In policy gradient networks like Actor Critic, you have a network that recommends actions, attached to another network that predicts the reward of those actions. You train the action recommender network to maximize the value of the reward predictor network, and you train the reward predictor network to predict more accurate rewards.

it's not difficult ur just dumb

Anonymous

Anonymous Fri 26 Feb 2021 20:24:09 No.12759949 Report

Quoted By:

>>12759922
You're right though that too many authors hide behind notation and excessive formalism. Also most ML authors are ESL, which makes it worse.

Anonymous

Anonymous Fri 26 Feb 2021 20:25:21 No.12759954 Report

Quoted By:

>>12759922
Because you're supposed to learn things in textbooks not with youtube videos you low attention span, smooth brained midwit.

>>12759849
Literally one of the simplest concepts in computer science. Give up on life, go play video games.

Anonymous

Anonymous Fri 26 Feb 2021 20:33:25 No.12759975 Report

Quoted By:

>>12759924
You will never be a professor.

Anonymous

Anonymous Fri 26 Feb 2021 20:38:42 No.12759987 Report

Quoted By:

>>12759922
Bruh you just said that you're reading someone else's code. There are different implementations of RL algorithms and the most popular one doesn't have this problem.
>All the math cunts tell you is "bro just take the probability of action a under state s" but nobody fucking tells you how.
Exploration-exploitation dilemma was mentioned in both the course and the book that I used. Maybe you're just using following some mediocre course.

Capcode	All Only User Posts Only Moderator Posts Only Admin Posts Only Developer Posts
Show Posts	All Only With Images Only Without Images
Deleted Posts	All Only Deleted Posts Only Non-Deleted Posts
Ghost Posts	All Only Ghost Posts Only Non-Ghost Posts
Post Type	All Only Sticky Threads Only Opening Posts Only Reply Posts
Results	All Grouped By Threads
Order	Latest Posts First Oldest Posts First

Your latest searches