Jegyzetek kincstára

❯

❯

❯

plots

Apr 20, 20261 min read

keep the naive reward for all these plots, and maybe try out different rewards and replicate the plots and see what changed

plot the T horizon changing do it for different trained models (mean reward, certainty equivavelnt)
one training cycle for each T
(could even do one training cycle and do a snapshot at each target T)
(the time horizon could even be a training variable)

It would be nice to see the agents trading strategy while its evolving. Put everything in info and recreate the rollouts to plot them.

Created with Quartz v4.4.0 © 2026

GitHub