2025.04.09 meeting notes

show side by side of H0.1 T1024 with and without clipped action space, that is [-10, 10] and [-1_000, 1_000]

if the training works for small T then we can try to incrementally extend the time horizon

when running parallell envs try to shuffle the starting point to cover more of the episode length

train model on

T = 128

with

H = 0.45, 0.55

with big num_steps_eval number (5_000)

fix training for

H > 0.5

Liquidation test env continue
make a grid search where there are three axis: clip_coef, learning_rate, H = 0.1, 0.7
make an env that is a child of FBMEnv and has a price process of a linear function with random slope

Jegyzetek kincstára