Recap

  • plot: action hist bars
  • plot: action hist polygons
  • plot: market bound with optimal strat
  • plot: split the train eval plot along the y axis
  • experiment: train with same T / num_steps ratio for the same amount of iterations
  • experiment: train with full price history in observation space
  • experiment: train with sliding window of price history
  • experiment: change reward type from delta wealth to delta bankroll

TODO

  • insert action hist plotting into the end of the training pipeline, after train eval plot
  • make asimptotic plot without market bound and put the trained policy eval on it
  • make asimptotic plot for a single hurst
  • make an FBM env with stochastic hurst
  • fix train_lstm.py

Remaining from last week

  • show side by side of H0.1 T1024 with and without clipped action space, that is [-10, 10] and [-1_000, 1_000]
  • if the training works for small T then we can try to incrementally extend the time horizon
  • when running parallell envs try to shuffle the starting point to cover more of the episode length
  • train model on with with big num_steps_eval number (5_000)
  • fix training for
    • Liquidation test env continue
    • make a grid search where there are three axis: clip_coef, learning_rate, H = 0.1, 0.7
    • make an env that is a child of FBMEnv and has a price process of a linear function with random slope