Adaptive Deep Learning in Quant Finance with Qlib’s PyTorch AdaRNN

December 22, 2024 · 6 min read

Senior Software Engineer

Introduction

AdaRNN is a specialized PyTorch model designed to adaptively learn from non-stationary financial time series—where market distributions evolve over time. Originally proposed in the paper AdaRNN: Adaptive Learning and Forecasting for Time Series, it leverages both GRU layers and transfer-loss techniques to mitigate the effects of distributional shift. This article demonstrates how AdaRNN can be applied within Microsoft’s Qlib—an open-source, AI-oriented platform for quantitative finance.

Background: AdaRNN Methodology

Temporal Covariate Shift (TCS)
Market factors may differ drastically from historical data. AdaRNN addresses TCS by adaptively aligning representations over time.
Two-Phase Design
1. Temporal Distribution Characterization:
  Better captures distribution information in time-series data.
2. Temporal Distribution Matching:
  Bridges the gap between older and newer data via advanced distribution alignment (e.g., MMD, CORAL, COSINE).
Paper & Code References:
- CIKM 2021 Paper
- Official Repo

AdaRNN’s adaptability makes it relevant for financial forecasting, air-quality prediction, and activity recognition—any scenario where non-stationary data complicates model training.

Qlib Integration

Below is an excerpt from a Qlib-friendly YAML configuration. By running one command, Qlib will:

Initialize a US market environment (^GSPC as benchmark).
Load & transform data (e.g. Alpha360) with specialized normalization and label dropping.
Train a custom or placeholder PyTorch “AdaRNN” model.
Evaluate predictions via correlation metrics and backtesting.
Generate logs for advanced debugging and iteration.

Command

qrun workflow_config_adarnn_Alpha360.yaml

YAML Snippet

qlib_init:
  provider_uri: "/Users/vadimnicolai/Public/work/qlib-cookbook/.data/us_data"
  region: us
  kernels: 1

market: &market sp500
benchmark: &benchmark ^GSPC

data_handler_config: &data_handler_config
  start_time: 2008-01-01
  end_time: 2020-08-01
  fit_start_time: 2008-01-01
  fit_end_time: 2014-12-31
  instruments: *market
  infer_processors:
    - class: RobustZScoreNorm
      kwargs:
        fields_group: feature
        clip_outlier: true
    - class: Fillna
      kwargs:
        fields_group: feature
  learn_processors:
    - class: DropnaLabel
    - class: CSRankNorm
      kwargs:
        fields_group: label
  label: ["Ref($close, -2) / Ref($close, -1) - 1"]

port_analysis_config: &port_analysis_config
  strategy:
    class: TopkDropoutStrategy
    module_path: qlib.contrib.strategy
    kwargs:
      signal: <PRED>
      topk: 50
      n_drop: 5
  backtest:
    start_time: 2017-01-01
    end_time: 2020-08-01
    account: 100000000
    benchmark: *benchmark
    exchange_kwargs:
      limit_threshold: 0.095
      deal_price: close
      open_cost: 0.0005
      close_cost: 0.0015
      min_cost: 5

task:
  model:
    # Demonstration of AdaRNN or a placeholder PyTorch model
    class: DNNModelPytorch
    module_path: qlib.contrib.model.pytorch_nn
    kwargs:
      batch_size: 1024
      max_steps: 4000
      loss: mse
      lr: 0.002
      optimizer: adam
      GPU: 0
      pt_model_kwargs:
        input_dim: 360

  dataset:
    class: DatasetH
    module_path: qlib.data.dataset
    kwargs:
      handler:
        class: Alpha360
        module_path: qlib.contrib.data.handler
        kwargs: *data_handler_config
      segments:
        train: [2008-01-01, 2014-12-31]
        valid: [2015-01-01, 2016-12-31]
        test: [2017-01-01, 2020-08-01]

  record:
    - class: SignalRecord
      module_path: qlib.workflow.record_temp
      kwargs:
        model: <MODEL>
        dataset: <DATASET>
    - class: SigAnaRecord
      module_path: qlib.workflow.record_temp
      kwargs:
        ana_long_short: False
        ann_scaler: 252
    - class: PortAnaRecord
      module_path: qlib.workflow.record_temp
      kwargs:
        config: *port_analysis_config

Example Logs & Observations

When you run:

qrun workflow_config_adarnn_Alpha360.yaml

You may see logs such as:

[50515:MainThread](2024-12-23 19:23:44,889) INFO - qlib.ADARNN - ADARNN pytorch version...
d_feat : 6
hidden_size : 64
num_layers : 2
dropout : 0.0
n_epochs : 200
lr : 0.001
metric : loss
batch_size : 800
early_stop : 20
optimizer : adam
loss_type : mse
...
Epoch0: training...
ic/train: 0.016603, mse/train: 0.010481, ...
ic/valid: 0.007023, mse/valid: 0.013398, ...
Epoch1: training...
ic/train: 0.017488, ...
ic/valid: 0.007711, ...

Key Points:

d_feat=6 indicates the model uses 6 features per time step (Alpha360 can have ~360; some examples show 6 for a simpler demonstration).
AdaRNN attempts to adapt to distribution shifts with a specialized gating mechanism, plus distance-based alignment (e.g., MMD, cosine).
Low or moderate ic (information coefficient) values in early epochs typically mean you might tune your features or hyperparams further.

Practical Notes

Setting GPU: 0 uses CPU-only mode—suitable for debugging or if CUDA is unavailable.
If distribution shift is severe, consider 'adv', 'mmd_rbf', or 'cosine' in the AdaRNN code base to better handle changing market regimes.
Check memory usage and concurrency settings (kernels: 1) if you encounter long training times or segmentation faults.

Data & Requirements

In AdaRNN’s original repository, they discuss:

Air-quality dataset but the technique extends well to finance.
Python >= 3.7, PyTorch ~1.6 for best results.
Ensure your Qlib environment has matching dependencies (e.g. requirements.txt pinned versions) to avoid conflicts.

Conclusion

AdaRNN’s adaptive architecture is particularly suited to non-stationary financial data, bridging older regimes to modern ones via distribution matching. Within Qlib:

You unify data ingestion, factor engineering, neural training, and evaluation into one YAML.
You easily measure correlation (IC, Rank IC) and simulate real trading costs via PortAnaRecord.
You can adapt AdaRNN’s gating, hidden-layers, or distribution distances for maximum alpha discovery in shifting markets.

Next Steps:

Try different distribution-losses (MMD, CORAL, 'adv') in AdaRNN to see which best handles your market regime changes.
Combine fundamental, sentiment, or alt data to strengthen “temporal distribution characterization.”
Evaluate the final portfolio PnL and IR thoroughly in Qlib’s logs, adjusting hyperparameters (dw, pre_epoch, hidden_size) for improved adaptation.

References

Du, Yuntao, et al. “AdaRNN: Adaptive Learning and Forecasting for Time Series.” Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM), 2021.
Paper Video | Zhihu (Chinese)
AdaRNN GitHub Repo
Qlib on GitHub

Pro Tip: If you see low IC or negative alpha, consider rolling or incremental re-training. Fine-tuning AdaRNN’s distribution alignment parameters can be crucial for dealing with abrupt financial market changes.

Introduction​

Background: AdaRNN Methodology​

Qlib Integration​

Command​

YAML Snippet​

Example Logs & Observations​

Practical Notes​

Data & Requirements​

Conclusion​

References​