Blog | Vadim's blog

A Comprehensive Guide to Qlib’s Portfolio Strategy, TopkDropoutStrategy, and EnhancedIndexingStrategy

December 25, 2024 · 9 min read

Senior Software Engineer at Vitrifi

Introduction

In Qlib, portfolio strategies turn prediction scores into actionable orders (buy/sell) for building and rebalancing a portfolio. This article will:

Explain the architecture of key strategy classes.
Demonstrate TopkDropoutStrategy and EnhancedIndexingStrategy in detail.
Present diagrams and code blocks illustrating the step-by-step flows.

By the end, you’ll see how to plug your own predictive model scores into these strategies and make them trade automatically.

Class Hierarchy

Below is a simple diagram showing how these classes inherit from one another:

BaseStrategy: Core abstraction; requires a method to generate a trade decision.
BaseSignalStrategy: Extends BaseStrategy with “signals” (model scores).
TopkDropoutStrategy: Buys the top-K scoring stocks and drops the worst ones.
WeightStrategyBase: Uses target weights (fractions of the portfolio) rather than discrete buy/sell.
EnhancedIndexingStrategy: Adds advanced risk modeling for partial index tracking.

High-Level Trading Flow for Top-K

Here’s a top-down look at a generic daily (or periodic) process once your predictions are ready:

Code Walkthrough

Below we break down the code for Qlib’s portfolio strategies into sections, each supplemented by additional flow diagrams relevant to that part of the code.

1. Imports and Setup

import os
import copy
import warnings
import numpy as np
import pandas as pd

from typing import Dict, List, Text, Tuple, Union
from abc import ABC

from qlib.data import D
from qlib.data.dataset import Dataset
from qlib.model.base import BaseModel
from qlib.strategy.base import BaseStrategy
from qlib.backtest.position import Position
from qlib.backtest.signal import Signal, create_signal_from
from qlib.backtest.decision import Order, OrderDir, TradeDecisionWO
from qlib.log import get_module_logger
from qlib.utils import get_pre_trading_date, load_dataset
from qlib.contrib.strategy.order_generator import OrderGenerator, OrderGenWOInteract
from qlib.contrib.strategy.optimizer import EnhancedIndexingOptimizer

Explanation

Core Python imports for numerical operations, data processing, and type hints.
Qlib-specific imports:
- BaseStrategy, Position, Signal, and TradeDecisionWO for implementing custom strategies and managing trade decisions.
- OrderGenerator and EnhancedIndexingOptimizer for generating orders from target weights and optimizing risk exposure.

2. `BaseSignalStrategy`

Below is a class diagram illustrating BaseSignalStrategy inheriting from BaseStrategy and adding a signal field:

class BaseSignalStrategy(BaseStrategy, ABC):
    def __init__(
        self,
        *,
        signal: Union[Signal, Tuple[BaseModel, Dataset], List, Dict, Text, pd.Series, pd.DataFrame] = None,
        model=None,
        dataset=None,
        risk_degree: float = 0.95,
        trade_exchange=None,
        level_infra=None,
        common_infra=None,
        **kwargs,
    ):
        """
        Parameters
        -----------
        signal :
            Could be a Signal object or raw predictions from a model/dataset.
        risk_degree : float
            Fraction of total capital to invest (default 0.95).
        trade_exchange : Exchange
            Market info for dealing orders, generating reports, etc.
        """
        super().__init__(level_infra=level_infra, common_infra=common_infra, trade_exchange=trade_exchange, **kwargs)

        self.risk_degree = risk_degree

        # For backward-compatibility with (model, dataset)
        if model is not None and dataset is not None:
            warnings.warn("`model` `dataset` is deprecated; use `signal`.", DeprecationWarning)
            signal = model, dataset

        self.signal: Signal = create_signal_from(signal)

    def get_risk_degree(self, trade_step=None):
        """Return the fraction of total value to allocate."""
        return self.risk_degree

Key Points

BaseSignalStrategy extends BaseStrategy and integrates a concept of a signal (predictions).
risk_degree indicates what fraction of the portfolio’s capital is invested (defaults to 95%).

3. `TopkDropoutStrategy`

Here’s a flow diagram specifically for the generate_trade_decision method in TopkDropoutStrategy, showing how the code sorts holdings, identifies “drop” stocks, and selects new buys:

class TopkDropoutStrategy(BaseSignalStrategy):
    def __init__(
        self,
        *,
        topk,
        n_drop,
        method_sell="bottom",
        method_buy="top",
        hold_thresh=1,
        only_tradable=False,
        forbid_all_trade_at_limit=True,
        **kwargs,
    ):
        """
        Parameters
        -----------
        topk : int
            Desired number of stocks to hold.
        n_drop : int
            Number of stocks replaced each rebalance.
        method_sell : str
            Approach to dropping existing stocks (e.g. 'bottom').
        method_buy : str
            Approach to adding new stocks (e.g. 'top').
        hold_thresh : int
            Must hold a stock for at least this many days before selling.
        only_tradable : bool
            Ignore non-tradable stocks.
        forbid_all_trade_at_limit : bool
            Disallow trades if limit up/down is reached.
        """
        super().__init__(**kwargs)
        self.topk = topk
        self.n_drop = n_drop
        self.method_sell = method_sell
        self.method_buy = method_buy
        self.hold_thresh = hold_thresh
        self.only_tradable = only_tradable
        self.forbid_all_trade_at_limit = forbid_all_trade_at_limit

    def generate_trade_decision(self, execute_result=None):
        trade_step = self.trade_calendar.get_trade_step()
        trade_start_time, trade_end_time = self.trade_calendar.get_step_time(trade_step)
        pred_start_time, pred_end_time = self.trade_calendar.get_step_time(trade_step, shift=1)
        pred_score = self.signal.get_signal(start_time=pred_start_time, end_time=pred_end_time)

        # If no score, do nothing
        if pred_score is None:
            return TradeDecisionWO([], self)

        # If multiple columns, pick the first
        if isinstance(pred_score, pd.DataFrame):
            pred_score = pred_score.iloc[:, 0]

        # Helper functions for picking top/bottom stocks...
        ...

        # Copy current position
        current_temp: Position = copy.deepcopy(self.trade_position)
        sell_order_list = []
        buy_order_list = []
        cash = current_temp.get_cash()
        current_stock_list = current_temp.get_stock_list()

        # Sort current holdings by descending score
        last = pred_score.reindex(current_stock_list).sort_values(ascending=False).index

        # Identify new stocks to buy
        ...

        # Figure out which existing stocks to sell
        ...

        # Create Sell Orders
        ...

        # Create Buy Orders
        ...

        return TradeDecisionWO(sell_order_list + buy_order_list, self)

Key Points

The “top-K, drop worst-K” concept is implemented by comparing current holdings to the broader universe of stocks sorted by score.
Some specifics:
- method_sell can be "bottom", so you drop the lowest-scored holdings.
- method_buy can be "top", so you pick the top new stocks that aren’t in the portfolio.

4. `WeightStrategyBase`

Below is a quick diagram for how WeightStrategyBase converts target weights into final orders:

class WeightStrategyBase(BaseSignalStrategy):
    def __init__(
        self,
        *,
        order_generator_cls_or_obj=OrderGenWOInteract,
        **kwargs,
    ):
        super().__init__(**kwargs)
        if isinstance(order_generator_cls_or_obj, type):
            self.order_generator: OrderGenerator = order_generator_cls_or_obj()
        else:
            self.order_generator: OrderGenerator = order_generator_cls_or_obj

    def generate_target_weight_position(self, score, current, trade_start_time, trade_end_time):
        """
        Subclasses must override this to return:
        {stock_id: target_weight}
        """
        raise NotImplementedError()

    def generate_trade_decision(self, execute_result=None):
        trade_step = self.trade_calendar.get_trade_step()
        trade_start_time, trade_end_time = self.trade_calendar.get_step_time(trade_step)
        pred_start_time, pred_end_time = self.trade_calendar.get_step_time(trade_step, shift=1)
        pred_score = self.signal.get_signal(start_time=pred_start_time, end_time=pred_end_time)
        if pred_score is None:
            return TradeDecisionWO([], self)

        current_temp = copy.deepcopy(self.trade_position)
        assert isinstance(current_temp, Position)

        # Let the subclass produce the weights
        target_weight_position = self.generate_target_weight_position(
            score=pred_score, current=current_temp, trade_start_time=trade_start_time, trade_end_time=trade_end_time
        )

        # Convert weights -> Orders
        order_list = self.order_generator.generate_order_list_from_target_weight_position(
            current=current_temp,
            trade_exchange=self.trade_exchange,
            risk_degree=self.get_risk_degree(trade_step),
            target_weight_position=target_weight_position,
            pred_start_time=pred_start_time,
            pred_end_time=pred_end_time,
            trade_start_time=trade_start_time,
            trade_end_time=trade_end_time,
        )
        return TradeDecisionWO(order_list, self)

Key Points

WeightStrategyBase uses a target-weight approach: you specify a final allocation for each stock.
The built-in order_generator calculates how many shares to buy/sell to achieve the target allocation.

5. `EnhancedIndexingStrategy`

Lastly, a diagram shows how this strategy merges model scores with factor data and a benchmark:

class EnhancedIndexingStrategy(WeightStrategyBase):
    """
    Combines active and passive management, aiming to
    outperform a benchmark index while controlling tracking error.
    """

    FACTOR_EXP_NAME = "factor_exp.pkl"
    FACTOR_COV_NAME = "factor_cov.pkl"
    SPECIFIC_RISK_NAME = "specific_risk.pkl"
    BLACKLIST_NAME = "blacklist.pkl"

    def __init__(
        self,
        *,
        riskmodel_root,
        market="csi500",
        turn_limit=None,
        name_mapping={},
        optimizer_kwargs={},
        verbose=False,
        **kwargs,
    ):
        super().__init__(**kwargs)
        self.logger = get_module_logger("EnhancedIndexingStrategy")

        self.riskmodel_root = riskmodel_root
        self.market = market
        self.turn_limit = turn_limit

        self.factor_exp_path = name_mapping.get("factor_exp", self.FACTOR_EXP_NAME)
        self.factor_cov_path = name_mapping.get("factor_cov", self.FACTOR_COV_NAME)
        self.specific_risk_path = name_mapping.get("specific_risk", self.SPECIFIC_RISK_NAME)
        self.blacklist_path = name_mapping.get("blacklist", self.BLACKLIST_NAME)

        self.optimizer = EnhancedIndexingOptimizer(**optimizer_kwargs)
        self.verbose = verbose
        self._riskdata_cache = {}

    def get_risk_data(self, date):
        if date in self._riskdata_cache:
            return self._riskdata_cache[date]

        root = self.riskmodel_root + "/" + date.strftime("%Y%m%d")
        if not os.path.exists(root):
            return None

        factor_exp = load_dataset(root + "/" + self.factor_exp_path, index_col=[0])
        factor_cov = load_dataset(root + "/" + self.factor_cov_path, index_col=[0])
        specific_risk = load_dataset(root + "/" + self.specific_risk_path, index_col=[0])

        if not factor_exp.index.equals(specific_risk.index):
            specific_risk = specific_risk.reindex(factor_exp.index, fill_value=specific_risk.max())

        universe = factor_exp.index.tolist()
        blacklist = []
        if os.path.exists(root + "/" + self.blacklist_path):
            blacklist = load_dataset(root + "/" + self.blacklist_path).index.tolist()

        self._riskdata_cache[date] = factor_exp.values, factor_cov.values, specific_risk.values, universe, blacklist
        return self._riskdata_cache[date]

    def generate_target_weight_position(self, score, current, trade_start_time, trade_end_time):
        trade_date = trade_start_time
        pre_date = get_pre_trading_date(trade_date, future=True)

        outs = self.get_risk_data(pre_date)
        if outs is None:
            self.logger.warning(f"No risk data for {pre_date:%Y-%m-%d}, skipping optimization")
            return None

        factor_exp, factor_cov, specific_risk, universe, blacklist = outs

        # Align score with risk model universe
        score = score.reindex(universe).fillna(score.min()).values

        # Current portfolio weights
        cur_weight = current.get_stock_weight_dict(only_stock=False)
        cur_weight = np.array([cur_weight.get(stock, 0) for stock in universe])
        cur_weight = cur_weight / self.get_risk_degree(trade_date)

        # Benchmark weight
        bench_weight = D.features(
            D.instruments("all"), [f"${self.market}_weight"], start_time=pre_date, end_time=pre_date
        ).squeeze()
        bench_weight.index = bench_weight.index.droplevel(level="datetime")
        bench_weight = bench_weight.reindex(universe).fillna(0).values

        # Track which stocks are tradable and which are blacklisted
        tradable = D.features(D.instruments("all"), ["$volume"], start_time=pre_date, end_time=pre_date).squeeze()
        tradable.index = tradable.index.droplevel(level="datetime")
        tradable = tradable.reindex(universe).gt(0).values
        mask_force_hold = ~tradable
        mask_force_sell = np.array([stock in blacklist for stock in universe], dtype=bool)

        # Optimize based on scores + factor model
        weight = self.optimizer(
            r=score,
            F=factor_exp,
            cov_b=factor_cov,
            var_u=specific_risk**2,
            w0=cur_weight,
            wb=bench_weight,
            mfh=mask_force_hold,
            mfs=mask_force_sell,
        )

        target_weight_position = {stock: w for stock, w in zip(universe, weight) if w > 0}

        if self.verbose:
            self.logger.info(f"trade date: {trade_date:%Y-%m-%d}")
            self.logger.info(f"number of holding stocks: {len(target_weight_position)}")
            self.logger.info(f"total holding weight: {weight.sum():.6f}")

        return target_weight_position

Key Points

Uses riskmodel_root to pull factor exposures, covariances, and specific risk estimates.
Combines your model scores with a benchmark weight to control tracking error via an optimizer.
Produces a final weight map, which Qlib then converts to buy/sell orders.

Summary

BaseSignalStrategy attaches prediction data to a strategy.
TopkDropoutStrategy implements a straightforward “buy top-K, drop worst-K” approach.
WeightStrategyBase generalizes weight-based rebalancing.
EnhancedIndexingStrategy is a powerful extension, combining active signals and passive indexing with risk control.

By customizing just a few methods or parameters, you can adapt these strategies to your own investing style. Simply feed your daily scores (prediction of returns) into Qlib, pick a strategy class, and let Qlib do the rest.

Happy Trading!

Understanding Score IC in Qlib for Enhanced Profit

December 24, 2024 · 6 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

One of the core ideas in quantitative finance is that model predictions—often called “scores”—can be mapped to expected returns on an instrument. In Qlib, these scores are evaluated using metrics like the Information Coefficient (IC) and Rank IC to show how well the scores predict future returns. Essentially, the higher the score, the more profit the instruments—if your IC is positive and statistically significant, the highest-scored stocks should, on average, outperform the lower-scored ones.

Powering Quant Finance with Qlib’s PyTorch MLP on Alpha360

December 22, 2024 · 5 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

Qlib is an AI-oriented, open-source platform from Microsoft that simplifies the entire quantitative finance process. By leveraging PyTorch, Qlib can seamlessly integrate modern neural networks—like Multi-Layer Perceptrons (MLPs)—to process large datasets, engineer alpha factors, and run flexible backtests. In this post, we focus on a PyTorch MLP pipeline for Alpha360 data in the US market, examining a single YAML configuration that unifies data ingestion, model training, and performance evaluation.

Adaptive Deep Learning in Quant Finance with Qlib’s PyTorch AdaRNN

December 22, 2024 · 6 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

AdaRNN is a specialized PyTorch model designed to adaptively learn from non-stationary financial time series—where market distributions evolve over time. Originally proposed in the paper AdaRNN: Adaptive Learning and Forecasting for Time Series, it leverages both GRU layers and transfer-loss techniques to mitigate the effects of distributional shift. This article demonstrates how AdaRNN can be applied within Microsoft’s Qlib—an open-source, AI-oriented platform for quantitative finance.

Harnessing AI for Quantitative Finance with Qlib and LightGBM

December 22, 2024 · 6 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

In the realm of quantitative finance, machine learning and deep learning are revolutionizing how researchers and traders discover alpha, manage portfolios, and adapt to market shifts. Qlib by Microsoft is a powerful open-source framework that merges AI techniques with end-to-end finance workflows.

This article demonstrates how Qlib automates an AI-driven quant workflow—from data ingestion and feature engineering to model training and backtesting—using a single YAML configuration for a LightGBM model. Specifically, we’ll explore the AI-centric aspects of how qrun orchestrates the entire pipeline and highlight best practices for leveraging advanced ML models in your quantitative strategies.

Understanding Gradient Descent in Linear Regression

November 14, 2024 · 5 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

Gradient descent is a fundamental optimization algorithm used in machine learning to minimize the cost function and find the optimal parameters of a model. In the context of linear regression, gradient descent helps in finding the best-fitting line by iteratively updating the model parameters. This article delves into the mechanics of gradient descent in linear regression, focusing on how the parameters are updated and the impact of the sign of the gradient.

Understanding Linear Regression in Machine Learning

November 14, 2024 · 4 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

Linear regression is a fundamental algorithm in supervised machine learning, widely used for predicting continuous outcomes. It models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. This article delves into the components of linear regression, explaining how inputs, parameters, and the cost function work together to create a predictive model.

Technical Analysis of Key NVIDIA Partners - A Comparative Study

October 31, 2024 · 10 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

NVIDIA's growth and innovation in the technology sector are significantly supported by its strategic partnerships with key companies in the semiconductor industry. This article provides a comparative technical analysis of several major NVIDIA partners, including Taiwan Semiconductor Manufacturing Company (TSMC), Samsung Electronics, Micron Technology, SK hynix, ASML Holding, Applied Materials, and ASE Technology. By examining their financial indicators, technical metrics, and growth potential, investors can gain insights into why these companies present compelling investment opportunities.

Correct Exchange Mapping in VeighNa to Resolve IB Security Definition Errors

October 12, 2024 · 14 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

In the intricate world of algorithmic trading, seamless integration between trading platforms and broker APIs is paramount.

One common issue when interfacing with Interactive Brokers (IB) API is encountering the error:

ERROR:root:Error - ReqId: 1, Code: 200, Message: No security definition has been found for the request

This error typically arises due to incorrect exchange mapping, preventing Interactive Brokers (IB) from recognizing the requested security. This article delves into the importance of accurate exchange mapping within the VeighNa trading platform, provides a detailed overview of IB's symbol rules, explains the updatePortfolio method, and offers guidance on implementing correct mappings to avoid such errors.

Understanding the Sniper Algorithm Implementation in Algorithmic Trading

October 5, 2024 · 8 min read

Vadim Nicolai

Senior Software Engineer at Vitrifi

Introduction

In the realm of algorithmic trading, execution algorithms play a pivotal role in optimizing trade orders to minimize market impact and slippage. One such algorithm is the Sniper Algorithm, which is designed to execute trades discreetly and efficiently by capitalizing on favorable market conditions.

This article aims to review and understand the implementation of the Sniper Algorithm as provided in the VeighNa trading platform's open-source repository. By dissecting the code and explaining its components, we hope to provide clarity on how the algorithm functions and how it can be utilized in practical trading scenarios.

Introduction​

Class Hierarchy​

High-Level Trading Flow for Top-K​

Code Walkthrough​

1. Imports and Setup​

2. BaseSignalStrategy​

3. TopkDropoutStrategy​

4. WeightStrategyBase​

5. EnhancedIndexingStrategy​

Summary​

Introduction​

Introduction​

Introduction​

Introduction​

Introduction​

Introduction​

Introduction​

Introduction​

Introduction

Class Hierarchy

High-Level Trading Flow for Top-K

Code Walkthrough

1. Imports and Setup

2. `BaseSignalStrategy`

3. `TopkDropoutStrategy`

4. `WeightStrategyBase`

5. `EnhancedIndexingStrategy`

Summary

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction