cuEquivariance Kernels¶

OF3 supports cuEquivariance triangle_multiplicative_update and triangle_attention kernels which can speed up inference/training of the model. Note: cuEquivariance acceleration can be used while DeepSpeed acceleration is enabled. cuEquivariance would take precedence, and then would fall back to either DeepSpeed (if enabled) or PyTorch for the shapes it does not handle efficiently. Notably, it would fall back for shorter sequences (threshold controlled by CUEQ_TRIATTN_FALLBACK_THRESHOLD environment variable), and for shapes with hidden dimension > 128 (diffusion transformer shapes).

To enable cuequivariance with pixi, use the openfold3-cuda12-pypi or openfold3-cuda13-pypi environment. Below is a example inference command

pixi run -e openfold3-cuda12-pypi \
  run_openfold predict --query-json=query_ubiquitin.json  --runner-yaml=cuequivariance.yml

For other workflows, cuequivariance must first be installed with the cuequivariance optional dependency, e.g.

pip install openfold3[cuequivariance]

Then, to enable these kernels via the runner.yaml, add the following:

# cuequivariance.yml
model_update:
  presets: 
    - "predict"
    - "low_mem"  # for lower memory systems
  custom:
    settings:
      memory:
        eval:
          use_cueq_triangle_kernels: true
          use_deepspeed_evo_attention: true  # set this to False to use cueq only

This runner.yml is specifically for inference, but similar settings can be used for training.