OpenFold3 Training¶
Welcome to the training documentation for OpenFold3. This guide covers how to train OpenFold3 on the PDB dataset from scratch or fine-tune from an existing checkpoint.
1. Prerequisites¶
OpenFold3 Conda environment. See OpenFold3 Installation for instructions on how to build the environment.
2. Download the Dataset¶
The pre-processed PDB training dataset is hosted on AWS S3. Download it using the AWS CLI:
aws s3 sync s3://openfold3-data/pdb_training_set/ /shared/openfold3/pdb_training_set/ --no-sign-request
3. Prepare the Training Config¶
The training configuration is stored in a YAML file that controls all aspects of training: model settings, dataset configuration, distributed training parameters, logging, and checkpointing.
Note: Make sure you update the paths to match your file locations. The examples below assume a /shared/openfold3 directory that’s accessible from all your training nodes.
Complete example YAML configurations for all stages of training are available in examples/training_yamls/.
3.1 Basic Training Config¶
Here’s a minimal configuration for single-GPU training:
experiment_settings:
mode: train
output_dir: ./test_train_output
restart_checkpoint_path: last
data_module_args:
batch_size: 1
num_workers: 1
epoch_len: 500 # Ckpt every 500 steps (effective batch_size * # of steps)
logging_config:
log_lr: false
wandb_config: null
pl_trainer_args:
devices: 1
num_nodes: 1
precision: bf16-mixed
max_epochs: -1
log_every_n_steps: 50
checkpoint_config:
every_n_epochs: 1
auto_insert_metric_name: false
save_last: true
save_top_k: -1
model_update:
presets:
- train
custom:
architecture:
shared:
use_confidence_emb_prob: 0.8
diffusion:
use_conditioning_prob: 0.8
dataset_configs:
train:
weighted-pdb:
dataset_class: WeightedPDBDataset
weight: 1.0
config:
debug_mode: true
template:
n_templates: 4
take_top_k: false
crop:
token_crop:
enabled: true
token_budget: 384
crop_weights:
contiguous: 0.2
spatial: 0.4
spatial_interface: 0.4
chain_crop:
enabled: true
validation:
val-weighted-pdb:
dataset_class: ValidationPDBDataset
config:
debug_mode: true
msa:
subsample_main: false
template:
n_templates: 4
take_top_k: true
crop:
token_crop:
enabled: false
dataset_paths:
weighted-pdb:
alignments_directory: none
alignment_db_directory: none
alignment_array_directory: /shared/openfold3/pdb_training_set/alignment_arrays
dataset_cache_file: /shared/openfold3/pdb_training_set/dataset_caches/training_cache_with_templates.json
target_structures_directory: /shared/openfold3/pdb_training_set/preprocessed_pdb_data/standard/structure_files
target_structure_file_format: npz
reference_molecule_directory: /shared/openfold3/pdb_training_set/preprocessed_pdb_data/standard/reference_mols
template_cache_directory: /shared/openfold3/pdb_training_set/templates/train_template_cache
template_structure_array_directory: /shared/openfold3/pdb_training_set/templates/template_structure_arrays
template_structures_directory: none
template_file_format: npz
ccd_file: null
val-weighted-pdb:
alignments_directory: none
alignment_db_directory: none
alignment_array_directory: /shared/openfold3/pdb_training_set/alignment_arrays
dataset_cache_file: /shared/openfold3/pdb_training_set/dataset_caches/validation_cache_with_templates.json
target_structures_directory: /shared/openfold3/pdb_training_set/preprocessed_pdb_data/standard/structure_files
target_structure_file_format: npz
reference_molecule_directory: /shared/openfold3/pdb_training_set/preprocessed_pdb_data/standard/reference_mols
template_cache_directory: /shared/openfold3/pdb_training_set/templates/val_template_cache
template_structure_array_directory: /shared/openfold3/pdb_training_set/templates/template_structure_arrays
template_structures_directory: none
template_file_format: npz
ccd_file: null
For example configurations for all stages of training, please see examples/training_yamls/:
initial_training.yml: Standard initial training configurationfinetune_1.yml: Fine-tuning stage 1 configurationfinetune_2.yml: Fine-tuning stage 2 configurationfinetune_3.yml: Fine-tuning stage 3 configuration (used in OF3p)
4. Launch Training¶
4.1 Single-GPU Training¶
For testing or debugging:
run_openfold train --runner-yaml training.yaml --seed 42
4.2 Multi-GPU Training¶
To train on multiple GPUs within a single node, configure your YAML:
pl_trainer_args:
devices: 8 # GPUs per node
num_nodes: 1
For multi-node distributed training, update your config as follows:
pl_trainer_args:
devices: 8 # GPUs per node
num_nodes: 32 # Total number of nodes
Then launch training with:
run_openfold train --runner_yaml training.yaml --seed 42
5. Monitoring Training¶
Enable Weights & Biases logging by configuring wandb_config in your YAML:
logging_config:
log_lr: true
wandb_config:
project: openfold3-training
entity: your-wandb-entity
group: null
id: null
experiment_name: my-training-run
To use W&B, ensure you’re logged in:
wandb login
6. Checkpointing and Resuming¶
6.1 Checkpoint Configuration¶
checkpoint_config:
every_n_epochs: 1 # Save checkpoint every N epochs
auto_insert_metric_name: false # Don't add metric to filename
save_last: true # Always save 'last.ckpt'
save_top_k: -1 # Keep all checkpoints (-1) or top K
Checkpoints are saved to {output_dir}/checkpoints/.
6.2 Resuming Training¶
To resume from the last checkpoint:
experiment_settings:
restart_checkpoint_path: last
To resume from a specific checkpoint:
experiment_settings:
restart_checkpoint_path: /path/to/checkpoint.ckpt
7. Fine-tuning¶
To fine-tune from a pre-trained checkpoint, specify the checkpoint path and adjust training parameters as needed:
experiment_settings:
mode: train
output_dir: ./finetune_output
seed: 42
restart_checkpoint_path: /path/to/pretrained.ckpt