PyTorch ------- **Using PyTorch environment modules** There are many PyTorch modules already installed on ARC. You can find these using the command:: module spider PyTorch Those modules with a CUDA suffix need to be run on the HTC cluster in order to benefit from GPU accelleration. Please be aware that you should load the module you intend to use on an interactive session, and it will inform you of the GPU compute capability it has been built for. This will ensure that you can specify the correct GPU type in your submission script. For example (from htc-login):: srun -p interactive --pty /bin/bash module load PyTorch/1.12.0-foss-2022a-CUDA-11.7.0 Note: This PyTorch module supports GPUs with compute capability features up to 8.6 (e.g. V100, A100, RTX8000) it will not work with newer GPU generations. Please ensure you have requested the correct GPU generation. See https://arc-user-guide.readthedocs.io/en/latest/job-scheduling.html#gpu-resources The above message indicates that this module was built for NVidia compute capability 8.6 so if run on newer GPUs such as A100 and H100 it will error and fall-back to CPU operation. **Building your own PyTorch conda environment** If you need to add other packages to the PyTorch environment you may find it easier to build your own Anaconda environment. A base install of PyTorch would be installed with the following script:: #! /bin/bash # # Run this script from an interactive session: # # srun -p interactive --pty /bin/bash # # # It will create a PyTorch 2.0.1 environment GPU enabled with CUDA 11.7 # module load Anaconda3/2022.10 # Change the following to specify the location for the environment: # export CONPREFIX=$DATA/arc_pytorch # conda create --prefix $CONPREFIX conda activate $CONPREFIX # # Base PyTorch install # conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia # You can add more packages to this script. You can then execute the script from an interactive session - e.g. assuming you have saved the file as ``arc_env_build.sh``:: [user@htc-login01 ~]$ srun -i interactive --pty /bin/bash srun: CPU resource required, checking settings/requirements... [user@htc-g040 ~]$ sh ./arc_env_build.sh To use the environment from a batch submission script, after the resource definition ``#SBATCH`` lines add:: module load Anaconda3/2022.10 export CONPREFIX=$DATA/arc_pytorch conda activate $CONPREFIX ...your python command here... ..note..: You MUST deactivate any active conda environment from your shell BEFORE running the ``sbatch`` command to submit your job - otherwise your job may fail.