Creating your own virtual environment
You must create your conda environments from a SLURM interactive session. So, ensure you have an active interactive session by running:
srun -p interactive --pty /bin/bash
Running conda installations interactively on the login nodes will result in memory errors and other compatibility issues.
You should decide which version of Python you wish to use, 2 or 3. There are Anaconda modules available for both versions, the current Anaconda versions can be found by typing:
module spider anaconda
To load the version of Anaconda you want, in this example we are using the latest version, use one of the following commands:
module load Anaconda2
module load Anaconda3
or one of the specific Anaconda versions shown by
Once the module is loaded you can use the
conda commands to create a virtual environment in your $DATA area. For example to create an environment named
myenv in $DATA we can use the following commands:
conda create --prefix $CONPREFIX --copy python=2.7
conda create --prefix $CONPREFIX
Please be aware of messages from
conda which instruct you to run
conda init - this command will add lines to your
~/.bashrc file which can in certain
circumstances cause undesirable behaviour in SLURM batch files. We recommend activating with
source activate if issues occur in batch files.
You can now use (activate) the environment by running one of the following commands:
source activate $CONPREFIX
conda activate $CONPREFIX
You can then use the
conda install or
pip commands to install packages. We recommend the use of
conda install where possible to maintain package
version consistency in the virtual environment. For example:
conda install numpy
pip install numpy
In the above examples we use the
--prefix option to
conda create This is to ensure that the conda virtual environment is placed in
$DATA. If you ommit
this there is a risk that your environment will be placed in the default location which is
$HOME/.conda/envs this will very likely over time cause you to go over
quota in your
$HOME area which will cause problems running jobs.
Conda Build Scripts
For ease of use, rather than running the conda commands from the command line, we recommend creating a small script to create the environment, such that you can re-
build the environment in future, if required. For example, you could create a file named
build_env.sh with the following contents
!# /bin/bash # Load the version of Anaconda you need module load Anaconda3 # Create an environment in $DATA and give it an appropriate name export CONPREFIX=$DATA/envname conda create --prefix $CONPREFIX # Activate your environment source activate $CONPREFIX # Install packages... conda install <packagename> .. ..
You could then run this script with
Conda Package Cache
By default Anaconda will cache all packages installed using
conda install into a directory in your
$HOME area named
~/.conda/pkgs before installing them
into your virtual environment. Over time this has the potential to put you over quota in
If you find yourself over quota in
$HOME check how much space is being used in
cd ~/.conda du -sh pkgs
du command above may take some time to run. When complete, the command will show how much space is in use in
pkgs - for example
In this case 12GB of space is being used by downloaded packages. To tidy up, run the following commands
module load Anaconda3 conda clean --packages --tarballs
You can repeat the
du command above to check that the space has been freed.
Using Anaconda from within a submission script
In order to use your installed virtual environment from a batch script, you will need to load the appropriate Anconda module and activate your environment. Using values from the above example (and assuming Python version 3, Anaconda 2020/11):
# After SBATCH section of script module load Anaconda3/2020.11 source activate $DATA/myenv # Your Python commands here...
Important Anaconda Information
When using Anaconda on the ARC systems, please take note of the following:
Do not load Anaconda virtual environments automatically on log in from your .bashrc or .bash_profile scripts. These will cause issues to SLURM submitted jobs.
Ensure you have deactivated the virtual environment BEFORE submitting a SLURM job using sbatch, otherwise you will have issues with packages from your virtual environment not being found.
You should load all you require from the submission script - as in the submission script example above.
Use the instructions above to create a basic Python Anaconda 2 or 3 virtual environment, then use the following commands to ensure the bioconda repostories are enabled:
conda config --add channels defaults conda config --add channels bioconda conda config --add channels conda-forge
Bioconda packages may then be installed by using the
conda install command, for example to install
conda install bwa