Creating your own virtual environment
Note
You must create your conda environments from a SLURM interactive session. So, ensure you have an active interactive session by running:
srun -p interactive --pty /bin/bash
Running conda installations interactively on the login nodes will result in memory errors and other compatibility issues.
You should decide which version of Python you wish to use, 2 or 3. There are Anaconda modules available for both versions, the current Anaconda versions can be found by typing:
module spider anaconda
To load the version of Anaconda you want, in this example we are using the latest version, use one of the following commands:
Python 2:
module load Anaconda2
Python 3:
module load Anaconda3
or one of the specific Anaconda versions shown by module spider
.
Once the module is loaded you can use the conda
commands to create a virtual environment in your $DATA area. For example to create an environment named
myenv
in $DATA we can use the following commands:
export CONPREFIX=$DATA/myenv
Python 2:
conda create --prefix $CONPREFIX --copy python=2.7
Python 3:
conda create --prefix $CONPREFIX
Note
Please be aware of messages from conda
which instruct you to run conda init
- this command will add lines to your ~/.bashrc
file which can in certain
circumstances cause undesirable behaviour in SLURM batch files. We recommend activating with source activate
if issues occur in batch files.
You can now use (activate) the environment by running one of the following commands:
source activate $CONPREFIX
or:
conda activate $CONPREFIX
You can then use the conda install
or pip
commands to install packages. We recommend the use of conda install
where possible to maintain package
version consistency in the virtual environment. For example:
conda install numpy
or:
pip install numpy
Warning
In the above examples we use the --prefix
option to conda create
This is to ensure that the conda virtual environment is placed in $DATA
. If you ommit
this there is a risk that your environment will be placed in the default location which is $HOME/.conda/envs
this will very likely over time cause you to go over
quota in your $HOME
area which will cause problems running jobs.
Conda Build Scripts
For ease of use, rather than running the conda commands from the command line, we recommend creating a small script to create the environment, such that you can re-
build the environment in future, if required. For example, you could create a file named build_env.sh
with the following contents
!# /bin/bash
# Load the version of Anaconda you need
module load Anaconda3
# Create an environment in $DATA and give it an appropriate name
export CONPREFIX=$DATA/envname
conda create --prefix $CONPREFIX
# Activate your environment
source activate $CONPREFIX
# Install packages...
conda install <packagename>
..
..
You could then run this script with
sh ./build_env.sh
Conda Package Cache
By default Anaconda will cache all packages installed using conda install
into a directory in your $HOME
area named ~/.conda/pkgs
before installing them
into your virtual environment. Over time this has the potential to put you over quota in $HOME
If you find yourself over quota in $HOME
check how much space is being used in ~/.conda/pkgs
cd ~/.conda
du -sh pkgs
The du
command above may take some time to run. When complete, the command will show how much space is in use in pkgs
- for example
12G pkgs
In this case 12GB of space is being used by downloaded packages. To tidy up, run the following commands
module load Anaconda3
conda clean --packages --tarballs
You can repeat the du
command above to check that the space has been freed.
Using Anaconda from within a submission script
In order to use your installed virtual environment from a batch script, you will need to load the appropriate Anconda module and activate your environment. Using values from the above example (and assuming Python version 3, Anaconda 2020/11):
# After SBATCH section of script
module load Anaconda3/2020.11
source activate $DATA/myenv
# Your Python commands here...
Important Anaconda Information
When using Anaconda on the ARC systems, please take note of the following:
Do not load Anaconda virtual environments automatically on log in from your .bashrc or .bash_profile scripts. These will cause issues to SLURM submitted jobs.
Ensure you have deactivated the virtual environment BEFORE submitting a SLURM job using sbatch, otherwise you will have issues with packages from your virtual environment not being found.
You should load all you require from the submission script - as in the submission script example above.
Using Bioconda
Use the instructions above to create a basic Python Anaconda 2 or 3 virtual environment, then use the following commands to ensure the bioconda repostories are enabled:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
Bioconda packages may then be installed by using the conda install
command, for example to install bwa
:
conda install bwa