**Author:** [[https://unit.link/dmytro-tymoshenko|Dmytro Tymoshenko]] | **Last update:** November 10, 2025 {{:surprised_pikachu_perun.jpg?300|}} If you've been wondering why your GPU jobs aren't running or you're seeing them on some random nodes instead of perun24, this guide will help you switch to the proper workflow. ==== Quick Start (TLDR) ==== Currently, we have only 1 GPU node: ''perun24''. [[ram_and_rom|Information about all nodes can be found here]]. Key information about it: > CPU: 24 threads > H100 NVL (94GB VRAM) > 512 GB RAM To work with the GPU node, either: * In your script header, define submission queue as ''#$ -q GPU''. * SSH directly to ''perun24'' for //interactive sessions//. ⚠️**High risk**, not recommended unless you know what you are doing: not visible from the queue, high probability of multiple CPU/GPU/RAM issues if multiple people are using the node. **Pro tips:** * add ''echo "Hostname: $(hostname)"'' to your script for the node name to appear in the output. * Not sure about GPU load by other users? Run ''nvidia-smi'' in the interactive session to check GPU utilization. ==== Understanding the Current GPU Setup ==== === Good Ol' Days === When the GPU node was first introduced, we had to SSH to it and submit jobs directly from there — it was isolated in its system: the GPU node's queue was not visible from other cluster nodes and vice versa. Queue management was fragmented between the main cluster and the GPU node, resulting in limited monitoring, management and submission opportunities. \\ === Current setup === In the current implementation, perun24 is fully integrated into the main SGE scheduler and has its queue, as visible in ''qconf -sql''. No more SSH juggling between nodes and proper monitoring, which results in flexible management of scripts. Now, to request a GPU node you can: * Modify header of your script (recommended) so you can submit it as ''qsub '': \\ #!/bin/bash #$ -S /bin/bash #$ -cwd #$ -q GPU # This line will define the needed queue ... * Submit a script with the following command if the header of the script is not modified: \\ > qsub -q GPU === How Can I Benefit From the New Setup? === For example, lets imagine we have a pipeline of two scripts, ''script_1.sh'' and ''script_2.sh'', where ''script_2.sh'' works on the output of ''script_1.sh''. ''script_1.sh'' can be run on CPU/requires extensive resources not available on perun24/or will run for an extremely long time, and ''script_2.sh'' needs GPU. You don't want hogging GPU node for ''script_1.sh'', and appropriate queues were defined in the headers of the scripts. Now you can maximize efficiency and submit them simultaneously with ''-hold_jid'': username@perun: qsub script_1.sh username@perun: qstat | grep $USER # We are looking for ID value from job-ID column job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 500 0.54090 script_1.sh username r 10/10/2025 12:12:12 2T-batch@perun23 15 username@perun: qsub -hold_jid 500 script2.sh Or using ''-terse'': username@perun: job1_id=$(qsub -terse script_1.sh) # Submits script_1.sh and assigns job-ID to job1_id username@perun: qsub -hold_jid $job1_id script_2.sh This will result in ''script_2.sh'' running when ''script_1.sh'' will finish. ==== Existing Limitations ==== * The current scheduler uses queue-based GPU access rather than resource flags. Commands like ''#$ -l gpu=1'' will not work — you can verify this by running ''qhost -F gpu'', which shows GPU resources aren't configured for direct resource requests. Instead, use the mentioned ''#$ -q GPU'' queue as described above. ====Contacts==== Questions? Suggestions? Assistance? > [[https://unit.link/dmytro-tymoshenko|Dmytro Tymoshenko]]