Skip to main content

GPU Memory Not Released

1. Check GPU Memory Usage

First, use the following command to check GPU memory usage:

nvidia-smi

If the program has exited but memory is still occupied, it means there are residual processes not terminated.

2. Find and Kill Processes Occupying GPU Memory

Step 1: Find Processes

Use the ps -ef command to view all processes:

ps -ef

Focus on the PID (process ID), PPID (parent process ID), and CMD (command) columns. Usually, only your own training processes (e.g., python train.py) occupy GPU memory.

Step 2: Kill Processes

Suppose the process IDs to kill are 594 and 797, run:

kill -9 594 797

For multi-GPU parallel jobs, there may be many processes. Use the following method to batch kill:

  1. Filter your own processes by keyword (e.g., train):
ps -ef | grep train
  1. Get all related process IDs and batch kill:
ps -ef | grep train | awk '{print $2}' | xargs kill -9

Note: If you see No such process, you can ignore it, as the grep process itself will also be filtered.

3. Linux Pipe Symbol Explanation

The | (pipe) in Linux commands is used to pass the output of the previous command as input to the next command, which is very useful for batch processing and filtering. For example:

  • Find all txt files in a directory:
ls | grep .txt

Pipes are widely used and can greatly improve command-line efficiency.