GPU Memory Not Released
1. Check GPU Memory Usage
First, use the following command to check GPU memory usage:
nvidia-smi
If the program has exited but memory is still occupied, it means there are residual processes not terminated.
2. Find and Kill Processes Occupying GPU Memory
Step 1: Find Processes
Use the ps -ef command to view all processes:
ps -ef
Focus on the PID (process ID), PPID (parent process ID), and CMD (command) columns. Usually, only your own training processes (e.g., python train.py) occupy GPU memory.
Step 2: Kill Processes
Suppose the process IDs to kill are 594 and 797, run:
kill -9 594 797
For multi-GPU parallel jobs, there may be many processes. Use the following method to batch kill:
- Filter your own processes by keyword (e.g., train):
ps -ef | grep train
- Get all related process IDs and batch kill:
ps -ef | grep train | awk '{print $2}' | xargs kill -9
Note: If you see
No such process, you can ignore it, as the grep process itself will also be filtered.
3. Linux Pipe Symbol Explanation
The | (pipe) in Linux commands is used to pass the output of the previous command as input to the next command, which is very useful for batch processing and filtering. For example:
- Find all txt files in a directory:
ls | grep .txt
Pipes are widely used and can greatly improve command-line efficiency.