Linux Basics
aifare platform instances use Linux (Ubuntu distribution) as the default operating system. Mastering basic Linux commands is essential for efficient AI development and model training. Below are commonly used commands and typical scenarios on the platform.
File and Directory Operations
List Files/Directories
ls: List files and directories in the current directoryls -l: Show detailed information (permissions, owner, size, time, etc.)
ls
ls -l
Create/Switch Directory
mkdir: Create a new directorycd: Change directory
mkdir data_dir
cd data_dir
cd ../data_dir # Enter data_dir under the parent directory
View Current Path
pwd: Display the current working directory
pwd
Rename/Move File or Directory
mv: Move or rename
mv old_name new_name
mv file.txt /data/
Copy Files/Folders
cp: Copy filescp -r: Recursively copy folders
cp file.txt /data/
cp -r myfolder /user-data/
Delete Files/Folders
rm -rf: Recursively and forcibly delete
rm -rf temp_dir
rm -rf /data/* # Delete all contents under /data
Environment Variable Settings
export: Set environment variables
export PATH=/opt/miniconda3/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
- View environment variables:
env | grep PATH
- Global effect: Write the export command into
~/.bashrc, then runsource ~/.bashrc
Text Editing
- It is recommended to use the
vimeditor. For more advanced usage, refer to related tutorials.
Compression and Decompression
zip/unzip: Compress and decompress in zip formattar: General compression/decompression tool
zip -r data.zip /data/
unzip data.zip
tar czf data.tar.gz /data/
tar xzf data.tar.gz
View GPU Information
nvidia-smi: View GPU status, memory usage, driver version, etc.
nvidia-smi
Process Management
ps -ef: View all processeskill -9 PID: Force kill a process
ps -ef | grep python
kill -9 12345
View CPU/Memory Usage
top: Real-time view of CPU, memory, and process resource usage
top
Log Redirection and Background Running
>: Redirect logs to a file2>&1: Merge standard output and error output&: Run in the background
python train.py > train.log 2>&1 &
cat train.log
Common Scenario Examples
1. GPU Memory Not Released
- Phenomenon: The program has stopped but GPU memory is still occupied
- Solution: Use
ps -efto find residual processes,kill -9to kill them, then check memory withnvidia-smi
2. Data/Model Sharing Across Instances
- Requirement: Save models or data to the
/user-datadirectory for sharing across multiple instances
cp -r model.pth /user-data/
3. Process Killed Due to Memory Overuse
- Phenomenon: The process is terminated by the system with a "Killed" message
- Solution: Use
topto check memory usage, optimize code, or upgrade the instance configuration
4. Run Daemon Process in JupyterLab Terminal
- Requirement: Logs can still be viewed after closing the web page
- Solution: Redirect logs to a file and run in the background
python train.py > train.log 2>&1 &
For more Linux tips, please refer to the aifare platform documentation or community resources.