Other Frequently Asked Questions
Q: Why does my program hang with no output?
A: First, use the top and nvidia-smi commands to check CPU and GPU usage, respectively. If the CPU is always at 100% and the GPU is not used, it is likely stuck at GPU invocation. Please refer to the "Unable to Use GPU" FAQ. If not, add print logs to key lines to locate where it hangs, then analyze the error or logs for details.
Q: What causes CUDA OOM (out of memory)?
A:
- If the program reports OOM, try setting the batch size to 1 and gradually increase it to find the maximum value, then decide whether to upgrade or use a GPU with more memory.
- If the first run is normal but the second run reports OOM, use
nvidia-smito check memory usage when idle. If there are residual processes, useps -efto find the PID andkill -9 PIDto clean up. - If there are no residual processes, it may be due to dynamic memory allocation by the framework. Analyze according to your code.
Q: What if there are not enough free GPUs on the host?
A:
- You can start in no-GPU mode for data download and other operations.
- You can migrate the instance to another host.
- Or wait for GPU resources to be released on the current host.
Q: After changing the image, why can't I connect via VSCode or SSH?
A:
- Linux/Mac users: Delete the local
~/.ssh/known_hostsfile. - Windows users: Delete
C:/Users/your-username/.ssh/known_hosts. - Try again after deleting.
Q: Can coupons be used for yearly/monthly packages?
A: Some coupons can be used. Please check the coupon usage scope. Coupons can be stacked and are used before the balance is deducted.
Q: Will GPUs be reserved for yearly/monthly instances after shutdown?
A: During the subscription period, GPUs are reserved and can be restarted at any time without worrying about being occupied.
Q: Does a multi-GPU instance support parallelism?
A: Multiple GPUs in the same instance are on the same physical host and support multi-GPU parallelism. For multi-node multi-GPU parallelism, please contact customer service.
Q: How is billing handled if the GPU price changes for pay-as-you-go instances?
A: Pay-as-you-go instances are billed at the price at startup. Price changes during runtime do not affect the current instance. After restarting, the latest price applies.
Q: Can data be recovered from released instances?
A: No, it cannot be recovered.
Q: What if the host has disk or GPU failures?
A: You can migrate the instance to another host or wait for repairs. The platform will compensate as appropriate.
Q: Will data on the instance be accidentally lost or corrupted?
A: The local data disk is a physical disk with no redundant backup, so there is a risk of data loss. Please back up important data in time. Shared cloud disks use multi-replica redundancy and are highly reliable.
Q: Will running programs in JupyterLab be affected if I close the browser or log out?
A: No, but it is recommended to redirect logs to a file for later review. See related documentation for details.
Q: How can I ensure programs started via SSH are not terminated if the connection drops?
A: It is recommended to use the JupyterLab terminal or tools like screen/tmux. See the daemon process documentation for details.
Q: Why does my program show 'Killed' and stop?
A: The program was terminated by the system due to memory overuse. Check memory usage in the instance monitoring panel. The solution is to upgrade or use a host with more memory.