Public Data
aifare platform provides a variety of public datasets for users to quickly access and use in AI development and training. This document introduces the main public data directories, usage methods, and best practices.
Main Public Data Directories
| Directory | Description |
|---|---|
/gm-datasets | Platform public datasets, training data |
/gm-models | Platform prebuilt models |
/user-data | User personal data, supports sharing |
How to Use Public Datasets
- Enter the instance and open the terminal or JupyterLab file manager.
- Navigate to the
/gm-datasetsdirectory to view available datasets. - Use the datasets directly in your code or copy them to your working directory as needed.
Example: Load a Dataset in Python
import pandas as pd
# Example: Load a CSV file from the public dataset directory
df = pd.read_csv('/gm-datasets/sample.csv')
print(df.head())
Best Practices
- Use public datasets for model training and testing to save time on data preparation.
- Do not modify or delete files in the public dataset directory; copy them to your own directory if you need to edit.
- For large-scale data processing, copy datasets to the data disk (
/data) for better performance.
Frequently Used Public Datasets
| Dataset Name | Path | Description |
|---|---|---|
| ImageNet | /gm-datasets/imagenet | Image classification |
| COCO | /gm-datasets/coco | Object detection |
| MNIST | /gm-datasets/mnist | Handwritten digit dataset |
| CIFAR-10 | /gm-datasets/cifar10 | Image classification |
| ... | ... | ... |
For more available datasets, please check the /gm-datasets directory in your instance.
Notes
- Public datasets are read-only and cannot be modified or deleted.
- If you need to save processed data, please use your own data directory (
/user-dataor/data).
For more information, please refer to the aifare platform documentation or contact customer support.