Public Data

aifare platform provides a variety of public datasets for users to quickly access and use in AI development and training. This document introduces the main public data directories, usage methods, and best practices.

Main Public Data Directories

Directory	Description
`/gm-datasets`	Platform public datasets, training data
`/gm-models`	Platform prebuilt models
`/user-data`	User personal data, supports sharing

How to Use Public Datasets

Enter the instance and open the terminal or JupyterLab file manager.
Navigate to the /gm-datasets directory to view available datasets.
Use the datasets directly in your code or copy them to your working directory as needed.

Example: Load a Dataset in Python

import pandas as pd

# Example: Load a CSV file from the public dataset directory
df = pd.read_csv('/gm-datasets/sample.csv')
print(df.head())

Best Practices

Use public datasets for model training and testing to save time on data preparation.
Do not modify or delete files in the public dataset directory; copy them to your own directory if you need to edit.
For large-scale data processing, copy datasets to the data disk (/data) for better performance.

Frequently Used Public Datasets

Dataset Name	Path	Description
ImageNet	`/gm-datasets/imagenet`	Image classification
COCO	`/gm-datasets/coco`	Object detection
MNIST	`/gm-datasets/mnist`	Handwritten digit dataset
CIFAR-10	`/gm-datasets/cifar10`	Image classification
...	...	...

For more available datasets, please check the /gm-datasets directory in your instance.

Notes

Public datasets are read-only and cannot be modified or deleted.
If you need to save processed data, please use your own data directory (/user-data or /data).

For more information, please refer to the aifare platform documentation or contact customer support.

Main Public Data Directories​

How to Use Public Datasets​

Example: Load a Dataset in Python​

Best Practices​

Frequently Used Public Datasets​

Notes​