Skip to main content

Public Data

aifare platform provides a variety of public datasets for users to quickly access and use in AI development and training. This document introduces the main public data directories, usage methods, and best practices.

Main Public Data Directories

DirectoryDescription
/gm-datasetsPlatform public datasets, training data
/gm-modelsPlatform prebuilt models
/user-dataUser personal data, supports sharing

How to Use Public Datasets

  1. Enter the instance and open the terminal or JupyterLab file manager.
  2. Navigate to the /gm-datasets directory to view available datasets.
  3. Use the datasets directly in your code or copy them to your working directory as needed.

Example: Load a Dataset in Python

import pandas as pd

# Example: Load a CSV file from the public dataset directory
df = pd.read_csv('/gm-datasets/sample.csv')
print(df.head())

Best Practices

  • Use public datasets for model training and testing to save time on data preparation.
  • Do not modify or delete files in the public dataset directory; copy them to your own directory if you need to edit.
  • For large-scale data processing, copy datasets to the data disk (/data) for better performance.

Frequently Used Public Datasets

Dataset NamePathDescription
ImageNet/gm-datasets/imagenetImage classification
COCO/gm-datasets/cocoObject detection
MNIST/gm-datasets/mnistHandwritten digit dataset
CIFAR-10/gm-datasets/cifar10Image classification
.........

For more available datasets, please check the /gm-datasets directory in your instance.

Notes

  • Public datasets are read-only and cannot be modified or deleted.
  • If you need to save processed data, please use your own data directory (/user-data or /data).

For more information, please refer to the aifare platform documentation or contact customer support.