AIBENCH TRAINING: BALANCED INDUSTRY-STANDARD AI TRAINING BENCHMARKING - Dria

Join the Network

Created at 11pm, Mar 10

Artificial Intelligence

0

AIBENCH TRAINING: BALANCED INDUSTRY-STANDARD AI TRAINING BENCHMARKING

KD2xzpbXSLxdU0D7fVHYZ146u-t7Dq1HDuD7IcGcq-s

File Type

PDF

Entry Count

89

Embed. Model

jina_embeddings_v2_base_en

Index Type

hnsw

Earlier-stage evaluations of a new AI architecture/system need affordableAI benchmarks. Only using a few AI component benchmarks like MLPerfalone in the other stages may lead to misleading conclusions. Moreover, thelearning dynamics are not well understood, and the benchmarks’ shelf-life isshort. This paper proposes a balanced benchmarking methodology. We usereal-world benchmarks to cover the factors space that impacts the learningdynamics to the most considerable extent. After performing an exhaustivesurvey on Internet service AI domains, we identify and implement nineteenrepresentative AI tasks with state-of-the-art models. For repeatable performance ranking (RPR subset) and workload characterization (WC subset), wekeep two subsets to a minimum for affordability. We contribute by far themost comprehensive AI training benchmark suite.The evaluations show: (1) AIBench Training (v1.1) outperforms MLPerfTraining (v0.7) in terms of diversity and representativeness of model complexity, computational cost, convergent rate, computation, and memory accesspatterns, and hotspot functions; (2) Against the AIBench full benchmarks,its RPR subset shortens the benchmarking cost by 64%, while maintainingthe primary workload characteristics; (3) The performance ranking showsthe single-purpose AI accelerator like TPU with the optimized TensorFlowframework performs better than that of GPUs while losing the latter’s general support for various AI models. The specification, source code, andperformance numbers are available from the AIBench homepage https://www.benchcouncil.org/aibench-training/index.html.

(recurrent), 25.25 (BLEU) for Translation (nonrecurrent). Note that AIBench and MLPerf use the same models and datasets for Image Classication, NLP, and Advertising (Recommendation task in MLPerf), so their numbers for these tasks are consistent in the rest of this paper.

id: 137248d852036394068ccaabb0635cc6 - page: 18

Fig. 2 shows the model characteristics. From the computation cost perspective, AIBench ranges from 0.09 to 282830 M-FLOPs, while MLPerf varies from 0.213248 to 24500 M-FLOPsa much narrower range. From the perspective of model complexity, the number of learnable parameters of AIBench ranges from 0.03 million to 68.4 million, while MLPerf only covers a range of 5.2 to 49.53 million. From the convergent rate perspective, the required epochs of AIBench range from 6 to 96, while MLPerf only covers a range of 3 to 49. Thus, only using MLPerf cannot cover the diversities of different AI models. Object Detection and 3D Object Reconstruction have the gigantic FLOPs, while Learning-to-Rank has the smallest. Image-to-Text is the most complex model, while the Spatial Transformer is the least. Text-to-text translation requires the largest epochs to converge, while the remaining models converge within 60 epochs. 16

id: 212f7592e1be759528035ebd0d279892 - page: 18

Translation(recurrent)Translation(non-recurrent)1M10M75M5M50M25M-200204060801001200.011100100001000000Number of Epochs to Convergent QualityOperations/M-FLOPs Learning to rank 3D Face Recognition 3D object reconstructionText summarizationSpatial transformer Object detection Face embedding Object detection(heavy) Speech recognition Image classificationText-to-Text translationImage-to-Text Object detection(light)

id: f75768349dda8e235c7014de4925e03d - page: 19

RecommendationImage compression Figure 2: The Comparisons of AIBench against MLPerf from the Perspectives of Model Complexity, Computational Cost, and Convergent Rate. Then we further investigate the optimizer and loss function categories. From the perspective of optimizers, both AIBench and MLPerf cover ve optimizers, which are adam, adamw, RMSprop, SGD, and lamb for AIBench, and adam, lamb, lars, lazy, and SGD for MLPerf. From the perspective of loss function categories, AIBench covers fourteen loss functions, including BCELoss, BCEWithLogitsLoss, ChamferLoss, CrossEntropyLoss, CTCLoss, GANLoss, L1Loss, NLLLoss, SigmoidLogLoss, SmoothL1Loss, SoftmaxLoss, TripletLoss, Earth-Mover distance, and rst order Euclidean distance, while MLPerf only covers six loss functions, which are BCELoss, CrossEntropyLos, LogLoss, SmoothL1Loss, SoftmaxLoss and VirtualLoss.

id: e18ebd4a38aa5971a29bae6e3cf2ce6e - page: 19

How to Retrieve?

# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "KD2xzpbXSLxdU0D7fVHYZ146u-t7Dq1HDuD7IcGcq-s", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "KD2xzpbXSLxdU0D7fVHYZ146u-t7Dq1HDuD7IcGcq-s", "level": 2}'