TonY: A framework to natively run deep learning frameworks on Apache Hadoop.

 TonY is a Linux AI & Data Foundation incubation project licensed under the Apache 2.0 license. It’s a native framework to make machine learning jobs run reliably and flexibly on Hadoop Yarn.

Features

Multiple Frameworks

TonY supports different frameworks including TensorFlow, PyTorch, MXNet and Horovod. And it enables running either single node or distributed training as a Hadoop application on YARN. Further, more frameworks will be involved in TonY, like Ray.

Multiple Training Modes

TonY supports multiple training modes, including Worker + ParameterServer, Ring-All-Reduce and so on. It supports not only multiple training modes, but also parallel tasks which are no communication in offline inference scenarios.

Easy to use


1. Support sidecar and inline tensorboards when running training jobs and help user to visit tensorboard site directly
2. Provide TonY portal to orchestrate many running distributed training jobs
3. Provide TonY cli to facilitate users to submit training jobs
4. TonY not only supports the docker runtime on higher-version of Hadoop, but also is compatible with older-version hadoop with help of TonY python virtual environment mechanism.

Heterogeneous resources

TonY supports heterogeneous resources, including CPU and GPU on Hadoop YARN. And it provides the resource utilization monitoring so that users can reasonably adjust the amount of training resources.

Easy to extend

Our generic training frameworks runtime interface design allows user to support other deep learning frameworks. Further, advanced users could run any job on YARN which is unlimited to deep-learning frameworks.

Existing Yarn features to leverage

1. Stable Hadoop scheduler
2. Team-based and hierarchical queues
3. Elasticity between queues
4. User-based limits

Join the Conversation

TonY maintains three mailing lists. You are invited to join the one that best meets your interest.