You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Distributed directed acyclic graph framework for machine learning with UI
The goal of MLComp is to provide tools for training, inferencing, creating complex pipelines
(especially for computer vision) in a rapid, well manageable way.
MLComp is compatible with: Python 3.6+, Unix operation system.
The single file to setup your computer environment is located at ~/mlcomp/configs/.env
ROOT_FOLDER - folder to save MLComp files: configs, db, tasks, etc.
TOKEN - site security token. Please change it to any string
DB_TYPE. Either SQLITE or POSTGRESQL
POSTGRES_DB. PostgreSql db name
POSTGRES_USER. PostgreSql user
POSTGRES_PASSWORD. PostgreSql password
POSTGRES_HOST. PostgreSql host
PGDATA. PostgreSql db files location
REDIS_HOST. Redis host
REDIS_PORT. Redis port
REDIS_PASSWORD. Redis password
WEB_HOST. MLComp site host. 0.0.0.0 means it is available from everywhere
WEB_PORT. MLComp site port
CONSOLE_LOG_LEVEL. log level for output to the console
DB_LOG_LEVEL. log level for output to the database
IP. Ip of a work computer. The work computer must be accessible from other work computers by these IP/PORT
PORT. Port of a work computer. The work computer must be accessible from other work computers by these IP/PORT (SSH protocol)
MASTER_PORT_RANGE. distributed port range for a work computer. 29500-29510 means that if
this work computer is a master in a distributed learning, it will use the first free port
from this range. Ranges of different work computers must not overlap.
NCCL_SOCKET_IFNAME. NCCL network interface.
FILE_SYNC_INTERVAL. File sync interval in seconds. 0 means file sync is off
WORKER_USAGE_INTERVAL. Interval in seconds of writing worker usage to DB
INSTALL_DEPENDENCIES. True/False. Either install dependent libraries or not
SYNC_WITH_THIS_COMPUTER. True/False. If False, all computers except that will not sync with that one
CAN_PROCESS_TASKS. True/False. If false, this computer does not process tasks
You can see your network interfaces with ifconfig command.
Please consider nvidia doc
About
Distributed DAG (Directed acyclic graph) framework for machine learning with UI