본문 바로가기

Data-science/deep learning

[pytorch] DistributedDataParallel vs DataParallel 차이

728x90

The difference between DistributedDataParallel and DataParallel is: DistributedDataParallel uses multiprocessing where a process is created for each GPU, while DataParallel uses multithreading. By using multiprocessing, each GPU has its dedicated process, this avoids the performance overhead caused by GIL of Python interpreter.

If you use DistributedDataParallel, you could use torch.distributed.launch utility to launch your program, see Third-party backends.

DistributedDataParallel : multi processing 이용

DataParallel : multi thread 이용 

python은 GLI 때문에 multi thread로 성능향상을 보기 어려운 걸로 알고 있다. 그러니 DistributedDataParallel를 쓰자.

pytorch.org/docs/stable/notes/cuda.html#cuda-nn-ddp-instead

 

CUDA semantics — PyTorch 1.7.0 documentation

CUDA semantics torch.cuda is used to set up and run CUDA operations. It keeps track of the currently selected GPU, and all CUDA tensors you allocate will by default be created on that device. The selected device can be changed with a torch.cuda.device cont

pytorch.org