[pytorch] inference시 memory leak, 메모리 과부하 문제

728x90

몇 시간은 삽질한 문제이다....

모델을 학습하는 것도 아닌, inference 하는 중인데 메모리가 계속 부족했다.

대체 이해가 안됐다.

0. model.eval() 하기

1. 그래서 tensor 안 쓰는 건 바로 del하고 gpu cache를 비우기

2. del하고 garbage collection을 소환 그다음 gpu cache를 비우기

3. 장치를 cuda에서 cpu로 전환 후 위 1, 2 과정을 시도하기

import gc
def memorydel(listobj):
    try:
        for obj in listobj:
            del obj
    except Exception as e:
        print(e)
    try:
        del listobj
    except Exception as e:
        print(e)
    gc.collect()
    torch.cuda.empty_cache()
def memorydel_all(listobj):
    for x in listobj:
        try:
            memorydel(x)
        except Exception as e:
            print(e)
    gc.collect()
    torch.cuda.empty_cache()
def switch_gpu_to_cpu(listobj):
    for x in listobj:
        try:
            x = x.to('cpu')
        except Exception as e:
            print(e)
        for y in x:
            try:
                y = y.to('cpu')
            except Exception as e:
                print(e)

이렇게 수 많은 방법을 시도했다.

그래도 안돼서 검색해본 결과.

with torch.no_grad():
    for idx, (_key, batch_sampler) in enumerate(sampler_dict.items()):
        num_img = 0
        out_codes = []
        output_images = []
        for imgs in batch_sampler:
        	# inference 코드
        	classifier(imgs.cuda())

with torch.no_grad()이 한줄을 넣으니 모든게 해결됐다. 와... 속 시원하다..

왜 그런걸까?

inference, validation 시에는 gradient 계산을 하지 않는다. 그래서 이 함수는 인퍼런스시에 유용하고 메모리 소비를 감소시켜준다고 한다.

eval은 왜 안 된걸까?

eval은 특정 모듈들 (dropout이나 batch normalization 등)의 모드를 변경하는 역할을 한다. 이 모듈들은 학습할 때 작동하는 방식과 추론(inference)할 때 작동하는 방식이 다르다. 그걸 조정해주는 거다.

즉, 메모리와는 관련없다.

#torch.no_grad() #model.eval()

github.com/pytorch/pytorch/issues/29893

Memory leak when evaluating model on CPU with dynamic size tensor input. · Issue #29893 · pytorch/pytorch

🐛 Bug To Reproduce Steps to reproduce the behavior: Make a simple network. Change a model to eval mode (with torch.no_grad()) Evaluate model with dynamic size input. CPU memory increases a lot. I a...

github.com

discuss.pytorch.org/t/memory-leaks-at-inference/85108/11

Memory leaks at inference

I found out the reason of the memory growing… It happens when inputs have different sizes. The following code is with detectron2 but previous model works in the same way. import detectron2 from detectron2.evaluation import COCOEvaluator, inference_on_dat

discuss.pytorch.org

'Data-science > deep learning' 카테고리의 다른 글

nvidia-smi랑 nvcc --version이 다를 때 (0)	2021.01.24
Closed-Form Factorization of Latent Semantics in GANs 논문 설명 (0)	2021.01.23
[petcon] stylegan2 distillation 찾다가 다른 좋은 거 찾음 encoding + distillation 한 번에 (0)	2021.01.09
[pytorch] DistributedDataParallel vs DataParallel 차이 (0)	2021.01.04
2020.12.31 회의록 (petcon stylegan2 학습 진행 과정2, 할 일, 관련 논문) (0)	2021.01.01

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

성장하는 나날들

[pytorch] inference시 memory leak, 메모리 과부하 문제

왜 그런걸까?

eval은 왜 안 된걸까?

'Data-science > deep learning' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

[pytorch] inference시 memory leak, 메모리 과부하 문제

왜 그런걸까?

eval은 왜 안 된걸까?

'Data-science > deep learning' 카테고리의 다른 글

'Data-science/deep learning' Related Articles

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역