데이터 셋 준비
AFHQ는 동물 데이터이다. (개, 고양이, 야생동물 각 5000 정도)
github.com/clovaai/stargan-v2/blob/master/README.md#animal-faces-hq-dataset-afhq
위 사이트에서 데이터 셋을 다운로드한다.
stylegan2-ada를 학습하기 위해선, tf-records 형태로 데이터 셋을 변환해줘야 한다.
github.com/NVlabs/stylegan2-ada
위 사이트에서 code를 clone 한 뒤
아래 명령어를 입력한다.
# python dataset_tool.py create_from_images 생성할경로 카테고리별_이미지경로
python dataset_tool.py create_from_images ~/datasets/afhqcat ~/downloads/afhq/train/cat
python dataset_tool.py create_from_images ~/datasets/afhqdog ~/downloads/afhq/train/dog
python dataset_tool.py create_from_images ~/datasets/afhqwild ~/downloads/afhq/train/wild
python dataset_tool.py display ~/datasets/afhqcat
그러면 위와 같이 tfrecords 형식으로 변환하는 작업이 진행된다.
tfrecords파일이 생성됐다.
위 코드는 argparse 기반이라, jupyter lab실행을 위해 아래 과정을 추가한다.
afhq 논문과 같은 조건은 아래와 같은데,
main 함수 안의 args를 좀 수정해줘야 한다.
import easydict
args = easydict.EasyDict({ "outdir": './output', "data": '../data/afhqdog',
"mirror": True, "cfg": "paper512", "aug": 'ada'})
#----------------------------------------------------------------------------
import sys;
sys.argv=[''];
del sys
main()
#----------------------------------------------------------------------------
github.com/spyder-ide/spyder/issues/3883#issuecomment-269131039
메모리 에러 핸들링
stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type
np.zeros([1<<30, 0]) 이 부분이 메모리 에러를 발생시킨다.
터미널에서 아래 명령어를 치면 0으로 나올 것인데
$ cat /proc/sys/vm/overcommit_memory
0
root 계정으로 아래와 같이 입력하면 위 값이 1로 바뀐다.
$ echo 1 > /proc/sys/vm/overcommit_memory
그러면 np.zeros([1<<30, 0]) 이 부분이 정상 작동한다.
이게 작동하는 이유는 위 명령어가 시스템에 overcommit을 가능하게 하는 것이기 때문이라고 한다.
This will enable "always overcommit" mode, and you'll find that indeed the system will allow you to make the allocation no matter how large it is (within 64-bit memory addressing at least).
train을 실행하면 아래와 같은 text가 출력된다.
Training options:
{
"G_args": {
"func_name": "training.networks.G_main",
"fmap_base": 16384,
"fmap_max": 512,
"mapping_layers": 8,
"num_fp16_res": 4,
"conv_clamp": 256
},
"D_args": {
"func_name": "training.networks.D_main",
"mbstd_group_size": 8,
"fmap_base": 16384,
"fmap_max": 512,
"num_fp16_res": 4,
"conv_clamp": 256
},
"G_opt_args": {
"beta1": 0.0,
"beta2": 0.99,
"learning_rate": 0.0025
},
"D_opt_args": {
"beta1": 0.0,
"beta2": 0.99,
"learning_rate": 0.0025
},
"loss_args": {
"func_name": "training.loss.stylegan2",
"r1_gamma": 0.5
},
"augment_args": {
"class_name": "training.augment.AdaptiveAugment",
"tune_heuristic": "rt",
"tune_target": 0.6,
"apply_func": "training.augment.augment_pipeline",
"apply_args": {
"xflip": 1,
"rotate90": 1,
"xint": 1,
"scale": 1,
"rotate": 1,
"aniso": 1,
"xfrac": 1,
"brightness": 1,
"contrast": 1,
"lumaflip": 1,
"hue": 1,
"saturation": 1
}
},
"num_gpus": 1,
"image_snapshot_ticks": 50,
"network_snapshot_ticks": 50,
"train_dataset_args": {
"path": "../data/afhqdog",
"max_label_size": 0,
"resolution": 512,
"mirror_augment": true
},
"metric_arg_list": [
{
"name": "fid50k_full",
"class_name": "metrics.frechet_inception_distance.FID",
"max_reals": null,
"num_fakes": 50000,
"minibatch_per_gpu": 8,
"force_dataset_args": {
"shuffle": false,
"max_images": null,
"repeat": false,
"mirror_augment": false
}
}
],
"metric_dataset_args": {
"path": "../data/afhqdog",
"max_label_size": 0,
"resolution": 512,
"mirror_augment": true
},
"total_kimg": 25000,
"minibatch_size": 64,
"minibatch_gpu": 8,
"G_smoothing_kimg": 20,
"G_smoothing_rampup": null,
"run_dir": "./output/00002-afhqdog-mirror-paper512"
}
Output directory: ./output/00002-afhqdog-mirror-paper512
Training data: ../data/afhqdog
Training length: 25000 kimg
Resolution: 512
Number of GPUs: 1
Creating output directory...
Loading training set...
Image shape: [3, 512, 512]
Label shape: [0]
Constructing networks...
G Params OutputShape WeightShape
--- --- --- ---
latents_in - (?, 512) -
labels_in - (?, 0) -
G_mapping/Normalize - (?, 512) -
G_mapping/Dense0 262656 (?, 512) (512, 512)
G_mapping/Dense1 262656 (?, 512) (512, 512)
G_mapping/Dense2 262656 (?, 512) (512, 512)
G_mapping/Dense3 262656 (?, 512) (512, 512)
G_mapping/Dense4 262656 (?, 512) (512, 512)
G_mapping/Dense5 262656 (?, 512) (512, 512)
G_mapping/Dense6 262656 (?, 512) (512, 512)
G_mapping/Dense7 262656 (?, 512) (512, 512)
G_mapping/Broadcast - (?, 16, 512) -
dlatent_avg - (512,) -
Truncation/Lerp - (?, 16, 512) -
G_synthesis/4x4/Const 8192 (?, 512, 4, 4) (1, 512, 4, 4)
G_synthesis/4x4/Conv 2622465 (?, 512, 4, 4) (3, 3, 512, 512)
G_synthesis/4x4/ToRGB 264195 (?, 3, 4, 4) (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Conv1 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Upsample - (?, 3, 8, 8) -
G_synthesis/8x8/ToRGB 264195 (?, 3, 8, 8) (1, 1, 512, 3)
G_synthesis/16x16/Conv0_up 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Conv1 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Upsample - (?, 3, 16, 16) -
G_synthesis/16x16/ToRGB 264195 (?, 3, 16, 16) (1, 1, 512, 3)
G_synthesis/32x32/Conv0_up 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Conv1 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Upsample - (?, 3, 32, 32) -
G_synthesis/32x32/ToRGB 264195 (?, 3, 32, 32) (1, 1, 512, 3)
G_synthesis/64x64/Conv0_up 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Conv1 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Upsample - (?, 3, 64, 64) -
G_synthesis/64x64/ToRGB 264195 (?, 3, 64, 64) (1, 1, 512, 3)
G_synthesis/128x128/Conv0_up 1442561 (?, 256, 128, 128) (3, 3, 512, 256)
G_synthesis/128x128/Conv1 721409 (?, 256, 128, 128) (3, 3, 256, 256)
G_synthesis/128x128/Upsample - (?, 3, 128, 128) -
G_synthesis/128x128/ToRGB 132099 (?, 3, 128, 128) (1, 1, 256, 3)
G_synthesis/256x256/Conv0_up 426369 (?, 128, 256, 256) (3, 3, 256, 128)
G_synthesis/256x256/Conv1 213249 (?, 128, 256, 256) (3, 3, 128, 128)
G_synthesis/256x256/Upsample - (?, 3, 256, 256) -
G_synthesis/256x256/ToRGB 66051 (?, 3, 256, 256) (1, 1, 128, 3)
G_synthesis/512x512/Conv0_up 139457 (?, 64, 512, 512) (3, 3, 128, 64)
G_synthesis/512x512/Conv1 69761 (?, 64, 512, 512) (3, 3, 64, 64)
G_synthesis/512x512/Upsample - (?, 3, 512, 512) -
G_synthesis/512x512/ToRGB 33027 (?, 3, 512, 512) (1, 1, 64, 3)
--- --- --- ---
Total 30276583
D Params OutputShape WeightShape
--- --- --- ---
images_in - (?, 3, 512, 512) -
labels_in - (?, 0) -
512x512/FromRGB 256 (?, 64, 512, 512) (1, 1, 3, 64)
512x512/Conv0 36928 (?, 64, 512, 512) (3, 3, 64, 64)
512x512/Conv1_down 73856 (?, 128, 256, 256) (3, 3, 64, 128)
512x512/Skip 8192 (?, 128, 256, 256) (1, 1, 64, 128)
256x256/Conv0 147584 (?, 128, 256, 256) (3, 3, 128, 128)
256x256/Conv1_down 295168 (?, 256, 128, 128) (3, 3, 128, 256)
256x256/Skip 32768 (?, 256, 128, 128) (1, 1, 128, 256)
128x128/Conv0 590080 (?, 256, 128, 128) (3, 3, 256, 256)
128x128/Conv1_down 1180160 (?, 512, 64, 64) (3, 3, 256, 512)
128x128/Skip 131072 (?, 512, 64, 64) (1, 1, 256, 512)
64x64/Conv0 2359808 (?, 512, 64, 64) (3, 3, 512, 512)
64x64/Conv1_down 2359808 (?, 512, 32, 32) (3, 3, 512, 512)
64x64/Skip 262144 (?, 512, 32, 32) (1, 1, 512, 512)
32x32/Conv0 2359808 (?, 512, 32, 32) (3, 3, 512, 512)
32x32/Conv1_down 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
32x32/Skip 262144 (?, 512, 16, 16) (1, 1, 512, 512)
16x16/Conv0 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
16x16/Conv1_down 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
16x16/Skip 262144 (?, 512, 8, 8) (1, 1, 512, 512)
8x8/Conv0 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
8x8/Conv1_down 2359808 (?, 512, 4, 4) (3, 3, 512, 512)
8x8/Skip 262144 (?, 512, 4, 4) (1, 1, 512, 512)
4x4/MinibatchStddev - (?, 513, 4, 4) -
4x4/Conv 2364416 (?, 512, 4, 4) (3, 3, 513, 512)
4x4/Dense0 4194816 (?, 512) (8192, 512)
Output 513 (?, 1) (512, 1)
--- --- --- ---
Total 28982849
Exporting sample images...
outdir로 지정된 곳에 보면 training_options.json 파일이 있다.
실행해보면 아래와 같이 이쁘게 학습 관련 하이퍼 파라미터들이 출력된다.
pretrained model파일
아래에 보면 pretrained model파일이 pkl파일로 있다. 이를 활용하면 더 빠르게 학습 가능할 듯하다.
nvlabs-fi-cdn.nvidia.com/stylegan2-ada/pretrained/