일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | |||||
3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 | 11 | 12 | 13 | 14 | 15 | 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 |
24 | 25 | 26 | 27 | 28 | 29 | 30 |
31 |
- 파이토치
- NERF
- 융합연구
- ICCV
- Vae
- pytorch
- 경희대
- linux
- docker
- IROS
- panoptic nerf
- Paper
- 논문 리뷰
- ICCV 2021
- GAN
- Deep Learning
- CVPR
- 딥러닝
- NeRF paper
- 논문
- Python
- panoptic segmentation
- Semantic Segmentation
- 논문리뷰
- 2022
- paper review
- 리눅스
- CVPR2023
- Computer Vision
- Neural Radiance Field
- Today
- Total
윤제로의 제로베이스
[MMSeg] 적응하기... 본문
mmseg를 사용한 논문 github을 보게 되었는데 실질적으로 코드는 별로 안 짜시는 것 같고 거의 config file가지고 하시는 거 같아서 신기해서 한 번 해보고 싶다는 생각에 시도해보는 중,,
설치
conda create --name openmmlab python=3.8 -y
conda activate openmmlab
conda install pytorch torchvision -c pytorch
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"
git clone -b main https://github.com/open-mmlab/mmsegmentation.git #main branch를 HEAD로 clone
cd mmsegmentation
pip install -v -e .
# '-v': 함수 수행 시 발생하는 정보들을 표준 출력으로 내보냄
# '-e': 코드 수정이 가능한 모드로 설치
Config 파일 알아보기...
MMSeg는 config 파일을 기반으로 실행된다.
dataset, model, scheduler 등의 정보를 config 파일에 작성해서 실행한다.
configs - _base_에는 datasets, models, schedules의 3개의 폴더와 default_runtime.py가 있다.
- datasets: 데이터에 따른 pipeline과 dataloader 구성 정보
- models: 모델 구성 정보
- schedules: optimizer 구성 정보
- default_runtime.py: 실행 환경에 대한 기본값.
_base_ 의 component들로 구성된 config를 primitive 라고 하며, 같은 폴더 안의 config 파일들에 대해서는 오직 하나의 primitive config 파일만 있는 것을 권장한다.
예를 들어 Deeplabv3 모델 구조를 변경하여 학습하는 상황이라고 해보자
1. _base_ > models > deeplabv3 안에 있는 모델 중 하나를 상속한다.
_base_ = ../deeplabv3/deeplabv3_r50-d8_4xb2-40k_cityscapes-512x1024.py
2. 변경할 구조를 덧씌우는 형태로 새로운 config 파일을 작성한다.
3. _base_ > models 안에 없는 새로운 모델을 구성할 경우 folder를 새로 만들고 그 안에 configs 파일을 생성한다.
Config Name Style
{algorithm name}_{model component names [component1]_[component2]_[...]}_{training settings}_{training dataset information}_{testing dataset information}
- {algorithm name} : 알고리즘 이름 deeplabv3, pspnet ...
- {model component names} : 사용된 component의 이름 backbone, head... For example, r50-d8 은 ResNet50 backbone에 8배 downsampling된 output이 나온다.
- {training settings} : batch size, augmentations, loss, learning rate scheduler, epochs/iterations같은 training setting 정보. For example, 4xb4-ce-linearlr-40K 는 4개 gpu-CrossEntropy loss-Linear learning rate schedulear-train 40k iterations
- {gpu x batch_per_gpu} : GPUs and samples per GPU. bN은 N batch size per GPU. 8xb2면 8개 GPU에다가 2-images-per-gpu인거임.
- {shedule} : iteration을 말하는거임
- {training data infromation} :Training dataset 이름
예를 들면
config > pspnet > pspnet_r50-d8_4xb2-80k_cityscapes-512x1024-dark_zurich-1920x1080.py
형식으로 config 이름이 생성된다.
_base_ = [
'../_base_/models/pspnet_r50-d8.py', '../_base_/datasets/cityscapes.py',
'../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py'
]
crop_size = (512, 1024)
data_preprocessor = dict(size=crop_size)
model = dict(data_preprocessor=data_preprocessor)
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(1920, 1080), keep_ratio=True),
# add loading annotation after ``Resize`` because ground truth
# does not need to do resize data transform
dict(type='LoadAnnotations'),
dict(type='PackSegInputs')
]
test_dataloader = dict(
dataset=dict(
type='DarkZurichDataset',
data_root='data/dark_zurich/',
data_prefix=dict(
img_path='rgb_anon/val/night/GOPR0356',
seg_map_path='gt/val/night/GOPR0356'),
pipeline=test_pipeline))
_base_/models/pspnet_r50-d8.py는 PSPNet ResNet50V1c를 사용하는 basic model 이다.
# model settings
norm_cfg = dict(type='SyncBN', requires_grad=True) # Segmentation usually uses SyncBN
data_preprocessor = dict( # The config of data preprocessor, usually includes image normalization and augmentation.
type='SegDataPreProcessor', # The type of data preprocessor.
mean=[123.675, 116.28, 103.53], # Mean values used for normalizing the input images.
std=[58.395, 57.12, 57.375], # Standard variance used for normalizing the input images.
bgr_to_rgb=True, # Whether to convert image from BGR to RGB.
pad_val=0, # Padding value of image.
seg_pad_val=255) # Padding value of segmentation map.
model = dict(
type='EncoderDecoder', # Segmentor 이름
data_preprocessor=data_preprocessor,
pretrained='open-mmlab://resnet50_v1c', # pretrained 모델을 가져옴.
backbone=dict(
type='ResNetV1c', # backbone
depth=50, # Depth of backbone. Normally 50, 101 are used.
num_stages=4, # Number of stages of backbone.
out_indices=(0, 1, 2, 3), # The index of output feature maps produced in each stages.
dilations=(1, 1, 2, 4), # The dilation rate of each layer.
strides=(1, 2, 1, 1), # The stride of each layer.
norm_cfg=norm_cfg, # The configuration of norm layer.
norm_eval=False, # Whether to freeze the statistics in BN
style='pytorch', # The style of backbone, 'pytorch' means that stride 2 layers are in 3x3 conv, 'caffe' means stride 2 layers are in 1x1 convs.
contract_dilation=True), # When dilation > 1, whether contract first layer of dilation.
decode_head=dict(
type='PSPHead', # Type of decode head.
in_channels=2048, # Input channel of decode head.
in_index=3, # The index of feature map to select.
channels=512, # The intermediate channels of decode head.
pool_scales=(1, 2, 3, 6), # The avg pooling scales of PSPHead. Please refer to paper for details.
dropout_ratio=0.1, # The dropout ratio before final classification layer.
num_classes=19, # Number of segmentation class. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k.
norm_cfg=norm_cfg, # The configuration of norm layer.
align_corners=False, # The align_corners argument for resize in decoding.
loss_decode=dict( # Config of loss function for the decode_head.
type='CrossEntropyLoss', # Type of loss used for segmentation.
use_sigmoid=False, # Whether use sigmoid activation for segmentation.
loss_weight=1.0)), # Loss weight of decode_head.
auxiliary_head=dict(
type='FCNHead', # Type of auxiliary head. Please refer to mmseg/models/decode_heads for available options.
in_channels=1024, # Input channel of auxiliary head.
in_index=2, # The index of feature map to select.
channels=256, # The intermediate channels of decode head.
num_convs=1, # Number of convs in FCNHead. It is usually 1 in auxiliary head.
concat_input=False, # Whether concat output of convs with input before classification layer.
dropout_ratio=0.1, # The dropout ratio before final classification layer.
num_classes=19, # Number of segmentation class. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k.
norm_cfg=norm_cfg, # The configuration of norm layer.
align_corners=False, # The align_corners argument for resize in decoding.
loss_decode=dict( # Config of loss function for the auxiliary_head.
type='CrossEntropyLoss', # Type of loss used for segmentation.
use_sigmoid=False, # Whether use sigmoid activation for segmentation.
loss_weight=0.4)), # Loss weight of auxiliary_head.
# model training and testing settings
train_cfg=dict(), # train_cfg is just a place holder for now.
test_cfg=dict(mode='whole')) # The test mode, options are 'whole' and 'slide'. 'whole': whole image fully-convolutional test. 'slide': sliding crop window on the image.
_base_/datasets/cityscapes.py은 다음과 같다.
# dataset settings
dataset_type = 'CityscapesDataset' # Dataset type, this will be used to define the dataset.
data_root = 'data/cityscapes/' # Root path of data.
crop_size = (512, 1024) # The crop size during training.
train_pipeline = [ # Training pipeline.
dict(type='LoadImageFromFile'), # First pipeline to load images from file path.
dict(type='LoadAnnotations'), # Second pipeline to load annotations for current image.
dict(type='RandomResize', # Augmentation pipeline that resize the images and their annotations.
scale=(2048, 1024), # The scale of image.
ratio_range=(0.5, 2.0), # The augmented scale range as ratio.
keep_ratio=True), # Whether to keep the aspect ratio when resizing the image.
dict(type='RandomCrop', # Augmentation pipeline that randomly crop a patch from current image.
crop_size=crop_size, # The crop size of patch.
cat_max_ratio=0.75), # The max area ratio that could be occupied by single category.
dict(type='RandomFlip', # Augmentation pipeline that flip the images and their annotations
prob=0.5), # The ratio or probability to flip
dict(type='PhotoMetricDistortion'), # Augmentation pipeline that distort current image with several photo metric methods.
dict(type='PackSegInputs') # Pack the inputs data for the semantic segmentation.
]
test_pipeline = [
dict(type='LoadImageFromFile'), # First pipeline to load images from file path
dict(type='Resize', # Use resize augmentation
scale=(2048, 1024), # Images scales for resizing.
keep_ratio=True), # Whether to keep the aspect ratio when resizing the image.
# add loading annotation after ``Resize`` because ground truth
# does not need to do resize data transform
dict(type='LoadAnnotations'), # Load annotations for semantic segmentation provided by dataset.
dict(type='PackSegInputs') # Pack the inputs data for the semantic segmentation.
]
train_dataloader = dict( # Train dataloader config
batch_size=2, # Batch size of a single GPU
num_workers=2, # Worker to pre-fetch data for each single GPU
persistent_workers=True, # Shut down the worker processes after an epoch end, which can accelerate training speed.
sampler=dict(type='InfiniteSampler', shuffle=True), # Randomly shuffle during training.
dataset=dict( # Train dataset config
type=dataset_type, # Type of dataset, refer to mmseg/datasets/ for details.
data_root=data_root, # The root of dataset.
data_prefix=dict(
img_path='leftImg8bit/train', seg_map_path='gtFine/train'), # Prefix for training data.
pipeline=train_pipeline)) # Processing pipeline. This is passed by the train_pipeline created before.
val_dataloader = dict(
batch_size=1, # Batch size of a single GPU
num_workers=4, # Worker to pre-fetch data for each single GPU
persistent_workers=True, # Shut down the worker processes after an epoch end, which can accelerate testing speed.
sampler=dict(type='DefaultSampler', shuffle=False), # Not shuffle during validation and testing.
dataset=dict( # Test dataset config
type=dataset_type, # Type of dataset, refer to mmseg/datasets/ for details.
data_root=data_root, # The root of dataset.
data_prefix=dict(
img_path='leftImg8bit/val', seg_map_path='gtFine/val'), # Prefix for testing data.
pipeline=test_pipeline)) # Processing pipeline. This is passed by the test_pipeline created before.
test_dataloader = val_dataloader
# The metric to measure the accuracy. Here, we use IoUMetric.
val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU'])
test_evaluator = val_evaluator
_base_/schedules/shedules_80k.py
# optimizer
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optim_wrapper = dict(type='OptimWrapper', optimizer=optimizer, clip_grad=None)
# learning policy
param_scheduler = [
dict(
type='PolyLR',
eta_min=1e-4,
power=0.9,
begin=0,
end=80000,
by_epoch=False)
]
# training schedule for 80k
train_cfg = dict(type='IterBasedTrainLoop', max_iters=80000, val_interval=8000)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
default_hooks = dict(
timer=dict(type='IterTimerHook'),
logger=dict(type='LoggerHook', interval=50, log_metric_by_epoch=False),
param_scheduler=dict(type='ParamSchedulerHook'),
checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=8000),
sampler_seed=dict(type='DistSamplerSeedHook'),
visualization=dict(type='SegVisualizationHook'))