윤제로의 제로베이스

[MMSeg] 적응하기... 본문

Background

[MMSeg] 적응하기...

윤_제로 2023. 8. 23. 18:12

mmseg를 사용한 논문 github을 보게 되었는데 실질적으로 코드는 별로 안 짜시는 것 같고 거의 config file가지고 하시는 거 같아서 신기해서 한 번 해보고 싶다는 생각에 시도해보는 중,,

설치

conda create --name openmmlab python=3.8 -y
conda activate openmmlab

conda install pytorch torchvision -c pytorch

pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"

git clone -b main https://github.com/open-mmlab/mmsegmentation.git #main branch를 HEAD로 clone
cd mmsegmentation
pip install -v -e .
# '-v': 함수 수행 시 발생하는 정보들을 표준 출력으로 내보냄
# '-e': 코드 수정이 가능한 모드로 설치

Config 파일 알아보기...

MMSeg는 config 파일을 기반으로 실행된다.

dataset, model, scheduler 등의 정보를 config 파일에 작성해서 실행한다.

configs - _base_에는 datasets, models, schedules의 3개의 폴더와 default_runtime.py가 있다.

  • datasets: 데이터에 따른 pipeline과 dataloader 구성 정보
  • models: 모델 구성 정보
  • schedules: optimizer 구성 정보
  • default_runtime.py: 실행 환경에 대한 기본값.

_base_ 의 component들로 구성된 config를 primitive 라고 하며, 같은 폴더 안의 config 파일들에 대해서는 오직 하나의 primitive config 파일만 있는 것을 권장한다.

예를 들어 Deeplabv3 모델 구조를 변경하여 학습하는 상황이라고 해보자

1. _base_ > models > deeplabv3 안에 있는 모델 중 하나를 상속한다.

_base_ = ../deeplabv3/deeplabv3_r50-d8_4xb2-40k_cityscapes-512x1024.py

2. 변경할 구조를 덧씌우는 형태로 새로운 config 파일을 작성한다.

3. _base_ > models 안에 없는 새로운 모델을 구성할 경우 folder를 새로 만들고 그 안에 configs 파일을 생성한다.

 

Config Name Style

{algorithm name}_{model component names [component1]_[component2]_[...]}_{training settings}_{training dataset information}_{testing dataset information}
  • {algorithm name} : 알고리즘 이름 deeplabv3, pspnet ...
  • {model component names} : 사용된 component의 이름 backbone, head... For example, r50-d8 은 ResNet50 backbone에 8배 downsampling된 output이 나온다.
  • {training settings} : batch size, augmentations, loss, learning rate scheduler, epochs/iterations같은 training setting 정보. For example, 4xb4-ce-linearlr-40K 는 4개 gpu-CrossEntropy loss-Linear learning rate schedulear-train 40k iterations
    • {gpu x batch_per_gpu} : GPUs and samples per GPU. bN은 N batch size per GPU. 8xb2면 8개 GPU에다가 2-images-per-gpu인거임.
    • {shedule} : iteration을 말하는거임
  • {training data infromation} :Training dataset 이름

예를 들면

config > pspnet > pspnet_r50-d8_4xb2-80k_cityscapes-512x1024-dark_zurich-1920x1080.py

형식으로 config 이름이 생성된다.

_base_ = [
    '../_base_/models/pspnet_r50-d8.py', '../_base_/datasets/cityscapes.py',
    '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py'
]
crop_size = (512, 1024)
data_preprocessor = dict(size=crop_size)
model = dict(data_preprocessor=data_preprocessor)

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='Resize', scale=(1920, 1080), keep_ratio=True),
    # add loading annotation after ``Resize`` because ground truth
    # does not need to do resize data transform
    dict(type='LoadAnnotations'),
    dict(type='PackSegInputs')
]
test_dataloader = dict(
    dataset=dict(
        type='DarkZurichDataset',
        data_root='data/dark_zurich/',
        data_prefix=dict(
            img_path='rgb_anon/val/night/GOPR0356',
            seg_map_path='gt/val/night/GOPR0356'),
        pipeline=test_pipeline))

 

_base_/models/pspnet_r50-d8.py는 PSPNet ResNet50V1c를 사용하는 basic model 이다.

# model settings
norm_cfg = dict(type='SyncBN', requires_grad=True)  # Segmentation usually uses SyncBN


data_preprocessor = dict(  # The config of data preprocessor, usually includes image normalization and augmentation.
    type='SegDataPreProcessor',  # The type of data preprocessor.
    mean=[123.675, 116.28, 103.53],  # Mean values used for normalizing the input images.
    std=[58.395, 57.12, 57.375],  # Standard variance used for normalizing the input images.
    bgr_to_rgb=True,  # Whether to convert image from BGR to RGB.
    pad_val=0,  # Padding value of image.
    seg_pad_val=255)  # Padding value of segmentation map.


model = dict(
    type='EncoderDecoder',  # Segmentor 이름
    data_preprocessor=data_preprocessor,
    pretrained='open-mmlab://resnet50_v1c',  # pretrained 모델을 가져옴.
    
    backbone=dict(
        type='ResNetV1c',  # backbone
        depth=50,  # Depth of backbone. Normally 50, 101 are used.
        num_stages=4,  # Number of stages of backbone.
        out_indices=(0, 1, 2, 3),  # The index of output feature maps produced in each stages.
        dilations=(1, 1, 2, 4),  # The dilation rate of each layer.
        strides=(1, 2, 1, 1),  # The stride of each layer.
        norm_cfg=norm_cfg,  # The configuration of norm layer.
        norm_eval=False,  # Whether to freeze the statistics in BN
        style='pytorch',  # The style of backbone, 'pytorch' means that stride 2 layers are in 3x3 conv, 'caffe' means stride 2 layers are in 1x1 convs.
        contract_dilation=True),  # When dilation > 1, whether contract first layer of dilation.
    
    decode_head=dict(
        type='PSPHead',  # Type of decode head.
        in_channels=2048,  # Input channel of decode head.
        in_index=3,  # The index of feature map to select.
        channels=512,  # The intermediate channels of decode head.
        pool_scales=(1, 2, 3, 6),  # The avg pooling scales of PSPHead. Please refer to paper for details.
        dropout_ratio=0.1,  # The dropout ratio before final classification layer.
        num_classes=19,  # Number of segmentation class. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k.
        norm_cfg=norm_cfg,  # The configuration of norm layer.
        align_corners=False,  # The align_corners argument for resize in decoding.
        loss_decode=dict(  # Config of loss function for the decode_head.
            type='CrossEntropyLoss',  # Type of loss used for segmentation.
            use_sigmoid=False,  # Whether use sigmoid activation for segmentation.
            loss_weight=1.0)),  # Loss weight of decode_head.
    
    auxiliary_head=dict(
        type='FCNHead',  # Type of auxiliary head. Please refer to mmseg/models/decode_heads for available options.
        in_channels=1024,  # Input channel of auxiliary head.
        in_index=2,  # The index of feature map to select.
        channels=256,  # The intermediate channels of decode head.
        num_convs=1,  # Number of convs in FCNHead. It is usually 1 in auxiliary head.
        concat_input=False,  # Whether concat output of convs with input before classification layer.
        dropout_ratio=0.1,  # The dropout ratio before final classification layer.
        num_classes=19,  # Number of segmentation class. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k.
        norm_cfg=norm_cfg,  # The configuration of norm layer.
        align_corners=False,  # The align_corners argument for resize in decoding.
        loss_decode=dict(  # Config of loss function for the auxiliary_head.
            type='CrossEntropyLoss',  # Type of loss used for segmentation.
            use_sigmoid=False,  # Whether use sigmoid activation for segmentation.
            loss_weight=0.4)),  # Loss weight of auxiliary_head.
    # model training and testing settings
    train_cfg=dict(),  # train_cfg is just a place holder for now.
    test_cfg=dict(mode='whole'))  # The test mode, options are 'whole' and 'slide'. 'whole': whole image fully-convolutional test. 'slide': sliding crop window on the image.

_base_/datasets/cityscapes.py은 다음과 같다.

# dataset settings
dataset_type = 'CityscapesDataset'  # Dataset type, this will be used to define the dataset.
data_root = 'data/cityscapes/'  # Root path of data.
crop_size = (512, 1024)  # The crop size during training.


train_pipeline = [  # Training pipeline.
    dict(type='LoadImageFromFile'),  # First pipeline to load images from file path.
    dict(type='LoadAnnotations'),  # Second pipeline to load annotations for current image.
    dict(type='RandomResize',  # Augmentation pipeline that resize the images and their annotations.
        scale=(2048, 1024),  # The scale of image.
        ratio_range=(0.5, 2.0),  # The augmented scale range as ratio.
        keep_ratio=True),  # Whether to keep the aspect ratio when resizing the image.
    dict(type='RandomCrop',  # Augmentation pipeline that randomly crop a patch from current image.
        crop_size=crop_size,  # The crop size of patch.
        cat_max_ratio=0.75),  # The max area ratio that could be occupied by single category.
    dict(type='RandomFlip',  # Augmentation pipeline that flip the images and their annotations
        prob=0.5),  # The ratio or probability to flip
    dict(type='PhotoMetricDistortion'),  # Augmentation pipeline that distort current image with several photo metric methods.
    dict(type='PackSegInputs')  # Pack the inputs data for the semantic segmentation.
]


test_pipeline = [
    dict(type='LoadImageFromFile'),  # First pipeline to load images from file path
    dict(type='Resize',  # Use resize augmentation
        scale=(2048, 1024),  # Images scales for resizing.
        keep_ratio=True),  # Whether to keep the aspect ratio when resizing the image.
    # add loading annotation after ``Resize`` because ground truth
    # does not need to do resize data transform
    dict(type='LoadAnnotations'),  # Load annotations for semantic segmentation provided by dataset.
    dict(type='PackSegInputs')  # Pack the inputs data for the semantic segmentation.
]


train_dataloader = dict(  # Train dataloader config
    batch_size=2,  # Batch size of a single GPU
    num_workers=2,  # Worker to pre-fetch data for each single GPU
    persistent_workers=True,  # Shut down the worker processes after an epoch end, which can accelerate training speed.
    sampler=dict(type='InfiniteSampler', shuffle=True),  # Randomly shuffle during training.
    dataset=dict(  # Train dataset config
        type=dataset_type,  # Type of dataset, refer to mmseg/datasets/ for details.
        data_root=data_root,  # The root of dataset.
        data_prefix=dict(
            img_path='leftImg8bit/train', seg_map_path='gtFine/train'),  # Prefix for training data.
        pipeline=train_pipeline)) # Processing pipeline. This is passed by the train_pipeline created before.


val_dataloader = dict(
    batch_size=1,  # Batch size of a single GPU
    num_workers=4,  # Worker to pre-fetch data for each single GPU
    persistent_workers=True,  # Shut down the worker processes after an epoch end, which can accelerate testing speed.
    sampler=dict(type='DefaultSampler', shuffle=False),  # Not shuffle during validation and testing.
    dataset=dict(  # Test dataset config
        type=dataset_type,  # Type of dataset, refer to mmseg/datasets/ for details.
        data_root=data_root,  # The root of dataset.
        data_prefix=dict(
            img_path='leftImg8bit/val', seg_map_path='gtFine/val'),  # Prefix for testing data.
        pipeline=test_pipeline))  # Processing pipeline. This is passed by the test_pipeline created before.


test_dataloader = val_dataloader
# The metric to measure the accuracy. Here, we use IoUMetric.
val_evaluator = dict(type='IoUMetric', iou_metrics=['mIoU'])
test_evaluator = val_evaluator

_base_/schedules/shedules_80k.py

# optimizer
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005)
optim_wrapper = dict(type='OptimWrapper', optimizer=optimizer, clip_grad=None)
# learning policy
param_scheduler = [
    dict(
        type='PolyLR',
        eta_min=1e-4,
        power=0.9,
        begin=0,
        end=80000,
        by_epoch=False)
]
# training schedule for 80k
train_cfg = dict(type='IterBasedTrainLoop', max_iters=80000, val_interval=8000)
val_cfg = dict(type='ValLoop')
test_cfg = dict(type='TestLoop')
default_hooks = dict(
    timer=dict(type='IterTimerHook'),
    logger=dict(type='LoggerHook', interval=50, log_metric_by_epoch=False),
    param_scheduler=dict(type='ParamSchedulerHook'),
    checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=8000),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    visualization=dict(type='SegVisualizationHook'))