StyleGAN2-ADA pytorch版本

资料仓库  收藏
0 / 845

StyleGAN2-ADA - Official PyTorch implementation

NVIDIA Research Projects

** Last update: Aug 3, 2021

stylegan2-ada-pytorch

Overview

StyleGAN2-ADA — Official PyTorch implementation

Teaser image

Training Generative Adversarial Networks with Limited Data
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, Timo Aila
https://arxiv.org/abs/2006.06676

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes. The approach does not require changes to loss functions or network architectures, and is applicable both when training from scratch and when fine-tuning an existing GAN on another dataset. We demonstrate, on several datasets, that good results are now possible using only a few thousand training images, often matching StyleGAN2 results with an order of magnitude fewer images. We expect this to open up new application domains for GANs. We also find that the widely used CIFAR-10 is, in fact, a limited data benchmark, and improve the record FID from 5.59 to 2.42.

Release notes

This repository is a faithful reimplementation of StyleGAN2-ADA in PyTorch, focusing on correctness, performance, and compatibility.

Correctness

  • Full support for all primary training configurations.
  • Extensive verification of image quality, training curves, and quality metrics against the TensorFlow version.
  • Results are expected to match in all cases, excluding the effects of pseudo-random numbers and floating-point arithmetic.

Performance

  • Training is typically 5%–30% faster compared to the TensorFlow version on NVIDIA Tesla V100 GPUs.
  • Inference is up to 35% faster in high resolutions, but it may be slightly slower in low resolutions.
  • GPU memory usage is comparable to the TensorFlow version.
  • Faster startup time when training new networks (<50s), and also when using pre-trained networks (<4s).
  • New command line options for tweaking the training performance.

Compatibility

  • Compatible with old network pickles created using the TensorFlow version.

  • New ZIP/PNG based dataset format for maximal interoperability with existing 3rd party tools.

  • TFRecords datasets are no longer supported — they need to be converted to the new format.

  • New JSON-based format for logs, metrics, and training curves.

  • Training curves are also exported in the old TFEvents format if TensorBoard is installed.

  • Command line syntax is mostly unchanged, with a few exceptions (e.g., dataset_tool.py).

  • Comparison methods are not supported (--cmethod, --dcap, --cfg=cifarbaseline, --aug=adarv)

  • Truncation is now disabled by default.

Data repository

Path Description
stylegan2-ada-pytorch Main directory hosted on Amazon S3
  ├ ada-paper.pdf Paper PDF
  ├ images Curated example images produced using the pre-trained models
  ├ videos Curated example interpolation videos
  └ pretrained Pre-trained models
    ├ ffhq.pkl FFHQ at 1024x1024, trained using original StyleGAN2
    ├ metfaces.pkl MetFaces at 1024x1024, transfer learning from FFHQ using ADA
    ├ afhqcat.pkl AFHQ Cat at 512x512, trained from scratch using ADA
    ├ afhqdog.pkl AFHQ Dog at 512x512, trained from scratch using ADA
    ├ afhqwild.pkl AFHQ Wild at 512x512, trained from scratch using ADA
    ├ cifar10.pkl Class-conditional CIFAR-10 at 32x32
    ├ brecahad.pkl BreCaHAD at 512x512, trained from scratch using ADA
    ├ paper-fig7c-training-set-sweeps Models used in Fig.7c (sweep over training set size)
    ├ paper-fig11a-small-datasets Models used in Fig.11a (small datasets & transfer learning)
    ├ paper-fig11b-cifar10 Models used in Fig.11b (CIFAR-10)
    ├ transfer-learning-source-nets Models used as starting point for transfer learning
    └ metrics Feature detectors used by the quality metrics

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • 1–8 high-end NVIDIA GPUs with at least 12 GB of memory. We have done all testing and development using NVIDIA DGX-1 with 8 Tesla V100 GPUs.
  • 64-bit Python 3.7, PyTorch 1.7.1, and CUDA toolkit 11.0 or newer. Use CUDA toolkit 11.1 or later with RTX 3090. See https://pytorch.org/ for PyTorch install instructions.
  • Python libraries: pip install click requests tqdm pyspng ninja imageio-ffmpeg==0.4.3. We use the Anaconda3 2020.11 distribution which installs most of these by default.
  • Docker users: use the provided Dockerfile to build an image with the required library dependencies.

The code relies heavily on custom PyTorch extensions that are compiled on the fly using NVCC. On Windows, the compilation requires Microsoft Visual Studio. We recommend installing Visual Studio Community Edition and adding it into PATH using "C:\Program Files (x86)\Microsoft Visual Studio\<version>\Community\VC\Auxiliary\Build\vcvars64.bat"</version>.

Getting started

Pre-trained networks are stored as *.pkl files that can be referenced using local filenames or URLs:

# Generate curated MetFaces images without truncation (Fig.10 left)
python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

# Generate uncurated MetFaces images with truncation (Fig.12 upper left)
python generate.py --outdir=out --trunc=0.7 --seeds=600-605 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

# Generate class conditional CIFAR-10 images (Fig.17 left, Car)
python generate.py --outdir=out --seeds=0-35 --class=1 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/cifar10.pkl

# Style mixing example
python style_mixing.py --outdir=out --rows=85,100,75,458,1500 --cols=55,821,1789,293 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

Outputs from the above commands are placed under out/*.png, controlled by --outdir. Downloaded network pickles are cached under $HOME/.cache/dnnlib, which can be overridden by setting the DNNLIB_CACHE_DIR environment variable. The default PyTorch extension build directory is $HOME/.cache/torch_extensions, which can be overridden by setting TORCH_EXTENSIONS_DIR.

Docker: You can run the above curated image example using Docker as follows:

docker build --tag sg2ada:latest .
./docker_run.sh python3 generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

Note: The Docker image requires NVIDIA driver release r455.23 or later.

Legacy networks: The above commands can load most of the network pickles created using the previous TensorFlow versions of StyleGAN2 and StyleGAN2-ADA. However, for future compatibility, we recommend converting such legacy pickles into the new format used by the PyTorch version:

python legacy.py \
    --source=https://nvlabs-fi-cdn.nvidia.com/stylegan2/networks/stylegan2-cat-config-f.pkl \
    --dest=stylegan2-cat-config-f.pkl

Projecting images to latent space

To find the matching latent vector for a given image file, run:

python projector.py --outdir=out --target=~/mytargetimg.png \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

For optimal results, the target image should be cropped and aligned similar to the FFHQ dataset. The above command saves the projection target out/target.png, result out/proj.png, latent vector out/projected_w.npz, and progression video out/proj.mp4. You can render the resulting latent vector by specifying --projected_w for generate.py:

python generate.py --outdir=out --projected_w=out/projected_w.npz \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

Using networks from Python

You can use pre-trained networks in your own Python code as follows:

with open('ffhq.pkl', 'rb') as f:
    G = pickle.load(f)['G_ema'].cuda()  # torch.nn.Module
z = torch.randn([1, G.z_dim]).cuda()    # latent codes
c = None                                # class labels (not used in this example)
img = G(z, c)                           # NCHW, float32, dynamic range [-1, +1]

The above code requires torch_utils and dnnlib to be accessible via PYTHONPATH. It does not need source code for the networks themselves — their class definitions are loaded from the pickle via torch_utils.persistence.

The pickle contains three networks. 'G' and 'D' are instantaneous snapshots taken during training, and 'G_ema' represents a moving average of the generator weights over several training steps. The networks are regular instances of torch.nn.Module, with all of their parameters and buffers placed on the CPU at import and gradient computation disabled by default.

The generator consists of two submodules, G.mapping and G.synthesis, that can be executed separately. They also support various additional options:

w = G.mapping(z, c, truncation_psi=0.5, truncation_cutoff=8)
img = G.synthesis(w, noise_mode='const', force_fp32=True)

Please refer to generate.py, style_mixing.py, and projector.py for further examples.

Preparing datasets

Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels.

Custom datasets can be created from a folder containing images; see python dataset_tool.py --help for more information. Alternatively, the folder can also be used directly as a dataset, without running it through dataset_tool.py first, but doing so may lead to suboptimal performance.

Legacy TFRecords datasets are not supported — see below for instructions on how to convert them.

FFHQ:

Step 1: Download the Flickr-Faces-HQ dataset as TFRecords.

Step 2: Extract images from TFRecords using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python ../stylegan2-ada/dataset_tool.py unpack \
    --tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked

Step 3: Create ZIP archive using dataset_tool.py from this repository:

# Original 1024x1024 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq.zip

# Scaled down 256x256 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq256x256.zip \
    --width=256 --height=256

MetFaces: Download the MetFaces dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/metfaces/images --dest=~/datasets/metfaces.zip

AFHQ: Download the AFHQ dataset and create ZIP archive:

python dataset_tool.py --source=~/downloads/afhq/train/cat --dest=~/datasets/afhqcat.zip
python dataset_tool.py --source=~/downloads/afhq/train/dog --dest=~/datasets/afhqdog.zip
python dataset_tool.py --source=~/downloads/afhq/train/wild --dest=~/datasets/afhqwild.zip

CIFAR-10: Download the CIFAR-10 python version and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/cifar-10-python.tar.gz --dest=~/datasets/cifar10.zip

LSUN: Download the desired categories from the LSUN project page and convert to ZIP archive:

python dataset_tool.py --source=~/downloads/lsun/raw/cat_lmdb --dest=~/datasets/lsuncat200k.zip \
    --transform=center-crop --width=256 --height=256 --max_images=200000

python dataset_tool.py --source=~/downloads/lsun/raw/car_lmdb --dest=~/datasets/lsuncar200k.zip \
    --transform=center-crop-wide --width=512 --height=384 --max_images=200000

BreCaHAD:

Step 1: Download the BreCaHAD dataset.

Step 2: Extract 512x512 resolution crops using dataset_tool.py from the TensorFlow version of StyleGAN2-ADA:

# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python dataset_tool.py extract_brecahad_crops --cropsize=512 \
    --output_dir=/tmp/brecahad-crops --brecahad_dir=~/downloads/brecahad/images

Step 3: Create ZIP archive using dataset_tool.py from this repository:

python dataset_tool.py --source=/tmp/brecahad-crops --dest=~/datasets/brecahad.zip

Training new networks

In its most basic form, training new networks boils down to:

python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1 --dry-run
python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1

The first command is optional; it validates the arguments, prints out the training configuration, and exits. The second command kicks off the actual training.

In this example, the results are saved to a newly created directory ~/training-runs/<id>-mydataset-auto1</id>, controlled by --outdir. The training exports network pickles (network-snapshot-<int>.pkl</int>) and example images (fakes<int>.png</int>) at regular intervals (controlled by --snap). For each pickle, it also evaluates FID (controlled by --metrics) and logs the resulting scores in metric-fid50k_full.jsonl (as well as TFEvents if TensorBoard is installed).