DeepAI
Log In Sign Up

Scaling Painting Style Transfer

Neural style transfer is a deep learning technique that produces an unprecedentedly rich style transfer from a style image to a content image and is particularly impressive when it comes to transferring style from a painting to an image. It was originally achieved by solving an optimization problem to match the global style statistics of the style image while preserving the local geometric features of the content image. The two main drawbacks of this original approach is that it is computationally expensive and that the resolution of the output images is limited by high GPU memory requirements. Many solutions have been proposed to both accelerate neural style transfer and increase its resolution, but they all compromise the quality of the produced images. Indeed, transferring the style of a painting is a complex task involving features at different scales, from the color palette and compositional style to the fine brushstrokes and texture of the canvas. This paper provides a solution to solve the original global optimization for ultra-high resolution images, enabling multiscale style transfer at unprecedented image sizes. This is achieved by spatially localizing the computation of each forward and backward passes through the VGG network. Extensive qualitative and quantitative comparisons show that our method produces a style transfer of unmatched quality for such high resolution painting styles.

READ FULL TEXT VIEW PDF

page 1

page 3

page 6

page 7

page 8

12/10/2018

Self-Contained Stylization via Steganography for Reverse and Serial Style Transfer

Style transfer has been widely applied to give real-world images a new a...
03/24/2022

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer

Recent studies on StyleGAN show high performance on artistic portrait ge...
11/17/2016

Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer

Transferring artistic styles onto everyday photographs has become an ext...
10/21/2019

Improving Style Transfer with Calibrated Metrics

Style transfer methods produce a transferred image which is a rendering ...
05/05/2018

Learning Selfie-Friendly Abstraction from Artistic Style Images

Artistic style transfer can be thought as a process to generate differen...
07/10/2020

Geometric Style Transfer

Neural style transfer (NST), where an input image is rendered in the sty...
11/23/2016

Controlling Perceptual Factors in Neural Style Transfer

Neural Style Transfer has shown very exciting results enabling new forms...

1 Introduction

Figure 2: UHR style transfer. Top row, content image (top-left, 60488064), style image (bottom left, 60487914), result (right, 60488064) (the three UHR images are downscaled 4 for visualization). Bottom row: three zoomed in details of the result image (, true resolution). Observe how very fine details such as the chairs look as if painted.

Style transfer is an image editing strategy transferring an image style to a content image. Given style and content, the goal is to extract the style characteristics of the style and merge them to the geometric features of the content. While this problem has a long history in computer vision and computer graphics (e.g. 

[Hertzmann_etal_image_analogies_SIGGRAPH2001, Aubry_etal_fast_local_laplacian_filters_TOG2014]), it has seen a remarkable development since the seminal works of Gatys et al. [Gatys_et_al_texture_synthesis_using_CNN_2015, Gatys_et_al_image_style_transfer_cnn_cvpr2016]

. These works demonstrate that the Gram matrices of the activation functions of a pre-trained VGG19 network 

[Simonyan_Zisserman_VGG_ICLR15] faithfully encode the perceptual style and textures of an input image. Style transfer is performed by optimizing a functional aiming at a compromise between fidelity to VGG19 features of the content image while reproducing the Gram matrix statistics of the style image. Other global statistics have been proven effective for style transfer and texture synthesis [Lu_Zhu_Wu_Deepframe_AAAI2016, Sendik_deep_correlations_texture_synthesis_SIGGRAPH2017, Luan_etal_deep_photo_style_transfer_cvpr2017, Vacher_etal_texture_interpolation_probing_visual_perception_NEURIPS2020, Risser_etal_stable_and_controllable_neural_texture_synthesis_and_style_transfer_Arxiv2017, Heitz_slices_Wassestein_loss_neural_texture_synthesis_CVPR2021, DeBortoli_et_al_maximum_entropy_methods_texture_synthesis_SIMODS2021, gonthier2022high] and it has been shown that a coarse-to-fine multiscale approach allows one to reproduce different levels of style detail for images of moderate to high-resolution (HR) [Gatys_etal_Controlling_perceptual_factors_in_neural_style_transfer_CVPR2017, snelgrove2017high, gonthier2022high]. The two major drawbacks of such optimization-based style transfer are the computation time and the limited resolution of images because of large GPU memory requirement.

Regarding computation time, several methods have been proposed to generate new stylized images by training feed-forward networks [ulyanov2016texturenets, johnson2016Perceptual, li2016precomputed] or by training VGG encoder-decoder networks [chen2016fast, Huang_arbitrary_style_transfer_real_time_ICCV2017, li2017universal, li2019learning, chiu2020iterative]. These models tend to provide images with relatively low style transfer loss and can therefore be considered as approximate solutions to [Gatys_et_al_image_style_transfer_cnn_cvpr2016]. Despite remarkable progress regarding computation time, these methods suffer from GPU memory limitations due to the large size of the models used for content and style characterization and are therefore limited in terms of resolution (generally limited to pixels (px)).

This resolution limitation was recently tackled [an2020real, Wang_2020_CVPR, Chen_Wang_Xie_Lu_Luo_towards_ultra_resolution_neural_style_transfer_thumbnail_instance_normalization_AAAI2022]. Nevertheless, although generating ultra-high resolution (UHR) images (larger than 4k images), the approximate results are not able to correctly represent the style resolution. Indeed, for some methods to satisfy the GPU’s memory limitations, the transfer is performed locally on small patches of the content image with a zoomed out style image ( px) [Chen_Wang_Xie_Lu_Luo_towards_ultra_resolution_neural_style_transfer_thumbnail_instance_normalization_AAAI2022]. In other methods, the multiscale nature of the networks is not fully exploited [Wang_2020_CVPR].

As illustrated in Figure 1, our high-resolution multiscale method manages to transfer the different levels of detail contained in the style image from the colour palette and compositional style to the fine brushstrokes and canvas texture. The resulting UHR images look like authentic painting as can be seen in the UHR example of Figure 2.

Comparative experiments illustrate that the results of competitive methods suffer from brushstroke styles that do not match those of the UHR style image, and that very fine textures are not well transferred and are subject to local artifacts. To straighten this visual comparison, we also introduce a qualitative and quantitative identity test that highlights how well a given texture is being emulated.

The main contributions of this work are summarized as follows:

  • We introduce a two-step algorithm to compute the style transfer loss gradient for UHR images that do not fit in GPU memory using localized neural feature calculation.

  • We show that this algorithm allows a multi-resolution UHR transfer for images up to px in size.

  • We experimentally show that the visual quality of this UHR style transfer is richer and more faithful than recent fast but approximate solutions.

This work provides a new reference method for high-quality style transfer with unequaled multi-resolution depth. It might serve as a reference to evaluate fast but approximate models.

2 Related work

Style transfer by optimization.

As recalled in the introduction, the seminal work of Gatys et al. formulated style transfer as an optimization minimizing the distances between Gram matrices of VGG features. Other global statistics have been proven effective for style transfer and texture synthesis such as deep correlations [Sendik_deep_correlations_texture_synthesis_SIGGRAPH2017, gonthier2022high], Bures metric [Vacher_etal_texture_interpolation_probing_visual_perception_NEURIPS2020], spatial mean of features [Lu_Zhu_Wu_Deepframe_AAAI2016, DeBortoli_et_al_maximum_entropy_methods_texture_synthesis_SIMODS2021], feature histograms [Risser_etal_stable_and_controllable_neural_texture_synthesis_and_style_transfer_Arxiv2017], or even the full feature distributions [Heitz_slices_Wassestein_loss_neural_texture_synthesis_CVPR2021]. Specific cost function corrections have also been proposed for photorealistic style transfer [Luan_etal_deep_photo_style_transfer_cvpr2017]. When dealing with HR images, a coarse-to-fine multiscale strategy has been proven efficient to capture the different levels of details present in style images [Gatys_etal_Controlling_perceptual_factors_in_neural_style_transfer_CVPR2017, snelgrove2017high, gonthier2022high].

Style transfer by training feed-forward networks.

Ulyanov et al[ulyanov2016texturenets, Ulyanov_etal_improved_texture_networks_CVPR2017] and Johnson et al[johnson2016Perceptual] showed that one could train a feed-forward network to approximately solve style transfer. Although these models produce a very fast style transfer, they require learning a new model for each type of style.

Universal style transfer (UST).

Style limitation has been addressed by training a VGG autoencoder that attempts to reverse VGG feature computations after normalizing them at the autoencoder bottleneck. Chen

et al[chen2016fast] introduce the encoder-decoder framework with a style swap layer replacing content features with the closest style features on overlapping patches. Huang et al[Huang_arbitrary_style_transfer_real_time_ICCV2017]

propose to use an Adaptive Instance Normalization (AdaIN) that adjusts the mean and variance of the content image features to match those of the style image. Li

et al[li2017universal] match the covariance matrices of the content image features to those of the style image by applying whitening and coloring transforms. These operations are performed layer by layer and involve specific reconstruction decoders at each step. Sheng et al[sheng2018avatar] use one encoder-decoder block combining the transformations of  [li2017universal] and [chen2016fast]. Park et al[park2019arbitrary] introduce an attention-based transformation module to integrate the local style patterns according to the spatial distribution of the content image. Li et al[li2019learning] train a symmetric encoder-decoder image reconstruction module and a transformation learning module. Chiu et al[chiu2020iterative] extend [li2017universal] by embedding a new transformation that iteratively updates features in the cascade of four autoencoder modules. Despite the numerous improvements of fast UST strategies, let us remark that: (a) they rely on matching VGG statistics as introduced by Gatys et al[Gatys_et_al_image_style_transfer_cnn_cvpr2016] (b) they are limited in resolution due to GPU memory required for the large sized models.

UST for high-resolution images.

Some methods attempt to reduce the size of the network in order to perform high resolution style transfer. An et al[an2020real] propose ArtNet which is a channel-wise pruned version of GoogLeNet [Szegedy_2015_CVPR]. Wang et al[Wang_2020_CVPR] propose a collaborative distillation approach in order to compress the model by transferring the knowledge of a large network (VGG19) to a smaller one, hence reducing the number of convolutional filters involved in [li2017universal] and [Huang_arbitrary_style_transfer_real_time_ICCV2017]. Chen et al. [Chen_Wang_Xie_Lu_Luo_towards_ultra_resolution_neural_style_transfer_thumbnail_instance_normalization_AAAI2022] recently proposed an UHR style transfer framework where the content image is divided into patches and a patch-wise style transfer is performed from a zoomed out version of the style image of size px.

3 Global optimization for neural style transfer

Single scale style transfer.

Let us recall the algorithm of Gatys et al[Gatys_et_al_image_style_transfer_cnn_cvpr2016]. It solely relies on optimizing some VGG19 second-order statistics for changing the image style while maintaining some VGG19 features to preserve the content image’s geometric features. Style is encoded through Gram matrices of several VGG19 layers, namely the set while the content is encoded with a single feature layer .

Given a content image and a style image

, one optimizes the loss function

(1)

where , with , and

(2)

with

(3)

where is the Frobenius norm, and, for an image and a layer index , denotes the Gram matrix of the VGG19 features at layer : if is the feature response of at layer that has spatial size and channels, one first reshapes as a matrix of size with the number of feature pixels, its associated Gram matrix is

(4)

where

is the column vector corresponding to the

-th line of . is a fourth-degree polynomial and non convex with respect to (wrt) the VGG features . Gatys et al[Gatys_et_al_texture_synthesis_using_CNN_2015] propose to use the L-BFGS algorithm [Nocedal_updating_Quasi-Newton_matrices_with_limited_storage_1980] to minimize this loss, after initializing with the content image . L-BFGS is an iterative quasi-Newton procedure that approximates the inverse of the Hessian using a fixed size history of the gradient vectors computed during the last iterations. The history size is typically 100 but will be decreased to 10 for HR images (for all scales except the first one) to limit memory requirement.

Gram loss correction.

It is known that optimizing for the Gram matrix alone may introduce some loss of contrast artefacts since Gram matrices encompass information regarding both the mean values and correlation of features [Sendik_deep_correlations_texture_synthesis_SIGGRAPH2017, Risser_etal_stable_and_controllable_neural_texture_synthesis_and_style_transfer_Arxiv2017, Heitz_slices_Wassestein_loss_neural_texture_synthesis_CVPR2021]. Instead of considering the full histogram of the features [Risser_etal_stable_and_controllable_neural_texture_synthesis_and_style_transfer_Arxiv2017, Heitz_slices_Wassestein_loss_neural_texture_synthesis_CVPR2021]

, we found that correcting for the mean and standard deviation (std) of each feature gives visually satisfying results. Given some (reshaped) features

, define and by

(5)

and

(6)

In the whole paper, we replace the Gram loss of Eq. (3) by the following augmented style loss

(7)

for a better reproduction of the feature distribution. The values of all the weights , , , , , have been fixed for all images. Note that limiting our style loss to second-order statistics is capital for a straightforward implementation of our localized algorithm described in Section 4.

Multiscale style transfer.

Since the style transfer solely relies on VGG19, the transfer is spatially limited by the receptive field of the network [Gatys_etal_Controlling_perceptual_factors_in_neural_style_transfer_CVPR2017]. For images having a side larger than 500 px, visually richer results are obtained by adopting a multiscale approach [Gatys_etal_Controlling_perceptual_factors_in_neural_style_transfer_CVPR2017] corresponding to the standard coarse-to-fine approach for texture synthesis [Wei_Levoy_fast_texture_synthesis_2000]. For the sake of simplicity, suppose that the content image and the style image have the same resolution (otherwise one can downscale the resolution of to match the one of as a preprocessing [Gatys_et_al_image_style_transfer_cnn_cvpr2016]). When using , both and are first downscaled by a factor to obtain the low-resolution couple and style transfer is first applied at this coarse resolution starting with . Then, for each subsequent scale to , the result image is upscaled by a factor 2 to define the initialization image , and style transfer is applied with the content and style image downscaled by a factor . At the last scale the output image has the same resolution as the HR content image. Thanks to this coarse-to-fine approach, the style is transferred in a coarse-to-fine way. This is especially important when using an HR digital photograph of a painting for the style: Ideally, the first scale encompasses color and large strokes while subsequent scales refine the stroke details up to the painting texture, bristle brushes and canvas texture.

Unfortunately, applying this multiscale algorithm off-the-shelf with UHR images is not possible in practice for images of size larger than 4000 px, even with a high-end GPU. The main limitation comes from the fact that differentiating the loss wrt requires fitting into memory and all its intermediate VGG19 features. While this requires a moderate 2.61 GB for a px image, it requires GB for a while scaling up to is not feasible with a 40 GB GPU. In the next section we describe a practical solution to overcome this limitation.

4 Localized neural features and style transfer loss gradient

Figure 3: Algorithm overview: Our localized algorithm (right part) allows to compute the global style transfer loss and its gradient wrt for images that are too large for the original algorithm of Gatys et al[Gatys_et_al_image_style_transfer_cnn_cvpr2016] (left part).

Our main contribution is to emulate the computation of even for images larger than px for which evaluation and automatic differentiation of the loss is not feasible due to large memory requirement.

First suppose one wants to compute the feature maps , , of an UHR image . The natural idea developed here is to compute the feature maps piece by piece, by partitioning the input image into small images of size , that we will call blocks. This approach will work up to boundary issues. Indeed, to compute exactly the feature maps of one needs the complete receptive field centered at the pixel of interest. Hence, each block of the partition must be extracted with a margin area, except on the sides that are actual borders for the image . In all our experiments we use a margin of width px in the image domain.

This localized way to compute features allows one to compute global feature statistics such as Gram matrices and means and stds vectors. Indeed, these statistics are all spatial averages that can be aggregated block by block by adding sequentially the contribution of each block. Hence, this easy to implement procedure allows one to compute the value of the loss (1). Note that it is not possible to automatically differentiate this loss, because the computation graph linking back to is lost.

Global statistics Feature loss Expression Gradient wrt the feature
Raw features MSE:
Gram matrix: Gram loss:
Feature mean: Mean loss:
Feature std: Std loss:
Table 1: Expression of the feature loss gradient wrt a generic feature having pixels and channels (having size ).

However, a close inspection of the different style losses wrt the neural features shows that they all have the same form: For each style layer , the gradient of the layer style loss wrt the layer feature at some pixel location only depends on the local value and on some difference between the global statistics (Gram matrix, spatial mean, std) of and the corresponding ones from the style layer . This fact is summarized in the formulas of Table 1. Exploiting this locality of the gradient, it is also possible to exactly compute the gradient vector

block by block using a two-pass procedure: The first pass is used to compute the global VGG19 statistics of each style layer and the second pass is used to locally backpropagate the gradient wrt the local neural features. The whole procedure is described by Algorithm 

1 and illustrated by Figure 3.

Current image , content image layer , and list of feature statistics of (computed block by block)
and
Step 1: Compute the global style statistics of block by block:
for each block in the partition of  do
     Extract the block with margin and compute without computation graph
     For each style layer : Extract the features of the block by properly removing the margin and add their contribution to and .
end for
For each style layer : compute as a function of and .
Step 2: Compute the transfer loss and its gradient wrt block by block:
Initialize the loss and its gradient: ;
for each block in the partition of  do
     Extract the block with margin and compute with computation graph
     For each style layer : Compute the gradient of the style loss wrt the local features using the global statistics of from Step 1 and the style statistics of as reference (Table 1)
     For the content layer , add the contribution of to the loss and compute the gradient of the content loss wrt the local features (first row of Table 1)
     Use automatic differentiation to backpropagate all the feature gradients to the level of the input block image .
     Populate the corresponding block of with the inner part of the gradient obtained by backpropagation.
end for
Algorithm 1 Localized computation of the style transfer loss and its gradient wrt

Note that the memory requirement for Algorithm 1 does not depend on the image size. Indeed, by spatially splitting all the computations involving VGG19 features, dealing with larger images only requires more computation time (since there are more blocks). However, the L-BFGS optimization will require more memory since it requires to store a gradients history, each gradient having the size of . Using a single 40 GB GPU, our algorithm allows for style transfer for images of size up to px.

5 Experiments

Ultra-high resolution style transfer.

An example of UHR style transfer is displayed in Figure 2 with several highlighted details. Figure 1 illustrates intermediary steps of our high resolution multiscale algorithm. The result for the first scale (third column) corresponds to the ones of the original paper [Gatys_et_al_image_style_transfer_cnn_cvpr2016] (except for our slightly modified style loss) and suffers from poor image resolution and grid artefacts. Note how, while progressing to the last scale, the texture of the painting gets refined and stroke details gain a natural aspect. This process is remarkably stable; the successive global style transfers results remain consistent with the one of the first scale.

We compare our method with two fast alternatives for UHR style transfer, namely collaborative distillation (CD) [Wang_2020_CVPR] and URST [Chen_Wang_Xie_Lu_Luo_towards_ultra_resolution_neural_style_transfer_thumbnail_instance_normalization_AAAI2022] (based on [li2017universal]) using their official implementations111[Wang_2020_CVPR]: https://github.com/MingSun-Tse/Collaborative-Distillation; [Chen_Wang_Xie_Lu_Luo_towards_ultra_resolution_neural_style_transfer_thumbnail_instance_normalization_AAAI2022]: https://git.io/URST. As already discussed in Section 2, URST decreases the resolution of the style image to px, so the style transfer is not performed at the proper scale and fine details cannot be transferred. As in UST methods, CD does not take into account details at different scales but simply proposes to reduce the number of filters in the auto-encoder network through collaborative distillation to process larger images. Unsurprisingly, one observes in Figure 4 that our method is the only one capable of conveying the aspect of the painting strokes to the content image. CD suffers from halo and high-frequency artefacts, while URST presents visible patch boundaries and a detail frequency mismatch due to improper scaling. Observe also that the style transfer results are in general better when the geometric content of the style image and the content image are close, regardless of the method.

Style image

Content image

SPST (ours)

CD

URST

Figure 4: Comparison of UHR style transfers. For each example, top row, left to right: style, content, our result (SPST), CD [Wang_2020_CVPR], URST [Chen_Wang_Xie_Lu_Luo_towards_ultra_resolution_neural_style_transfer_thumbnail_instance_normalization_AAAI2022]. Bottom row: zoom in of the corresponding top row. First row: content (), style (). Third row: content (30244032), style (30243787). We used three scales for both of our results. Observe the loss of details and the unrealistic looks of the outputs produced by both fast methods.

Identity test for style transfer quality assessment.

Method PSNR SSIM LPIPS Gram
SPST (ours)
CD
URST
Table 2: Quantitative evaluation of identity test for UHR style transfer. The PSNR, SSIM [ssim], LPIPS [Zhang_2018_CVPR] and the Gram (style distance) metrics are shown for our results (SPST), CD [Wang_2020_CVPR] and URST [Chen_Wang_Xie_Lu_Luo_towards_ultra_resolution_neural_style_transfer_thumbnail_instance_normalization_AAAI2022]. All metrics are averages using HR paintings images used as both content and style. Best results shown in bold.

Style transfer is an ill-posed problem by nature. We introduce here an identity test to evaluate if a method is able to reproduce a painting when using the same image for both content and style. Two examples of this sanity check test are shown in Figure 5. We observe that our multiscale algorithm is slightly less sharp than the original style image, yet high-resolution details from the paint texture are faithfully conveyed. In comparison, the results of [Wang_2020_CVPR] suffer from color deviation and frequency artefacts while the results of [Chen_Wang_Xie_Lu_Luo_towards_ultra_resolution_neural_style_transfer_thumbnail_instance_normalization_AAAI2022] apply a style transfer that is too homogeneous and present color and scale issues as already discussed. Some previous work introduce a style distance [Wang_2020_CVPR] that corresponds to the Gram loss for some VGG19 layers, showing again that fast approximate methods try to reproduce the algorithm of Gatys et al. which we extend to UHR images. Since we explicitly minimize this quantity, it is not fair to only consider this criterion for a quantitative evaluation. For this reason, we also calculate PSNR, SSIM [ssim] and LPIPS [Zhang_2018_CVPR] metrics on a set of paint styles (see supplementary material) to quantitatively evaluate our results, in addition to the “Gram” metric, that is, the style loss of Equation (2) using the original Gram loss of Equation (3), computed on UHR results using our localized approach. The average scores reported in Table 2 confirm the good qualitative behaviour discussed earlier: Our method is by far the best for all the scores.

Style/content image

SPST (ours)

CD

URST

Figure 5: Identity test: a style image is transferred to itself. We compare three style transfer strategies. From left to right: ground truth style, our result (SPST), CD [Wang_2020_CVPR], URST [Chen_Wang_Xie_Lu_Luo_towards_ultra_resolution_neural_style_transfer_thumbnail_instance_normalization_AAAI2022]. First row: The style image has resolution 33754201; Third row: The style image has resolution 30954000 (UHR images have been downscaled by 4 factor to save memory). Second and fourth row: Close-up view with true resolution. Observe that our results are the more faithful to the input painting and do not suffer from color blending

Computation time.

Our UHR style transfer algorithm takes several minutes, from 16 minutes for the third row result in Figure 4 to 74 minutes for the result in Figure 2 using an A100 GPU. In comparison, fast UST methods only take a few seconds.

6 Discussion

Our work presented an extension of the Gatys et al. style transfer algorithm to UHR images. Regarding visual quality, our algorithm clearly outperforms competing UHR methods by conveying a true painting feel thanks to faithful HR details such as strokes, paint cracks, and canvas texture.

It is here that we may confess: Our iterative method is obviously slow, even though its complexity scales linearly with image size. Yet, as we have demonstrated, fast methods do not reach a satisfying quality, and fast high-quality style transfer remains an open problem to this date.

Several extensions and applications of our work can be considered. For instance, we can perform HR texture synthesis by removing the content term [Gatys_et_al_texture_synthesis_using_CNN_2015] (see supplementary material). Our two-pass procedure can be extended to any function of the Gram matrix and feature spatial means, such as the Bures metric used in [Vacher_etal_texture_interpolation_probing_visual_perception_NEURIPS2020] for texture mixing. One could also consider extending the method to the slice Wasserstein style loss [Heitz_slices_Wassestein_loss_neural_texture_synthesis_CVPR2021] using the Run-Sort-ReRun strategy [Lezama_etal_run_sort_rerun_ICML2021]. However, the memory requirements to store five VGG feature maps (or their projections) increase linearly with the size of the input image, in contrast to the size-agnostic global statistics used in this paper.

This work opens the way for several future research directions, from allowing local control for UHR style transfer [Gatys_etal_Controlling_perceptual_factors_in_neural_style_transfer_CVPR2017] to training fast CNN-based models to reproduce our results.


Acknowledgements: B. Galerne and L. Raad acknowledge the support of the project MISTIC (ANR-19-CE40-005).


Supplementary material: A full version with a supplementary material that presents additional style transfer results, gives an overview of the dataset of pictures used for the identity test comparison, and discuss the adaptation of the algorithm for UHR texture synthesis is available here:
https://hal.archives-ouvertes.fr/hal-03897715/en

References