CSC490 Capstone Project

Conditional Sketch Generation and Completion

Input

Output

(generate something to see SVG)

Abstract

Existing sketch generation models are not very generalizable across datasets, drawing styles, and require training multiple models to generate sketches over different object classes. Furthermore, the temporal and structural information present in hand-drawn sketches is frequently lost by conventional raster-based methods. We present a new system for conditional sketch generation and completion that takes a novel approach to tokenizing sketch sequences. To improve data quality, we simplify input paths and maintain output quality by fitting strokes to Bezier curves. We train a lightweight transformer model with an added class embedding layer to learn from multiple sketch object classes at once. This allows a single model to generalize across classes and reuse learned shape priors.

Representation

representation
We represent a sketch as a sequence of strokes. Each stroke is quantized and encoded as a discrete token using a fix-sized codebook. This representation captures both the temporal and structural information present in human-drawn sketches, allowing our model to generate realistic and coherent sketches.

System

model
We designed our system and model ourselves it is based on a transformer architecture, but does not follow any existing implementation. In addition, the class embedding allows the model to condition its generation on specific object classes, enabling it to generate sketches of different types using a single model. We use fewer layers and attention heads compared to standard transformer architectures to keep the model lightweight.

Training

training
As training input we use the tokenized sketch along with an integer class label. We train our model using Cross-entropy loss, and the AdamW optimizer. This way we can effectively learn the distribution of sketch tokens and generate sketch sequences based on the learned patterns.