Deep Neural Network (DNN) accelerators provide high-accuracy data recognition that are commonly used in edge devices. However, resource constrained edge devices usually have small form factors and limited amounts of resources, whereas DNN accelerators require large amounts of computations and memory footprint. In particular, the memory requirements of DNNs become a major bottleneck for realization of accelerators on small footprint devices. To reduce the memory footprint, fused-layer convolutional neural network (CNN) accelerators have been proposed for image recognition networks. Fused-layer CNNs reduce off-chip memory accesses by executing a set of CNN layers on-chip and by replacing the off-chip memory accesses with on-chip memory accesses between the layers. Although fused-layer CNNs reduce the off-chip memory accesses, they still require large amounts of on-chip memories that occupy significant chip area, making impractical their realization on small footprint devices. We address this problem by proposing a design technique that generates fused-CNN accelerators with small memory footprints, demonstrate its potential via a case study.