In this paper, we present a variation of the Modulo Scheduling algorithm to exploit software pipelining in the high-level synthesis for FPGA architectures. We demonstrate the difficulties of implementing software pipelining for FPGA architectures, and propose a modified version of Modulo Scheduling that addresses memory resource conflicts and the circular dependencies. The experimental results demonstrate significant speedups as compared to the non-pipelined high-level synthesis results and the software-pipelined results using traditional Modulo Scheduling algorithm.