This project aims at a systematic approach that integrates multi-level IC design automation flow by leveraging the parallel computational power in current and upcoming multi-core/many-core processors. The main challenge in this approach is to provide responsive high-level design decisions while incurring the massive computational cost of lower-level design algorithms. Fortunately, recent trends in processor architectures provide a solution. Compared with traditional uniprocessor systems, emerging multi-core/many-core microprocessors have far more computational power but limited global memory access capabilities. Parallel computational power may make it possible to break the boundaries of the existing IC design hierarchy and to vertically integrate the IC optimization fow. The potential benefits of an integrated solution are significant - accurate high-level design decisions become possible as low-level design details are derived concurrently, and low-level designs can also greatly benefit from high-quality high-level decisions. Together, an efficient and high-quality design flow becomes feasible, making design closure faster, thus saving time and reducing costs.
The proposed work has the potential to overcome the key limitations of existing hierarchical IC CAD technologies and enable new IC design automation solutions, which in turn can benefit the IC and semiconductor industry. The PIs will work together with their industrial collaborators to develop and commercialize the proposed work. The project will have beneficial impact on education. The PIs intend to educate the next generation of software developers and practicing engineers in the design and innovation of IC design automation and parallel computing.
This project aims to develop parallel program technologies to tackle the increasing computational complexity across the IC design and optimization flow. It leverages the fast-growing computing power of multi-core and many-core microprocessors, as well as emerging big data computational software platform. Throughout the course of the project, our research has focused on parallel CAD algorithm design and optimization, parallel programming using concurrent programming language, and distributed data management for scalable parallel data processing. The project can benefit the IC and semiconductor industry, as well as educate the next generation of software developers and practicing engineers in the design and innovation of IC design automation and parallel computing. Our research has focused on parallel programming model research. We propose a methodology to explore concurrency via nondeterministic transactional algorithm design, and to program them on multicore processors for CAD applications. We apply the proposed methodology to the min-cost flow problem which has been identified as the key problem in many design optimizations. A concurrent algorithm and its implementation on multicore processors for min-cost flow have been developed based on the methodology. Our study shows that, compared to sequential algorithm design, an average of 2.52X and a maximum of 3.01X of speedup can be achieved. We have investigated the problem of mapping and scheduling a synchronous data flow graph onto a multi-core platform. A two-level heuristic algorithm has been developed to solve this problem on multi-core processor. The experimental results show that our two-level heuristic algorithm achieves near-optimal solution compared with the enumeration algorithm, and achieves 38.8% less buffer usage on average, compared with the greedy algorithm. We have developed parallel-friendly algorithms for the cell-type selection problem in modern high-performance low-power designs. We provide techniques to address the diffculties that the traditional continuous approaches encountered when applied to real life designs. The proposed algorithm is parallel-friendly in nature and has the potential to significantly accelerate cell-type selection on many-core systems. We have developed a new linear max K-min classifier for 2-class classification problem. To address the high computational complexity problem for large data set, we transform the optimization objective function into a linear programming problem. We next extend the proposed work to the general maximum K-min based classification problem. Our study shows that, the classification performance of the proposed classifiers is competitive with Hinge Loss classifiers. We have studied parallel CAD programming using concurrent programming languages. We adopt Go, a concurrent programing language, which has been widely used in cloud backend parallel programming and data processing. We have designed and implemented the proposed linear max K-means classifier using Go. Our study shows that, for concurrency optimization algorithms, Go offers comparable single-thread computational efficiency and superb scalability. Go also offers superior ease-of-programming advantages for massively concurrent algorithm design and implementation. Data access has been a key performance limitation for parallel algorithms with massive-scale dataset. Specifically, fetching data on-the-fly from disks introduces significant performance penalty. To tackle this problem, we have investigated how to use Memcached to build a unified in-memory data management layer to support parallel CAD algorithm design. Our study shows that, the Memcached approach offers comparable per-data item access efficiency and superior parallel scalability over conventional in-memory pointer based approach. We have explored Spark based in-memory parallel processing infrastructure. Leveraging Sparkâ€™s resilient distributed dataset, we have implemented parallel algorithms using distributed in-memory data processing. Specifically, data being processed by each node is distributed and kept in the local memory of the specific computing node. Using Spark infrastructure, we have developed a set of parallel algorithms, including stochastic gradient descent, connected components, polygonal overlay, NB multinomial, passive-aggressive, and min-cost max-flow. Each algorithm is implemented using mapreduce programming model, and executed on Google compute engine cluster. Dataset ranging from 250MB to 1.2GB for each benchmark is partitioned and distributed among the computing nodes. Our study shows that, the parallel design offers 2.5X to 7.2X performance improvement over the corresponding sequential algorithm implementation.