This research effort is developing new electronic devices and mapping software to improve the speed, power-efficiency, and cost of digital electronics. They start with the concept of the Field-Programmable Gate Array (FPGA), logic chips that can be programmed and reprogrammed to implement complex digital circuits. FPGAs are an important driver for the semiconductor industry, reaching almost $3B in annual worldwide sales. Current FPGAs are essentially seas of 1-bit compute units, each configured to do one function over and over. To support more complex operations modern devices have a sprinkling of more complex units, including multipliers and memories, which have a more multi-bit flavor. All the components of these devices are interconnected via a static, single-bit routing network, and are primarily programmed in hardware description languages such as Verilog or VHDL. An FPGA single-bit programmability provides a great deal of flexibility for creating arbitrary logic, but has significant inefficiencies as well. Word-based architectures, that compute and route multi-bit values simultaneously, can be much more efficient than standard FPGAs. Word-based alternatives to FPGAs exist, such as CGRAs and MPPAs, but limitations in their control systems significantly reduce their quality and usefulness for many applications. One of the major thrusts of this work is to merge together the customizable logic of FPGAs with the time-multiplexing ability of MPPAs and CGRAs, as well as the complex control flow supported by modern multi-core CPUs. Unlike a standard FPGA, that statically configures all of its resources to do a single task, this system allows each compute element in the device to run a small program. This provides a significantly greater compute density in these devices. However, to boost this even further, they are exploring mechanisms to make use of branching and conditional operation. Specifically, where a microprocessor might take a branch based upon a loop condition or as part of an if-then-else construct, their hardware system can either change the instructions loaded during that cycle, or branch to a different portion of the overall operation. However, unlike MPPAs and CGRAs, their system can perform data-dependent instruction selection within a large, automatically mapped computation region operating in lock-step. Alternatively, for control-heavy portions of a computation they can embed complete, simple VLIW processors into the fabric of their system. To support these efforts, they are developing new compilation strategies to convert computations into efficient implementations on these architectures. They are also looking at the hardware resources required to support these operations. This includes methods for stalling portions of the array when their communication demands temporarily cannot be met, as well as mechanisms to synchronize the program counters of regions of the array operating in lock-step. When combined, they estimate these systems will provide an order of magnitude improvement in area-power product, and at least a factor of 2 performance improvement, over FPGAs. The resulting hardware and software systems should be able to significantly reduce the power consumption, lower the cost, and increase the speed of a large swath of electronic systems. Also, their improved programming models will make these systems easier to develop and maintain. This effort also includes a focus on improving the diversity of the engineering workforce at both the graduate and undergraduate level, with mentoring and research opportunities at each level. All of these activities are done within an overall effort towards outreach to underrepresented groups.