Scaling scientific problems to 10,000,000 processors for the next generation HEC systems is today severely challenged by conventional practices of programming models, languages, and their supporting compilation systems. To achieve this goal one must expose greater degree of parallelism and improve parallel computing efficiency than is otherwise feasible with conventional methods such as MPI. The goal of this collaborative research project is to dramatically enhance the scalability of challenging physics problems, through the application of an innovative programming model. The strategy is to replace static message-passing course grained processes using global barrier synchronization in a distributed memory space with a model using dynamic message-driven multiple threads using lightweight synchronization objects in a partitioned global address space. Parallelism is to be extracted directly from the large irregular sparse and time varying data structures. Ephemeral user-threads will permit many simultaneous tasks over the data structures, exposing the intrinsic near-fine grain parallelism. System-wide latency will be hidden by overlapping computation with communication through the advanced communication strategy of asynchronous message-driven processing. Consequently this will enable a class of physics problems that cannot currently be done using conventional methods.