How cg artists compare cpu performance

Different communication libraries, including MPI, have

been studied and implemented for the SCC [11], [12],

[13]. Porting CHARM++ on top of them is a future study,

which may result in performance improvements. There

are also many other SCC-related studies going on in

the Many-core Applications Research Community of Intel

(http://communities.intel.com/community/marc).

Esmaeilzadeh et al. [14] provide an extensive report and

analysis of the chip power and performance of five different

generations of Intel processors with a vast amount of diverse

benchmarks [14]. Such work, however, does not consider

many-cores or GPUs, which are promising architectures for

the future of parallel computing.

VII. CONCLUSION

Large increases in the number of transistors, accompanied

by power and energy limitations, introduce new challenges

for architectural design of new processors. There are several

alternatives to consider, such as heavy-weight multi-cores,

light-weight many-cores, low-power designs and SIMD-like

(GPGPU) architectures. In choosing among them, several

possibly conflicting goals must be kept in mind, such as

speed, power, energy, programmability and portability. In

this work, we evaluated platforms representing the above-

mentioned design alternatives using ve scalable CHARM++

and MPI applications: Jacobi, NAMD, NQueens, CG and Sort.

The Intel SCC is a research chip using a many-core ar-

chitecture. Many-cores like the SCC offer an opportunity to

build future machines that consume low power and can run

CHARM++ and MPI code fast. They represent an intersting

and balanced design point, as they consume lower power

than heavy-weight multi-cores but are faster than low-power

processors and do not have the generality or portability issues

of GPGPU architectures. In our analysis of the SCC, we

suggested improvements in sequential performance, especially

in floating-point operation speed, and suggested adding a

global collectives network.

We showed that heavy-weight multicores are still an ef-

fective solution for dynamic and complicated applications, as

well as for those with irregular accesses and communications.

In addition, GPGPUs are exceptionally powerful for many

applications in speed, power and energy. However, they lack

the sophisticated architecture to execute complex and irregular

applications efficiently. They also require a high programming

effort to write new code, and are unable to run legacy codes.

Finally, as seen from the Intel Atom experiments, we observe

that low-power designs do not necessarily result in low energy

consumption, since they may increase the execution time

significantly. Therefore, there is no single best solution to fit

all the applications and goals.

ACKNOWLEDGMENTS

We thank Intel for giving us an SCC system and the anony-

mous reviewers for their comments. We also thank Laxmikant

Kale, Marc Snir and David Padua for their comments and

support. This work was supported in part by Intel under the

Illinois-Intel Parallelism Center (I2PC).

REFERENCES

[1] T. G. Mattson, M. Riepen, T. Lehnig, P. Brett, W. Haas, P. Kennedy,

J. Howard, S. Vangal, N. Borkar, G. Ruhl, and S. Dighe, “The 48-core

SCC processor: The programmer’s view, in International Conference

for High Performance Computing, Networking, Storage and Analysis,

2010, pp. 1–11.

[2] L. V. Kale and G. Zheng, “Charm++ and AMPI: Adaptive runtime strate-

gies via migratable objects, in Advanced Computational Infrastructures

for Parallel and Distributed Applications, M. Parashar, Ed. Wiley-

Interscience, 2009, pp. 265–282.

[3] J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl,

D. Jenkins, H. Wilson, N. Borkar, G. Schrom, F. Pailet, S. Jain,

T. Jacob, S. Yada, S. Marella, P. Salihundam, V. Erraguntla, M. Konow,

M. Riepen, G. Droege, J. Lindemann, M. Gries, T. Apel, K. Henriss,

T. Lund-Larsen, S. Steibl, S. Borkar, V. De, R. Van Der Wijngaart, and

T. Mattson, A 48-core IA-32 message-passing processor with DVFS in

45nm CMOS, in Solid-State Circuits Conference Digest of Technical

Papers, 2010, pp. 108–109.

[4] R. F. van der Wijngaart, T. G. Mattson, and W. Haas, “Light-weight com-

munications on Intel’s single-chip cloud computer processor, SIGOPS

Oper. Syst. Rev., vol. 45, pp. 73–83, February 2011.

[5] J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa,

C. Chipot, R. D. Skeel, L. Kal

´

e, and K. Schulten, “Scalable molecular

dynamics with NAMD, Journal of Computational Chemistry, vol. 26,

no. 16, pp. 1781–1802, 2005.

[6] D. Bailey, E. Barszcz, L. Dagum, and H. Simon, “NAS parallel bench-

mark results, in Proc. Supercomputing, Nov. 1992.

[7] L. V. Kale, G. Zheng, C. W. Lee, and S. Kumar, “Scaling applications

to massively parallel machines using Projections performance analysis

tool, in Future Generation Computer Systems Special Issue on: Large-

Scale System Performance Modeling and Analysis, vol. 22, no. 3,

February 2006, pp. 347–358.

[8] B. Marker, E. Chan, J. Poulson, R. van de Geijn, R. F. Van der

Wijngaart, T. G. Mattson, and T. E. Kubaska, “Programming many-core

architectures - a case study: Dense matrix computations on the Intel

Single-chip Cloud Computer processor,Concurrency and Computation:

Practice and Experience, 2011.

[9] R. David, P. Bogdan, R. Marculescu, and U. Ogras, “Dynamic power

management of voltage-frequency island partitioned networks-on-chip

using Intel Single-chip Cloud Computer, in International Symposium

on Networks-on-Chip, 2011, pp. 257–258.

[10] P. Salihundam, S. Jain, T. Jacob, S. Kumar, V. Erraguntla, Y. Hoskote,

S. Vangal, G. Ruhl, and N. Borkar, A 2 Tb/s 6 × 4 mesh network for a

single-chip cloud computer with DVFS in 45 nm CMOS,IEEE Journal

of Solid-State Circuits, vol. 46, no. 4, pp. 757 –766, April 2011.

[11] C. Clauss, S. Lankes, P. Reble, and T. Bemmerl, “Evaluation and

improvements of programming models for the Intel SCC many-core pro-

cessor, in International Conference on High Performance Computing

and Simulation (HPCS), 2011, pp. 525–532.

[12] I. Ure

˜

na, M. Riepen, and M. Konow, “RCKMPI–Lightweight MPI im-

plementation for Intels Single-chip Cloud Computer (SCC), in Recent

Advances in the Message Passing Interface: 18th European MPI Users

Group Meeting. Springer-Verlag New York Inc, 2011, p. 208.

[13] C. Clauss, S. Lankes, and T. Bemmerl, “Performance tuning of SCC-

MPICH by means of the proposed MPI-3.0 tool interface, Recent

Advances in the Message Passing Interface, pp. 318–320, 2011.

[14] H. Esmaeilzadeh, T. Cao, Y. Xi, S. M. Blackburn, and K. S. McKinley,

“Looking back on the language and hardware revolutions: Measured

power, performance, and scaling, in International Conference on Ar-

chitectural Support for Programming Languages and Operating Systems,

2011, pp. 319–332.