Max Domeika, co-chair of the Multicore Programming Practices (MPP) Group and Intel developer, discusses the intersection of multicore and embedded markets.
By John Blyler, Editorial Director
Max Domeika, senior software engineer in the Developer Products Division at Intel, sat down with LPE Consulting Editor John Blyler to talk about the growing importance – and intersection – of both the multicore and embedded markets. What follows are excerpts of that conversation. – JB
Blyler: Intel’s software focus seems to be following its hardware processor drive into both multicore and embedded markets. What challenges does that bring for traditional software developers?
Domeika: Coming from a background in the desktop software application space at Intel I’m now spending more time working in the embedded multicore arena. This year I’ve been particularly focused on power issues, primarily on the Atom processor. My task is to see what sorts of tools are needed to help developers move to both embedded and multicore applications. Already I see a long term need for both power optimization and power measurement tools. The key is to monitor the power-related impacts of your application on the specific and overall system performance.
In the past, desktop clients and server users haven’t had to pay much attention to power. Desktop systems plug into the wall and their software applications use as much power as the processor wants to give them. Over the past several years these processors have incorporated features that help control the amount of power usage in both C (idle) states and P (operational) states.
Blyler: How do these processor states affect the development of software applications?
Domeika: In the past, processors either ran at full speed or idle. Several years ago hardware designers added features to the chips to control how deep of a sleep the processors are in. As you know, these features allow different portions of the chip and caches to be turned-off. One of the challenges is that while deep sleep saves more power, it often takes more power to wake up.
For the software developer, this means that you don’t want the application to enter the deepest C-state if you will have to wake up immediately. There needs to be some smarts as to how deep a sleep you go into, which is really an operating system issue. P states, or operational states, utilize varying frequency and voltage to balance the amount of the execution that the OS determines is needed. These states directly affect the performance of the system.
These states can have a big impact on the application, restricting how developers write the code. If your application causes the processor to perform poorly, that will have a negative effect on power utilization. Developer need more mature tools to help figure out which application processes result in effective use of the C-states.
I see a need for continued maturity of the power optimization and power measurement tools and methodologies. These power tools must also be tied to traditional performance analysis and optimization tools, because many of the techniques for mitigating power are the same techniques that you use for traditional performance optimization. The entire system must run as quickly as possible while using as little power as possible.
At the other end of the spectrum are the same issues of power optimization and performance analysis, but applied to a multicore environment. Tools and methodologies need to mature to include multicore development, too.
Blyler: Are the two worlds of embedded and multicore coming together? After all, Intel’s Atom isn’t yet a multicore architecture, is it?
Domeika: Well, some instances of the processor already support hyper-threading, a technology that dates back to the Pentium 4 processor. The key here is hyper-threading, which makes the environment look like two (or more) processors from the point of view of the operating system. That’s why the software techniques that developers use on multicore are starting to have an impact on embedded applications targeting the Atom processor.
Blyler: Isn’t the low power push also affecting the high-end embedded and multicore processors like the Xeon?
Domeika: Power is important across the board. We’ve seen power optimization become important in servers – especially with the “greening” of data centers.
Blyler: Let’s talk about the actual tool environment for both power issues and multicore design. What’s happening there?
Domeika: Today, I’d summarize the tool environment as consisting of a collection of separate tools and techniques. For example, if you want to do power optimization then you might use Power Top – an open source tool for doing power measurement. Conversely, if you want to do performance analysis – to count cache misses, branch mispredictions, memory fetches and the like – using performance monitoring counters, you might use another open source tool called OProfile. Intel also has a tool called the VTune Performance analyzer.
These tools show what performance issues are occurring on the chip, which in turn helps the developers to optimize their code. For example, if you see examples of high cache miss rates, you can investigate to see what portions of the code are causing this problem. This might mean that the data structure of the application needs to be changed to get better cache performance. Performance and power tools give the developer a means of getting valuable feedback from the hardware.
Blyler: Most desktop application developers are well versed in Microsoft’s Visual Studio IDE. What tools are available for these developers as they move toward multicore applications?
Domeika: Intel has the Intel Parallel Studio, which integrates well with Microsoft’s Visual Studio for multicore (parallel) code development. Parallel Studio is not targeted at embedded folks, but rather at the desktop client environment. Intel has tools that also help with the programming interface, compiler, libraries and more. With regard to debugging, we have a set of enhancements that integrate into the MS Visual Studio to help with parallel debug.
Debugging is a key issue. While developers could someday spawn 64 threads on a multicore chip – because they have 64 cores – that is not the best way to begin. In multithread implementations it’s best to start with one thread and make sure the program works, then we’ll move up to two and four, etc. Good debugging tools provide easy mechanisms to start debugging one thread then scale to more cores, i.e., they are serially consistent.
Another challenge in multicore development is in the area of configuration control. You may have multiple threads running on multiple cores, but you don’t want multiple version of the same code. Instead, you want one version of the code with a parameter that you can change that will change as your processor cores change. Again, good debugging tools have those configuration control features.
Blyler: Tools work best when they are following a set methodology. You’re co-chair with David Stewart from CriticalBlue on the Multicore Programming Practices (MPP) Working Group – part of the Multicore Association for which Markus Levy is CEO. Please give the readers a quick update of the MPP.
Domeika: As you know, our focus is on documenting best-known methods for multicore software development techniques. This year we are in the middle of documenting the best practices. Internal review of this document should begin shortly. We hope to have it ready for external review by the first half of next year.
This document will fulfill a need that is oftentimes overlooked, namely, what are the best practices using the technology that is available today. We have customers that are becoming more and more aware of the challenges of multicore software development. But we are still building awareness and educating the larger group of mainstream programmers. Even when mainstream developers identify the need for a multicore program, they are often stuck with their existing code. Not everyone has the resources, time or need to complete rewrite their legacy code. The MPP document will provide mainstream programmers with a workable set of best practices for multicore development throughout the typical life cycle development process: analysis, design-implementation, debug and the performance tune-up phase.
That’s the big vision. We just have to keep executing. This is all volunteer work, so it’s not something where I can say, ‘Hey, let’s meet this schedule in two weeks. You have to do it.’ Instead, we just have to keep the momentum going. I am pretty pleased with the progress. The feedback from our internal surveys is positive
Blyler: The goal of your best practices working group is to use existing languages like C/C++ to develop multicore applications. Do you see the need to create new programming languages?
Domeika: The Multicore Association has other working groups that are developing standards for new software approaches, such as those focused on multicore communications and runtime APIs. The overall plan is to incorporate best-known techniques using those APIs as we move forward. But it’s hard to predetermine the best-known methods before the APIs are available. We won’t know until we get there, but we can’t wait for one to proceed before starting the other.
Intel is working on several technologies both language extensions and new APIs so yes, there is a need for technology; I’m not so sure on new programming languages. In embedded, C and C++ are going to be with us for sometime, so I’d say there’s probably less need in embedded.
Originally posted on Chip Design