Sunday 26 February 2012

The Qualcomm Snapdragon S4 (Krait) Preview Part II

by Anand Lal Shimpi on 2/22/2012 11:40:00 PM
Posted in smartphones , Tablets , nvidia , Qualcomm , Krait , Snapdragon , Tegra 3

Yesterday we presented the first results of Qualcomm's Krait based MSM8960 SoC. While we still await the first Krait based phones (widely expected to begin shipping sometime in Q2), courtesy of Qualcomm's MSM8960 Mobile Development Platform we were able to get a good idea of the upper bound for Krait and MSM8960 performance. I mention it's the upper bound because, at least in the past, MDP performance hasn't corresponded directly to shipping device performance. There was a pretty big delta between MSM8660 MDP performance and phones that used the MSM8660. Qualcomm tells us that this time around things are going to be different. Qualcomm is expecting a much narrower (nonexistent?) gap between the MSM8960 development platform and phones that use MSM8960 silicon. One major difference between the MSM8960 MDP and our earlier MSM8660 MDP was the state of the CPU governor. In the earlier MDP the governer was set to max performance, always delivering the CPU's maximum clock frequency. With the MSM8960 platform the governor was set to ondemand, allowing for variable CPU speeds depending on what the OS requests of the device. The ondemand setting is in-line with what we can expect device manufacturers to use when they ship phones. All of this goes to say that while we have a good handle of what Krait and the MSM8960 are capable of, there are still a lot of unknowns.

While it's true that shipping performance remains to be seen, some of the deltas we saw between MSM8960 and the current competition were so great that even a much slower implementation in a shipping phone would still be significantly faster than anything else out today.

We left our MSM8960 investigation with two major unknowns. The first was power consumption. We still haven't been able to get Qualcomm's Trepn tool running on the MSM8660 MDP, which has always been a bit finicky. To get a true feel for MSM8960 battery life we will have to wait for shipping devices. The other major unknown was really how MSM8960 stacks up against NVIDIA's Tegra 3.

Tegra 3 was everything Tegra 2 should have been. We got higher clocks, NEON support and a much faster GPU. The only thing missing from Tegra 3 was a dual channel memory interface. We were happy with Tegra 3 on ASUS' Eee Pad Transformer Prime, but in less than a week we'll get to meet some of the first smartphones based on T3 silicon.

Armed with the Eee Pad Transformer Prime (updated to Ice Cream Sandwich) we're able to get a rough idea of how these two heavyweights will compare. The same caveats that applied to the MDP apply to our Tegra 3 platform as well. Since we are using a tablet we're obviously dealing with a higher TDP than what you'll find in a phone. The comparison today is largely academic and naturally shipping devices may be better or worse that these two representatives. With the disclaimers out of the way, let's get to the comparison.

CPU Performance: Preferring Single vs. Multithreaded Performance

The MSM8960 features two Krait cores compared to the four ARM Cortex A9 cores in NVIDIA's Tegra 3. While the A9 is a very power efficient core, Krait offers a much wider front end, wider execution back end, faster FPU and an improved cache/memory interface. All of these factors together combined with similar clock speeds to what Tegra 3 is able to hit should result in better absolute performance in single or lightly threaded applications. As video decode and transcode are both fully offloaded in all modern SoCs, finding workloads that scale well across more than two cores is difficult. We noted this in our Eee Pad Transformer Prime review - it's just not easy coming up with current apps that scale well to four ARM cores. That's not to say that there are no advantages to more than two cores, but you're more likely to get a benefit from two faster cores vs. four slower ones.

 

 

NVIDIA's saving grace is the fact that it did ramp up A9 clock speed very high in Tegra 3, and it has that handy companion core 4-PLUS-1 architecture to keep power consumption low throughout very light workloads. There's also the fact that while very few smartphone apps will peg four cores constantly, there are periods of time when you'll see more than two cores in use. Multitasking, although more likely to happen in significant amounts on a tablet, can also increase usage of the third and fourth cores on Tegra 3.

We'll start with Linpack, our heaviest floating point/cache/memory bandwidth test:

Linpack - Single-threaded

Single threaded floating point performance is obviously a strength of the MSM8960 and Krait. Qualcomm tells us that Krait is able to multi-issue floating point instructions, something that the Cortex A9 cannot do. The MSM8960 memory controller also appears to be more efficient than previous designs, contributing to the magnitude of the win here.

Move to more threads and the situation doesn't change dramatically, although Tegra 3 is obviously far more competitive thanks to its sheer core count:

Linpack - Multi-threaded

Javascript performance can be multithreaded at times but most of the benchmarks we run don't scale incredibly well beyond two cores. Making matters worse is the fact that SunSpider performance regressed on the Eee Pad Transformer with the latest update to ICS. I've included the old Honeycomb results as a reference for where things should be. Keep in mind that the Honeycomb browser on the Eee Pad Transformer was very heavily optimized for Tegra 3. It's possible that the same degree of optimizations just aren't present in the ICS version yet.

SunSpider Javascript Benchmark 0.9.1 - Stock Browser

Browsermark tells a different story. Here the Tegra 3 based Transformer Prime is actually able to be slightly faster than the MSM8960. The margin of victory is small enough to be a wash, but the fact that NVIDIA is able to remain competitive is important.

BrowserMark

Basemark OS echoes more of what we'd expect. In the overall score the MSM8960 is around 50% faster than the Tegra 3 based tablet. Even if the MSM8960 MDP is unrealistically fast for a Krait platform, it's likely that we'll still see a Krait advantage.

Basemark OS - System HTC RezoundGalaxy NexusASUS Transformer PrimeMDP MSM8960System Overall Score658538602907Simple Java 1298 loops/s210 loops/s240 loops/s375 loops/sSimple Java 27.28 loops/s8.61 loops/s7.27 loops/s10.8 loops/sSMP Test35.3 loops/s49.2 loops/s81.2 loops/s64.4 loops/s100K File (eMMC->SD)6.49 mB/s9.52 mB/s11.0 mB/s8.64 mB/s100K File (SD->eMMC)33.0 mB/s17.8 mB/s14.5 mB/s39.8 mB/s100K File (eMMC->eMMC)37.8 mB/s34.5 mB/s29.7 mB/s48.9 mB/s100K File (SD->SD)8.47 mB/s8.30 mB/s8.06 mB/s12.7 mB/sDatabase Operation10.0 ops/s5.73 ops/s4.56 ops/s19.4 ops/sZip Compression0.509 s0.848 s0.637 s0.561 sZip Decompression0.097 s0.206 s0.089 s0.073 s

Most of the Basemark tests are lightly threaded, but looking at the SMP test gives you another example of Tegra 3's strengths given the right workload. With the right application, Tegra 3 can be faster than the MSM8960, however it's still our opinion that you're more likely to find a lightly threaded workload on a smartphone than you are going to encounter something that scales well to four cores.

GPU Performance CPU Performance: Preferring Single vs. Multithreaded Performance GPU Performance Final Words Print This Article 45 Comments View All Comments Post a Comment Great review! by wimbet on Thursday, February 23, 2012 Thanks for posting comparisons with Tegra 3. It will be real interesting to see how OMAP4470 and Exynos4412 match up. I have a feeling we will see a lot more of OMAP5 and Exynos5250 at MWC as well. wimbet Reply When? by AmdInside on Thursday, February 23, 2012 When are Krait 4 phones due? Still a while before my plan expires but just curious. AmdInside Reply RE: When? by infra_red_dude on Thursday, February 23, 2012 Apparently MWC will see the launch of MSM8960 consumer devices. infra_red_dude Reply Justice by douglaswilliams on Thursday, February 23, 2012 "I do hope the device vendors do these SoCs justice."

"Will Moore's Law, and the 28nm LP process in particular, be enough to offset the power consumption of a higher performance Krait core under full load? Depending on how conservative device makers choose to build their power profiles we may get varying answers to this question."

Anand, perhaps some justice could lie in allowing user selectable power profiles, as on laptops. Let the user jump to a performance profile while playing a game or plugged in. Is that a possibility? Or will they just attempt to do that automatically in their stock power profile? douglaswilliams Reply RE: Justice by Wishmaster89 on Thursday, February 23, 2012 Asus already did something like that with transformer prime so there's a possibility that with Krait powered padfone they could do the same thing. And don't forget that up till now Asus was quite good when it came to optimized software. So I have high hopes for Krait based Asus padfone with LTE radio.
Perfect when connected to the tablet docking station. Wishmaster89 Reply RE: Justice by Pipperox on Thursday, February 23, 2012 It already happens, sort of.
Any CPU governor will lower the CPU clock for light workloads, and max it out for games.
It's for the intermediate situations where you can see a big difference.

Anyway, this will be used on Android phones.
Hence, it'll be rooted in the blink of an eye, and custom kernels will offer multiple choices to the users concerning governors, so battery or performance optimized profiles. Pipperox Reply who cares by ratn9ne on Thursday, February 23, 2012 At&T doesn't even sell the galaxy nexus yet... so I expect this to be out sometime 2015. ratn9ne Reply Sunspider perf by Loki726 on Thursday, February 23, 2012 Anand,

Can you comment on the Tegra 3 perf difference on sunspider in this review compared to your previous transformer prime review? This figure shows a score of 2300, and the previous figure in the transformer prime review shows a score of 1600. That's a pretty significant difference. Is there some change in the configuration that can explain this?

I saw that you mentioned a regression going from honeycomb to icecream sandwich, but then
you say that you included the faster honeycomb results.

Thanks Loki726 Reply RE: Sunspider perf by rahvin on Friday, February 24, 2012 Without looking at the previous review, this review was clear that the Transformer had recently been upgraded to ICS. For those of you that haven't used ICS yet, it's significantly faster than Gingerbread on the same hardware. rahvin Reply UE 3 benchmarks by Gideon on Thursday, February 23, 2012 "Oh the things I would do for an Unreal Engine 3 benchmark on Android..."

Second that !

I don't think I have seen a single review/preview of a phone on Anandtech for the last year that doesn't include the same message. Hopefully the devs will finally notice. Gideon Reply Subject Comment Post Comment Please login or register to post a comment.
User Name Password Remember me? Login 1 2 3 4 5 Next » View All Comments Post a Comment Follow AnandTech
Latest from AnandTech Pipeline Submit News! ADATA Releases Three SSDs - Maximizing The Capacity of SandForce Drives Google Updates Chrome for Android to 16.0.912.77 Nvidia Announces LTE Partnership with Renesas Mobile and GCT Semiconductor Qualcomm Expands its Gobi Brand to include entire MDM portfolio T-Mobile Makes 2013 LTE on AWS plans official in Earnings Report TI Posts OMAP 5 Dual Core A15 vs Quad Core A9 Video LG's Optimus 4X HD: 1.5GHz Tegra 3 Smartphone w/ 4.7-inch HD Display to Launch at MWC 2012 NVIDIA's Rebrands Tegra 3's vSMP as 4-PLUS-1 Intel 313 Series SSDs Launching Soon Samsung Galaxy Ace 2 and mini 2 Announced: Mystery SoC Within NVIDIA Posts GeForce 295.73 Driver Package BlackBerry PlayBook OS 2.0 Available Today DailyTech Panasonic, et al. Consolidate 3G/4G Chips Into Single Chip for Intl. Phones CA Man Prevails in AT&T 3G Data Throttling Lawsuit Chevrolet Volt Production Restarted at Detroit Plant; Volts Being Sent to California 2/24/2012 Daily Hardware Reviews German Lawsuit Loss Forces Apple to Cut Verboten iPad, iPhone Feature Video Billboard in UK Specifically Targets Women Microsoft Office 15 Touch Mode Revealed Apple CEO Tim Cook: We Have More Money Than We Need Ford's New Fusion Ditches Interior Incandescent Bulbs, Goes All LED T-Mobile Announces Q4 2011 Results, Loss of Over 700,000 Customers Motorola Demanded $22.50 Per Windows Laptop, Microsoft Complains to EU Stanford Creates Wireless, Self-Propelling Medical Implant 2/23/2012 Daily Hardware Reviews Quick Note: Original ASUS Transformer Tablet Gets Ice Cream Sandwich Upgrade UC San Diego Develops Injectable Hydrogel for Cardiac Tissue Repair VIA Outs World's First Quad-Core Mini-ITX Boards for HTPCs Chinese Court Allows Apple to Continue Selling iPad in China Twitter @AgentKyle SF is really trying to address incompressible performance, the next major jump will come with ONFI 3.x class NAND More pre-MWC news - @anandtech: Nvidia Announces LTE Partnership with Renesas and GCT Semiconductor http://t.co/0KaY9RVC RT @anandtech: Qualcomm Expands its Gobi Brand to include entire MDM portfolio http://t.co/oaojaxvd RT @anandtech: TI Posts OMAP 5 Dual Core A15 vs Quad Core A9 Video http://t.co/b6WisDvk @PenLlawen anytime sir :) @PenLlawen @jdg @rawsoncj I quietly mentioned the notification stuff here: http://t.co/Dp4dSm1v and the MAS requirements @PenLlawen @jdg I honestly haven't done much digging to see where push notifications are used (if at all), but both exist. @PenLlawen both remote and local notifications are supported in mountain lion @pewild thanks for reading the site :) @JonathanHoover thanks for reading :)  

Copyright © 1997-2012 AnandTech, Inc. All rights reserved. Terms, Conditions and Privacy Information.
Click Here for Advertising Information Quantcast

No comments:

Post a Comment