Qualcomm held a one day analyst event in San Diego updating us all on their AI research. Pretty amazing stuff, but the big news is yet to come with a new Oryon-based Snapdragon expected this Fall, and perhaps a new Cloud AI 100 next year.
Qualcomm was pretty confident last week that their AI edge in, well, the edge, is strong and getting stronger. Most of what they covered was known, and we covered a lot of it here and elsewhere. But there were a few hints of some big updates coming as well.
Qualcomm AI: Bigger is better if you can make it smaller.
While the rush is on to make AI ever larger, with mixture of expert models exceeding one trillion parameters, Qualcomm has been busy squeezing these massive models down so they can fit on a mobile device, a robot, or in a car. They say you can always fall back on the cloud if needed for larger AI, but the pot of gold is in your hand: your phone.
Qualcomm has invested in five areas that enable these massive models to slim down. While most AI developers know about quantization and compression, distillation is newer and really cool, where a “student model” mimics the larger teacher model but runs on a phone. And speculative decoding is getting a lot of traction as well. Add it all up, and smaller can be much more affordable than the massive models while still yielding the quality needed.
Qualcomm showed off some data that says an optimized 8B parameter Llama 3 model can yield the same quality as a 175B parameter GPT 3.5 Turbo model.
So, all this AI is available to developers on the Qualcomm AI Hub, which we covered here, and runs really fast on Snapdragon 8 Gen 3 powered phones. Spokespeople said that this tiny chip, which runs on less electricity than an LED lightbulb, can generate AI imagery 30 times more efficiently than data center infrastructure in the cloud.
More interesting, Qualcomm confirmed that it will announce the next step in Snapdragon SoCs this fall, and that it would be based on the same Oryon cores that power its laptop offering, the Snapdragon X Elite. Stay tuned!
The Data Center: Qualcomm is just getting started
The Cloud AI 100 Ultra has been getting a lot of wins of late, with nearly every server company providing support as well as public clouds like AWS. Cerebras, the company that brought us the Wafer Scale Engine, is collaborating with Qualcomm as their preferred inference platform. And NeuReality also selected the Cloud AI100 Ultra for its Deep Learning Accelerator in their CPU-less inference appliance.
The reason for all this attention is simple: The Cloud AI 100 runs all the AI applications you might need at a tiny fraction of the power consumption. And the PCIe card can run models up to 100 B parameters, thanks in part to the larger on-card DRAM. The Net-net: The Qualcomm Cloud AI 100 Ultra delivers two to five times the performance per dollar over competitors in generative AI, LLMs, NLP, and computer vision workloads.
And for the first time we are aware of, a Qualcomm engineer confirmed that they are working on a new version of the AI100, probably using the same Oryon cores as the X Elite, which they acquired when the company bought Nuvia. We expect this third-generation of Qualcomm’s data center inference engine will focus on generative AI. Ultra has established a strong foundation for Qualcomm, and the next generation platform could be a material added business for the company.
Automotive
Qualcomm recently said its automotive business “pipeline” had increased to US$30 billion, thanks to its Snapdragon Digital Chassis. Up more than US$10 billion since its third quarter results were announced in last July, this is over twice the size of Nvidia’s auto pipeline, which the company disclosed to be some $14B in 2023.
Conclusions
We recently said that Qualcomm is becoming the juggernaut of AI at the edge, and the session with Qualcomm executives last week reinforced our position. The company has been researching AI for over a decade, having had the foresight to recognize that they could use AI to their advantage over Apple. Now they are productizing that research in silicon and software, and has made it all available on the new AI Hub for developers. The addition of Automotive and Data Center inference processing helps grow revenue in new markets and diversifies the company from its roots in modems and Snapdragon mobile business. Finally, the company has doubled down on its Nuvia bet, in spite of the two year old litigation with Arm over licensing.
Disclosures: This article expresses the author’s opinions and
should not be taken as advice to purchase from or invest in the companies mentioned. Cambrian-AI Research is fortunate to have many, if not most, semiconductor firms as our clients, including Blaize, BrainChip, Cadence Design, Cerebras, D-Matrix, Eliyan, Esperanto, GML, Groq, IBM, Intel, NVIDIA, Qualcomm Technologies, Si-Five, SiMa.ai, Synopsys, Ventana Microsystems, Tenstorrent and scores of investment clients. We have no investment positions in any of the companies mentioned in this article and do not plan to initiate any in the near future. For more information, please visit our website at