Equinox IT Blog

Is generative AI about to have its PC Moment?

Is generative AI about to have its PC Moment?
2:33

 We may be seeing the start of another shift in generative AI: away from predominantly cloud-focused inference and towards a more hybrid ecosystem, with viable on-premises and portable AI compute options starting to land.

Is generative AI about to have its PC Moment?

AI-generated image (OpenAI DALL-E)

This year has seen a steady stream of NVIDIA GB10-based “AI supercomputer” announcements, with examples from MSI, GIGABYTE and ASUS now available for pre-order in New Zealand. Each delivers ~1,000 TOPS (FP4)* with 128 GB of unified memory, though on comparatively slow LPDDR5X at ~273 GB/s.

Dell has also flagged a GB10-based Pro Max with GB10, while reports suggest some variants of its Pro Max 18 Plus laptop may ship with a discrete Qualcomm AI 100 Inferencing Card, with estimates of over ~450 TOPS (INT8)* with 64 GB of dedicated memory. That is a major step up from today’s laptop NPUs, which mostly sit in the 38–50 TOPS (INT8)* range.

Meanwhile, a rumoured NVIDIA-MediaTek ARM-based N1 laptop and N1-X desktop SoC promises 180–200 TOPS (INT8)* and up to 128 GB of unified memory, targeting the thin-and-light segment, though timelines appear to have slipped more than once.

This is not a new thing

We have seen this kind of shift before - from mainframes to PCs, and later to a hybrid of edge devices connected to the cloud.

Generative AI first made its mark in the cloud, but with open-weight models enabling local inference and new classes of on-premises and portable AI compute becoming available, it could be about to have its own “PC moment.”

If that is the case, the implications could be significant. Organisations may begin to rely less on hyperscaler-only AI compute and more on a hybrid of local and cloud AI compute.

What happens now?

Such a shift could reshape the economics, security and governance of AI investments, with procurement conversations moving beyond cloud consumption costs to also include fleet refresh decisions.

At the same time, hyperscalers are not about to walk away from their specialised infrastructure, custom silicon and cost optimisation at scales most organisations cannot match. For many workloads, the economics of cloud-based AI, with its elasticity and fully managed services, will likely remain compelling.

Organisations that can successfully anticipate if or when a shift to more hybridised LLM inference happens will be better positioned to optimise their AI investments.

 

*: FP4 and INT8 TOPS are not directly comparable. They reflect vendor-quoted peak performance in different precision modes. Memory bandwidth also varies significantly and can strongly influence real-world throughput.

 

Download From Overspend to Advantage whitepaper

Cloud spending continues to surge globally, but most organisations haven’t made the changes necessary to maximise the value and cost-efficiency benefits of their cloud investments. Download the whitepaper From Overspend to Advantage to learn about our proven approach to optimising cloud value.

 

Subscribe by email