Grasp AI Modern Conversion Website

The Challenge

If you run Spark workloads on Microsoft Fabric, you've probably noticed the same pattern I see with customers: data volumes keep growing, refresh windows keep shrinking, and the Spark jobs that ran fine six months ago now need constant tuning. Cluster scaling becomes the default answer, which means costs climb alongside the data.

The root cause isn't Spark itself — it's the JVM execution layer underneath it. Row-based processing, garbage collection pauses, and limited use of modern CPU instructions (SIMD) mean you're leaving performance on the table. That's not a controversial take; it's physics. Modern analytical data sits in columnar formats like Parquet and Delta. Processing it row-by-row is inherently wasteful.

Microsoft's answer is the Native Execution Engine — a C++ vectorised execution layer that sits beneath Spark and accelerates compute-heavy operations. No code changes. No new APIs. No additional cost. Just faster jobs.

What's Changed

The Native Execution Engine replaces Spark's traditional JVM execution path for supported operators with native C++ code built on two open-source projects: Velox (Meta's vectorised execution engine) and Apache Gluten (the bridge between Spark's query plans and Velox).

Here's how it works in practice:

Spark builds the logical and physical plan as normal — all existing optimisations (adaptive query execution, predicate pushdown, column pruning) still apply.
Gluten intercepts the physical plan and identifies which operators can run natively.
Those operators execute in Velox using vectorised C++ kernels over columnar data.
Anything unsupported falls back to standard Spark execution, with automatic columnar-to-row conversions at the boundary.

The key insight: this isn't a fork of Spark or a separate runtime. It's a drop-in acceleration layer. Your existing notebooks, pipelines, and applications work unchanged. You toggle a single config setting — spark.native.enabled — and the engine activates.

Where the Gains Come From

The performance story rests on three pillars:

Columnar execution — Instead of processing data row by row, the engine works on contiguous columns. This dramatically improves CPU cache efficiency because adjacent values in memory are the ones you're actually computing over.

SIMD vectorisation — Modern CPUs can apply a single instruction to 8, 16, or even 32 values simultaneously. The native engine exploits this at every opportunity — filters, projections, hash joins, aggregations. Traditional JVM Spark barely touches these instructions.

No GC overhead — C++ manages memory directly. No garbage collector means no unpredictable pauses, no stop-the-world events, and no memory pressure from object headers and boxing overhead.

Microsoft's TPC-DS benchmarks at scale factor 1000 show up to 6x faster performance compared to open-source Spark, which translates to roughly 83% compute cost savings on the same cluster. Real-world results vary, but even a 2-3x improvement changes the economics of your data platform.

The Fallback Safety Net

Not every Spark operator runs natively yet. When the engine hits an unsupported expression, it falls back to JVM execution — automatically, without failing the job. The new Spark Advisor integration surfaces these fallbacks in real time within your notebook, so you can see exactly which operations triggered them.

This is a smart design choice. Rather than blocking workloads that use unsupported features, the engine degrades gracefully. You get native speed where possible and standard Spark everywhere else.

Getting Started

Enabling the engine takes about 30 seconds:

At the environment level — Open your Fabric environment settings, find the acceleration option, and toggle native execution on. This applies to all Spark sessions in that environment.

At the session level — Add this to any notebook cell:

spark.conf.set("spark.native.enabled", "true")

To validate it's working, use the Spark History Server or DataFrame explain to check which operators ran natively. The Spark Advisor will flag any fallbacks with specific recommendations.

For best results, start with your heaviest Spark jobs — the ones that already take 20+ minutes or cost the most CU-hours. Run them with and without native execution using Apache Spark Run Series to get a clean before-and-after comparison.

A few things to keep in mind:

Parquet and Delta workloads see the biggest gains. If you're reading CSV or JSON, the engine still helps but the bottleneck is often parsing rather than compute.
Complex aggregations and joins benefit most. Simple scans or I/O-bound work won't show dramatic improvement.
Check the Gluten compatibility matrix for your specific operators. Coverage is broad but not yet complete.

What This Means

The Native Execution Engine is part of a broader shift in how platforms handle analytical compute. Rather than asking developers to rewrite workloads or learn new frameworks, the performance improvement happens at the infrastructure layer. That's the right abstraction — let engineers focus on the logic while the platform handles how that logic executes.

For organisations running Fabric at scale, this is the kind of improvement that changes budget conversations. A 2-6x speedup on existing workloads — with no migration effort — means either running the same jobs cheaper or running more ambitious jobs within the same budget. With FabCon 2026 approaching in Atlanta this month, expect to hear much more about native execution in production scenarios.

The engine is GA as part of Fabric Runtime 1.3. If you're not using it yet, there's no reason not to test it on your next pipeline run.