Google Vertex AI SDK Flaw Could Let Attackers Hijack Machine Learning Model Uploads


Google has fixed a security flaw in the Vertex AI SDK for Python that could let an attacker hijack a machine learning model upload and run code in Google’s serving infrastructure. The issue, disclosed by Palo Alto Networks Unit 42, affected how the SDK handled temporary Cloud Storage staging buckets during model uploads.

The bug affected the google-cloud-aiplatform package when developers relied on the default staging bucket behavior. Google completed the fix in version 1.148.0, released on April 15, 2026, and developers should update to that version or later.

The attack did not require access to the victim’s Google Cloud project. According to the researchers, an attacker needed a Google Cloud project of their own, the victim’s project ID, and a race condition during the model upload process.

Why the Default Staging Bucket Created Risk

The flaw centered on predictable bucket naming. Google Cloud Storage bucket names live in a shared global namespace, and the Cloud Storage bucket documentation states that every bucket name must be globally unique.

When a developer did not set a custom staging bucket, vulnerable SDK versions generated a default bucket name from the project ID and region. The Model.upload documentation includes a staging_bucket parameter, but the risky path appeared when that parameter was left unset.

The SDK checked whether the generated bucket name existed, but it did not verify that the bucket belonged to the victim’s project. That allowed an attacker to create the expected bucket first and wait for the victim’s SDK to upload model files into the attacker-controlled bucket.

IssueWhat happenedSecurity impact
Predictable bucket nameThe SDK generated a staging bucket name from project ID and region.An attacker could guess and pre-create the bucket.
Missing ownership checkThe SDK checked whether the bucket existed, not whether the user owned it.The victim could upload model artifacts to an attacker-owned bucket.
Model replacement windowThe attacker had a short period to swap the uploaded model.A poisoned model could run code during deployment.

How the Pickle in the Middle Attack Worked

Unit 42 named the technique Pickle in the Middle because the proof of concept used Python model files that relied on pickle or joblib serialization. The Python pickle documentation warns that malicious pickle data can execute arbitrary code when it is loaded.

In the Unit 42 research, the attacker created the predicted bucket in advance and configured it so the victim could upload to it. When the victim uploaded a legitimate model, a Cloud Function triggered by the upload replaced the file with a malicious model before Vertex AI read it.

The timing window was tight. Unit 42 measured about 2.5 seconds between the model upload and the Vertex AI service agent reading the file. In the proof of concept, the replacement happened after about 1,433 milliseconds.

  • The victim uploaded a model through the SDK without setting staging_bucket.
  • The SDK used the attacker-owned bucket because the expected bucket name already existed.
  • A Cloud Function detected the upload and replaced the model artifact.
  • Vertex AI later loaded the poisoned model during deployment.
  • The malicious payload executed inside the serving container.

What the Attacker Could Access

The proof of concept showed that the payload could query the Google Compute Engine metadata server from inside the serving container. It then extracted service account details and sent the OAuth token to an attacker-controlled endpoint.

In Unit 42’s test environment, the token had access beyond the single compromised deployment. The researchers said it could reach other model artifacts in the same Google-managed tenant project, including a full TensorFlow model with trained weights.

The same token also exposed information that could support follow-on attacks, including BigQuery metadata, dataset access lists, tenant logs, Google Kubernetes Engine cluster names, internal container image paths, and Kubernetes identities.

Potential exposureWhy it matters
Model artifactsAttackers could steal trained model files from other deployments in the same tenant project.
BigQuery metadataDataset names and access lists could reveal sensitive data structure and service identities.
Tenant logsLogs could reveal internal deployment and infrastructure details.
GKE and container pathsInfrastructure names could help attackers map the environment for later movement.

Who Needs to Update

Developers and security teams should check every environment where the Vertex AI SDK for Python runs. That includes notebooks, CI jobs, training pipelines, automation scripts, and production services.

The PyPI release history shows version 1.148.0 was released on April 15, 2026, with newer versions available afterward. Any environment running older versions should be treated as needing review.

Teams should also set an explicit staging bucket that they control. Google Cloud’s Vertex AI SDK reference documents staging_bucket as an optional parameter for model uploads, but using it removes reliance on the risky default behavior in older SDK versions.

  • Upgrade google-cloud-aiplatform to version 1.148.0 or later.
  • Use a controlled Cloud Storage location for staging model artifacts.
  • Audit notebooks, local developer machines, CI runners, and training jobs.
  • Review any model upload workflows that use default SDK settings.
  • Avoid loading pickle or joblib files from untrusted or tampered sources, as the Python documentation recommends.

Google’s Fix and the Earlier Vertex AI Bug

Google first changed the bucket naming routine in version 1.144.0 by adding a random UUID to the bucket name. The full fix arrived in version 1.148.0, where the python-aiplatform changelog says Google added bucket ownership verification to prevent bucket squatting in Model.upload().

The same SDK changelog lists version 1.148.0 as an April 15, 2026 release. That matches Unit 42’s disclosure timeline, which says the second fix reached production that day.

This issue is separate from CVE-2026-2473, which affected Vertex AI Experiments. Google’s security bulletin describes that earlier flaw as a predictable bucket naming issue affecting versions 1.21.0 up to, but not including, 1.133.0.

The earlier Google Cloud bulletin says CVE-2026-2473 could allow cross-tenant remote code execution, model theft, and poisoning through bucket squatting. Google says no customer action was needed for that separate issue because mitigations had already been applied.

Why This Matters for AI Security

The flaw shows how AI security risks can start in normal developer workflows, not only in deployed models or public APIs. A model upload process that looked routine could become a supply chain path into managed serving infrastructure.

It also highlights why bucket naming, artifact staging, and model serialization deserve security review. In AI pipelines, a model file can act like code if the runtime deserializes it through formats such as pickle or joblib.

For organizations using Vertex AI, the priority is clear: update the SDK, stop relying on default staging behavior in older environments, and make sure model artifacts move only through trusted storage locations.

FAQ

What was the Google Vertex AI SDK flaw?

The flaw affected how the Vertex AI SDK for Python selected a default Cloud Storage staging bucket during model uploads. Older versions used predictable bucket names and did not verify bucket ownership, which could let an attacker hijack the upload path.

Which Vertex AI SDK version fixes the issue?

Google completed the fix in google-cloud-aiplatform version 1.148.0, released on April 15, 2026. Developers should update to version 1.148.0 or later.

What did an attacker need to exploit the flaw?

An attacker needed their own Google Cloud project, the victim’s project ID, and a victim workflow that uploaded a model without setting an explicit staging_bucket. The victim’s default staging bucket also had to be absent in that region.

Why did pickle make the attack more dangerous?

Pickle and joblib files can execute code when loaded if they contain malicious serialized data. In the proof of concept, the attacker replaced the victim’s uploaded model with a malicious serialized model that executed during deployment.

How can developers reduce the risk?

Developers should upgrade google-cloud-aiplatform to version 1.148.0 or later, set an explicit staging_bucket that they control, audit all SDK usage across notebooks and pipelines, and avoid loading untrusted pickle or joblib artifacts.

Readers help support VPNCentral. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help VPNCentral sustain the editorial team Read more

User forum

0 messages