Google provider

Standalone — usable in any Node project, no @warlock.js/core required.

@warlock.js/ai-google is the Google provider adapter for @warlock.js/ai. It wraps the @google/genai SDK (models.generateContent) and exposes Gemini as a vendor-neutral ModelContract — the same shape every agent, workflow, and supervisor already speaks. Works against both the Gemini Developer API and Vertex AI.

Install

npm install @warlock.js/ai @warlock.js/ai-google

yarn add @warlock.js/ai @warlock.js/ai-google

pnpm add @warlock.js/ai @warlock.js/ai-google

Construct

One GoogleSDK holds one live GoogleGenAI client. Build it once at boot and reuse it everywhere — every model and embedder it produces shares the same client and auth.

import { GoogleSDK } from "@warlock.js/ai-google";

const google = new GoogleSDK({
  apiKey: process.env.GEMINI_API_KEY!,
});

The config wraps GoogleGenAIOptions, so the whole object is forwarded to new GoogleGenAI(...). The common path is the Gemini API with an apiKey; the same options also drive Vertex AI:

const google = new GoogleSDK({
  vertexai: true,
  project: "my-proj",
  location: "us-central1",
});

provider relabels the upstream (defaults to "google") and flows through to AgentReport.model, logs, and provider-aware middleware.

First call

Build a model, hand it to an agent, run it. execute() never throws — failures land in error as a typed AIError.

import { ai } from "@warlock.js/ai";
import { GoogleSDK } from "@warlock.js/ai-google";

const google = new GoogleSDK({ apiKey: process.env.GEMINI_API_KEY! });

const assistant = ai.agent({
  model: google.model({ name: "gemini-2.5-flash" }),
  systemPrompt: "You are a concise senior TypeScript engineer.",
});

const { text, usage, error } = await assistant.execute("Why use generics?");

if (error) {
  console.warn(error.code, error.category);
} else {
  console.log(text, usage.total);
}

Use as a model

google.model({ name: "gemini-2.5-flash", temperature: 0.7 });
google.model({ name: "gemini-2.5-pro" });

Use as an embedder

const embedder = google.embedder({ name: "gemini-embedding-001" });
const { vector } = await embedder.embed("Hello world");
const { vectors } = await embedder.embedMany(["doc 1", "doc 2"]);

embedContent accepts an array natively, so embedMany is a single request — embeddings come back in input order. Pass dimensions to forward Gemini’s outputDimensionality truncation hint (supported by 2024+ models):

google.embedder({ name: "gemini-embedding-001", dimensions: 768 });

Embeddings report no token usage. Gemini’s embed endpoint returns no token counts, so usage on an embedding result is always { promptTokens: 0, totalTokens: 0 } — an honest absence, not a fabricated estimate.

Capabilities

Tool calling — vendor-neutral ToolConfigs map to Gemini function declarations; calls round-trip as functionCall / functionResponse parts. (Thinking models’ opaque thoughtSignature is carried through automatically so follow-up turns don’t 400.)
Streaming — stream() runs generateContentStream and yields text deltas, a tool-call per function call (Gemini emits a fully-formed call, not partial JSON), and a terminal done with the final finish reason and usage.
Structured output — on by default. A root-object JSON Schema is forwarded to Gemini’s native structured output (responseMimeType: "application/json" + responseJsonSchema). Otherwise it degrades to the agent’s soft system-prompt hint plus client-side validation.
Vision — auto-detected from the model id substring. Every Gemini 1.5, 2.x, and 2.5 model is multimodal (plus the legacy gemini-pro-vision); only the original text-only gemini-1.0-pro is excluded. Override either way with the vision flag.
Embeddings — gemini-embedding-001, text-embedding-004, native batch (see above).
PDF + audio input — the multimodal Gemini families accept document and audio parts, both mapped to Gemini’s media-agnostic inlineData block (see below). pdf / audio mirror the vision inference; override either with google.model({ name, pdf: true, audio: false }).

Image attachments accept inlined base64 bytes (with mediaType). Note: generateContent does not fetch arbitrary remote URLs, so a { source: { url } } PDF or audio part throws InvalidRequestError — resolve to base64 first.

Multimodal input maps to `inlineData`

Gemini’s multimodal input is media-agnostic — every binary modality maps to one inlineData block keyed by IANA mime type, so image, PDF, and audio all take the same shape:

{ type: "image", source: { base64, mediaType } } → { inlineData: { mimeType, data } }
{ type: "pdf", source: { base64, mediaType: "application/pdf" } } → { inlineData: { mimeType: "application/pdf", data } } (gated on capabilities.pdf)
{ type: "audio", source: { base64, mediaType: "audio/mpeg" } } → { inlineData: { mimeType: "audio/mpeg", data } } (gated on capabilities.audio)

PDF and audio reach the wire only when the model declares the matching capability (inferred for the multimodal Gemini families) — so capability ≡ behavior.

Image generation (Imagen)

google.image({ name }) returns an ImageModelContract (Imagen, via ai.models.generateImages) for the ai.image() output verb — prompt in, images out, in the same never-throws { data, error, usage, report } envelope every executable returns.

import { ai } from "@warlock.js/ai";
import { GoogleSDK } from "@warlock.js/ai-google";

const google = new GoogleSDK({ apiKey: process.env.GEMINI_API_KEY! });
const imagen = google.image({ name: "imagen-4.0-generate-001", pricing: { perImage: 0.04 } });

const { data, error } = await ai.image({
  model: imagen,
  prompt: "a watercolor lighthouse at dawn",
  aspectRatio: "3:4",              // Imagen ratio (vs OpenAI's WxH `size`)
  negativePrompt: "text, watermark",
  options: { imageSize: "2K", personGeneration: "allow_adult" }, // Imagen passthroughs
});

if (error) {
  console.warn(error.code); // typed AIError — content-filter when every candidate is safety-filtered
} else {
  for (const img of data.images) {
    save(Buffer.from(img.base64, "base64"), img.mediaType); // Imagen returns base64 bytes
  }
}

Imagen is per-image-metered — price it with { perImage }. It returns base64 bytes (no hosted URL, no token usage), and the spend folds into the same Usage.cost rollup as text.
When every candidate is safety-filtered, the run surfaces a typed ContentFilterError on result.error.
A non-Imagen model id (google.image({ name: "gemini-2.5-flash" })) throws InvalidRequestError at construction — Gemini’s native image output (gemini-*-image via generateContent) is a separate surface, not routed here.

See ai.image for the full verb surface (options, cost-truth, GeneratedImage).

Pricing and usage

Every response reports token usage (input, output, total); cached-content tokens surface as cachedTokens when present.

Attach a pricing registry — keyed by model name, in USD per million tokens — to turn tokens into money. SDK-level or per-model (per-model wins). With pricing set, cost rolls up through every node of the AgentReport.

const google = new GoogleSDK({
  apiKey: process.env.GEMINI_API_KEY!,
  pricing: {
    "gemini-2.5-flash": { input: 0.3, output: 2.5 },
  },
});

Need an offline token estimate? google.count(text) returns a fast character-heuristic approximation — Gemini’s countTokens is a network round-trip, so this stays offline for budgeting, not billing.

For the agent/workflow surface these models plug into, see @warlock.js/ai. For provider-specific notes and the latest model ids, see the setup-google skill.