3

Extractor

Integration Guide

Developer Documentation

Embed Extractor in Your App

1Admin Dashboard Setup

Before embedding Extractor, follow the steps below to sign in to the admin console, register an application, and obtain your appId and appSecret.

First admin onlySkip if an admin already exists

Create the first admin by calling the setup API (e.g. from Postman or curl).

Request
POST /api/setup
Authorization
Authorization: Bearer <ADMIN_SETUP_SECRET>

Use ADMIN_SETUP_SECRET from your environment.

Body (JSON)
{
  "email": "admin@example.com",
  "name": "Admin Name"
}
  1. 1

    Sign in to the admin console

    Open /admin/login and complete sign-in with the one-time code sent to your email.

  2. 2

    Open the Applications page

    From the dashboard, go to /admin/dashboard/applications.

  3. 3

    Create an application

    Click Create Application, or navigate directly to /admin/dashboard/applications/new.

    Example: https://extractor.decoded.digital/admin/dashboard/applications/new

  4. 4

    Save your appId and appSecret

    After you create the application, copy the appId and appSecret (shown only once). Store the app secret securely on your backend — you'll need it for onboarding and session creation.

Use the appId and appSecret in your backend for onboarding, session creation, and server-to-server API calls. The appSecret must never be exposed to the browser.

2Quick Start

Follow these three steps in order to embed the extractor in your application.

  1. 1

    Register Application

    Create an application in Admin Dashboard. You'll receive an appId and appSecret. Store the app secret securely on your backend — it's shown only once.

  2. 2

    Onboard Organization

    Call POST /api/embed/onboard with appId, appSecret, and user details. This creates a tenant, user, and embed token. Store the returned embed token on your backend (Onboarding API).

  3. 3

    Create Session & Embed

    From your backend, check your local DB for a cached session. If none exists or expired, call POST /api/embed/sessions with appId, appSecret, embedToken, and userEmail to get a sessionId. Cache it in your DB (per user), then pass only the sessionId to the iframe: /embed?sessionId=ess_.... On subsequent page loads, reuse the cached session until it expires. No secrets are exposed to the browser (Session Auth details).

3Onboarding API (For Host Applications)

🚀 Getting Started as a Host Application

If you're integrating Extractor into your application, register an application in Admin first, then use these endpoints to onboard your organization and manage users programmatically. Onboarding creates a tenant, admin user, and embed token in one call.

POST/api/embed/onboard

Create a new tenant, owner user, and embed token in one call. This is used for initial integration setup. Store the returned embed token securely - it won't be shown again.

Request Body
{
  "organizationName": "Acme Corp",      // Required: Your organization name
  "firstName": "John",                   // Required: Admin user first name
  "lastName": "Doe",                     // Required: Admin user last name
  "email": "john@acme.com",             // Required: Admin user email
  "appId": "673abc123def456789012345",  // Required: App ID from Admin Dashboard
  "appSecret": "ask_xxx...",            // Required if app has a secret (from Admin Dashboard)
  "apiKeyName": "Production Embed Token",   // Optional: Custom name for embed token
  "allowedDomains": ["acme.com", "*.acme.com"],  // Optional: Domain restrictions
  "hasExpiry": false,                    // Optional: Set true for expiring token
  "expiryDays": 90                       // Optional: Days until expiry (1-365)
}
Example Request
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "organizationName": "Acme Corp",
    "firstName": "John",
    "lastName": "Doe",
    "email": "john@acme.com",
    "appId": "673abc123def456789012345",
    "appSecret": "ask_your_app_secret_here"
  }' \
  https://extractor.decoded.digital/api/embed/onboard
Response
{
  "data": {
    "message": "Successfully onboarded. Store the embed token securely - it won't be shown again.",
    "tenant": {
      "id": "507f1f77bcf86cd799439011",
      "name": "Acme Corp"
    },
    "user": {
      "id": "507f1f77bcf86cd799439012",
      "firstName": "John",
      "lastName": "Doe",
      "email": "john@acme.com",
      "role": "owner",
      "isExistingUser": false
    },
    "embedToken": {
      "id": "507f1f77bcf86cd799439013",
      "name": "Acme Corp - Embed Token",
      "key": "ext_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
      "expiresAt": null,
      "allowedDomains": []
    }
  },
  "error": null
}
POST/api/embed/usersAuth Required

Create a new user in your tenant. Use this when a new user signs up in your application and needs access to Extractor features. Pass the X-Session-Id header — the session already carries tenant context from the onboarding flow.

Request Body
{
  "firstName": "Jane",          // Required
  "lastName": "Smith",          // Required
  "email": "jane@acme.com",     // Required
  "role": "member"              // Optional: "owner", "admin", or "member" (default)
}
Example Request
curl -X POST \
  -H "Content-Type: application/json" \
  -H "X-Session-Id: ess_YOUR_SESSION_ID" \
  -d '{
    "firstName": "Jane",
    "lastName": "Smith",
    "email": "jane@acme.com",
    "role": "member"
  }' \
  https://extractor.decoded.digital/api/embed/users
Response
{
  "data": {
    "message": "User created successfully",
    "user": {
      "id": "507f1f77bcf86cd799439014",
      "firstName": "Jane",
      "lastName": "Smith",
      "email": "jane@acme.com",
      "role": "member",
      "createdAt": "2026-03-11T10:30:00.000Z"
    }
  },
  "error": null
}

Note: If the user already exists globally but is not yet a member of your tenant, they will be added to your tenant and the response will return HTTP 200 with message "User added to tenant successfully" instead of 201. If the user already exists in your tenant, you will receive a 409 Conflict error.

GET/api/embed/usersAuth Required

List all users in your tenant with pagination and search. Pass the X-Session-Id header — the session already carries tenant and user context.

Query Parameters:

  • page - Page number (default: 1)
  • limit - Items per page (default: 20)
  • search - Search by name or email
Example Request
curl -H "X-Session-Id: ess_YOUR_SESSION_ID" \
  "https://extractor.decoded.digital/api/embed/users?page=1&limit=20"

Response (data):

{
  "items": [
    {
      "id": "...",
      "firstName": "Jane",
      "lastName": "Smith",
      "email": "jane@acme.com",
      "role": "member",
      "createdAt": "...",
      "updatedAt": "..."
    }
  ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "totalCount": 5,
    "totalPages": 1,
    "hasNextPage": false,
    "hasPrevPage": false
  }
}

4Session-Based Embed Auth (Recommended)

For maximum security, use session-based authentication. The raw embed token and app secret never reach the browser — only a temporary session ID is exposed. Sessions last 1 day and can be refreshed or revoked.

How It Works

  1. Register an application in Admin Dashboard → you get appId + appSecret.
  2. Onboard your organization: POST /api/embed/onboard with appId, appSecret, user details → returns embedToken. Store it on your backend.
  3. When a user needs to embed: your backend checks your local DB for a cached session. If none exists or it's expired, call POST /api/embed/sessions with appId + appSecret + embedToken + userEmail → returns sessionId. Cache it in your DB.
  4. Pass only sessionId to the iframe: /embed?sessionId=ess_...
  5. Session expires after 1 day. On next page load, your backend checks the cached session — if still valid, reuse it. If expired, create a new one and update the cache.

Step 1: Create Session (from your backend)

curl -X POST https://your-extractor.com/api/embed/sessions \
  -H "Content-Type: application/json" \
  -d '{
    "appId": "YOUR_APP_ID",
    "appSecret": "ask_YOUR_APP_SECRET",
    "embedToken": "ext_YOUR_EMBED_TOKEN",
    "userEmail": "user@example.com"
  }'

# Response:
# {
#   "data": {
#     "sessionId": "ess_abc123...",
#     "expiresAt": "2026-04-07T12:00:00Z",
#     "userEmail": "user@example.com",
#     "tenantId": "...",
#     "appId": "..."
#   }
# }

Step 2: Embed in iframe

<!-- Only sessionId in the URL — no secrets exposed -->
<iframe
  src="https://your-extractor.com/embed?sessionId=ess_abc123..."
  width="100%"
  height="600"
  frameborder="0"
  allow="clipboard-write"
></iframe>

Step 3: Refresh Session (optional)

# Extend session by another day (call from your backend)
curl -X PUT https://your-extractor.com/api/embed/sessions/SESSION_ID/refresh \
  -H "Content-Type: application/json" \
  -d '{ "appId": "YOUR_APP_ID", "appSecret": "ask_YOUR_APP_SECRET" }'

Step 4: Revoke Session (optional)

# Immediately end a session
curl -X DELETE https://your-extractor.com/api/embed/sessions/SESSION_ID \
  -H "Content-Type: application/json" \
  -d '{ "appId": "YOUR_APP_ID", "appSecret": "ask_YOUR_APP_SECRET" }'

Full React Example

function ExtractorEmbed() {
  const [sessionId, setSessionId] = useState(null);
  const [error, setError] = useState(null);

  useEffect(() => {
    // Call YOUR backend — it returns a cached or fresh session
    fetch("/api/extractor/session", { method: "POST" })
      .then(res => res.json())
      .then(data => {
        if (data.sessionId) setSessionId(data.sessionId);
        else setError("Failed to create session");
      })
      .catch(() => setError("Connection failed"));
  }, []);

  if (error) return <div>{error}</div>;
  if (!sessionId) return <div>Loading...</div>;

  return (
    <iframe
      src={`https://your-extractor.com/embed?sessionId=${sessionId}`}
      width="100%" height="700" frameBorder="0" allow="clipboard-write"
    />
  );
}

Best Practice: Cache Sessions in Your Database

Do not create a new session on every page refresh. Sessions last 24 hours. Store the sessionId and expiresAt in your database (per user + tenant), and reuse it until it expires. This avoids orphan sessions piling up and reduces unnecessary API calls to the extractor.

// YOUR backend endpoint (e.g. Next.js API route)
// Collection: extractorSessions { tenantId, userId, sessionId, expiresAt }

export async function POST(req) {
  const user = await getAuthenticatedUser(req);

  // 1. Check your local DB for a cached session
  const BUFFER_MS = 5 * 60 * 1000; // 5-min safety buffer
  const cached = await db.collection("extractorSessions").findOne({
    tenantId: user.tenantId,
    userId: user.id,
    expiresAt: { $gt: new Date(Date.now() + BUFFER_MS) },
  });

  // 2. If valid session exists → return it (no extractor API call)
  if (cached) {
    return Response.json({
      sessionId: cached.sessionId,
      expiresAt: cached.expiresAt,
    });
  }

  // 3. No valid session → create a new one via extractor API
  const embedToken = await getStoredEmbedToken(user.tenantId);
  const res = await fetch("https://your-extractor.com/api/embed/sessions", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      appId: process.env.EXTRACTOR_APP_ID,
      appSecret: process.env.EXTRACTOR_APP_SECRET,
      embedToken,
      userEmail: user.email,
    }),
  });
  const { data } = await res.json();

  // 4. Cache it in your DB (upsert — 1 row per user, replaces expired)
  await db.collection("extractorSessions").updateOne(
    { tenantId: user.tenantId, userId: user.id },
    {
      $set: {
        sessionId: data.sessionId,
        expiresAt: new Date(data.expiresAt),
        updatedAt: new Date(),
      },
      $setOnInsert: { createdAt: new Date() },
    },
    { upsert: true }
  );

  // 5. Return to frontend
  return Response.json({
    sessionId: data.sessionId,
    expiresAt: data.expiresAt,
  });
}

Session API Reference

EndpointMethodBodyDescription
/api/embed/sessionsPOSTappId, appSecret, userEmail, embedTokenCreate session (1 day TTL)
/api/embed/sessions/:id/refreshPUTappId, appSecretExtend session by 1 day
/api/embed/sessions/:idGET?appId, ?appSecretCheck session status
/api/embed/sessions/:idDELETEappId, appSecretRevoke session immediately

Security Notes

  • • New applications require session-based auth — direct embed token URLs are rejected.
  • • The appSecret must never be sent to the browser — all session API calls come from your backend.
  • • Sessions are per-user. Multiple users can have independent sessions.
  • • If the underlying embed token is deactivated, all its sessions become invalid immediately.
  • • MongoDB TTL indexes automatically clean up expired sessions on the extractor side.
  • Cache sessions in your database — reuse the same sessionId until it expires instead of creating a new one on every page load. Use a 5-minute buffer before expiry to avoid mid-use failures.

Supported File Types

File type and size limits vary by endpoint. Maximum file size is 10MB for all uploads.

EndpointAccepted types
POST /api/embed/extractPDF (application/pdf), JPEG, PNG, WebP (image/jpeg, image/png, image/webp)
POST /api/embed/extract/bulkPDF (application/pdf), JPEG, PNG, WebP (image/jpeg, image/png, image/webp) — max 20 files, 10MB each
POST /api/embed/filesPDF, JPEG, PNG, WebP, plain text (text/plain), Word (.doc, .docx)

5Embed (iframe / JS / Python)

iFrame Embed

The simplest way to integrate. Create a session from your backend, then pass the sessionId to the iframe URL.

Recommended: Use ?sessionId=ess_... instead of embedding tokens directly. See Session-Based Auth for how to create sessions from your backend.

File Upload & Extraction

Embed the file upload interface for users to upload and extract documents.

<!-- Session-based (recommended) — sessionId from your backend -->
<iframe
  src="https://extractor.decoded.digital/embed?sessionId=ess_YOUR_SESSION_ID"
  width="100%"
  height="600"
  frameborder="0"
  allow="clipboard-write"
></iframe>

<!-- Legacy (for apps without app secret) -->
<iframe
  src="https://extractor.decoded.digital/embed?embedToken=YOUR_EMBED_TOKEN&userEmail=user@example.com&appId=YOUR_APP_ID"
  width="100%"
  height="600"
  frameborder="0"
  allow="clipboard-write"
></iframe>

Manage Documents (Templates)

Embed the documents management page to create, edit, and delete document templates with custom extraction fields.

<!-- Session-based -->
<iframe
  src="https://extractor.decoded.digital/embed/documents?sessionId=ess_YOUR_SESSION_ID"
  width="100%" height="700" frameborder="0"
></iframe>

View Extractions

Embed the extractions page with tabbed navigation by document type, search, and pagination.

<!-- Session-based -->
<iframe
  src="https://extractor.decoded.digital/embed/extractions?sessionId=ess_YOUR_SESSION_ID"
  width="100%" height="700" frameborder="0"
></iframe>

Settings (Integrations, Webhooks & Embed Tokens)

Embed the settings page with tabbed UI for managing Microsoft integrations, outbound webhooks, and embed tokens.

<!-- Session-based -->
<iframe
  src="https://extractor.decoded.digital/embed/settings?sessionId=ess_YOUR_SESSION_ID"
  width="100%"
  height="700"
  frameborder="0"
  style="border: 1px solid #e5e7eb; border-radius: 8px;"
></iframe>

Available Embed Pages

  • /embed - File upload and extraction
  • /embed/documents - Document template management (create, edit, delete)
  • /embed/extractions - View all extractions with search, filters, and file preview
  • /embed/settings - Settings with tabs for Microsoft integrations, webhooks, and embed tokens

Listen for Messages

// Listen for extraction completion
window.addEventListener('message', (event) => {
  if (event.data.type === 'extraction-complete') {
    console.log('Extraction completed:', event.data.data);
    // { extractionId, fileName, status, ... }
  }
  
  if (event.data.type === 'extraction-selected') {
    console.log('Extraction selected:', event.data.data);
    // Full extraction details including extractedData
  }
});

6API Reference

Full API access for custom integrations. Embed endpoints support session-based auth (X-Session-Id), app secret auth (X-App-Secret + X-App-Id), or legacy embed token auth.

Session-Based Auth

After onboarding, create a session from your backend and use X-Session-Id for all subsequent API calls — user creation, user listing, extractions, documents, and iframe embeds. Only the initial onboarding call uses X-App-Id + X-App-Secret. See Session-Based Auth for details.

Authentication

All embed API calls use one of these two methods. The embed token, app secret, and user email never reach the browser.

# Session-based auth (for all calls after onboarding):
X-Session-Id: ess_xxx...
# That's it — one header. The session contains all context
# (app, tenant, user) so no other headers are needed.
# Works for: user creation, user listing, extractions, documents, etc.

# App-Secret auth (only for initial onboarding):
X-App-Id: YOUR_APP_ID
X-App-Secret: ask_xxx...
# Used only for the POST /api/embed/onboard call.

Endpoints

GET/api/embed/verify

Verify session and get tenant information

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/verify

Response (data):

{
  "valid": true,
  "tenantId": "507f1f77bcf86cd799439011",
  "embedTokenName": "My Production Token",
  "userEmail": "user@example.com",
  "message": "Embed token is valid"
}
POST/api/embed/extract

Upload a file and start extraction

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -F "file=@document.pdf" \
  https://extractor.decoded.digital/api/embed/extract

Response:

{
  "data": {
    "success": true,
    "extractionId": "507f1f77bcf86cd799439011",
    "fileName": "document.pdf",
    "fileType": "application/pdf",
    "fileUrl": "https://...",
    "status": "processing",
    "message": "File uploaded and extraction started"
  },
  "error": null
}
POST/api/embed/extract/bulk

Upload and extract data from multiple files in a single request. Maximum 20 files per request. Each file must be PDF or image (JPEG, PNG, WebP) and under 10MB.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -F "files=@invoice1.pdf" \
  -F "files=@invoice2.pdf" \
  -F "files=@receipt.png" \
  https://extractor.decoded.digital/api/embed/extract/bulk

Response:

{
  "data": {
    "results": [
      {
        "fileName": "invoice1.pdf",
        "success": true,
        "extractionId": "507f1f77bcf86cd799439011",
        "fileType": "application/pdf",
        "fileUrl": "https://...",
        "status": "processing"
      },
      {
        "fileName": "invoice2.pdf",
        "success": true,
        "extractionId": "507f1f77bcf86cd799439012",
        "fileType": "application/pdf",
        "fileUrl": "https://...",
        "status": "processing"
      },
      {
        "fileName": "receipt.png",
        "success": true,
        "extractionId": "507f1f77bcf86cd799439013",
        "fileType": "image/png",
        "fileUrl": "https://...",
        "status": "processing"
      }
    ],
    "summary": {
      "total": 3,
      "successful": 3,
      "failed": 0
    },
    "message": "3 of 3 files uploaded successfully"
  },
  "error": null
}

Individual files may fail while others succeed. Check each item in the results array for per-file status.

GET/api/embed/extractions

List all extractions with pagination and optional filters

Query Parameters:

  • page - Page number (default: 1)
  • limit - Items per page (default: 20)
  • status - Filter by status (uploaded, analyzing, analyzed, extracting, completed, analysis failed, extraction failed, failed)
  • documentId - Filter by document template ID
  • search - Search by file name
curl -H "X-Session-Id: ess_xxx" \
  "https://extractor.decoded.digital/api/embed/extractions?page=1&limit=20&status=completed"

Response (data):

{
  "items": [
    {
      "id": "...",
      "fileName": "doc.pdf",
      "fileType": "application/pdf",
      "fileSize": 12345,
      "fileUrl": "https://...",
      "status": "completed",
      "documentId": "...",
      "documentName": "Invoice",
      "source": "embed",
      "appId": "673abc123def456789012345",
      "applicationName": "my-app",
      "email": {
        "from": { "emailAddress": { "name": "John", "address": "john@example.com" } },
        "toRecipients": [
          { "emailAddress": { "name": "Jane", "address": "jane@example.com" } }
        ],
        "ccRecipients": [],
        "subject": "Invoice #1234",
        "bodyPreview": "Please find attached..."
      },
      "createdAt": "...",
      "updatedAt": "..."
    }
  ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "totalCount": 100,
    "totalPages": 5,
    "hasNextPage": true,
    "hasPrevPage": false
  }
}
GET/api/embed/extractions/:id

Get extraction details including extracted data

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011

Response (data):

{
  "id": "507f1f77bcf86cd799439011",
  "fileName": "doc.pdf",
  "fileType": "application/pdf",
  "fileSize": 12345,
  "fileUrl": "https://...",
  "status": "completed",
  "extractedData": { "vendor_name": "...", "total": "..." },
  "error": null,
  "documentId": "...",
  "document": {
    "id": "...",
    "name": "Invoice",
    "description": "...",
    "fields": []
  },
  "source": "embed",
  "appId": "673abc123def456789012345",
  "applicationName": "my-app",
  "email": {
    "from": { "emailAddress": { "name": "John", "address": "john@example.com" } },
    "toRecipients": [
      { "emailAddress": { "name": "Jane", "address": "jane@example.com" } }
    ],
    "ccRecipients": [],
    "subject": "Invoice #1234",
    "bodyPreview": "Please find attached...",
    "body": { "contentType": "html", "content": "<html>...</html>" }
  },
  "createdAt": "...",
  "updatedAt": "..."
}
PUT/api/embed/extractions/:id

Update an extraction's file name and extracted data

Request Body:

  • fileName * — Updated file name (string, required)
  • extractedData * — Updated extracted data (object, required)
curl -X PUT \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "fileName": "updated-invoice.pdf",
    "extractedData": {
      "vendor_name": "Acme Corp",
      "total": "1500.00",
      "invoice_number": "INV-2026-001"
    }
  }' \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011

Response (data):

{
  "id": "507f1f77bcf86cd799439011",
  "extractionId": null,
  "fileName": "updated-invoice.pdf",
  "fileType": "application/pdf",
  "fileSize": 12345,
  "fileUrl": "https://...",
  "status": "completed",
  "extractedData": {
    "vendor_name": "Acme Corp",
    "total": "1500.00",
    "invoice_number": "INV-2026-001"
  },
  "error": null,
  "documentId": "...",
  "document": {
    "id": "...",
    "name": "Invoice",
    "description": "...",
    "fields": []
  },
  "source": "embed",
  "appId": "673abc123def456789012345",
  "applicationName": "my-app",
  "email": null,
  "createdAt": "...",
  "updatedAt": "..."
}
GET/api/embed/documents

List available document templates with pagination

Query Parameters:

  • page - Page number (default: 1)
  • limit - Items per page (default: 20)
  • search - Search by name or description
curl -H "X-Session-Id: ess_xxx" \
  "https://extractor.decoded.digital/api/embed/documents?page=1&limit=20"

Response (data):

{
  "items": [
    {
      "_id": "...",
      "name": "Invoice",
      "description": "...",
      "fields": [{ "key": "...", "type": "String", "description": "..." }],
      "createdAt": "...",
      "updatedAt": "..."
    }
  ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "totalCount": 10,
    "totalPages": 1,
    "hasNextPage": false,
    "hasPrevPage": false
  }
}
GET/api/embed/documents/:id

Get a specific document template with field definitions

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/documents/507f1f77bcf86cd799439011

Response (data):

{
  "id": "507f1f77bcf86cd799439011",
  "name": "Invoice",
  "description": "Invoice template",
  "fields": [{ "key": "vendor_name", "type": "String", "description": "Vendor name" }],
  "createdAt": "...",
  "updatedAt": "..."
}
POST/api/embed/documents

Create a new document template with extraction fields. Each field requires a key, type, and description.

Supported Field Types:

StringNumberBooleanDateObjectList<String>List<Object>

Use children array on Object and List<Object> fields to define nested field structures.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Invoice",
    "description": "Invoice document template",
    "fields": [
      { "id": "1", "key": "vendor_name", "type": "String", "description": "Vendor name" },
      { "id": "2", "key": "total", "type": "String", "description": "Total amount" }
    ]
  }' \
  https://extractor.decoded.digital/api/embed/documents

Response (data):

{
  "_id": "507f1f77bcf86cd799439011",
  "name": "Invoice",
  "description": "Invoice document template",
  "fields": [...],
  "createdAt": "...",
  "updatedAt": "..."
}
PUT/api/embed/documents/:id

Update an existing document template

curl -X PUT \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Invoice (Updated)",
    "description": "Updated invoice template",
    "fields": [...]
  }' \
  https://extractor.decoded.digital/api/embed/documents/507f1f77bcf86cd799439011

Response (data): same shape as Create Document

DELETE/api/embed/documents/:id

Delete a document template

curl -X DELETE \
  -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/documents/507f1f77bcf86cd799439011

Response (data):

{ "deleted": true, "id": "507f1f77bcf86cd799439011" }
PATCH/api/embed/documents/:id/archive

Archive or unarchive a document template. Archived documents are hidden from default listing but can be restored by setting isArchived to false.

curl -X PATCH \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{ "isArchived": true }' \
  https://extractor.decoded.digital/api/embed/documents/507f1f77bcf86cd799439011/archive

Response (data):

{
  "_id": "507f1f77bcf86cd799439011",
  "name": "Invoice",
  "description": "Invoice template",
  "fields": [...],
  "isArchived": true,
  "archivedAt": "2026-04-08T12:00:00.000Z",
  "createdAt": "...",
  "updatedAt": "..."
}

To unarchive, send { "isArchived": false }.

POST/api/embed/documents/bulk-delete

Bulk delete multiple document templates in a single request. Performs a soft delete. Maximum 100 IDs per request.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": [
      "507f1f77bcf86cd799439011",
      "507f1f77bcf86cd799439012",
      "507f1f77bcf86cd799439013"
    ]
  }' \
  https://extractor.decoded.digital/api/embed/documents/bulk-delete

Response (data):

{
  "deleted": true,
  "deletedCount": 3,
  "requestedCount": 3
}

deletedCount may be less than requestedCount if some IDs were already deleted or don't belong to your app scope.

GET/api/embed/documents/:id/extraction-count

Get the count of active extractions linked to a specific document template. Useful for displaying usage stats or confirming a document is safe to delete.

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/documents/507f1f77bcf86cd799439011/extraction-count

Response (data):

{ "count": 42 }
DELETE/api/embed/extractions/:id

Delete an extraction

curl -X DELETE \
  -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011

Response (data):

{ "deleted": true, "id": "507f1f77bcf86cd799439011" }
POST/api/embed/extractions/bulk-delete

Bulk delete multiple extractions in a single request. Performs a soft delete. Maximum 100 IDs per request.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": [
      "507f1f77bcf86cd799439011",
      "507f1f77bcf86cd799439012",
      "507f1f77bcf86cd799439013"
    ]
  }' \
  https://extractor.decoded.digital/api/embed/extractions/bulk-delete

Response (data):

{
  "deleted": true,
  "deletedCount": 3,
  "requestedCount": 3
}

deletedCount may be less than requestedCount if some IDs were already deleted or don't belong to your app scope.

POST/api/embed/extractions/:id/retry

Retry a failed or completed extraction. Only extractions with status analysis failed, extraction failed, failed, or completed can be retried.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011/retry

Response (data):

{
  "success": true,
  "extractionId": "507f1f77bcf86cd799439011",
  "message": "Extraction retry initiated"
}
POST/api/embed/extractions/bulk-retry

Retry multiple failed or completed extractions in a single request. Only extractions with status analysis failed, extraction failed, failed, or completed can be retried. Maximum 100 IDs per request.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": [
      "507f1f77bcf86cd799439011",
      "507f1f77bcf86cd799439012",
      "507f1f77bcf86cd799439013"
    ]
  }' \
  https://extractor.decoded.digital/api/embed/extractions/bulk-retry

Response (data):

{
  "message": "2 extraction(s) retry initiated successfully",
  "retriedCount": 2,
  "failedCount": 0,
  "skippedCount": 1,
  "requestedCount": 3
}

retriedCount — extractions successfully queued for retry. skippedCount — extractions that were not in a retryable status, had no file, or were not found. failedCount — extractions where the retry job could not be created.

PATCH/api/embed/extractions/:id/archive

Archive or unarchive a single extraction. Archived extractions are hidden from default listing but can be restored by setting isArchived to false.

curl -X PATCH \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{ "isArchived": true }' \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011/archive

Response (data):

{
  "id": "507f1f77bcf86cd799439011",
  "fileName": "doc.pdf",
  "fileType": "application/pdf",
  "fileSize": 12345,
  "fileUrl": "https://...",
  "status": "completed",
  "documentId": "...",
  "documentName": "Invoice",
  "source": "embed",
  "isArchived": true,
  "archivedAt": "2026-04-08T12:00:00.000Z",
  "createdAt": "...",
  "updatedAt": "..."
}

To unarchive, send { "isArchived": false }.

PATCH/api/embed/extractions/bulk-archive

Archive or unarchive multiple extractions in a single request. Maximum 100 IDs per request.

curl -X PATCH \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": [
      "507f1f77bcf86cd799439011",
      "507f1f77bcf86cd799439012",
      "507f1f77bcf86cd799439013"
    ],
    "isArchived": true
  }' \
  https://extractor.decoded.digital/api/embed/extractions/bulk-archive

Response (data):

{
  "archived": true,
  "archivedCount": 3,
  "requestedCount": 3
}

To unarchive, send isArchived: false. archivedCount may be less than requestedCount if some IDs were already in the desired state or don't belong to your app scope.

POST/api/embed/extractions/:id/reassign

Reassign an extraction to a different document type. This resets the extraction status and triggers re-analysis with the new document template. The extraction must have a valid file URL.

Request Body:

  • documentId * — The ID of the target document type to reassign to (required)
curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{ "documentId": "673abc123def456789012345" }' \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011/reassign

Response (data):

{
  "success": true,
  "extractionId": "507f1f77bcf86cd799439011",
  "newDocumentId": "673abc123def456789012345",
  "message": "Extraction reassigned and re-analysis initiated"
}

The extraction status resets to uploaded and a new analysis job is triggered automatically. Reassignment history is tracked internally. Cannot reassign to the same document type the extraction is already assigned to.

POST/api/embed/extractions/bulk-change-document-type

Reassign multiple extractions to a different document type in a single request. Because the user has explicitly chosen the target type, AI re-classification is skipped and an extraction job is enqueued directly against the chosen template. Maximum 100 IDs per request.

Request Body:

  • ids * — Array of extraction IDs to reassign (1–100, required)
  • documentId * — The ID of the target document type to reassign to (required)
curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": [
      "507f1f77bcf86cd799439011",
      "507f1f77bcf86cd799439012",
      "507f1f77bcf86cd799439013"
    ],
    "documentId": "673abc123def456789012345"
  }' \
  https://extractor.decoded.digital/api/embed/extractions/bulk-change-document-type

Response (data):

{
  "message": "3 extraction(s) reassigned successfully",
  "reassignedCount": 3,
  "failedCount": 0,
  "skippedCount": 0,
  "requestedCount": 3,
  "newDocumentId": "673abc123def456789012345"
}

Extractions already assigned to the target document type or missing a file URL are counted in skippedCount. IDs that don't exist or fall outside your app scope are also counted as skipped. Each reassignment is recorded in the extraction's reassignmentHistory.

GET/api/embed/files

List all uploaded files with pagination

Query Parameters:

  • page - Page number (default: 1)
  • limit - Items per page (default: 20)
  • search - Search by file name
  • fileType - Filter by MIME type
curl -H "X-Session-Id: ess_xxx" \
  "https://extractor.decoded.digital/api/embed/files?page=1&limit=20"

Response (data):

{
  "items": [
    {
      "id": "...",
      "fileName": "doc.pdf",
      "fileType": "application/pdf",
      "fileSize": 12345,
      "fileUrl": "https://...",
      "status": "uploaded",
      "source": "embed",
      "appId": "673abc123def456789012345",
      "createdAt": "...",
      "updatedAt": "..."
    }
  ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "totalCount": 50,
    "totalPages": 3,
    "hasNextPage": true,
    "hasPrevPage": false
  }
}
POST/api/embed/files

Upload a file without immediate extraction (supports PDF, images, text, Word docs)

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -F "file=@document.pdf" \
  -F "description=Invoice document" \
  https://extractor.decoded.digital/api/embed/files

Response (data):

{
  "id": "507f1f77bcf86cd799439011",
  "fileName": "document.pdf",
  "fileType": "application/pdf",
  "fileSize": 12345,
  "fileUrl": "https://...",
  "status": "uploaded",
  "message": "File uploaded successfully"
}

Webhooks

GET/api/embed/webhooks

List all webhooks for the authenticated tenant

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/webhooks

Response (data):

[
  {
    "_id": "507f1f77bcf86cd799439011",
    "name": "My Webhook",
    "url": "https://example.com/webhook",
    "scope": "all",
    "documentIds": [],
    "isActive": true,
    "tenantId": "...",
    "createdAt": "...",
    "updatedAt": "..."
  }
]
POST/api/embed/webhooks

Create a new outbound webhook to receive notifications when extractions complete

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Webhook",
    "url": "https://example.com/webhook",
    "scope": "all",
    "documentIds": []
  }' \
  https://extractor.decoded.digital/api/embed/webhooks

Request Body:

  • name (required) - Webhook display name
  • url (required) - HTTPS endpoint URL
  • scope - "all" (default) or "selected"
  • documentIds - Array of document template IDs (required when scope is "selected")

Response (data): created webhook object (HTTP 201)

PUT/api/embed/webhooks/:id

Update an existing webhook. All fields are optional.

curl -X PUT \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Updated Webhook",
    "url": "https://example.com/new-webhook",
    "isActive": false,
    "scope": "selected",
    "documentIds": ["507f1f77bcf86cd799439011"]
  }' \
  https://extractor.decoded.digital/api/embed/webhooks/507f1f77bcf86cd799439011

Response (data): updated webhook object

DELETE/api/embed/webhooks/:id

Delete a webhook

curl -X DELETE \
  -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/webhooks/507f1f77bcf86cd799439011

Response (data):

{ "message": "Webhook deleted successfully", "deletedId": "507f1f77bcf86cd799439011" }

embed tokens

GET/api/embed/embed-tokens

List all embed tokens for the authenticated tenant

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/embed-tokens

Response (data):

[
  {
    "_id": "507f1f77bcf86cd799439011",
    "name": "Production Key",
    "key": "ext_abc123...",
    "isActive": true,
    "expiresAt": null,
    "allowedDomains": ["example.com"],
    "usageCount": 42,
    "lastUsedAt": "...",
    "createdAt": "...",
    "updatedAt": "..."
  }
]
POST/api/embed/embed-tokens

Create a new embed token. The full key is only returned once in this response — store it securely.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Staging Key",
    "expiresAt": "2026-12-31T00:00:00Z",
    "allowedDomains": ["staging.example.com"]
  }' \
  https://extractor.decoded.digital/api/embed/embed-tokens

Request Body:

  • name (required) - Display name for the key
  • expiresAt (optional) - ISO 8601 expiration date (must be in the future)
  • allowedDomains (optional) - Array of allowed domains for iframe embeds

Response (data): created embed token object with full key value (HTTP 201)

PUT/api/embed/embed-tokens/:id

Update an existing embed token. You cannot deactivate the key you are currently using for authentication.

curl -X PUT \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Renamed Key",
    "isActive": true,
    "expiresAt": "2027-06-30T00:00:00Z",
    "allowedDomains": ["example.com", "*.example.com"]
  }' \
  https://extractor.decoded.digital/api/embed/embed-tokens/507f1f77bcf86cd799439011

Response (data): updated embed token object

DELETE/api/embed/embed-tokens/:id

Delete an embed token. You cannot delete the key you are currently using for authentication.

curl -X DELETE \
  -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/embed-tokens/507f1f77bcf86cd799439011

Response (data):

{ "message": "embed token deleted successfully", "deletedId": "507f1f77bcf86cd799439011" }

Tenants

GET/api/embed/tenants/:id

Get your tenant details including integration status. You can only access your own tenant.

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/tenants/507f1f77bcf86cd799439011

Response (data):

{
  "_id": "507f1f77bcf86cd799439011",
  "name": "Acme Corp",
  "status": "active",
  "integrations": {
    "microsoft": {
      "accounts": [
        {
          "accountId": "...",
          "email": "user@acme.com",
          "connectedAt": "..."
        }
      ]
    }
  },
  "createdAt": "...",
  "updatedAt": "..."
}

Security Best Practices

Do

  • ✓ Store embed tokens securely (environment variables)
  • ✓ Use HTTPS for all requests
  • ✓ Set domain restrictions for iframe embeds
  • ✓ Use expiring embed tokens for temporary access
  • ✓ Monitor API usage in your dashboard
  • ✓ Rotate embed tokens periodically

Don't

  • ✗ Expose embed tokens in client-side code
  • ✗ Share embed tokens across different applications
  • ✗ Store embed tokens in version control
  • ✗ Use production keys in development
  • ✗ Ignore embed token expiration warnings

Domain Restrictions

When creating an embed token, you can restrict which domains can use it for iframe embedding. This prevents unauthorized sites from using your embed.

# Examples of domain restrictions:
example.com          # Exact match
*.example.com        # All subdomains
app.example.com      # Specific subdomain
localhost            # Local development

Extraction Status Flow

uploaded
analyzing
analyzed
extracting
completedorfailed
uploaded
File uploaded, waiting to be analyzed
analyzing
AI is analyzing the document to identify its type
analyzed
Analysis complete, document type identified
extracting
AI is extracting structured data from the document
completed
Extraction complete, data available in extractedData
analysis failed
Document analysis failed — retryable
extraction failed
Data extraction failed — retryable
failed
General failure — retryable

Retrying Failed Extractions

Extractions with status analysis failed, extraction failed, or failed can be retried via POST /api/embed/extractions/:id/retry. The extraction will be reset to uploaded and re-processed from the beginning.

Response Format

Success Response

{ "data": { ... }, "error": null }

Error Response

{ "data": null, "error": "Error message" }

Error Codes

400 Bad Request
  • Missing or invalid fields
  • Invalid file type or size > 10MB
401 Unauthorized
  • Missing or invalid session (X-Session-Id)
  • Session expired — create a new session
  • Invalid app secret for server-to-server calls
403 Forbidden
  • User not a member of this organization
  • Domain not allowed for iframe embed
404 Not Found
  • User with email not found
  • Resource not found or not in your tenant
409 Conflict
  • User already exists in this tenant
500 Server Error
  • Internal error — retry or contact support

Need Help?

Our team is here to help you integrate Extractor into your application. Reach out for technical support or custom integration requirements.