Extractor

Integration Guide

Developer Documentation

Embed Extractor in Your App

1Admin Dashboard Setup

Before embedding Extractor, follow the steps below to sign in to the admin console, register an application, and obtain your appId and appSecret.

First admin onlySkip if an admin already exists

Create the first admin by calling the setup API (e.g. from Postman or curl).

Request: POST /api/setup
Authorization: Authorization: Bearer <ADMIN_SETUP_SECRET>
Use ADMIN_SETUP_SECRET from your environment.
Body (JSON): { "email": "admin@example.com", "name": "Admin Name" }

1
Sign in to the admin console
Open /admin/login and complete sign-in with the one-time code sent to your email.
2
Open the Applications page
From the dashboard, go to /admin/dashboard/applications.
3
Create an application
Click Create Application, or navigate directly to /admin/dashboard/applications/new.
Example: https://extractor.decoded.digital/admin/dashboard/applications/new
4
Save your appId and appSecret
After you create the application, copy the appId and appSecret (shown only once). Store the app secret securely on your backend — you'll need it for onboarding and session creation.

Use the appId and appSecret in your backend for onboarding, session creation, and server-to-server API calls. The appSecret must never be exposed to the browser.

2Quick Start

Follow these three steps in order to embed the extractor in your application.

1
Register Application
Create an application in Admin Dashboard. You'll receive an appId and appSecret. Store the app secret securely on your backend — it's shown only once.
2
Onboard Organization
Call POST /api/embed/onboard with appId, appSecret, and user details. This creates a tenant, user, and embed token. Store the returned embed token on your backend (Onboarding API).
3
Create Session & Embed
From your backend, check your local DB for a cached session. If none exists or expired, call POST /api/embed/sessions with appId, appSecret, embedToken, and userEmail to get a sessionId. Cache it in your DB (per user), then pass only the sessionId to the iframe: /embed?sessionId=ess_.... On subsequent page loads, reuse the cached session until it expires. No secrets are exposed to the browser (Session Auth details).

3Onboarding API (For Host Applications)

🚀 Getting Started as a Host Application

If you're integrating Extractor into your application, register an application in Admin first, then use these endpoints to onboard your organization and manage users programmatically. Onboarding creates a tenant, admin user, and embed token in one call.

POST/api/embed/onboard

Create a new tenant, owner user, and embed token in one call. This is used for initial integration setup. Store the returned embed token securely - it won't be shown again.

Request Body

{
  "organizationName": "Acme Corp",      // Required: Your organization name
  "firstName": "John",                   // Required: Admin user first name
  "lastName": "Doe",                     // Required: Admin user last name
  "email": "john@acme.com",             // Required: Admin user email
  "appId": "673abc123def456789012345",  // Required: App ID from Admin Dashboard
  "appSecret": "ask_xxx...",            // Required if app has a secret (from Admin Dashboard)
  "apiKeyName": "Production Embed Token",   // Optional: Custom name for embed token
  "allowedDomains": ["acme.com", "*.acme.com"],  // Optional: Domain restrictions
  "hasExpiry": false,                    // Optional: Set true for expiring token
  "expiryDays": 90                       // Optional: Days until expiry (1-365)
}

Example Request

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "organizationName": "Acme Corp",
    "firstName": "John",
    "lastName": "Doe",
    "email": "john@acme.com",
    "appId": "673abc123def456789012345",
    "appSecret": "ask_your_app_secret_here"
  }' \
  https://extractor.decoded.digital/api/embed/onboard

Response

{
  "data": {
    "message": "Successfully onboarded. Store the embed token securely - it won't be shown again.",
    "tenant": {
      "id": "507f1f77bcf86cd799439011",
      "name": "Acme Corp"
    },
    "user": {
      "id": "507f1f77bcf86cd799439012",
      "firstName": "John",
      "lastName": "Doe",
      "email": "john@acme.com",
      "role": "owner",
      "isExistingUser": false
    },
    "embedToken": {
      "id": "507f1f77bcf86cd799439013",
      "name": "Acme Corp - Embed Token",
      "key": "ext_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
      "expiresAt": null,
      "allowedDomains": []
    }
  },
  "error": null
}

POST/api/embed/usersAuth Required

Create a new user in your tenant. Use this when a new user signs up in your application and needs access to Extractor features. Pass the X-Session-Id header — the session already carries tenant context from the onboarding flow.

Request Body

{
  "firstName": "Jane",          // Required
  "lastName": "Smith",          // Required
  "email": "jane@acme.com",     // Required
  "role": "member"              // Optional: "owner", "admin", or "member" (default)
}

Example Request

curl -X POST \
  -H "Content-Type: application/json" \
  -H "X-Session-Id: ess_YOUR_SESSION_ID" \
  -d '{
    "firstName": "Jane",
    "lastName": "Smith",
    "email": "jane@acme.com",
    "role": "member"
  }' \
  https://extractor.decoded.digital/api/embed/users

Response

{
  "data": {
    "message": "User created successfully",
    "user": {
      "id": "507f1f77bcf86cd799439014",
      "firstName": "Jane",
      "lastName": "Smith",
      "email": "jane@acme.com",
      "role": "member",
      "createdAt": "2026-03-11T10:30:00.000Z"
    }
  },
  "error": null
}

Note: If the user already exists globally but is not yet a member of your tenant, they will be added to your tenant and the response will return HTTP 200 with message "User added to tenant successfully" instead of 201. If the user already exists in your tenant, you will receive a 409 Conflict error.

GET/api/embed/usersAuth Required

List all users in your tenant with pagination and search. Pass the X-Session-Id header — the session already carries tenant and user context.

Query Parameters:

• page - Page number (default: 1)
• limit - Items per page (default: 20)
• search - Search by name or email

Example Request

curl -H "X-Session-Id: ess_YOUR_SESSION_ID" \
  "https://extractor.decoded.digital/api/embed/users?page=1&limit=20"

Response (data):

{
  "items": [
    {
      "id": "...",
      "firstName": "Jane",
      "lastName": "Smith",
      "email": "jane@acme.com",
      "role": "member",
      "createdAt": "...",
      "updatedAt": "..."
    }
  ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "totalCount": 5,
    "totalPages": 1,
    "hasNextPage": false,
    "hasPrevPage": false
  }
}

4Session-Based Embed Auth (Recommended)

For maximum security, use session-based authentication. The raw embed token and app secret never reach the browser — only a temporary session ID is exposed. Sessions last 1 day and can be refreshed or revoked.

How It Works

Register an application in Admin Dashboard → you get appId + appSecret.
Onboard your organization: POST /api/embed/onboard with appId, appSecret, user details → returns embedToken. Store it on your backend.
When a user needs to embed: your backend checks your local DB for a cached session. If none exists or it's expired, call POST /api/embed/sessions with appId + appSecret + embedToken + userEmail → returns sessionId. Cache it in your DB.
Pass only sessionId to the iframe: /embed?sessionId=ess_...
Session expires after 1 day. On next page load, your backend checks the cached session — if still valid, reuse it. If expired, create a new one and update the cache.

Step 1: Create Session (from your backend)

curl -X POST https://your-extractor.com/api/embed/sessions \
  -H "Content-Type: application/json" \
  -d '{
    "appId": "YOUR_APP_ID",
    "appSecret": "ask_YOUR_APP_SECRET",
    "embedToken": "ext_YOUR_EMBED_TOKEN",
    "userEmail": "user@example.com"
  }'

# Response:
# {
#   "data": {
#     "sessionId": "ess_abc123...",
#     "expiresAt": "2026-04-07T12:00:00Z",
#     "userEmail": "user@example.com",
#     "tenantId": "...",
#     "appId": "..."
#   }
# }

Step 2: Embed in iframe

<!-- Only sessionId in the URL — no secrets exposed -->
<iframe
  src="https://your-extractor.com/embed?sessionId=ess_abc123..."
  width="100%"
  height="600"
  frameborder="0"
  allow="clipboard-write"
></iframe>

Step 3: Refresh Session (optional)

# Extend session by another day (call from your backend)
curl -X PUT https://your-extractor.com/api/embed/sessions/SESSION_ID/refresh \
  -H "Content-Type: application/json" \
  -d '{ "appId": "YOUR_APP_ID", "appSecret": "ask_YOUR_APP_SECRET" }'

Step 4: Revoke Session (optional)

# Immediately end a session
curl -X DELETE https://your-extractor.com/api/embed/sessions/SESSION_ID \
  -H "Content-Type: application/json" \
  -d '{ "appId": "YOUR_APP_ID", "appSecret": "ask_YOUR_APP_SECRET" }'

Full React Example

function ExtractorEmbed() {
  const [sessionId, setSessionId] = useState(null);
  const [error, setError] = useState(null);

  useEffect(() => {
    // Call YOUR backend — it returns a cached or fresh session
    fetch("/api/extractor/session", { method: "POST" })
      .then(res => res.json())
      .then(data => {
        if (data.sessionId) setSessionId(data.sessionId);
        else setError("Failed to create session");
      })
      .catch(() => setError("Connection failed"));
  }, []);

  if (error) return <div>{error}</div>;
  if (!sessionId) return <div>Loading...</div>;

  return (
    <iframe
      src={`https://your-extractor.com/embed?sessionId=${sessionId}`}
      width="100%" height="700" frameBorder="0" allow="clipboard-write"
    />
  );
}

Best Practice: Cache Sessions in Your Database

Do not create a new session on every page refresh. Sessions last 24 hours. Store the sessionId and expiresAt in your database (per user + tenant), and reuse it until it expires. This avoids orphan sessions piling up and reduces unnecessary API calls to the extractor.

// YOUR backend endpoint (e.g. Next.js API route)
// Collection: extractorSessions { tenantId, userId, sessionId, expiresAt }

export async function POST(req) {
  const user = await getAuthenticatedUser(req);

  // 1. Check your local DB for a cached session
  const BUFFER_MS = 5 * 60 * 1000; // 5-min safety buffer
  const cached = await db.collection("extractorSessions").findOne({
    tenantId: user.tenantId,
    userId: user.id,
    expiresAt: { $gt: new Date(Date.now() + BUFFER_MS) },
  });

  // 2. If valid session exists → return it (no extractor API call)
  if (cached) {
    return Response.json({
      sessionId: cached.sessionId,
      expiresAt: cached.expiresAt,
    });
  }

  // 3. No valid session → create a new one via extractor API
  const embedToken = await getStoredEmbedToken(user.tenantId);
  const res = await fetch("https://your-extractor.com/api/embed/sessions", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      appId: process.env.EXTRACTOR_APP_ID,
      appSecret: process.env.EXTRACTOR_APP_SECRET,
      embedToken,
      userEmail: user.email,
    }),
  });
  const { data } = await res.json();

  // 4. Cache it in your DB (upsert — 1 row per user, replaces expired)
  await db.collection("extractorSessions").updateOne(
    { tenantId: user.tenantId, userId: user.id },
    {
      $set: {
        sessionId: data.sessionId,
        expiresAt: new Date(data.expiresAt),
        updatedAt: new Date(),
      },
      $setOnInsert: { createdAt: new Date() },
    },
    { upsert: true }
  );

  // 5. Return to frontend
  return Response.json({
    sessionId: data.sessionId,
    expiresAt: data.expiresAt,
  });
}

Session API Reference

Endpoint	Method	Body	Description
/api/embed/sessions	POST	appId, appSecret, userEmail, embedToken	Create session (1 day TTL)
/api/embed/sessions/:id/refresh	PUT	appId, appSecret	Extend session by 1 day
/api/embed/sessions/:id	GET	?appId, ?appSecret	Check session status
/api/embed/sessions/:id	DELETE	appId, appSecret	Revoke session immediately

Security Notes

• New applications require session-based auth — direct embed token URLs are rejected.
• The appSecret must never be sent to the browser — all session API calls come from your backend.
• Sessions are per-user. Multiple users can have independent sessions.
• If the underlying embed token is deactivated, all its sessions become invalid immediately.
• MongoDB TTL indexes automatically clean up expired sessions on the extractor side.
• Cache sessions in your database — reuse the same sessionId until it expires instead of creating a new one on every page load. Use a 5-minute buffer before expiry to avoid mid-use failures.

•Supported File Types

File type and size limits vary by endpoint. Maximum file size is 10MB for all uploads.

Endpoint	Accepted types
`POST /api/embed/extract`	PDF (`application/pdf`), JPEG, PNG, WebP (`image/jpeg`, `image/png`, `image/webp`)
`POST /api/embed/extract/bulk`	PDF (`application/pdf`), JPEG, PNG, WebP (`image/jpeg`, `image/png`, `image/webp`) — max 20 files, 10MB each
`POST /api/embed/files`	PDF, JPEG, PNG, WebP, plain text (`text/plain`), Word (.doc, .docx)

5Embed (iframe / JS / Python)

iFrame Embed

The simplest way to integrate. Create a session from your backend, then pass the sessionId to the iframe URL.

Recommended: Use ?sessionId=ess_... instead of embedding tokens directly. See Session-Based Auth for how to create sessions from your backend.

File Upload & Extraction

Embed the file upload interface for users to upload and extract documents.

<!-- Session-based (recommended) — sessionId from your backend -->
<iframe
  src="https://extractor.decoded.digital/embed?sessionId=ess_YOUR_SESSION_ID"
  width="100%"
  height="600"
  frameborder="0"
  allow="clipboard-write"
></iframe>

<!-- Legacy (for apps without app secret) -->
<iframe
  src="https://extractor.decoded.digital/embed?embedToken=YOUR_EMBED_TOKEN&userEmail=user@example.com&appId=YOUR_APP_ID"
  width="100%"
  height="600"
  frameborder="0"
  allow="clipboard-write"
></iframe>

Manage Documents (Templates)

Embed the documents management page to create, edit, and delete document templates with custom extraction fields.

<!-- Session-based -->
<iframe
  src="https://extractor.decoded.digital/embed/documents?sessionId=ess_YOUR_SESSION_ID"
  width="100%" height="700" frameborder="0"
></iframe>

View Extractions

Embed the extractions page with tabbed navigation by document type, search, and pagination.

<!-- Session-based -->
<iframe
  src="https://extractor.decoded.digital/embed/extractions?sessionId=ess_YOUR_SESSION_ID"
  width="100%" height="700" frameborder="0"
></iframe>

Settings (Integrations, Webhooks & Embed Tokens)

Embed the settings page with tabbed UI for managing Microsoft integrations, outbound webhooks, and embed tokens.

<!-- Session-based -->
<iframe
  src="https://extractor.decoded.digital/embed/settings?sessionId=ess_YOUR_SESSION_ID"
  width="100%"
  height="700"
  frameborder="0"
  style="border: 1px solid #e5e7eb; border-radius: 8px;"
></iframe>

Available Embed Pages

• /embed - File upload and extraction
• /embed/documents - Document template management (create, edit, delete)
• /embed/extractions - View all extractions with search, filters, and file preview
• /embed/settings - Settings with tabs for Microsoft integrations, webhooks, and embed tokens

Listen for Messages

// Listen for extraction completion
window.addEventListener('message', (event) => {
  if (event.data.type === 'extraction-complete') {
    console.log('Extraction completed:', event.data.data);
    // { extractionId, fileName, status, ... }
  }
  
  if (event.data.type === 'extraction-selected') {
    console.log('Extraction selected:', event.data.data);
    // Full extraction details including extractedData
  }
});

6API Reference

Full API access for custom integrations. Embed endpoints support session-based auth (X-Session-Id), app secret auth (X-App-Secret + X-App-Id), or legacy embed token auth.

Session-Based Auth

After onboarding, create a session from your backend and use X-Session-Id for all subsequent API calls — user creation, user listing, extractions, documents, and iframe embeds. Only the initial onboarding call uses X-App-Id + X-App-Secret. See Session-Based Auth for details.

Authentication

All embed API calls use one of these two methods. The embed token, app secret, and user email never reach the browser.

# Session-based auth (for all calls after onboarding):
X-Session-Id: ess_xxx...
# That's it — one header. The session contains all context
# (app, tenant, user) so no other headers are needed.
# Works for: user creation, user listing, extractions, documents, etc.

# App-Secret auth (only for initial onboarding):
X-App-Id: YOUR_APP_ID
X-App-Secret: ask_xxx...
# Used only for the POST /api/embed/onboard call.

Endpoints

GET/api/embed/verify

Verify session and get tenant information

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/verify

Response (data):

{
  "valid": true,
  "tenantId": "507f1f77bcf86cd799439011",
  "embedTokenName": "My Production Token",
  "userEmail": "user@example.com",
  "message": "Embed token is valid"
}

POST/api/embed/extract

Upload a file and start extraction

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -F "file=@document.pdf" \
  https://extractor.decoded.digital/api/embed/extract

Response:

{
  "data": {
    "success": true,
    "extractionId": "507f1f77bcf86cd799439011",
    "fileName": "document.pdf",
    "fileType": "application/pdf",
    "fileUrl": "https://...",
    "status": "processing",
    "message": "File uploaded and extraction started"
  },
  "error": null
}

POST/api/embed/extract/bulk

Upload and extract data from multiple files in a single request. Maximum 20 files per request. Each file must be PDF or image (JPEG, PNG, WebP) and under 10MB.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -F "files=@invoice1.pdf" \
  -F "files=@invoice2.pdf" \
  -F "files=@receipt.png" \
  https://extractor.decoded.digital/api/embed/extract/bulk

Response:

{
  "data": {
    "results": [
      {
        "fileName": "invoice1.pdf",
        "success": true,
        "extractionId": "507f1f77bcf86cd799439011",
        "fileType": "application/pdf",
        "fileUrl": "https://...",
        "status": "processing"
      },
      {
        "fileName": "invoice2.pdf",
        "success": true,
        "extractionId": "507f1f77bcf86cd799439012",
        "fileType": "application/pdf",
        "fileUrl": "https://...",
        "status": "processing"
      },
      {
        "fileName": "receipt.png",
        "success": true,
        "extractionId": "507f1f77bcf86cd799439013",
        "fileType": "image/png",
        "fileUrl": "https://...",
        "status": "processing"
      }
    ],
    "summary": {
      "total": 3,
      "successful": 3,
      "failed": 0
    },
    "message": "3 of 3 files uploaded successfully"
  },
  "error": null
}

Individual files may fail while others succeed. Check each item in the results array for per-file status.

GET/api/embed/extractions

List all extractions with pagination and optional filters

Query Parameters:

• page - Page number (default: 1)
• limit - Items per page (default: 20)
• status - Filter by status (uploaded, analyzing, analyzed, extracting, completed, analysis failed, extraction failed, failed)
• documentId - Filter by document template ID
• search - Search by file name

curl -H "X-Session-Id: ess_xxx" \
  "https://extractor.decoded.digital/api/embed/extractions?page=1&limit=20&status=completed"

Response (data):

{
  "items": [
    {
      "id": "...",
      "fileName": "doc.pdf",
      "fileType": "application/pdf",
      "fileSize": 12345,
      "fileUrl": "https://...",
      "status": "completed",
      "documentId": "...",
      "documentName": "Invoice",
      "source": "embed",
      "appId": "673abc123def456789012345",
      "applicationName": "my-app",
      "email": {
        "from": { "emailAddress": { "name": "John", "address": "john@example.com" } },
        "toRecipients": [
          { "emailAddress": { "name": "Jane", "address": "jane@example.com" } }
        ],
        "ccRecipients": [],
        "subject": "Invoice #1234",
        "bodyPreview": "Please find attached..."
      },
      "createdAt": "...",
      "updatedAt": "..."
    }
  ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "totalCount": 100,
    "totalPages": 5,
    "hasNextPage": true,
    "hasPrevPage": false
  }
}

GET/api/embed/extractions/:id

Get extraction details including extracted data

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011

Response (data):

{
  "id": "507f1f77bcf86cd799439011",
  "fileName": "doc.pdf",
  "fileType": "application/pdf",
  "fileSize": 12345,
  "fileUrl": "https://...",
  "status": "completed",
  "extractedData": { "vendor_name": "...", "total": "..." },
  "error": null,
  "documentId": "...",
  "document": {
    "id": "...",
    "name": "Invoice",
    "description": "...",
    "fields": []
  },
  "source": "embed",
  "appId": "673abc123def456789012345",
  "applicationName": "my-app",
  "email": {
    "from": { "emailAddress": { "name": "John", "address": "john@example.com" } },
    "toRecipients": [
      { "emailAddress": { "name": "Jane", "address": "jane@example.com" } }
    ],
    "ccRecipients": [],
    "subject": "Invoice #1234",
    "bodyPreview": "Please find attached...",
    "body": { "contentType": "html", "content": "<html>...</html>" }
  },
  "createdAt": "...",
  "updatedAt": "..."
}

PUT/api/embed/extractions/:id

Update an extraction's file name and extracted data

Request Body:

• fileName * — Updated file name (string, required)
• extractedData * — Updated extracted data (object, required)

curl -X PUT \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "fileName": "updated-invoice.pdf",
    "extractedData": {
      "vendor_name": "Acme Corp",
      "total": "1500.00",
      "invoice_number": "INV-2026-001"
    }
  }' \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011

Response (data):

{
  "id": "507f1f77bcf86cd799439011",
  "extractionId": null,
  "fileName": "updated-invoice.pdf",
  "fileType": "application/pdf",
  "fileSize": 12345,
  "fileUrl": "https://...",
  "status": "completed",
  "extractedData": {
    "vendor_name": "Acme Corp",
    "total": "1500.00",
    "invoice_number": "INV-2026-001"
  },
  "error": null,
  "documentId": "...",
  "document": {
    "id": "...",
    "name": "Invoice",
    "description": "...",
    "fields": []
  },
  "source": "embed",
  "appId": "673abc123def456789012345",
  "applicationName": "my-app",
  "email": null,
  "createdAt": "...",
  "updatedAt": "..."
}

GET/api/embed/documents

List available document templates with pagination

Query Parameters:

• page - Page number (default: 1)
• limit - Items per page (default: 20)
• search - Search by name or description

curl -H "X-Session-Id: ess_xxx" \
  "https://extractor.decoded.digital/api/embed/documents?page=1&limit=20"

Response (data):

{
  "items": [
    {
      "_id": "...",
      "name": "Invoice",
      "description": "...",
      "fields": [{ "key": "...", "type": "String", "description": "..." }],
      "createdAt": "...",
      "updatedAt": "..."
    }
  ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "totalCount": 10,
    "totalPages": 1,
    "hasNextPage": false,
    "hasPrevPage": false
  }
}

GET/api/embed/documents/:id

Get a specific document template with field definitions

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/documents/507f1f77bcf86cd799439011

Response (data):

{
  "id": "507f1f77bcf86cd799439011",
  "name": "Invoice",
  "description": "Invoice template",
  "fields": [{ "key": "vendor_name", "type": "String", "description": "Vendor name" }],
  "createdAt": "...",
  "updatedAt": "..."
}

POST/api/embed/documents

Create a new document template with extraction fields. Each field requires a key, type, and description.

Supported Field Types:

StringNumberBooleanDateObjectList<String>List<Object>

Use children array on Object and List<Object> fields to define nested field structures.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Invoice",
    "description": "Invoice document template",
    "fields": [
      { "id": "1", "key": "vendor_name", "type": "String", "description": "Vendor name" },
      { "id": "2", "key": "total", "type": "String", "description": "Total amount" }
    ]
  }' \
  https://extractor.decoded.digital/api/embed/documents

Response (data):

{
  "_id": "507f1f77bcf86cd799439011",
  "name": "Invoice",
  "description": "Invoice document template",
  "fields": [...],
  "createdAt": "...",
  "updatedAt": "..."
}

PUT/api/embed/documents/:id

Update an existing document template

curl -X PUT \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Invoice (Updated)",
    "description": "Updated invoice template",
    "fields": [...]
  }' \
  https://extractor.decoded.digital/api/embed/documents/507f1f77bcf86cd799439011

Response (data): same shape as Create Document

DELETE/api/embed/documents/:id

Delete a document template

curl -X DELETE \
  -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/documents/507f1f77bcf86cd799439011

Response (data):

{ "deleted": true, "id": "507f1f77bcf86cd799439011" }

PATCH/api/embed/documents/:id/archive

Archive or unarchive a document template. Archived documents are hidden from default listing but can be restored by setting isArchived to false.

curl -X PATCH \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{ "isArchived": true }' \
  https://extractor.decoded.digital/api/embed/documents/507f1f77bcf86cd799439011/archive

Response (data):

{
  "_id": "507f1f77bcf86cd799439011",
  "name": "Invoice",
  "description": "Invoice template",
  "fields": [...],
  "isArchived": true,
  "archivedAt": "2026-04-08T12:00:00.000Z",
  "createdAt": "...",
  "updatedAt": "..."
}

To unarchive, send { "isArchived": false }.

POST/api/embed/documents/bulk-delete

Bulk delete multiple document templates in a single request. Performs a soft delete. Maximum 100 IDs per request.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": [
      "507f1f77bcf86cd799439011",
      "507f1f77bcf86cd799439012",
      "507f1f77bcf86cd799439013"
    ]
  }' \
  https://extractor.decoded.digital/api/embed/documents/bulk-delete

Response (data):

{
  "deleted": true,
  "deletedCount": 3,
  "requestedCount": 3
}

deletedCount may be less than requestedCount if some IDs were already deleted or don't belong to your app scope.

GET/api/embed/documents/:id/extraction-count

Get the count of active extractions linked to a specific document template. Useful for displaying usage stats or confirming a document is safe to delete.

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/documents/507f1f77bcf86cd799439011/extraction-count

Response (data):

{ "count": 42 }

DELETE/api/embed/extractions/:id

Delete an extraction

curl -X DELETE \
  -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011

Response (data):

{ "deleted": true, "id": "507f1f77bcf86cd799439011" }

POST/api/embed/extractions/bulk-delete

Bulk delete multiple extractions in a single request. Performs a soft delete. Maximum 100 IDs per request.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": [
      "507f1f77bcf86cd799439011",
      "507f1f77bcf86cd799439012",
      "507f1f77bcf86cd799439013"
    ]
  }' \
  https://extractor.decoded.digital/api/embed/extractions/bulk-delete

Response (data):

{
  "deleted": true,
  "deletedCount": 3,
  "requestedCount": 3
}

deletedCount may be less than requestedCount if some IDs were already deleted or don't belong to your app scope.

POST/api/embed/extractions/:id/retry

Retry a failed or completed extraction. Only extractions with status analysis failed, extraction failed, failed, or completed can be retried.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011/retry

Response (data):

{
  "success": true,
  "extractionId": "507f1f77bcf86cd799439011",
  "message": "Extraction retry initiated"
}

POST/api/embed/extractions/bulk-retry

Retry multiple failed or completed extractions in a single request. Only extractions with status analysis failed, extraction failed, failed, or completed can be retried. Maximum 100 IDs per request.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": [
      "507f1f77bcf86cd799439011",
      "507f1f77bcf86cd799439012",
      "507f1f77bcf86cd799439013"
    ]
  }' \
  https://extractor.decoded.digital/api/embed/extractions/bulk-retry

Response (data):

{
  "message": "2 extraction(s) retry initiated successfully",
  "retriedCount": 2,
  "failedCount": 0,
  "skippedCount": 1,
  "requestedCount": 3
}

retriedCount — extractions successfully queued for retry. skippedCount — extractions that were not in a retryable status, had no file, or were not found. failedCount — extractions where the retry job could not be created.

PATCH/api/embed/extractions/:id/archive

Archive or unarchive a single extraction. Archived extractions are hidden from default listing but can be restored by setting isArchived to false.

curl -X PATCH \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{ "isArchived": true }' \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011/archive

Response (data):

{
  "id": "507f1f77bcf86cd799439011",
  "fileName": "doc.pdf",
  "fileType": "application/pdf",
  "fileSize": 12345,
  "fileUrl": "https://...",
  "status": "completed",
  "documentId": "...",
  "documentName": "Invoice",
  "source": "embed",
  "isArchived": true,
  "archivedAt": "2026-04-08T12:00:00.000Z",
  "createdAt": "...",
  "updatedAt": "..."
}

To unarchive, send { "isArchived": false }.

PATCH/api/embed/extractions/bulk-archive

Archive or unarchive multiple extractions in a single request. Maximum 100 IDs per request.

curl -X PATCH \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": [
      "507f1f77bcf86cd799439011",
      "507f1f77bcf86cd799439012",
      "507f1f77bcf86cd799439013"
    ],
    "isArchived": true
  }' \
  https://extractor.decoded.digital/api/embed/extractions/bulk-archive

Response (data):

{
  "archived": true,
  "archivedCount": 3,
  "requestedCount": 3
}

To unarchive, send isArchived: false. archivedCount may be less than requestedCount if some IDs were already in the desired state or don't belong to your app scope.

POST/api/embed/extractions/:id/reassign

Reassign an extraction to a different document type. This resets the extraction status and triggers re-analysis with the new document template. The extraction must have a valid file URL.

Request Body:

• documentId * — The ID of the target document type to reassign to (required)

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{ "documentId": "673abc123def456789012345" }' \
  https://extractor.decoded.digital/api/embed/extractions/507f1f77bcf86cd799439011/reassign

Response (data):

{
  "success": true,
  "extractionId": "507f1f77bcf86cd799439011",
  "newDocumentId": "673abc123def456789012345",
  "message": "Extraction reassigned and re-analysis initiated"
}

The extraction status resets to uploaded and a new analysis job is triggered automatically. Reassignment history is tracked internally. Cannot reassign to the same document type the extraction is already assigned to.

POST/api/embed/extractions/bulk-change-document-type

Reassign multiple extractions to a different document type in a single request. Because the user has explicitly chosen the target type, AI re-classification is skipped and an extraction job is enqueued directly against the chosen template. Maximum 100 IDs per request.

Request Body:

• ids * — Array of extraction IDs to reassign (1–100, required)
• documentId * — The ID of the target document type to reassign to (required)

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "ids": [
      "507f1f77bcf86cd799439011",
      "507f1f77bcf86cd799439012",
      "507f1f77bcf86cd799439013"
    ],
    "documentId": "673abc123def456789012345"
  }' \
  https://extractor.decoded.digital/api/embed/extractions/bulk-change-document-type

Response (data):

{
  "message": "3 extraction(s) reassigned successfully",
  "reassignedCount": 3,
  "failedCount": 0,
  "skippedCount": 0,
  "requestedCount": 3,
  "newDocumentId": "673abc123def456789012345"
}

Extractions already assigned to the target document type or missing a file URL are counted in skippedCount. IDs that don't exist or fall outside your app scope are also counted as skipped. Each reassignment is recorded in the extraction's reassignmentHistory.

GET/api/embed/files

List all uploaded files with pagination

Query Parameters:

• page - Page number (default: 1)
• limit - Items per page (default: 20)
• search - Search by file name
• fileType - Filter by MIME type

curl -H "X-Session-Id: ess_xxx" \
  "https://extractor.decoded.digital/api/embed/files?page=1&limit=20"

Response (data):

{
  "items": [
    {
      "id": "...",
      "fileName": "doc.pdf",
      "fileType": "application/pdf",
      "fileSize": 12345,
      "fileUrl": "https://...",
      "status": "uploaded",
      "source": "embed",
      "appId": "673abc123def456789012345",
      "createdAt": "...",
      "updatedAt": "..."
    }
  ],
  "pagination": {
    "page": 1,
    "limit": 20,
    "totalCount": 50,
    "totalPages": 3,
    "hasNextPage": true,
    "hasPrevPage": false
  }
}

POST/api/embed/files

Upload a file without immediate extraction (supports PDF, images, text, Word docs)

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -F "file=@document.pdf" \
  -F "description=Invoice document" \
  https://extractor.decoded.digital/api/embed/files

Response (data):

{
  "id": "507f1f77bcf86cd799439011",
  "fileName": "document.pdf",
  "fileType": "application/pdf",
  "fileSize": 12345,
  "fileUrl": "https://...",
  "status": "uploaded",
  "message": "File uploaded successfully"
}

Webhooks

GET/api/embed/webhooks

List all webhooks for the authenticated tenant

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/webhooks

Response (data):

[
  {
    "_id": "507f1f77bcf86cd799439011",
    "name": "My Webhook",
    "url": "https://example.com/webhook",
    "scope": "all",
    "documentIds": [],
    "isActive": true,
    "tenantId": "...",
    "createdAt": "...",
    "updatedAt": "..."
  }
]

POST/api/embed/webhooks

Create a new outbound webhook to receive notifications when extractions complete

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Webhook",
    "url": "https://example.com/webhook",
    "scope": "all",
    "documentIds": []
  }' \
  https://extractor.decoded.digital/api/embed/webhooks

Request Body:

• name (required) - Webhook display name
• url (required) - HTTPS endpoint URL
• scope - "all" (default) or "selected"
• documentIds - Array of document template IDs (required when scope is "selected")

Response (data): created webhook object (HTTP 201)

PUT/api/embed/webhooks/:id

Update an existing webhook. All fields are optional.

curl -X PUT \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Updated Webhook",
    "url": "https://example.com/new-webhook",
    "isActive": false,
    "scope": "selected",
    "documentIds": ["507f1f77bcf86cd799439011"]
  }' \
  https://extractor.decoded.digital/api/embed/webhooks/507f1f77bcf86cd799439011

Response (data): updated webhook object

DELETE/api/embed/webhooks/:id

Delete a webhook

curl -X DELETE \
  -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/webhooks/507f1f77bcf86cd799439011

Response (data):

{ "message": "Webhook deleted successfully", "deletedId": "507f1f77bcf86cd799439011" }

embed tokens

GET/api/embed/embed-tokens

List all embed tokens for the authenticated tenant

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/embed-tokens

Response (data):

[
  {
    "_id": "507f1f77bcf86cd799439011",
    "name": "Production Key",
    "key": "ext_abc123...",
    "isActive": true,
    "expiresAt": null,
    "allowedDomains": ["example.com"],
    "usageCount": 42,
    "lastUsedAt": "...",
    "createdAt": "...",
    "updatedAt": "..."
  }
]

POST/api/embed/embed-tokens

Create a new embed token. The full key is only returned once in this response — store it securely.

curl -X POST \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Staging Key",
    "expiresAt": "2026-12-31T00:00:00Z",
    "allowedDomains": ["staging.example.com"]
  }' \
  https://extractor.decoded.digital/api/embed/embed-tokens

Request Body:

• name (required) - Display name for the key
• expiresAt (optional) - ISO 8601 expiration date (must be in the future)
• allowedDomains (optional) - Array of allowed domains for iframe embeds

Response (data): created embed token object with full key value (HTTP 201)

PUT/api/embed/embed-tokens/:id

Update an existing embed token. You cannot deactivate the key you are currently using for authentication.

curl -X PUT \
  -H "X-Session-Id: ess_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Renamed Key",
    "isActive": true,
    "expiresAt": "2027-06-30T00:00:00Z",
    "allowedDomains": ["example.com", "*.example.com"]
  }' \
  https://extractor.decoded.digital/api/embed/embed-tokens/507f1f77bcf86cd799439011

Response (data): updated embed token object

DELETE/api/embed/embed-tokens/:id

Delete an embed token. You cannot delete the key you are currently using for authentication.

curl -X DELETE \
  -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/embed-tokens/507f1f77bcf86cd799439011

Response (data):

{ "message": "embed token deleted successfully", "deletedId": "507f1f77bcf86cd799439011" }

Tenants

GET/api/embed/tenants/:id

Get your tenant details including integration status. You can only access your own tenant.

curl -H "X-Session-Id: ess_xxx" \
  https://extractor.decoded.digital/api/embed/tenants/507f1f77bcf86cd799439011

Response (data):

{
  "_id": "507f1f77bcf86cd799439011",
  "name": "Acme Corp",
  "status": "active",
  "integrations": {
    "microsoft": {
      "accounts": [
        {
          "accountId": "...",
          "email": "user@acme.com",
          "connectedAt": "..."
        }
      ]
    }
  },
  "createdAt": "...",
  "updatedAt": "..."
}

•Security Best Practices

Do

✓ Store embed tokens securely (environment variables)
✓ Use HTTPS for all requests
✓ Set domain restrictions for iframe embeds
✓ Use expiring embed tokens for temporary access
✓ Monitor API usage in your dashboard
✓ Rotate embed tokens periodically

Don't

✗ Expose embed tokens in client-side code
✗ Share embed tokens across different applications
✗ Store embed tokens in version control
✗ Use production keys in development
✗ Ignore embed token expiration warnings

Domain Restrictions

When creating an embed token, you can restrict which domains can use it for iframe embedding. This prevents unauthorized sites from using your embed.

# Examples of domain restrictions:
example.com          # Exact match
*.example.com        # All subdomains
app.example.com      # Specific subdomain
localhost            # Local development

•Extraction Status Flow

uploaded

analyzing

analyzed

extracting

completedorfailed

uploaded

File uploaded, waiting to be analyzed

analyzing

AI is analyzing the document to identify its type

analyzed

Analysis complete, document type identified

extracting

AI is extracting structured data from the document

completed

Extraction complete, data available in extractedData

analysis failed

Document analysis failed — retryable

extraction failed

Data extraction failed — retryable

failed

General failure — retryable

Retrying Failed Extractions

Extractions with status analysis failed, extraction failed, or failed can be retried via POST /api/embed/extractions/:id/retry. The extraction will be reset to uploaded and re-processed from the beginning.

•Response Format

Success Response

{ "data": { ... }, "error": null }

Error Response

{ "data": null, "error": "Error message" }

•Error Codes

400 Bad Request

Missing or invalid fields
Invalid file type or size > 10MB

401 Unauthorized

Missing or invalid session (X-Session-Id)
Session expired — create a new session
Invalid app secret for server-to-server calls

403 Forbidden

User not a member of this organization
Domain not allowed for iframe embed

404 Not Found

User with email not found
Resource not found or not in your tenant

409 Conflict

User already exists in this tenant

500 Server Error

Internal error — retry or contact support

Need Help?

Our team is here to help you integrate Extractor into your application. Reach out for technical support or custom integration requirements.

Embed Extractor in Your App

1Admin Dashboard Setup

2Quick Start

Register Application

Onboard Organization

Create Session & Embed

3Onboarding API (For Host Applications)

🚀 Getting Started as a Host Application

Request Body

Example Request

Response

Request Body

Example Request

Response

Example Request

4Session-Based Embed Auth (Recommended)

How It Works

Step 1: Create Session (from your backend)

Step 2: Embed in iframe

Step 3: Refresh Session (optional)

Step 4: Revoke Session (optional)

Full React Example

Best Practice: Cache Sessions in Your Database

Session API Reference

Security Notes

•Supported File Types

5Embed (iframe / JS / Python)

iFrame Embed

File Upload & Extraction

Manage Documents (Templates)

View Extractions

Settings (Integrations, Webhooks & Embed Tokens)

Available Embed Pages

Listen for Messages

6API Reference

Session-Based Auth

Authentication

Endpoints

Webhooks

embed tokens

Tenants

•Security Best Practices

Do

Don't

Domain Restrictions

•Extraction Status Flow

Retrying Failed Extractions

•Response Format

Success Response

Error Response

•Error Codes

Need Help?