Job

A Job in Alfred represents a single unit of work that performs a sequence of operations on one or more files for the purpose of document classification, extraction, and indexing. It is an asynchronous entity, orchestrated by a state machine that manages its progress through various stages.

For example, a Job could be responsible for ingesting a batch of scanned invoices, classifying them, extracting relevant fields, and then indexing them in a searchable database.

Responsibilities

Executes pre-defined workflows for document processing.
Manages state transitions and retries.
Orchestrates the processing of multiple files if applicable.
Emits events to signal significant changes in its lifecycle.

List all Jobs associated with the authorized user's company.

GET https://<env>.tagshelf.com/api/job/all

This endpoint provides a list of Jobs that are specific to the company to which the authorized user belongs. It does not list all Jobs in the entire system but rather filters them based on company affiliation. This ensures that users only access and manage Jobs relevant to their organizational context.

Query Parameters

Name

Type

Description

currentPage

Integer

The currentPage parameter specifies the page number in the paginated list of Jobs. Default value is 1.

pageSize

Integer

The pageSize parameter determines the number of Jobs displayed on each page of the paginated response. Default value is 10.

Value must be greater than 0.

Value must be less than 40.

sortDirection

String

Specifies whether the results are presented in ascending or descending order based on their creation date. Allowed values are:

ASC for ascending order (oldest to newest)

DESCfor descending order (newest to oldest).

Default value is DESC

Headers

Name

Type

Description

X-TagshelfAPI-Key

String

Application API Key

Authorization

String

Bearer <access_token> or amx <hmac_token>

{
  "jobs": [
    {
      "id": "78dd6181-00aa-4ded-aeaf-57803bbddae3",
      "creation_date": "2021-08-06T15:15:54.12",
      "update_date": "2022-11-17T21:12:27.813",
      "user_name": "Alfred Pennyworth",
      "channel": "api",
      "merge": false,
      "decompose": true,
      "propagate_metadata": false,
      "file_sources_count": 0,
      "file_count": 19,
      "finished_files": 7,
      "retries": 0,
      "metadata": "Job Specific Metadada"
      "exceeded_retries": false,
      "stage": "in_progress",
      "priority": "normal"
      "start_date": "2021-08-06T15:16:00.257",
    },
  ],
  "total": 65000
  "is_empty": false
  }
}

{ 
  "error": "Invalid Pagination Parameters", 
  "message": "The 'pageSize' parameter must be an integer between 10 and 40. The 'currentPage' parameter must be a positive integer." 
}

Retrieve detailed information about a specific Job.

GET https://<env>.tagshelf.com/api/job/detail/:id

This endpoint provides comprehensive details about a Job identified by its unique id. It's used to fetch the current status, progress, results, and any other relevant information about a specific Job. This is particularly useful for monitoring the ongoing process of a Job, understanding its current state, and for diagnostic purposes. The endpoint is essential for users or systems needing to track the progress and outcome of document processing tasks.

Path Parameters

Name

Type

Description

id*

UUID

The unique identifier of the Job.

Headers

Name

Type

Description

X-TagshelfAPI-Key

String

Application API Key

Authorization

String

Bearer <access_token> or amx <hmac_token>

{
  "id": "78dd6181-00aa-4ded-aeaf-57803bbddae3",
  "cancellation_reason": ""
  "creation_date": "2021-08-06T15:15:54.12",
  "has_job_request_info": false,
  "job_request_date": null,
  "update_date": "2022-11-17T21:12:27.813",
  "company_id": "286e2ed0-3626-4faa-a745-8ebf3488fbd7",
  "bulk_id": null,
  "deferred_session_id": null,
  "user_name": "SuperTagShelf",
  "channel": "api",
  "source": "",
  "container": "",
  "remote_file_name": "",
  "remote_file_names": "",
  "merge": false,
  "decompose": true,
  "propagate_metadata": false,
  "parent_file_prefix": "",
  "decomposed_page_rotation": -1,
  "metadata" : 
    {
      "metadata_field": "value"
    }
  "file_count": 19,
  "file_sources_count": 0,
  "metadata_objects_count": 0,
  "finished_files": 7,
  "files": [
    {
      "id": "8758c19f-815e-401f-ac6d-170a665d757d",
      "creation_date": "2021-08-06T15:17:21.793",
      "update_date": "2021-08-06T15:18:51.26",
      "file_name": "85a3ae7008b94322b8972e405803233d.png",
      "tag_name": "",
      "is_parent": false,
      "is_children": true,
      "status": "finished"
    },
    ...
  ],
  "retries": 0,
  "exceeded_retries": false,
  "file_urls": ["https://bucket.tagshelf.com/static/test.pdf"],
  "error_messages": [],
  "stage": "in_progress",
  "priority" : "normal"
  "input_source_type": "remote_url"
  "start_date": "2021-08-06T15:16:00.257",
  "email_from": "",
  "email_to": ""
  "email_subject": "",
  "email_body": "",
  "ip_address":""
}

Create a new Job and close the deferred upload session.

POST https://<env>.tagshelf.com/api/job/create

This endpoint is used for creating a new Job in Alfred. It serves a dual purpose: it finalizes the deferred upload session, ensuring that all files needed for the Job are uploaded, and initiates the Job itself. This process includes transitioning from the file upload phase to the document processing phase, where the files undergo classification, extraction, and indexing based on predefined workflows. The endpoint is crucial for starting the document processing task and is invoked once all necessary files are in place.

Headers

Name

Type

Description

X-TagshelfAPI-Key

string

Application API Key

Authorization

string

Bearer <access_token> or amx <hmac_token>

Request Body

Name

Type

Description

session_id*

UUID

Session ID

metadata

Object or array of Objects

This parameter accepts a JSON object that encapsulates various metadata fields for the Job.

The metadata provided here serves as a set of descriptors or attributes that apply to the Job as a whole. Once defined, this metadata is automatically propagated to each file that is part of the Job. This means that every file within this Job will inherit the specified metadata, ensuring consistency and contextual relevance across all files associated with the Job.

This feature is particularly useful for maintaining uniformity in file attributes, aiding in categorization, and enhancing the searchability and traceability of files within a Job.

merge

boolean

The merge parameter is a directive that instructs the file processor on how to handle multiple files within a Job. When set, and provided all files in the Job are either images or PDFs, this parameter signals that these files should be treated as a single unit of work. This means that instead of processing each file independently, the system combines them into a single file for the purpose of processing.

This approach is particularly beneficial when the files are parts of a larger document or dataset that need to be handled cohesively.

For example, if multiple scanned images of a document or several PDFs are part of a Job, setting merge ensures they are processed together, maintaining the continuity and integrity of the document. This results in the Job having a singular output file, despite originating from multiple input files.

decompose

boolean

The decompose parameter plays a critical role in how the file processor handles the input files. When enabled, this parameter triggers the decomposition of file inputs into multiple, distinct units of work. This is applicable in scenarios where a single file contains multiple separable components.

For example, in the case of a multi-page PDF, each page is treated as an individual file. Similarly, for an image or a PDF page containing multiple documents, each document is separated and processed independently.

This parameter is vital for tasks that require individual attention to each component of a file, such as detailed analysis, classification, or extraction of data from each part of a larger document. Decomposition enhances the granularity of processing, enabling more precise and targeted handling of each segment within a file.

propagate_metadata

boolean

This parameter enables the specification of a single metadata object to be applied across multiple files from remote URLs or remote sources. When used, propagate_metadata ensures that the defined metadata is consistently attached to all the specified files during their upload and processing. This feature is particularly useful for maintaining uniform metadata across a batch of files, streamlining data organization and retrieval.

priority

string

Sets the Job's priority in the processing queue. Valid values are: low, normal or high.

{
    "job_id": "ce533119-4266-4bfe-aa32-9466f751ca61"
}

{
  "error": "Bad Request",
  "message": "Invalid input parameters. Please check your data."
}

Last updated 7 days ago

Was this helpful?