File

In Alfred, a file is an individual document or data unit undergoing specialized operations tailored for document analysis and management. It is processed by the platform for tasks such as classification, optical character recognition (OCR), and data extraction. Each file, as a distinct entity, transitions through various statuses, which mark its progress in the automated workflow.

For example, a file could be a digital version of a business contract, initially uploaded for processing, then analyzed for key content, classified according to its nature, and finally integrated into a database for efficient access and utilization. This process underscores the file's role in Alfred's robust document handling ecosystem.

Roles and Responsibilities

  • Data Carrier and Information Source: At its core, a file in Alfred serves as a carrier of data and a source of information. From the moment it is uploaded, through various processing stages, to its final state, it retains and conveys valuable information that is key to document processing tasks.

  • Subject of Automated Processes: Throughout its lifecycle, the file acts as the subject of various automated processes like OCR, classification, extraction, and indexing. Its role is to be efficiently processed by these automated systems, ensuring accurate and effective handling of the document's content.

  • Facilitator of Workflow Progression: As the file moves through different statuses, it facilitates the progression of the overall workflow in Alfred. Each change in status reflects a step forward in the document processing pipeline, contributing to the workflow's momentum.

  • Intermediary for User Interaction: In stages requiring user input or confirmation, the file acts as an intermediary between the automated system and the user. It presents processed information for user review, validation, or further input, bridging the gap between automation and human oversight.

  • Indicator of System Health and Efficiency: The file's progression through various statuses also serves as an indicator of the system's health and efficiency. Stages like error handling or re-queuing provide insights into potential issues or bottlenecks in the workflow, prompting necessary adjustments or improvements.

  • End-Point of Processing and Analysis: Finally, in its concluded states, whether successfully processed, tagged, or in error and deletion stages, the file represents the end-point of document processing and analysis. It signifies the completion of a cycle within the Alfred platform, either culminating in successful integration into the system or identifying areas for system refinement.

File Retrieval EndPoints

Get all files from your company

GET https://tagshelf.host/api/file/all

Get a list of all files within your company context.

Headers

Name
Type
Description

X-TagshelfAPI-Key

String

Application API Key

Authorization

String

Bearer <access_token> or amx <hmac_token>

[
    {
        "id": "eada91d5-ea19-4a80-afbe-166c12649d52",
        "creation_date": "2020-08-12T00:00:00.000",
        "update_date": "2020-08-12T00:00:00.000",
        "file_name": "SAMPLE.pdf",
        "file_name_without_extension": "SAMPLE",
        "blob_name": "Tag/SAMPLE.pdf",
        "blob_url": "https://demo.bucket.com/Tag/SAMPLE.pdf",
        "user_name": null,
        "md5_hash": "KTuG8VB6fviFQF6sshyGvQ==",
        "content_type": "application/pdf",
        "channel": "web",
        "should_be_classified": false,
        "classifier": "bayes",
        "classification_score": 0.98,
        "status": "waiting_to_be_sent_to_third_party",
        "is_duplicate": false,
        "duplicate_origin_id": null,
        "tag_id": "49973d90-03c3-4f2a-b556-fd1a2f5635b9",
        "is_parent": false,
        "parent_id": null,
        "deferred_session_id": null,
        "tag_name": "Sale Invoices",
        "company_id": "c46109f2-a507-47da-ae05-f839be543855",
        "file_size": 222222,
        "proposed_tag_id": "1c40a594-4b0f-4f8c-b534-73f64e61b2ea",
        "proposed_tag_variance": 0.000632407575208874,
        "classification_score_above_deviation": false,
        "confirmed_tag_id": "1c40a594-4b0f-4f8c-b534-73f64e61b2ea",
        "confirmed_by": "[email protected]",
        "manual_classification": false
    },
    ...
]

Get file details by ID

GET https://tagshelf.host/api/file/detail/:id

Get all file details by File ID.

Path Parameters

Name
Type
Description

id*

String

File ID

Headers

Name
Type
Description

X-TagshelfAPI-Key

String

Application API Key

Authorization

String

Bearer <access_token> or amx <hmac_token>

{
  "id": "eada91d5-ea19-4a80-afbe-166c12649d52",
  "creation_date": "2020-08-12T00:00:00.000",
  "update_date": "2020-08-12T00:00:00.000",
  "file_name": "SAMPLE.pdf",
  "file_name_without_extension": "SAMPLE",
  "blob_name": "Tag/SAMPLE.pdf",
  "blob_url": "https://demo.bucket.com/Tag/SAMPLE.pdf",
  "user_name": null,
  "md5_hash": "KTuG8VB6fviFQF6sshyGvQ==",
  "content_type": "application/pdf",
  "channel": "web",
  "should_be_classified": false,
  "classifier": "bayes",
  "classification_score": 0.98,
  "status": "waiting_to_be_sent_to_third_party",
  "is_duplicate": false,
  "duplicate_origin_id": null,
  "tag_id": "49973d90-03c3-4f2a-b556-fd1a2f5635b9",
  "is_parent": false,
  "parent_id": null,
  "deferred_session_id": null,
  "tag_name": "Sale Invoices",
  "company_id": "c46109f2-a507-47da-ae05-f839be543855",
  "file_size": 222222,
  "proposed_tag_id": "1c40a594-4b0f-4f8c-b534-73f64e61b2ea",
  "proposed_tag_variance": 0.000632407575208874,
  "classification_score_above_deviation": false,
  "confirmed_tag_id": "1c40a594-4b0f-4f8c-b534-73f64e61b2ea",
  "confirmed_by": "[email protected]",
  "manual_classification": false
}

Get file events by ID

GET https://tagshelf.host/api/file/events/:id

Get a list of events triggered on the specified File.

Path Parameters

Name
Type
Description

id*

String

File ID

Headers

Name
Type
Description

X-TagshelfAPI-Key

String

Application API Key

Authorization

String

Bearer <access_token> or amx <hmac_token>

[
  {
    "id": "de0e8920-bc18-405d-89c3-4e741bd2d7a0",
    "creation_date": "2022-11-17T00:00:00.000",
    "update_date": "2022-11-17T00:00:00.000",
    "file_id": "9fb3b05e-36f2-4528-a7fc-67d13d3c4e83",
    "event_name": "alfred_event_file_done",
    "event_envelope": "{}",
    "sent_to_event_broker": true,
    "sent_to_web_hook": true
  },
  ...
]

Get file versions by ID

GET https://tagshelf.host/api/file/versions/:id

Get a list of versions from the specified File.

Path Parameters

Name
Type
Description

id*

String

File ID

Headers

Name
Type
Description

X-TagshelfAPI-Key

String

Application API Key

Authorization

String

Bearer <access_token> or amx <hmac_token>

[
    {
        "id": "d854a6d3-652e-4d00-9cd9-563b94e0c6ab",
        "update_date": "2020-08-12T00:00:00.000",
        "creation_date": "2020-08-12T00:00:00.000",
        "file_log_id": "9ddb12be-2375-4891-9edc-ec457adc252c",
        "content": ""
    },
    ...
]

Download file by ID

GET https://tagshelf.host/api/file/download/:id

Returns a byte stream representing the data for a given file.

Headers

Name
Type
Description

X-TagshelfAPI-Key

String

Application API Key

Authorization

String

Bearer <access_token> or amx <hmac_token>

Return the File as a stream.

POST https://tagshelf.host/api/file/share/:id

Generate a shareable link based on your desired configuration on expiration time, URL format (short or long), etc.

Headers

Name
Type
Description

Content-Type

String

application/json

Authorization

String

Bearer <access_token> or amx <hmac_token>

X-TagshelfAPI-Key

String

Application API Key

Request Body

Name
Type
Description

length

Integer

Shareable link duration in seconds. After the defined time elapsed, the link will no longer works.

short_url

Boolean

Whether or not you would like use a short link instead of the original link.

always_use_last_version

Boolean

Whether or not you would like to use the latest version of the File or simply use the last version available during the shareable link creation.

{
    "file_id": "6e5ef3ed-71c5-442d-97b2-d4992edb14ce",
    "share_url": "https://staging.tagshelf.com//api/file/public/01debd6b-31d8-4a95-9b83-9d0d5e2f8ed6"
}

File Upload EndPoints

Upload Files from Remote Sources

POST https://tagshelf.host/api/file/upload

This endpoint facilitates the uploading of files from remote sources, accommodating various types of external storage like URLs, or Blob storage from cloud providers including AWS, GCP, or DigitalOcean.

This endpoint is specifically tailored for integrating files stored in diverse remote locations into the Alfred platform for processing.

In Alfred, uploading a file via this endpoint contributes to the initiation of a Job, which encompasses a series of processes over several files. Each file uploaded becomes a processing unit within this Job. The individual processing of each file, involving tasks like classification, OCR, and data extraction, starts only after the Job is initiated. The Job is considered complete when all its associated files have finished their respective processing cycles.

This design ensures that while each file undergoes its unique processing workflow, it remains a cohesive part of the larger Job, aligning with Alfred's approach to handling multiple files in a coordinated and efficient manner.

Headers

Name
Type
Description

Content-Type

string

application/json

Authorization

string

Bearer <access_token> or amx <hmac_token>

X-TagshelfAPI-Key

string

Application API Key

Request Body

Name
Type
Description

url or urls

string or array of strings

Use url, when you have an URl to single remote file. Use urls, when you have URl's for multiple remote files. the current limit for this parameter is 100 elements.

source

string

Configured object storage source name. Ideal for referring to files hosted in existing cloud containers. When used, file_name are container are required.

container

string

Virtual container where the referenced remote file is located. When used, source and file_name are required.

file_name or file_names

string or array of strings

Unique identifier or identifiers of file within an object storage source. When used, source and container are required.

merge

boolean

Boolean value [true/false] - When set to true, will merge all of the remote files into a single PDF file. All of the remote files MUST be images. By default this field is set to false.

metadata

string

JSON object or JSON array of objects containing metadata fields for a given remote file. When merge field is set to false: When using the urls field this should be a JSON object array that matches the urls field array length. When using the url field the metadata field should be a JSON object. When the merge field is set to true: The metadata field should be a JSON object.

priority

string

Sets the Job's priority in the processing queue. Valid values are: low, normal or high.

propagate_metadata

boolean

This parameter enables the specification of a single metadata object to be applied across multiple files from remote URLs or remote sources. When used, propagate_metadata ensures that the defined metadata is consistently attached to all the specified files during their upload and processing. This feature is particularly useful for maintaining uniform metadata across a batch of files, streamlining data organization and retrieval.

parent_file_prefix

string

The parent_file_prefix parameter is used to specify a virtual folder destination for the uploaded files, diverging from the default 'Inbox' folder. By setting this parameter, users can organize files into specific virtual directories, enhancing file management and accessibility within Alfred's system.

page_rotation

integer

This parameter allows for the manual rotation of remote files upon upload. page_rotation is applied to ensure the correct orientation of the files for processing, which is crucial for tasks like OCR and document analysis. Represents rotation in degrees.

{
  "job_id": "559167cb-4c76-4e28-bf15-34cbf614119c"
}

Upload file by stream

POST https://tagshelf.host/api/file/uploadfile

Uploads a single remote file to the system's pipeline. A successful response will be returned in JSON with a file's unique identifier for the submitted workflow. If the request fails, the response will contain a message to help understand what went wrong. File processing is asynchronous, there fore a polling policy (within the API rate limits) should be implemented that queries the /api/file/detail end-point and evaluates its response body and the file statuses. Anatomy of the Request

Headers

Name
Type
Description

Content-Type

string

multipart/form-data

Authorization

string

Bearer <access_token> or amx <hmac_token>

X-TagshelfAPI-Key

string

Application API Key

Request Body

Name
Type
Description

metadata

string

String which represents a JSON object or JSON array of objects containing metadata fields for a given remote file.

session_id

string

Session ID to link multiple files to a job.

file*

file or string

Allows the upload of either a local file object or a base64-encoded string of the file.

{
  "file_id": "559167cb-4c76-4e28-bf15-34cbf614119c"
}

Last updated

Was this helpful?