rckflr-imagetotext

Package Information

Released: 6/21/2025

Downloads: 14 weekly / 283 monthly

Latest Version: 0.1.2

Author: rckflr

Available Nodes

Image to Text (Captioning)

Generates a textual description (caption) for an image using Transformers.js.

Documentation

Read in English | Leer en Español | Ler em Português

n8n Node: Image to Text (Captioning)

This is a community node for n8n that allows you to generate a textual description (caption) for a given image. It leverages the power of Hugging Face's Transformers.js library to run state-of-the-art AI models directly within your n8n workflow, without needing external APIs.

The node can process images from a URL or from binary data passed by a previous node.

Installation

Go to Settings > Community Nodes in your n8n instance.
Select Install a community node.
Enter n8n-nodes-rckflr-imagetotext in the search box.
Click Install.

After installation, the "Image to Text (Captioning)" node will be available in the nodes panel.

Performance Note

Since this node runs the AI model locally on the machine where n8n is installed, performance can vary significantly based on the available hardware resources (CPU, RAM). The initial model download and the caption generation time will be faster on more powerful machines.

Usage

The node takes an image as input and returns the generated caption in the caption field of the JSON output.

Input

Image Input: This field accepts either:
- A public URL of an image (e.g., https://example.com/image.jpg).
- The name of the binary property from a previous node. For example, if a Read Binary File node outputs data in a property named data, you would use an expression and enter data in this field.

Parameters

Model: Choose the image captioning model to use.
- ViT-GPT2 Image Captioning (Default): A robust general-purpose model.
- BLIP Image Captioning (Base): A more modern and often more accurate model.
- BLIP Image Captioning (Large): A larger version of BLIP for potentially better results at the cost of performance.
Output Caption Field: The name of the field where the generated caption will be stored. Defaults to caption.
Max New Tokens: (Optional) Controls the maximum length of the generated caption. Defaults to 50.
Include Full Output: (Optional) If enabled, includes the full, raw output from the model in a field named [Output Field Name]_full.

Output Example

If the Output Caption Field is set to caption, the output will look like this:

{
  "caption": "a cat is sitting on a couch",
  "other_input_field": "some_value"
}

Example Workflow

Here is a basic example of a workflow that reads an image from a URL and generates a caption.

{
  "nodes": [
    {
      "parameters": {},
      "name": "Start",
      "type": "n8n-nodes-base.start",
      "typeVersion": 1,
      "position": [
        250,
        300
      ]
    },
    {
      "parameters": {
        "imageInput": "https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.png",
        "model": "Xenova/vit-gpt2-image-captioning",
        "outputFieldName": "image_caption"
      },
      "name": "Image to Text",
      "type": "imageToTextCaptioning",
      "typeVersion": 1,
      "position": [
        450,
        300
      ],
      "credentials": {}
    }
  ],
  "connections": {
    "Start": {
      "main": [
        [
          {
            "node": "Image to Text",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  }
}

To use this workflow, copy the JSON and paste it into your n8n canvas.

Compatibility

Requires n8n version 1.0 or later.
Requires Node.js version 20.15 or later.

License

MIT