DOCX to Text

Convert DOCX files to formatted text

Overview

This node converts DOCX files from binary input data into plain formatted text. It extracts the raw textual content from DOCX documents, optionally including filename metadata in the output. This is useful for workflows that need to process or analyze text content from Word documents without manual extraction.

Common scenarios include:

  • Extracting text from uploaded DOCX files for indexing or searching.
  • Converting DOCX attachments into plain text for further processing like sentiment analysis or translation.
  • Automating document content extraction in document management systems.

Properties

Name Meaning
Binary Property The name of the binary property containing the DOCX file to be converted.
Options Collection of additional options:
- Include Filename Whether to include the original filename (without extension) and file extension in output.

Output

The node outputs JSON objects with the following structure:

  • text: The extracted plain text content from the DOCX file.
  • messages: Any messages or warnings generated during text extraction.
  • If the option "Include Filename" is enabled and the binary data has a filename:
    • filename: The original filename without its extension.
    • fileExtension: The file extension in lowercase.
  • The output JSON merges any existing JSON data from the input item.

No binary output is produced; the node focuses on extracting and returning textual content.

Dependencies

  • Requires the mammoth library for DOCX text extraction.
  • Needs access to binary data input containing DOCX files.
  • No external API keys or services are required.
  • Runs entirely within n8n environment.

Troubleshooting

  • Error: "No binary data exists on item!"
    Occurs if the input item lacks any binary data. Ensure the input contains binary data items.

  • Error: "No binary data property "" does not exists on item!"
    Happens when the specified binary property name does not exist on the input item. Verify the correct binary property name is set.

  • If the node fails but "Continue On Fail" is enabled, errors will be returned as JSON error messages instead of stopping execution.

  • Make sure the binary data actually contains valid DOCX files; corrupted or unsupported files may cause extraction issues.

Links and References

Discussion