OpenText home page.
AI Content Management

OpenText File Content Extraction (IDOL)

Identify, extract, and transform content with file extraction software

Computer monitor illustrating file extraction

How complete is your file content extraction software?

Uniform and consistent access to content and unstructured data is critical for today’s AI and analytics workflows and processes. File content extraction identifies and extracts file contents, unlocking unprecedented possibilities for your solution.

OpenText™ File Content Extraction (IDOL), part of the overall OpenText Knowledge Management solution, provides file format detection, text extraction, decryption, subfile processing and decompression, non-native rendering, and structured export. It understands over 2,200 file formats without the need for the originating software.

Why OpenText File Content Extraction?

Unleash the power of your content with an AI-driven solution that can identify, extract, and transform over 2,200 file formats; streamline content access; and ensure compliance—unlocking insights for smarter decisions.

  • 2,200+
    Content types
    Reach your content, whatever it is.
  • Exhaustive
    Extract office documents, compressed archives, and more
    Access nearly any file’s content, including legacy formats—word documents, spreadsheets, slides, CAD and zip files, and files with passwords.
  • Modular
    Integrate with any existing architecture
    Extend the functionality of current applications and workflows by deploying the modular service as part of existing architecture.

Use cases

Get more out of your data with accurate file format identification, content decryption, text extraction, subfile processing, non-native rendering, and structured export.

  • Incorporate deep content visibility to your service or application—quickly, reliably, and without the need for ongoing development. A ready-to-go SDK, complete with sample code, accelerates your product’s time-to-market and frees your engineering team to spend their time on higher-value work.

  • Support a wide range of applications, formats, and languages, enabling your organization to work across geographies, industries, and business types. Continual updates make sure you’re always on top of changes and additions.

  • Get the greatest visibility into your data, with file extraction software that captures metadata, textual data, hidden data—like tracked changes, cached content, and accessibility data—embedded sub-files and more.

  • Maximize throughput, minimize latency, reduce CPU cost, decrease install size, and optimize memory footprint. OpenText File Content Extraction is designed to deliver ideal performance.

    Key features

    Transform customer experience with accurate file format identification, content decryption, text extraction, subfile processing, non-native rendering, and structured export, plus support for 2,200+ formats across all major client and server-side platforms.

    File format detection

    Reduces the risk of misprocessing crucial information or wasting valuable CPU time on irrelevant files by quickly and accurately identifying file types.

    Rights management

    Identifies rights-management protected files from Microsoft, Seclore, and SmartCipher.

    Metadata access

    Quickly accesses file metadata such as XMP, XrML, IPTC, EXIF, Boldon-James classification, and format-specific fields.

    Character set conversion

    Prepares for downstream processes, which usually expect UTF-8 input. Automatically determines the character set used within a document—even if it’s not specified in the metadata.

    Text extraction

    Extracts plain text content by removing format scaffolding and other noise at speed. Goes deep into a wide variety of document formats, extracting body text and other visible components.

    HTML and PDF export

    Previews documents in high-fidelity HTML so documents can be viewed even without the appropriate plug-in or native application. Archives files in PDF format, ensuring document content can be frozen.


    Accelerate the value of OpenText File Content Extraction

    Professional Services

    OpenText Professional Services combines end-to-end solution implementation with comprehensive technology services to help improve systems.

    Partners

    OpenText helps customers find the right solution, the right support, and the right outcome.

    Communities

    Explore our OpenText communities. Connect with individuals and companies to get insight and support. Get involved in the discussion.

    Premium Support

    Optimize the value of your OpenText solution with dedicated experts who provide mission-critical support for your complex IT environment.

    OpenText File Content Extraction resources

    Censornet logo

    Censornet added value to its cybersecurity solution

    Learn more
    Fortra logo

    Digital guardian enhanced data security and control

    Learn more
    TELUS logo

    Telus enabled fast, search-box access to 6 million service addresses

    Learn more

    OpenText File Content Extraction

    Read the data sheet

    OpenText File Content Extraction

    Read the product overview

    OpenText File Content Extraction

    Read the data sheet

    OpenText File Content Extraction

    Read the product overview
    • OpenText File Content Extraction unlocks hidden value from text, metadata, and subfiles from 2200+ file formats. It reduces manual processing time to free your team for higher-value tasks, and it identifies sensitive data—like PII—with precision, helping you stay ahead of regulatory requirements.

    • More than just a file reader, it’s an enterprise-grade powerhouse that supports 2200+ file formats, extracts hidden text and metadata, and offers flexible output options. With its ability to decrypt protected files and handle complex containers, it delivers unmatched versatility and accuracy.

    • OpenText File Content Extraction is ideal for software developers, OEMs, and enterprises across industries. Whether you’re building a security solution, enhancing a search platform, or managing legacy archives, it empowers you to process and leverage data effortlessly.

    • OpenText File Content Extraction detects and processes over 2,200 unique file formats, from everyday files like PDFs and Word docs to niche formats like CAD drawings or legacy archives. With continuous updates, it stays ahead of the ever-evolving file format landscape.

    • Yes! It includes tools like Panopticon to decrypt files protected by Microsoft Azure Information Protection (AIP) or Rights Management System (RMS), ensuring you can access and process the original, unencrypted content securely.

    • It extracts:

      • Visible text: What users see in documents.
      • Hidden text: Comments, tracked changes, or accessibility text in PDFs.
      • Metadata: Author details, creation dates, security classifications, and more.
      • Subfiles: Embedded content in archives, emails, or documents—like images or attachments

    • OpenText File Content Extraction transforms extracted content into usable formats:

      • HTML: For web viewing or embedding in apps.
      • XML: Structured data for indexing or parsing.
      • PDF: High-fidelity versions for easy sharing or archiving.

    • Yes, you can. OpenText File Content Extraction, as well as additional SDKs and services, are available as OpenText OEM solutions. Add high-performance file processing capabilities directly to your application.

      For more information, please visit our OEM Marketplace.

      July 2, 2025

      What’s new in OpenText™ Knowledge Discovery

      See what all is new within OpenText Knowledge Discovery.

      Read the blog
      March 7, 2025

      AI-first government productivity and efficiency

      Build an AI strategy for government use cases with a content-focused knowledge management approach.

      Read the blog

      Take the next step

      Discover how you can reach all your content.

      Reach out for a demo