Word

Configuration

Microsoft Word source connector integrates with the Microsoft Graph API.

Synchronizes Word documents from Microsoft OneDrive and SharePoint. Documents are processed through Graffo's file handling pipeline which:

  • Downloads the .docx/.doc file

  • Converts to markdown for text extraction

  • Chunks content for vector search

  • Indexes for semantic search

It provides comprehensive access to Word documents with proper token refresh and rate limiting.

Source Code: View on GitHub

Authentication

This connector uses OAuth 2.0 authentication. You can connect through the Graffo UI or API using the OAuth flow.

Supported authentication methods:

  • OAuth Browser Flow (recommended for UI)

  • OAuth Token (for programmatic access)

  • Auth Provider (enterprise SSO)

Configuration Options

This connector does not have any additional configuration options.

Data Models

The following data models are available for this connector:

WordDocumentEntity

Schema for a Microsoft Word document as a file entity.

Represents Word documents (.docx, .doc) stored in OneDrive/SharePoint. Extends FileEntity to leverage Graffo's file processing pipeline which will:

  • Download the Word document

  • Convert it to markdown using document converters

  • Chunk the content for indexing

Reference: https://learn.microsoft.com/en-us/graph/api/resources/driveitem

Field
Type
Description

id

str

Drive item ID for the Word document.

title

str

Human-readable title for the document.

created_datetime

Optional[datetime]

When the document was created.

last_modified_datetime

Optional[datetime]

When the document was last modified.

web_url_override

Optional[str]

URL to open the document in Word Online.

content_download_url

Optional[str]

Direct download URL for the document content.

created_by

Optional[Dict[str, Any]]

Identity of the user who created the document.

last_modified_by

Optional[Dict[str, Any]]

Identity of the user who last modified the document.

parent_reference

Optional[Dict[str, Any]]

Information about the parent folder/drive location.

drive_id

Optional[str]

ID of the drive containing this document.

folder_path

Optional[str]

Full path to the parent folder.

description

Optional[str]

Description of the document if available.

shared

Optional[Dict[str, Any]]

Information about sharing status of the document.

Last updated