Word
Configuration
Microsoft Word source connector integrates with the Microsoft Graph API.
Synchronizes Word documents from Microsoft OneDrive and SharePoint. Documents are processed through Graffo's file handling pipeline which:
Downloads the .docx/.doc file
Converts to markdown for text extraction
Chunks content for vector search
Indexes for semantic search
It provides comprehensive access to Word documents with proper token refresh and rate limiting.
Source Code: View on GitHub
Authentication
This connector uses OAuth 2.0 authentication. You can connect through the Graffo UI or API using the OAuth flow.
Supported authentication methods:
OAuth Browser Flow (recommended for UI)
OAuth Token (for programmatic access)
Auth Provider (enterprise SSO)
Configuration Options
This connector does not have any additional configuration options.
Data Models
The following data models are available for this connector:
WordDocumentEntity
Schema for a Microsoft Word document as a file entity.
Represents Word documents (.docx, .doc) stored in OneDrive/SharePoint. Extends FileEntity to leverage Graffo's file processing pipeline which will:
Download the Word document
Convert it to markdown using document converters
Chunk the content for indexing
Reference: https://learn.microsoft.com/en-us/graph/api/resources/driveitem
id
str
Drive item ID for the Word document.
title
str
Human-readable title for the document.
created_datetime
Optional[datetime]
When the document was created.
last_modified_datetime
Optional[datetime]
When the document was last modified.
web_url_override
Optional[str]
URL to open the document in Word Online.
content_download_url
Optional[str]
Direct download URL for the document content.
created_by
Optional[Dict[str, Any]]
Identity of the user who created the document.
last_modified_by
Optional[Dict[str, Any]]
Identity of the user who last modified the document.
parent_reference
Optional[Dict[str, Any]]
Information about the parent folder/drive location.
drive_id
Optional[str]
ID of the drive containing this document.
folder_path
Optional[str]
Full path to the parent folder.
description
Optional[str]
Description of the document if available.
shared
Optional[Dict[str, Any]]
Information about sharing status of the document.
Last updated
