Skip to main content

Parse DOCX

Path: .cursor/skills/parse-docx/SKILL.md

Category: Data Parsing

Triggers: .doc files, .docx files, Word documents, document parsing

What It Does

Python CLI tool using python-docx for parsing Word documents. Supports inspecting document structure, extracting paragraphs/tables/headings, and exporting to CSV. Handles .doc files by auto-converting via macOS textutil.

Commands

DOC_PY=".cursor/skills/parse-docx/scripts/.venv/bin/python3"
DOC_SCRIPT=".cursor/skills/parse-docx/scripts/docx_parser.py"
CommandPurpose
$DOC_PY $DOC_SCRIPT "file.docx" inspectDocument structure overview (paragraph count, table count, heading count)
$DOC_PY $DOC_SCRIPT "file.docx" textFull text extraction
$DOC_PY $DOC_SCRIPT "file.docx" tablesExtract all tables
$DOC_PY $DOC_SCRIPT "file.docx" headingsDocument outline (heading hierarchy)

Handling .doc Files

Older .doc format files are auto-converted to .docx via macOS textutil before parsing. This happens transparently — pass the .doc path and the skill handles conversion.

Common Use Case

Parsing Safeco integration specs and Progressive PRD documents stored in docs/.