Parse DOCX
Path: .cursor/skills/parse-docx/SKILL.md
Category: Data Parsing
Triggers: .doc files, .docx files, Word documents, document parsing
What It Does
Python CLI tool using python-docx for parsing Word documents. Supports inspecting document structure, extracting paragraphs/tables/headings, and exporting to CSV. Handles .doc files by auto-converting via macOS textutil.
Commands
DOC_PY=".cursor/skills/parse-docx/scripts/.venv/bin/python3"
DOC_SCRIPT=".cursor/skills/parse-docx/scripts/docx_parser.py"
| Command | Purpose |
|---|---|
$DOC_PY $DOC_SCRIPT "file.docx" inspect | Document structure overview (paragraph count, table count, heading count) |
$DOC_PY $DOC_SCRIPT "file.docx" text | Full text extraction |
$DOC_PY $DOC_SCRIPT "file.docx" tables | Extract all tables |
$DOC_PY $DOC_SCRIPT "file.docx" headings | Document outline (heading hierarchy) |
Handling .doc Files
Older .doc format files are auto-converted to .docx via macOS textutil before parsing. This happens transparently — pass the .doc path and the skill handles conversion.
Common Use Case
Parsing Safeco integration specs and Progressive PRD documents stored in docs/.