Here are step by step instructions on how to extract comments in a PDF file. The steps to do it using Adobe PRO are documented here. This posts describes how to do it if you do not have Adobe PRO.
There are two solutions I found searching online: one uses the poppler library and another uses PyMuPDF (fitz) library. The poppler solution had lot of upvotes so I tried to use that but couldn’t get it to work. The problem I ran into was how to install poppler on Mac. I installed it using conda and the installation did not give me any errors (here is the installation log) but when I tried to import the import did not work. Luckily I was able to get PyMuPDF to work. Below are step by step instructions.
Step 0: Install Anaconda
Step 1: Create new Environment
conda create -n py38 python=3.8
Step 2: Activate the Environment
conda activate py38
Step 3: Install PyMuPDF
The command that worked for me is
python -m pip install --upgrade pymupdf
Here is the output when I ran the command:
Collecting pymupdf
Downloading PyMuPDF-1.19.1-cp38-cp38-macosx_10_9_x86_64.whl (7.6 MB)
|████████████████████████████████| 7.6 MB 1.5 MB/s
Installing collected packages: pymupdf
Successfully installed pymupdf-1.19.1
Step 4: Create script to extract the comments
I got it from here.
#!/usr/bin/env python
import sys
import fitz
doc = fitz.open(sys.argv[1])
for i in range(doc.pageCount):
page = doc[i]
for annot in page.annots():
print(annot.info["content"])
Step 5: Run the script
Left as exercise