I needed to upload a pdf to a website. I did not want this PDF traced back to me.
I looked at the file and noticed there was a lot of information that could trace that specific pdf back to me.
Read the Metadata on the file
There are various methods to view meta data and attributes on a file.
One way is to use mac’s: right click a file -> get Info
. There you can see some attributes.
Another way is to use pdfinfo
:
$ pdfinfo go-in-action.pdf
Title: Go in Action
Author: William Kennedy with Brian Ketelsen and Erik St. Martin
Creator: FrameMaker 8.0
Producer: Mac OS X 10.12.4 Quartz PDFContext
CreationDate: Wed Jun 21 04:28:44 2017 SAST
ModDate: Wed Jun 21 04:28:44 2017 SAST
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 266
Encrypted: no
Page size: 531 x 666 pts
Page rot: 0
File size: 6075788 bytes
Optimized: no
PDF version: 1.3
A generic option for all files is mdls
(metadata list):
$ mdls go-in-action.pdf
kMDItemAuthors = (
"William Kennedy with Brian Ketelsen and Erik St. Martin"
)
kMDItemContentCreationDate = 2020-04-05 08:18:55 +0000
kMDItemContentCreationDate_Ranking = 2020-04-05 00:00:00 +0000
kMDItemContentModificationDate = 2020-04-05 08:18:55 +0000
kMDItemContentType = "com.adobe.pdf"
kMDItemContentTypeTree = (
"com.adobe.pdf",
"public.item",
"com.adobe.pdf",
"public.data",
"public.composite-content",
"public.content"
)
kMDItemCreator = "FrameMaker 8.0"
kMDItemDateAdded = 2020-04-06 06:13:25 +0000
kMDItemDateAdded_Ranking = 2020-04-06 00:00:00 +0000
kMDItemDisplayName = "go-in-action.pdf"
kMDItemEncodingApplications = (
"Mac OS X 10.12.4 Quartz PDFContext"
)
kMDItemFSContentChangeDate = 2020-04-05 08:18:55 +0000
kMDItemFSCreationDate = 2020-04-05 08:18:55 +0000
kMDItemFSCreatorCode = ""
kMDItemFSFinderFlags = 0
kMDItemFSHasCustomIcon = (null)
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSIsStationery = (null)
kMDItemFSLabel = 0
kMDItemFSName = "go-in-action.pdf"
kMDItemFSNodeCount = (null)
kMDItemFSOwnerGroupID = 20
kMDItemFSOwnerUserID = 501
kMDItemFSSize = 6075788
kMDItemFSTypeCode = ""
kMDItemInterestingDate_Ranking = 2020-04-05 00:00:00 +0000
kMDItemKind = "PDF document"
kMDItemLogicalSize = 6075788
kMDItemNumberOfPages = 266
kMDItemPageHeight = 666
kMDItemPageWidth = 531
kMDItemPhysicalSize = 6078464
kMDItemSecurityMethod = "None"
kMDItemTitle = "Go in Action"
kMDItemVersion = "1.3"
mdls
gets more attributes thanpdfinfo
But that is not all. There are other attributes called extended attributes that you can access with the command line binary xattr
:
$ xattr -l go-in-action.pdf
com.dropbox.attrs:
00000000 0A 12 0A 10 87 7B DA AB 3A 51 09 70 00 00 00 00 |.....{..:Q.p....|
00000010 00 00 14 84 10 BB E2 C9 B0 01 |..........|
0000001a
The
xattrs
often contain theFromWhere
A tool called exiftool
is also helpful:
$ exiftool go-in-action.pdf
ExifTool Version Number : 12.42
File Name : go-in-action.pdf
Directory : .
File Size : 6.1 MB
File Modification Date/Time : 2020:04:05 10:18:55+02:00
File Access Date/Time : 2022:08:06 20:55:55+02:00
File Inode Change Date/Time : 2020:04:06 08:13:25+02:00
File Permissions : -rw-r--r--
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.3
Linearized : No
XMP Toolkit : XMP Core 5.4.0
Modify Date : 2016:01:08 14:44:52+02:00
Creator Tool : FrameMaker 8.0
Create Date : 2015:10:30 10:50:29Z
Metadata Date : 2016:01:08 14:44:52+02:00
Producer : PlotSoft PDFill 12.0
Creator : William Kennedy with Brian Ketelsen and Erik St. Martin
Format : application/pdf
Title : Go in Action
Document ID : uuid:644ba8f9-51d7-446d-aeed-55b20802899a
Instance ID : uuid:a5024ca2-1862-4840-833b-8d63ce4fb80a
Page Count : 266
Author : William Kennedy with Brian Ketelsen and Erik St. Martin
You can also use sips
for images:
$ sips -g all go-in-action.pdf
~/go-in-action.pdf
pixelWidth: 1107
pixelHeight: 1388
typeIdentifier: com.adobe.pdf
format: pdf
formatOptions: default
dpiWidth: 150.000
dpiHeight: 150.000
samplesPerPixel: 4
bitsPerSample: 8
hasAlpha: yes
space: RGB
profile: sRGB IEC61966-2.1
Removing the metadata
You can remove extended attributes with the -c
clear flag.
xattr -c go-in-action.pdf
Use exiftool to remove metadata attributes:
$ exiftool -all= go-in-action.pdf
Warning: [minor] ExifTool PDF edits are reversible. Deleted tags may be recovered! - go-in-action.pdf
1 image files updated
Interesting that the changes made are reversible
Verifying metadata removal
$ exiftool go-in-action.pdf
ExifTool Version Number : 12.42
File Name : go-in-action.pdf
Directory : .
File Size : 6.1 MB
File Modification Date/Time : 2022:08:06 21:07:51+02:00
File Access Date/Time : 2022:08:06 21:11:03+02:00
File Inode Change Date/Time : 2022:08:06 21:07:51+02:00
File Permissions : -rw-r--r--
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.3
Linearized : No
Page Count : 266
The extended attribute is still there – maybe dropbox automatically adds it again:
$ xattr -l go-in-action.pdf
com.dropbox.attrs:
00000000 0A 12 0A 10 87 7B DA AB 3A 51 09 70 00 00 00 00 |.....{..:Q.p....|
00000010 00 00 14 84 10 BB E2 C9 B0 01 |..........|
0000001a