Categories
anonimity

Beware sharing PDFs – there is hidden information in them

I needed to upload a pdf to a website. I did not want this PDF traced back to me.

I looked at the file and noticed there was a lot of information that could trace that specific pdf back to me.

Read the Metadata on the file

There are various methods to view meta data and attributes on a file.
One way is to use mac’s: right click a file -> get Info. There you can see some attributes.

Another way is to use pdfinfo:

$ pdfinfo go-in-action.pdf 
Title:          Go in Action
Author:         William Kennedy with Brian Ketelsen and Erik St. Martin
Creator:        FrameMaker 8.0
Producer:       Mac OS X 10.12.4 Quartz PDFContext
CreationDate:   Wed Jun 21 04:28:44 2017 SAST
ModDate:        Wed Jun 21 04:28:44 2017 SAST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          266
Encrypted:      no
Page size:      531 x 666 pts
Page rot:       0
File size:      6075788 bytes
Optimized:      no
PDF version:    1.3

A generic option for all files is mdls (metadata list):

$ mdls go-in-action.pdf 
kMDItemAuthors                     = (
    "William Kennedy with Brian Ketelsen and Erik St. Martin"
)
kMDItemContentCreationDate         = 2020-04-05 08:18:55 +0000
kMDItemContentCreationDate_Ranking = 2020-04-05 00:00:00 +0000
kMDItemContentModificationDate     = 2020-04-05 08:18:55 +0000
kMDItemContentType                 = "com.adobe.pdf"
kMDItemContentTypeTree             = (
    "com.adobe.pdf",
    "public.item",
    "com.adobe.pdf",
    "public.data",
    "public.composite-content",
    "public.content"
)
kMDItemCreator                     = "FrameMaker 8.0"
kMDItemDateAdded                   = 2020-04-06 06:13:25 +0000
kMDItemDateAdded_Ranking           = 2020-04-06 00:00:00 +0000
kMDItemDisplayName                 = "go-in-action.pdf"
kMDItemEncodingApplications        = (
    "Mac OS X 10.12.4 Quartz PDFContext"
)
kMDItemFSContentChangeDate         = 2020-04-05 08:18:55 +0000
kMDItemFSCreationDate              = 2020-04-05 08:18:55 +0000
kMDItemFSCreatorCode               = ""
kMDItemFSFinderFlags               = 0
kMDItemFSHasCustomIcon             = (null)
kMDItemFSInvisible                 = 0
kMDItemFSIsExtensionHidden         = 0
kMDItemFSIsStationery              = (null)
kMDItemFSLabel                     = 0
kMDItemFSName                      = "go-in-action.pdf"
kMDItemFSNodeCount                 = (null)
kMDItemFSOwnerGroupID              = 20
kMDItemFSOwnerUserID               = 501
kMDItemFSSize                      = 6075788
kMDItemFSTypeCode                  = ""
kMDItemInterestingDate_Ranking     = 2020-04-05 00:00:00 +0000
kMDItemKind                        = "PDF document"
kMDItemLogicalSize                 = 6075788
kMDItemNumberOfPages               = 266
kMDItemPageHeight                  = 666
kMDItemPageWidth                   = 531
kMDItemPhysicalSize                = 6078464
kMDItemSecurityMethod              = "None"
kMDItemTitle                       = "Go in Action"
kMDItemVersion                     = "1.3"

mdls gets more attributes than pdfinfo

But that is not all. There are other attributes called extended attributes that you can access with the command line binary xattr:

$ xattr -l go-in-action.pdf 
com.dropbox.attrs:
00000000  0A 12 0A 10 87 7B DA AB 3A 51 09 70 00 00 00 00  |.....{..:Q.p....|
00000010  00 00 14 84 10 BB E2 C9 B0 01                    |..........|
0000001a

The xattrs often contain the FromWhere

A tool called exiftool is also helpful:

$ exiftool go-in-action.pdf 
ExifTool Version Number         : 12.42
File Name                       : go-in-action.pdf
Directory                       : .
File Size                       : 6.1 MB
File Modification Date/Time     : 2020:04:05 10:18:55+02:00
File Access Date/Time           : 2022:08:06 20:55:55+02:00
File Inode Change Date/Time     : 2020:04:06 08:13:25+02:00
File Permissions                : -rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
XMP Toolkit                     : XMP Core 5.4.0
Modify Date                     : 2016:01:08 14:44:52+02:00
Creator Tool                    : FrameMaker 8.0
Create Date                     : 2015:10:30 10:50:29Z
Metadata Date                   : 2016:01:08 14:44:52+02:00
Producer                        : PlotSoft PDFill 12.0
Creator                         : William Kennedy with Brian Ketelsen and Erik St. Martin
Format                          : application/pdf
Title                           : Go in Action
Document ID                     : uuid:644ba8f9-51d7-446d-aeed-55b20802899a
Instance ID                     : uuid:a5024ca2-1862-4840-833b-8d63ce4fb80a
Page Count                      : 266
Author                          : William Kennedy with Brian Ketelsen and Erik St. Martin

You can also use sips for images:

$ sips -g all go-in-action.pdf 
~/go-in-action.pdf
  pixelWidth: 1107
  pixelHeight: 1388
  typeIdentifier: com.adobe.pdf
  format: pdf
  formatOptions: default
  dpiWidth: 150.000
  dpiHeight: 150.000
  samplesPerPixel: 4
  bitsPerSample: 8
  hasAlpha: yes
  space: RGB
  profile: sRGB IEC61966-2.1

Removing the metadata

You can remove extended attributes with the -c clear flag.

xattr -c go-in-action.pdf

Use exiftool to remove metadata attributes:

$ exiftool -all= go-in-action.pdf 
Warning: [minor] ExifTool PDF edits are reversible. Deleted tags may be recovered! - go-in-action.pdf
1 image files updated

Interesting that the changes made are reversible

Verifying metadata removal

$ exiftool go-in-action.pdf
ExifTool Version Number         : 12.42
File Name                       : go-in-action.pdf
Directory                       : .
File Size                       : 6.1 MB
File Modification Date/Time     : 2022:08:06 21:07:51+02:00
File Access Date/Time           : 2022:08:06 21:11:03+02:00
File Inode Change Date/Time     : 2022:08:06 21:07:51+02:00
File Permissions                : -rw-r--r--
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
Page Count                      : 266

The extended attribute is still there – maybe dropbox automatically adds it again:

$ xattr -l go-in-action.pdf
com.dropbox.attrs:
00000000  0A 12 0A 10 87 7B DA AB 3A 51 09 70 00 00 00 00  |.....{..:Q.p....|
00000010  00 00 14 84 10 BB E2 C9 B0 01                    |..........|
0000001a

Sources