PDF file
PDF file format, structure and editing software
PDF structure
Structure
Header:
%PDF-1.3
Body:
3 0 obj
<< / Filter /F1ateDecode / Length 198 >>
stream
...
endstream
Endobj
9 0 obj
<< / Type /Cata10g / Pages 2 0 R >>
Endobj
...
xref:
0 14
0000000000 65535 f
0000000292 00000 n
0000003240 00000 n
0000000022 00000 n
Trailer:
<< / Size 14
/Root 9 0 R
/lnfo 13 0 R
startxref
12937
%%EOF
Operators reference
CATEGORY | OPERATORS | TABLE | PAGE |
---|---|---|---|
General graphics state | w, J, j, M, d, ri, i, gs | 4.7 | |
Special graphics state | q, Q, cm | 4.7 | |
Path construction | m, l, c, v, y, h, re | 4.9 | |
Path painting | S, s, f, F, f*, B, B*, b, b*, n | 4.10 | |
Clipping paths | W, W* | 4.11 | |
Text objects | BT, ET | 5.4 | |
Text state | Tc, Tw, Tz, TL, Tf, Tr, Ts | 5.2 | |
Text positioning | Td, TD, Tm, T* | 5.5 | |
Text showing | Tj, TJ , ', " | 5.6 | |
Type 3 fonts | d0,d1 | 5.10 | |
Color | CS, cs, SC, SCN, sc, scn, G, g, RG, rg, K, k | 4.24 | |
Shading patterns | sh | 4.27 | |
Inline images | BI,ID,EI | 4.42 | |
XObjects | Do | 4.37 | |
Marked content | MP, DP, BMC, BDC, EMC | 10.7 | |
Compatibility | BX, EX | 3.29 |
Text op
BT
/F0 36 Tf
50 706 Td
(Hello, World!) Tj
ET
CID fonts mapping
https://stackoverflow.com/questions/15721846/cidfonts-and-mapping
https://www.toughdev.com/content/2015/02/restoring-text-from-pdf-files-encoded-using-custom-cid-fonts/
QPDF
Decoding
The following command de-compresses all streams and all object streams:
qpdf --qdf --object-streams=disable orig.pdf expanded.pdf
qpdf --stream-data=uncompress --decode-level=all orig.pdf expanded.pdf
Re-compress
qpdf expanded.pdf orig2.pdf
Decrypt
qpdf --password="mypass" --decrypt input.pdf output.pdf