Microsoft Publisher - Reverse-Engineering

Part of a series of articles on my work on Decoding The Microsoft Publisher File Format

Publisher - Reverse Engineering

Post Six - 12 June 2007

Post 5 went ultra-simplistic. We're going to get a bit more bold, now.

Raw list of files

From 5, we know to expect some (apparently meaningless) differences between the files - we can ignore these.

Experiment 6a

I just replicated 5, to confirm that the technique works.

Experiment 6b

  • File/New.
  • Set "Measurement Type" to "Pixels"
  • Create Text Box. Put text in it. "Hello, world!"
  • 520px high, 416px wide.
  • 176px from left, 224 from top
  • File/Save.
  • Move text box.
  • 192px from left, 240px from top
  • File/Save (copy as 6a)
  • Move text box.
  • 208px from left, 256px from top
  • File/Save (copy as 6b)
So -the basic difference between 6a and 6b is that the box has moved from:
6a: (192, 240) to
6b: (208, 256) - a difference of:
16, 16
That tidiness is purely coincidental, and actually a bit unfortunate now I come to think about it. Oh well..

It is hard to tell quite what we see here (created by getdiffs.c, slightly modified to mark previously-seen common locations in light grey).

The FF/7F (255/127) blocks must tell us something - the MSB (128 bits) is being toggled (both ways) for some reason or another, at 37513, 37531 (the 3 bytes in between are FF (255) in both files). - see outfile4, and again (other direction) at 38123 (again, the missing 3 bytes are all 255/FF)

We then have lots of bitty little 2- and 3-byte diffs, starting at 58469. Some duplicates, not many. I suspect that the ones at 60964,60965, are "0EF8" (3832) and "0F48" (3912), though I don't know why - it's just a hunch!

There then follows a stream of data which is in 6b, but purely blank in 6a, with a few more one-byte differences where neither are blank.

Site Links