Publisher - Reverse Engineering

Post Eight - 30 June 2007

Okay, I created 1.pub, with 2 text boxes.

box1 says "This ix box one.\n\nThe quick brown fox jumped over the lazy dog\d" This starts at 61440 (170000 Octal) (Yes, it says "ix". The font was tiny, I couldn't see!) And yes, I said "jumped" instead of "jumps". So shoot me.

box2 comes right after it, at 61566 (170176). Between the "g" of "dog" and the "T" of "This is box two", there is just a single byte: 0x0D (13 decimal)

box2 says "This is box two.\n\nThe water in Majorca don't taste like what it oughter\d"

A: Fix the "ix" to "is":

if (i==61452) {
                dub1(infile,outfile);
        }
dub1(){
	char newstr[] = { 's' };
        replace(i,o,newstr,1,1);
}
That worked. is.pub is fixed! isdiffs confirms it. Just one char changed.

Okay, what happens if I overwrite the "dog\nThis" text?

At 61558 (the start of the word "dog"), I insert "qwertyuiop":

        char newstr[] = { 'q',0,'w',0,'e',0,'r',0,'t',0,'y',0,'u',0,'i',0,'o',0,'p',0 };
        replace(i,o,newstr,sizeof(newstr),1);
Publisher refuses to open the file. The text in box1 doesn't end with "0x0D 0x00". If I change the "r" to a 13 (0x0D), it works:
        char newstr[] = { 'q',0,'w',0,'e',0,13,0,'t',0,'y',0,'u',0,'i',0,'o',0,'p',0 };
Box 1 ends "laxy qwe", and Box 2 starts "tyuiops box two." Here is dog.pub. So - Publisher expects the content of a text box to end with 0x000D. 000D doesn't mean the end of the text box, though, because a text box can also contain linebreaks.

Still, that's something learned. It's really just a sanity check, I think, but it's something. We can work on that.

So, let's use Publisher to put some text into "is.pub", in text box1, after the word "dog". We'll say "Saturday", for no better reason that it's a Saturday today. So, box 1 now ends "lazy dog Saturday". is_sat.pub sat_diffs.html tells the tale. Lots of changes.

Again, time runs short, and I can't investigate it any further. Some of the diffs seem to be filename changes again - I thought that I'd eliminated that sort of stuff by doing File/Save, and dealing with filename changes at the OS level, just telling Publisher that it was always the same filename.

Coming back the following day....

A diff between is.pub and sat.pub (okay, I re-saved, I'm sure this is all kosher, though) shows that:
a) grep -oab T.h.i.s is.pub doesn't seem to produce the right results; it says that "This" is at 61242, not at 61440.
b) At 61564, "dog" is followed by " Saturday", so the existing text was moved down; box2's text now starts at 61584, not 61566 as it did before.

So, if we can find what has changed between the two, we should see that box1 used to go from 61440 to 61565, but now extends to 61583, and that box2 used to go from 61566, but now goes from 61584. o7 shows the first change; o7s shows the change after we edit phtml to skip 18 bytes from 61564:

	if (i==61564) {
	printf("\nSkipping - see %s\n", filename);
	  skipB(i,&y,18,b)

The rest of the changes are still looking pretty obscure, though: satdiff2 (created by adding the same skipB() to getdiffs.c)

However, we should be able to find our boxes. If they just use an int, then 61440 is 00,f0,00,00; 61566 is 7e,f0,00,00. If they use long long, it's twice as big (but just more zeroes after it).
So, can we find 00,f0, or - probably easier to spot - 7e,f0? No. If the offset is always 61440, though, can we find 0x7e?
sgpgrep.c is a very simplistic search tool!
Found 0x7e at 980
Found 0x7e at 52200
Found 0x7e at 53302
Found 0x7e at 53716
Found 0x7e at 54112
Found 0x7e at 54508
Found 0x7e at 54904
Found 0x7e at 55304
Found 0x7e at 55506
Found 0x7e at 62480
Found 0x7e at 62988
Now we'll delete the second box, and compare. Nope, exactly the same ones :-( If we do a getdiffs, though, there are lots of differences - again, probably mainly to do with the timestamp and filename. This is getting very frustrating.