You cannot reference OOXML content via page numbering at the OOXML data level alone.
- Hard page breaks are not the problem; hard page breaks can be counted.
- Soft page breaks are the problem. These are calculated according to
line break and pagination algorithms which are implementation
dependent; it is not intrinsic to the OOXML data. There is nothing
to count.
What about w:lastRenderedPageBreak
, which is a record of the position of a soft page break at the time the document was last rendered? No, w:lastRenderedPageBreak
does not help in general either because:
- By definition,
w:lastRenderedPageBreak
position is stale when content has
been changed since last opened by a program that paginates its
content. - In MS Word’s implementation,
w:lastRenderedPageBreak
is known to be unreliable in various circumstances including- when table spans two pages
- when next page starts with an empty paragraph
- for
multi-column layouts with text boxes starting a new column - for
large images or long sequences of blank lines
If you’re willing to accept a dependence on Word Automation, with all of its inherent licensing and server operation limitations, then you have a chance of determining page boundaries, page numberings, page counts, etc.
Otherwise, the only real answer is to move beyond page-based referencing frameworks that are dependent upon proprietary, implementation-specific pagination algorithms.