Information On How Word Watermarks Are Tagged By Outside In (Doc ID 1114864.1)

Last updated on MAY 02, 2017

Applies to:

Oracle Outside In Technology - Version 8.3.2 and later
Information in this document applies to any platform.

Goal

We want to be able to identify and extract watermark data in Microsoft Word documents. We can see how watermarks are reflected in the generated SearchML output, but there is no full documentation of this in the SearchML schema. Could you provide the relevant information explaining how watermarks are represented and what would be the right way to extract them from the SearchML XML?

To elaborate on specific examples from SearchML output, when enabling the following SearchML flags:

SCCEX_XML_PRODUCEOBJECTINFO
SCCEX_XML_EMBEDDINGS
SCCEX_ANNOTATIONS

...we can see output like the following:

<doc_content type="sub_doc" id="x30000005">
<p/>
...
<doc_content type="frame">
...
<document type="JPEG File Interchange">
<object type="00000000" param1="0003001E" param2="00030007" param3="FFFFFFFF" param4="00000000"/>



Specifically, we want to know:
1a. If there is a watermark in the document?
1b. If so, what type (image vs text)?
2. If a text watermark, what is the text?
3. If image, the image name or some type of identifier
4. For any of the watermarks, are there any comments or metadata on the watermark? If you run export.exe on these documents with watermarks, you see the sub_doc content with some flags, but we need to know how to parse all this information in our XML parser, per the points 1-3 above.
5. Can the id value in the <doc_content> tag be used to uniquely identify a watermark?
6. What are the paramn values in the <object> tag?

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms