Not all Expected Content is Crawled due to SES SAXParseException (Doc ID 2054231.1)

Last updated on DECEMBER 12, 2016

Applies to:

Oracle WebCenter Content - Version 11.1.1.8.0 and later
Information in this document applies to any platform.

Symptoms

SES crawls are not crawling and indexing the expected amount of content in the Webcenter Content datafeed.

As an example, the WCC datafeed snapshot added 498 items but the crawl only reports 273 discovered/processed.

The SES crawl log shows the following error:

[2015-08-13T13:59:11.791-04:00] [NOTIFICATION] [] [tid: crawler_2] [ecid: 0000KwcCYKk8Tsb6TJf9EO1LnD_z00000A,0] submitting doc...idcplg?IdcService=GET_FILE&dDocName=UCM_CLUSTER1000256&allowInterrupt=1&Rendition=primaryFile&RevisionSelectionMethod=latestReleased
[2015-08-13T13:59:11.791-04:00] [TRACE:16] [EQG-30309] [tid: crawler_2] [ecid: 0000KwcCYKk8Tsb6TJf9EO1LnD_z00000A,0] [SRC_CLASS: oracle.search.crawler.WebCrawler] [SRC_METHOD: processingPage]  Processing idcplg?IdcService=GET_FILE&dDocName=UCM_CLUSTER1000256&allowInterrupt=1&Rendition=primaryFile&RevisionSelectionMethod=latestReleased
[2015-08-13T13:59:11.792-04:00] [TRACE:16] [] [tid: crawler_2] [ecid: 0000KwcCYKk8Tsb6TJf9EO1LnD_z00000A,0] [SRC_CLASS: oracle.search.crawler.URLAccess] [SRC_METHOD: processUrlEntry] doc owner (guid) =null
[2015-08-13T13:59:11.841-04:00] [TRACE:16] [EQG-40500] [tid: crawler_2] [ecid: 0000KwcCYKk8Tsb6TJf9EO1LnD_z00000A,0] [SRC_CLASS: oracle.search.crawler.URLAccess] [SRC_METHOD: processDocBody]  Filtering document "idcplg?IdcService=GET_FILE&dDocName=UCM_CLUSTER1000256&allowInterrupt=1&Rendition=primaryFile&RevisionSelectionMethod=latestReleased"(URL ID = 198202)
[2015-08-13T13:59:12.149-04:00] [ERROR] [] [tid: Thread-19] [ecid: 0000KwcCXUV8Tsb6TJf9EO1LnD_z000007,0] EQP-60303: Exiting saxthread due to errors
[2015-08-13T13:59:12.150-04:00] [ERROR] [] [tid: Thread-19] [ecid: 0000KwcCXUV8Tsb6TJf9EO1LnD_z000007,0] [[
org.xml.sax.SAXParseException: <Line 17292, Column 17>: XML-20201: (Fatal Error) Expected name instead of <.
at oracle.xml.parser.v2.XMLError.flushErrorHandler(XMLError.java:422)
at oracle.xml.parser.v2.XMLError.flushErrors1(XMLError.java:287)
at oracle.xml.parser.v2.XMLReader.scanNameChars(XMLReader.java:1240)
at oracle.xml.parser.v2.XMLReader.scanQName(XMLReader.java:2069)
at oracle.xml.parser.v2.NonValidatingParser.parseAttr(NonValidatingParser.java:1733)
at oracle.xml.parser.v2.NonValidatingParser.parseAttributes(NonValidatingParser.java:1682)
at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingParser.java:1523)
at oracle.xml.parser.v2.NonValidatingParser.parseRootElement(NonValidatingParser.java:409)
at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingParser.java:355)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:226)
at oracle.xml.jaxp.JXSAXParser.parse(JXSAXParser.java:292)
at oracle.search.plugin.rss.SAXThread.run(SAXThread.java:183)
at java.lang.Thread.run(Thread.java:682)

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms