Oracle SES 11g Crawler does not Detect Correct Title and is not able to Fetch Additional Attribute
(Doc ID 1516967.1)
Last updated on APRIL 24, 2020
Applies to:Oracle Secure Enterprise Search - Version 11.1.2 and later
Information in this document applies to any platform.
Attempting to crawl and index a web site using Oracle SES 11g. The SES crawler does not seem to find the title or meta tags, also it does not seem to find any links either - so it only indexes the starting page.
On dumping the start of the file it is observed that there is a strange three-byte code at the start of it:
It seems that the no-break space at the beginning of the doc is preventing SES from identifying the file as an HTML file.
It seems like all documents receive a fallback title (DOCTYPE-declaration part of html page), and unusual looking snippets (also from start of document)
Also it is not possible to fetch additional meta data elements from html head section (attributes added on Global Settings-Search Attributes, and then
manually added to crawler definition).
Oracle® Secure Enterprise Search Administrator's Guide
11g Release 2 (11.2.1)
Part Number E17332-04
clearly states "You can define additional HTML metatags to map to a String
attribute on the Home - Sources - Metatag Mapping page"
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document