Oracle SES 11g Crawler does not Detect Correct Title and is not able to Fetch Additional Attribute (Doc ID 1516967.1)

Last updated on FEBRUARY 02, 2024

Applies to:

Oracle Secure Enterprise Search - Version 11.1.2 and later
Information in this document applies to any platform.

Symptoms

Attempting to crawl and index a web site using Oracle SES 11g. The SES crawler does not seem to find the title or meta tags, also it does not seem to find any links either - so it only indexes the starting page.

On dumping the start of the file it is observed that there is a strange three-byte code at the start of it:

It seems that the no-break space at the beginning of the doc is preventing SES from identifying the file as an HTML file.

It seems like all documents receive a fallback title (DOCTYPE-declaration part of html page), and unusual looking snippets (also from start of document)

Also it is not possible to fetch additional meta data elements from html head section (attributes added on Global Settings-Search Attributes, and then
manually added to crawler definition).

Oracle® Secure Enterprise Search Administrator's Guide
11g Release 2 (11.2.1)
Part Number E17332-04

http://docs.oracle.com/cd/E25054_01/doc.1111/e17332/bisources001.htm#CIAEBJEH

clearly states "You can define additional HTML metatags to map to a String
attribute on the Home - Sources - Metatag Mapping page"

Changes

Cause

	To view full details, sign in with your My Oracle Support account.
	Don't have a My Oracle Support account? Click to get started!

In this Document

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.

Oracle SES 11g Crawler does not Detect Correct Title and is not able to Fetch Additional Attribute (Doc ID 1516967.1)

Applies to:

Symptoms

Changes

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!