OSB 10.3.0.3: File system backup fails with 'Error: NDMP operation failed: data service reported connect error' (Doc ID 1544050.1)

Last updated on APRIL 23, 2013

Applies to:

Oracle Secure Backup - Version 10.3.0.3 and later
Linux x86-64

Symptoms

Different file systems backups, from different clients, fail with the following error in the transcript:

Dumping all files in /usr
Error: NDMP operation failed: data service reported connect error
Error: bytes moved reported by data service and mover differ
Error: data Service reported 0x9ADD0000; Mover reported 0x9ABC0000

 

If the dataset has multiple file systems, some file system backups may complete successfully, and only one file system backup will fail, causing the subsequent file system backups not to start:

Dumping all files in /
....
admin/1054.1:     11:21:24 MNPO: data service halted with reason=connection error
admin/1054.1:     11:21:24 QTOS: received osb_stats message for job admin/1054.1, kbytes 172864, nfiles 2131
admin/1054.1:     11:21:24 SNPD: Data Service reported bytes processed 0x9A70000
admin/1054.1:     11:21:24 SNPD: stopping NDMP data service (to transition to idle state)
admin/1054.1: 11:21:38 await_ndmp_event: sending progress update
admin/1054.1:     11:21:38 SPU: sending progress update
admin/1054.1: 11:21:53 await_ndmp_event timed out waiting for a service to halt, flag is 1
admin/1054.1:     11:21:53 MNPO: data service halted, mover didn't (its state=active), aborting it
admin/1054.1:     11:21:53 ANPO: aborting NDMP mover
admin/1054.1:     11:21:53 MGS:  ms.record_size 65536, ms.record_num 0x997, ms.bytes_moved 0x9970000
admin/1054.1:     11:21:53 ANPO: NDMP states after abort:
admin/1054.1:           11:21:53 mover state halted reason abort request received
admin/1054.1:           11:21:53 data service state halted reason connection error
admin/1054.1: Error: NDMP operation failed: data service reported connect error
admin/1054.1:     11:21:53 BNPC: finished OSB NDMP backup with status 21
admin/1054.1: Error: bytes moved reported by data service and mover differ
admin/1054.1: Error: data Service reported 0x9A70000; Mover reported 0x9970000
admin/1054.1:     11:21:53 QREX: exit status upon entry is 95
admin/1054.1:     11:21:53 RBTR: trouble reporting time used: handle not open (OB library mgr)
admin/1054.1:     11:21:55 QREX: released reservation on tape drive Drive1
admin/1054.1:     11:21:55 QREX: released writable volume reservation on 8f1d5764-486f-1030-9642-5cf3fc09cea8
admin/1054.1:     11:21:55 QREX: released writable volume reservation on 6d31cf5c-4d1c-1030-a42f-5cf3fc09cea8
admin/1054.1:     11:21:55 QREX: [11085] connecting to osb to import and/or delete ascii index file for client osb-client
admin/1054.1:
admin/1054.1: Backup statistics:
admin/1054.1: status 95
....
admin/1054.1: path /boot completed, status 0
admin/1054.1: path / incomplete
admin/1054.1: path /home not started
admin/1054.1: path /var not started
admin/1054.1: path /usr not started
admin/1054.1:     11:21:55 RLYX: exit status 95; checking allocs...
admin/1054.1:     11:21:55 RLYX: from mm__check_all: 1

NOTE: Above transcript created when verbose logging enabled (obtool setp operations/backupoptions -JJvvv).

In the 'obndmpdds.log' on the client, we see the following 'Connection timed out' messages:

2013/01/30.11:21:18 [468] dataWrite:    service data write failed - Connection timed out
2013/01/30.11:21:18 [468] data.buf = 0xDBFC7C0, len = 65536
2013/01/30.11:21:18 [468] svc_dataerror:    sent notify_data_halted to DMA
2013/01/30.11:21:18 [468] svc_datacleanup:    SSL cleanup was successful
2013/01/30.11:21:18 [468] svc_datawrite:    ndmpdsvc_run error or op aborted(state 2)
2013/01/30.11:21:18 [468]     QREX: exit status 93; checking allocs...
2013/01/30.11:21:18 [468]     QREX: from mm__check_all: 1
2013/01/30.11:21:18 [468] svc_datastartbackup:    backup for "/" failed, status 93
2013/01/30.11:21:18 [468] svc_datastop:  bytes processed 0x9A70000
2013/01/30.11:21:52 [468] svc_dataerror:    sent notify_data_halted to DMA

A stop/start of the OSB daemons on the admin server and affected clients did not change the behavior.

Changes

No changes were made to either the admin server or any of the clients.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms