All Products
Search
Document Center

DataHub:OGG Application Adapters

Last Updated:Aug 02, 2021

1. Environment requirements

  • Oracle Database 11g is installed in the source database. The current version of the DataHub plug-in for Oracle GoldenGate (OGG) supports only Oracle Database 11g.
  • OGG V12.2.X is installed in the source database. We recommend that you install OGG V12.2.0.2.
  • OGG Application Adapters V12.2.X is installed in the destination database. We recommend that you install OGG Application Adapters V12.2.0.1.
  • Java Development Kit (JDK) V1.8 is installed.

2. Installation

The following part describes how to configure Oracle, and install and configure OGG. The parameter configurations used in this topic are only for your reference. Use the configurations provided by the O&M engineers for actual business operations.

You can download relevant installation packages at the end of this topic.

  • OGG
  • OGG Application Adapters
  • DataHub agent for OGG

Configure OGG in the source database

1. Configure Oracle in the source database

Note: If one of the following configurations is missed, the parameters may be left unspecified before the update operation. Run the following command to start SQL*Plus as the database administrator: sqlplus / as sysdba. Then, complete the following configurations:

# Create a tablespace.
create tablespace ATMV datafile '/home/oracle/u01/app/oracle/oradata/uprr/ATMV.dbf' size 100m autoextend on next 50m maxsize unlimited;
 # Create a user named ogg_test. The password is also set to ogg_test.
create user ogg_test identified by ogg_test default tablespace ATMV;
 # Grant required permissions to the user ogg_test.
grant connect,resource,dba to ogg_test;
 # Check whether supplemental logging is enabled for the database.
Select SUPPLEMENTAL_LOG_DATA_MIN, SUPPLEMENTAL_LOG_DATA_PK, SUPPLEMENTAL_LOG_DATA_UI, SUPPLEMENTAL_LOG_DATA_FK, SUPPLEMENTAL_LOG_DATA_ALL from v$database;
 # If the result is NO, enable supplemental logging.
alter database add supplemental log data;
alter database add supplemental log data (primary key, unique,foreign key) columns;
 # rollback
alter database drop supplemental log data (primary key, unique,foreign key) columns;
alter database drop supplemental log data;
 # Enable logging for all fields at the database level. After this logging mode is enabled, only primary key fields are logged for a delete operation. If you require information about other fields, set the NOCOMPRESSDELETES parameter in the Extract process in the source database.
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;
 # Enable force logging.
alter database force logging;
 # Enable the support for sequences.
@sequence.sql
#
alter table sys.seq$ add supplemental log data (primary key) columns;

2. Install OGG in the source database

Download and decompress the OGG installation package. The following directories are generated:

drwxr-xr-x install
drwxrwxr-x response
-rwxr-xr-x runInstaller
drwxr-xr-x stage

Install dependencies in response/oggcore.rsp. The following template describes the OGG response file:

oracle.install.responseFileVersion=/oracle/install/rspfmt_ogginstall_response_schema_v12_1_2
# The installation option, which indicates the installed Oracle version.
INSTALL_OPTION=ORA11g
# The directory in which OGG is to be installed.
SOFTWARE_LOCATION=/home/oracle/u01/ggate
# Specifies whether to start the Manager process after installation.
START_MANAGER=false
# The port number of the Manager process.
MANAGER_PORT=7839
# The directory of the Oracle database.
DATABASE_LOCATION=/home/oracle/u01/app/oracle/product/11.2.0/dbhome_1
# Optional. The directory that stores the inventory files.
INVENTORY_LOCATION=
# The UNIX group of the inventory directory. In this example, OGG is installed by using the ogg_test Oracle account. You can also create a dedicated account for OGG as needed.
UNIX_GROUP_NAME=oinstall

Run the following command to install OGG:

runInstaller -silent -responseFile {YOUR_OGG_INSTALL_FILE_PATH}/response/oggcore.rsp

In this example, OGG is installed in the /home/oracle/u01/ggate directory, and the installation logs are stored in the /home/oracle/u01/ggate/cfgtoollogs/oui directory. The OGG installation is complete when the following message appears in the silentInstall{time}.log file:

Note

The installation of Oracle GoldenGate Core was successful.

Run the /home/oracle/u01/ggate/ggsci command and enter CREATE SUBDIRS at the command prompt to create OGG directories whose names start with dir. OGG is installed in the source database.

3. Configure the Manager process in the source database

The following part describes how to configure the Manager process in the source database by using Oracle GoldenGate Software Command Interface (GGSCI). You can run the edit params mgr command to configure the Manager process. The following code shows the configurations:

PORT 7839
DYNAMICPORTLIST  7840-7849
USERID ogg_test, PASSWORD ogg_test
PURGEOLDEXTRACTS ./dirdat/*, USECHECKPOINTS, MINKEEPDAYS 7
LAGREPORTHOURS 1
LAGINFOMINUTES 30
LAGCRITICALMINUTES 45
PURGEDDLHISTORY MINKEEPDAYS 3, MAXKEEPDAYS 7
PURGEMARKERHISTORY MINKEEPDAYS 3, MAXKEEPDAYS 7

After the Manager process is configured, you can run the view params mgr command to verify the configurations.

Run the start mgr command to start the Manager process.

Run the info mgr command to view the status of the Manager process.

4. Configure the Extract process in the source database

To configure the Extract process, create a configuration file. You can specify a custom file name. The file is named dhext in this example. In GGSCI, run the edit params dhext command to edit the Extract configuration file.

EXTRACT dhext
SETENV (NLS_LANG="AMERICAN_AMERICA.AL32UTF8")
DBOPTIONS   ALLOWUNUSEDCOLUMN
USERID ogg_test, PASSWORD ogg_test
REPORTCOUNT EVERY 1 MINUTES, RATE
NUMFILES 5000
DISCARDFILE ./dirrpt/ext_test.dsc, APPEND, MEGABYTES 100
DISCARDROLLOVER AT 2:00
WARNLONGTRANS 2h, CHECKINTERVAL 3m
EXTTRAIL ./dirdat/st, MEGABYTES 200
DDL &
INCLUDE MAPPED OBJTYPE 'table' &
INCLUDE MAPPED OBJTYPE 'index' &
INCLUDE MAPPED OBJTYPE 'SEQUENCE' &
EXCLUDE OPTYPE COMMENT
DDLOPTIONS  NOCROSSRENAME  REPORT
TABLE  OGG_TEST.*,tokens (TKN-ROWID=@GETENV('RECORD','rowid'));
SEQUENCE  OGG_TEST.*;
GETUPDATEBEFORES

Note: The TABLE OGG_TEST.*,tokens (TKN-ROWID=@GETENV('RECORD','rowid')); command can be used to obtain the row IDs in the source table. If you do not need to obtain the row IDs, use TABLE OGG_TEST.*; to replace rowid in the command.

Add and start the Extract process.

# Add the Extract process.
add extract dhext,tranlog, begin now

# The maximum file size in each trail is 200 MB.
add exttrail ./dirdat/st,extract dhext, megabytes 200

# Start the Extract process.
start dhext

After the Extract process is started, you can view the changes to the database in the files stored in the ggate/dirdat directory.

5. Create a DEFGEN parameter file

If OGG V12.2.X or later is used, you can skip this step.

Start GGSCI in the source database. In GGSCI, run the following command to create a DEFGEN parameter file and copy the file to the dirdef directory in the destination database: edit params defgen.

DEFSFILE ./dirdef/ogg_test.def
USERID ogg_test, PASSWORD ogg_test
table OGG_TEST.*;

Run the following command in a shell to create the DEFGEN parameter file named ogg_test.def: ./defgen paramfile ./dirprm/defgen.prm.

6. Configure a data pump in the source database

Start GGSCI in the source database. In GGSCI, run the following command to configure a data pump: edit params pump.

EXTRACT pump
RMTHOST xx.xx.xx.xx, MGRPORT 7839, COMPRESS
PASSTHRU
NUMFILES 5000
RMTTRAIL ./dirdat/st
DYNAMICRESOLUTION
TABLE      OGG_TEST.*;
SEQUENCE  OGG_TEST.*;

Add and start a data pump.

# Add a data pump.
add extract pump,exttrailsource ./dirdat/st

# Add a trail in the destination database. The maximum file size in each trail is 200 MB.
add rmttrail ./dirdat/st,extract pump,megabytes 200

# Before you start the data pump, make sure that the Manager process in the destination database is started.
start pump

After the data pump is started, you can view the delivered trail in the dirdat directory in the destination database.

Configure OGG in the destination database

1. Install OGG in the destination database

OGG in the destination database is OGG Application Adapters and does not need to be manually installed. You need only to decompress the installation package. After the decompression, you must start GGSCI and enter the create subdirs command to create the required directories. After the create subdirs command is run, multiple directories whose names start with dir are created in the OGG directory.

Copy the DEFGEN parameter file generated in the source database to the dirdef directory in the destination database unless OGG V12.2.X or later is used.

2. Install and configure the DataHub plug-in for OGG

Environment requirement: JDK 1.7 or later. Configure the JAVA_HOME and LD_LIBRARY_PATH environment variables and specify the configurations in the ~/.bash_profile file. Example:

export JAVA_HOME=/xxx/xxx
export JRE_HOME=/xxx/xxx/jrexx
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$JRE_HOME/lib/amd64:$JRE_HOME/lib/amd64/server

After the environment variables are configured, download and decompress the datahub-ogg-plugin.tar.gz package. Modify the javaue.properties file in the conf subdirectory of the installation directory and replace {YOUR_HOME} with the path after decompression.

gg.handlerlist=ggdatahub
gg.handler.ggdatahub.type=com.aliyun.odps.ogg.handler.datahub.DatahubHandler
gg.handler.ggdatahub.configureFileName={YOUR_HOME}/aliyun-datahub-ogg-plugin/conf/configure.xml
goldengate.userexit.nochkpt=false
goldengate.userexit.timestamp=utc+8
gg.classpath={YOUR_HOME}/aliyun-datahub-ogg-plugin/lib/*
gg.log.level=debug
jvm.bootoptions=-Xmx512m -Dlog4j.configuration=file:{YOUR_HOME}/datahub-ogg-plugin/conf/log4j.properties -Djava.class.path=ggjava/ggjava.jar

Modify the log4j.properties file in the conf subdirectory of the installation directory and replace {YOUR_HOME} with the path after decompression. Modify the configure.xml file in the conf subdirectory of the installation directory, as shown in the following code and parameter descriptions:

<?xml version="1.0" encoding="UTF-8"?>
<configue>
    <defaultOracleConfigure>
        <!--Required. The Oracle database system identifier (SID).-->
        <sid>100</sid>
        <!--The schema of the Oracle table. The configuration can be overwritten by oracleSchema in the mappings. At least one of these two parameters must be specified.-->
        <schema>ogg_test</schema>
    </defaultOracleConfigure>
    <defalutDatahubConfigure>
        <!--Required. The endpoint of DataHub.-->
        <endPoint>YOUR_DATAHUB_ENDPOINT</endPoint>
        <!--The name of the DataHub project. The configuration can be overwritten by datahubProject in the mappings. At least one of these two parameters must be specified.-->
        <project>YOUR_DATAHUB_PROJECT</project>
        <!--The AccessKey ID for accessing DataHub. The configuration can be overwritten by datahubAccessId in the mappings. At least one of these two parameters must be specified.-->
        <accessId>YOUR_DATAHUB_ACCESS_ID</accessId>
        <!--The AccessKey secret for accessing DataHub. The configuration can be overwritten by datahubAccessKey in the mappings. At least one of these two parameters must be specified.-->
        <accessKey>YOUR_DATAHUB_ACCESS_KEY</accessKey>
        <!--The field in DataHub that indicates the data update type. The configuration can be overwritten by ctypeColumn in the mappings.-->
        <ctypeColumn>optype</ctypeColumn>
        <!--The field in DataHub that indicates the data update time. The configuration can be overwritten by ctimeColumn in the mappings.-->
        <ctimeColumn>readtime</ctimeColumn>
        <!--The field in DataHub that indicates the sequence number of the updated record. The configuration can be overwritten by cidColumn in the mappings. The sequence number increases as more records are updated. Multiple sequence numbers may not be consecutive.-->
        <cidColumn>record_id</cidColumn>
    </defalutDatahubConfigure>
    <!--The approach to handling errors. By default, if an error occurs, the system ignores the error and continues the operation or repeatedly retries the operation.-->
    <!--Optional. The maximum number of records that can be managed at a time. Default value: 1000.-->
    <batchSize>1000</batchSize>
    <!--Optional. The format into which the timestamp is converted. Default value: yyyy-MM-dd HH:mm:ss.-->
    <defaultDateFormat>yyyy-MM-dd HH:mm:ss</defaultDateFormat>
    <!--Optional. Specifies whether the system ignores dirty records. Default value: false.-->
    <dirtyDataContinue>true</dirtyDataContinue>
    <!--Optional. The name of the file that stores dirty records. Default value: datahub_ogg_plugin.dirty.-->
    <dirtyDataFile>datahub_ogg_plugin.dirty</dirtyDataFile>
    <!--Optional The maximum size of the file that stores dirty records. Unit: MB. Default value: 500.-->
    <dirtyDataFileMaxSize>200</dirtyDataFileMaxSize>
    <!--Optional. The maximum number of retries allowed. A value of -1 indicates unlimited retries. A value of 0 indicates no retries. A value greater than 0 indicates the specified number of retries.-->
    <retryTimes>0</retryTimes>
    <!--Optional. The IDs of the shards with higher priority into which data is to be written. Example: 0,1.-->
    <shardId>0,1</shardId>
    <!--Optional. The interval between two consecutive retries. Unit: milliseconds. Default value: 3000.-->
    <retryInterval>4000</retryInterval>
    <!--Optional. The name of the checkpoint file. Default value: datahub_ogg_plugin.chk.-->
    <checkPointFileName>datahub_ogg_plugin.chk</checkPointFileName>
    <mappings>
        <mapping>
            <!--The schema of the Oracle table. This parameter is described in the preceding part.-->
            <oracleSchema></oracleSchema>
            <!--Required. The name of the Oracle table.-->
            <oracleTable>t_person</oracleTable>
            <!--The name of the DataHub project. This parameter is described in the preceding part.-->
            <datahubProject></datahubProject>
            <!--Required. The name of the DataHub topic.-->
            <datahubTopic>t_person</datahubTopic>
            <!--Optional. The field in DataHub that indicates the rows in the source table.-->
            <rowIdColumn></rowIdColumn>
            <ctypeColumn></ctypeColumn>
            <ctimeColumn></ctimeColumn>
            <cidColumn></cidColumn>
            <columnMapping>
                <!--
                src: required. The field names in the Oracle table.
                dest: required. The field names in the DataHub topic.
                destOld: optional. The DataHub topic field that stores the data before update.
                isShardColumn: optional. Specifies whether to generate the shard ID based on the hash key value. The configuration can be overwritten by the shardId parameter. Default value: false.
                isDateFormat: specifies whether to convert the timestamps to the format specified by the dateFormat parameter. Default value: true. If you set this parameter to false, the data type of the timestamps in the source database must be LONG.
                dateFormat: The format to which the timestamps are converted. If this parameter is left empty, the default format is used.
                -->
                <column src="id" dest="id" isShardColumn="true"  isDateFormat="false" dateFormat="yyyy-MM-dd HH:mm:ss"/>
                <column src="name" dest="name" isShardColumn="true"/>
                <column src="age" dest="age"/>
                <column src="address" dest="address"/>
                <column src="comments" dest="comments"/>
                <column src="sex" dest="sex"/>
                <column src="temp" dest="temp" destOld="temp1"/>
            </columnMapping>
        </mapping>
    </mappings>
</configue>

3. Configure the Manager process in the destination database

Run the edit params mgr command to configure the Manager process.

PORT 7839
DYNAMICPORTLIST  7840-7849
PURGEOLDEXTRACTS ./dirdat/*, USECHECKPOINTS, MINKEEPDAYS 7
LAGREPORTHOURS 1
LAGINFOMINUTES 30
LAGCRITICALMINUTES 45
PURGEDDLHISTORY MINKEEPDAYS 3, MAXKEEPDAYS 7
PURGEMARKERHISTORY MINKEEPDAYS 3, MAXKEEPDAYS 7

Run the start mgr command to start the Manager process.

4. Configure the Extract process in the destination database

In GGSCI, run the edit params dhwriter command to configure the Extract process.

extract dhwriter
getEnv (JAVA_HOME)
getEnv (LD_LIBRARY_PATH)
getEnv (PATH)
CUSEREXIT ./libggjava_ue.so CUSEREXIT PASSTHRU INCLUDEUPDATEBEFORES, PARAMS "{YOUR_HOME}/datahub-ogg-plugin/conf/javaue.properties"
sourcedefs ./dirdef/ogg_test.def
table OGG_TEST.*;

Add and start the Extract process.

# Add the Extract process.
add extract dhwriter, exttrailsource ./dirdat/st

# Start the Extract process.
start dhwriter

3. Sample logs and sample data

Sample logs

Plug-ins are not involved in the source database. Logs in the destination database can be categorized into the following two types:

  1. DataHub plug-in logs: By default, each Replicat process has a log in the log subdirectory of the plug-in directory, and the log name is datahub-ogg-plugin.log. You can resolve issues related to plug-ins based on this type of logs.
  2. OGG-related logs: By default, each Replicat process has a log in the dirrpt directory. The log name is processName.rpt. For example, the log name for the DataHub writer is DHWRITER.rpt. You can resolve issues related to OGG based on this type of logs.

Sample data

After the sample data is written to DataHub, the following record is generated in the DataHub plug-in log:

2020-12-01 11:29:30.000461 [record_writer-1-thread-1] INFO  ShardWriter - Write DataHub success, table: orders, topic: orders_new, shard: 0, recordNum: 1, rt: 3

You can view the following sample record in the DataHub console:

Shard ID    System Time    oid (STRING)    num (STRING)    pid (STRING)    bak (STRING)    ctype (STRING)    ctime (STRING)    cid (STRING)    rowid (STRING)
0    December 1, 2020 11:29:30    1    3    2    zh    I    2020-12-01 03:29:24.000074    16067933700000    AAAWwyAAGAAABufAAC