All Products
Search
Document Center

DataHub:OGG for MySQL

Last Updated:Aug 02, 2021

1. Environment requirements

  • MySQL Enterprise Edition or Community Edition is installed.
  • Oracle GoldenGate (OGG) for MySQL 12.2.0.2 is installed.
  • OGG Application Adapters 12.2.0.1 is installed.
  • Java 7 is installed. We recommend that you install Java Development Kit (JDK) 1.8.

2. Installation

The following part describes how to configure MySQL, and install and configure Oracle GoldenGate (OGG). The parameter configurations used in this topic are only for your reference. Use the configurations provided by the O&M engineers for actual business operations.

Prepare the installation packages

You can download the required installation packages by referring to the last section of this topic.

  • OGG for MySQL
  • OGG Application Adapters
  • DataHub agent for OGG

Configure OGG in the source database

1. Configure MySQL in the source database

Enable binary logging

Modify the /etc/my.cnf file to add the following configurations.

## for ogg
# Extract expects that all of the table columns are in the binary log. 
# As a result, only binlog_row_image set as full is supported and this is the default
binlog_row_image=full
# It must be set to the value of ROW, which directs the database to log DML statements in binary format.
binlog_format=row
# prevents DDL logging history table entries in the binlog
binlog-ignore-db=oggddl
# binlog path
log-bin=/usr/local/mysql/logs/log
# binlog index path
log-bin-index=/usr/local/mysql/logs/binlog.index
# enable binlog must set server-id, you can modify if there are multiple services
server-id=1

After the /etc/my.cnf file is modified, restart MySQL.

Create a database and a user

After you connect to MySQL, run the following commands to create a database and a user:

# Create a database named ggtt for OGG test.
create database ggtt;

## Create a user named ogg_test for OGG test.
grant all privileges on ggtt.* to 'ogg_test'@'localhost' identified by 'ogg_test';
flush privileges;

2. Install OGG in the source database

Decompress the OGG for MySQL installation package.

1. Create the required directories

Run the ./ggsci command to start Oracle GoldenGate Software Command Interface (GGSCI) and enter the following command. Unless otherwise specified, the commands in the following parts need to be run in GGSCI.

create subdirs

After the preceding command is run, multiple directories whose names start with dir are created in the OGG home directory.

3. Configure the Manager process in the source database

Run the edit params mgr command to configure the Manager process. The following code shows the configurations:

port 7839
dynamicportlist  7840-7849
purgeoldextracts /home/goldengate/dirdat/*,usecheckpoints, minkeepdays 2

After the Manager process is configured, you can run the view params mgr command to verify the configurations.

Run the start mgr command to start the Manager process.

Run the info mgr command to view the status of the Manager process.

4. Configure the Extract process in the source database.

Run the edit params dhext command to configure the Extract process. dhext is the process name. The process name can be customized, provided that it is the same as that in the configurations. This requirement is not described again in the following parts.

extract dhext
sourcedb ggtt@localhost:3306 userid ogg_test password ogg_test
exttrail dirdat/ms
TranLogOptions AltLogDest /usr/local/mysql/logs/binlog.index
table ggtt.*;

After the configurations are edited, save the configurations. Then, add and start the Extract process.

# Add the Extract process.
add extract dhext,tranlog,begin now

# The maximum file size in each trail is 200 MB.
add exttrail ./dirdat/ms,extract dhext, megabytes 200

start dhext

5. Configure a data pump in the source database

Run the edit params pump command to configure a data pump.

extract pump
passthru
-- enable compress
rmthost  localhost,mgrport 7839,compress
rmttrail  dirdat/ms
table ggtt.*;

After the configurations are edited, save the configurations. Then, add the data pump.

# Add the data pump.
add extract pump,exttrailsource dirdat/ms

# Add a trail in the destination database. The maximum file size in each trail is 200 MB.
add rmttrail dirdat/ms,extract pump,megabytes 200

# When you start the data pump, the destination database is connected. Therefore, you must start the Manager process in the destination database before you start the data pump. Otherwise, an error is reported.
start pump

Configure the destination database

1. Install OGG in the destination database

OGG in the destination database is OGG Application Adapters and does not need to be manually installed. You need only to decompress the installation package. After the decompression, you must start GGSCI and enter the create subdirs command to create the required directories. After the create subdirs command is run, multiple directories whose names start with dir are created in the OGG directory.

2. Install and configure the DataHub plug-in for OGG

Environment requirement: JDK 1.7 or later. Configure the JAVA_HOME and LD_LIBRARY_PATH environmental variables and specify the configurations in the ~/.bash_profile file. Example:

export JAVA_HOME=/xxx/xxx
export JRE_HOME=/xxx/xxx/jrexx
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$JRE_HOME/lib/amd64:$JRE_HOME/lib/amd64/server

After the environment variables are configured, decompress the datahub-ogg-plugin.tar.gz package. Modify the javaue.properties file in the conf subdirectory of the installation directory and replace {YOUR_HOME} with the path after decompression.

Decompress the datahub-ogg-plugin.tar.gz package. Modify the javaue.properties file in the conf subdirectory of the installation directory and replace {YOUR_HOME} with the absolute path after decompression.

gg.handlerlist=ggdatahub
gg.handler.ggdatahub.type=com.aliyun.odps.ogg.handler.datahub.DatahubHandler
gg.handler.ggdatahub.configureFileName={YOUR_HOME}/aliyun-datahub-ogg-plugin/conf/configure.xml
goldengate.userexit.nochkpt=false
goldengate.userexit.timestamp=utc+8
gg.classpath={YOUR_HOME}/aliyun-datahub-ogg-plugin/lib/*
gg.log.level=debug
jvm.bootoptions=-Xmx512m -Dlog4j.configuration=file:{YOUR_HOME}/datahub-ogg-plugin/conf/log4j.properties -Djava.class.path=ggjava/ggjava.jar

Modify the log4j.properties file in the conf subdirectory of the installation directory and replace {YOUR_HOME} with the path after decompression.

log4j.rootLogger=INFO,fileout
log4j.logger.com.goldengate=DEBUG
log4j.logger.com.aliyun=DEBUG

# Console output...
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

# Rolling log file output...
log4j.appender.fileout=org.apache.log4j.RollingFileAppender
log4j.appender.fileout.File={YOUR_HOME}/aliyun-datahub-ogg-plugin/log/datahub-ogg-plugin.log
log4j.appender.fileout.MaxFileSize=10000KB
log4j.appender.fileout.MaxBackupIndex=30
log4j.appender.fileout.layout=org.apache.log4j.PatternLayout
log4j.appender.fileout.layout.conversionPattern=%d{yyyy-MM-dd HH:mm:ss} [%t] %-5p %c{1} - %m%n

Modify the configure.xml file in the conf subdirectory of the installation directory, as shown in the following code:

Parameters to take note of

  • sid: This parameter has no practical significance. You can set this parameter to an arbitrary value.
  • schema and oracleSchema: the name of the MySQL database. In this example, set this parameter to ggtt.
  • oracleTable: the name of the MySQL table.
  • rowIdColumn: You cannot set this parameter because MySQL cannot obtain row IDs.
<?xml version="1.0" encoding="UTF-8"?>
<configue>
    <defaultOracleConfigure>
        <!--Required. You can set this parameter to an arbitrary value.-->
        <sid>100</sid>
        <!--The name of the MySQL database. The configuration can be overwritten by oracleSchema in the mappings. At least one of these two parameters must be specified.-->
        <schema>ogg_test</schema>
    </defaultOracleConfigure>
    <defalutDatahubConfigure>
        <!--Required. The endpoint of DataHub.-->
        <endPoint>YOUR_DATAHUB_ENDPOINT</endPoint>
        <!--The name of the DataHub project. The configuration can be overwritten by datahubProject in the mappings. At least one of these two parameters must be specified.-->
        <project>YOUR_DATAHUB_PROJECT</project>
        <!--The AccessKey ID for accessing DataHub. The configuration can be overwritten by datahubAccessId in the mappings. At least one of these two parameters must be specified.-->
        <accessId>YOUR_DATAHUB_ACCESS_ID</accessId>
        <!--The AccessKey secret for accessing DataHub. The configuration can be overwritten by datahubAccessKey in the mappings. At least one of these two parameters must be specified.-->
        <accessKey>YOUR_DATAHUB_ACCESS_KEY</accessKey>
        <!--The field in DataHub that indicates the data update type. The configuration can be overwritten by ctypeColumn in the mappings.-->
        <ctypeColumn>optype</ctypeColumn>
        <!--The field in DataHub that indicates the data update time. The configuration can be overwritten by ctimeColumn in the mappings.-->
        <ctimeColumn>readtime</ctimeColumn>
        <!--The field in DataHub that indicates the sequence number of the updated record. The configuration can be overwritten by cidColumn in the mappings. The sequence number increases as more records are updated. Multiple sequence numbers may not be consecutive.-->
        <cidColumn>record_id</cidColumn>
    </defalutDatahubConfigure>
    <!--The approach to handling errors. By default, if an error occurs, the system ignores the error and continues the operation or repeatedly retries the operation.-->
    <!--Optional. The maximum number of records that can be managed at a time. Default value: 1000.-->
    <batchSize>1000</batchSize>
    <!--Optional. The format into which the timestamp is converted. Default value: yyyy-MM-dd HH:mm:ss.-->
    <defaultDateFormat>yyyy-MM-dd HH:mm:ss</defaultDateFormat>
    <!--Optional. Specifies whether the system ignores dirty records. Default value: false.-->
    <dirtyDataContinue>true</dirtyDataContinue>
    <!--Optional. The name of the file that stores dirty records. Default value: datahub_ogg_plugin.dirty.-->
    <dirtyDataFile>datahub_ogg_plugin.dirty</dirtyDataFile>
    <!--Optional The maximum size of the file that stores dirty records. Unit: MB. Default value: 500.-->
    <dirtyDataFileMaxSize>200</dirtyDataFileMaxSize>
    <!--Optional. The maximum number of retries allowed. A value of -1 indicates unlimited retries. A value of 0 indicates no retries. A value greater than 0 indicates the specified number of retries.-->
    <retryTimes>0</retryTimes>
    <!--Optional. The IDs of the shards with higher priority into which data is to be written. Example: 0,1.-->
    <shardId>0,1</shardId>
    <!--Optional. The interval between two consecutive retries. Unit: milliseconds. Default value: 3000.-->
    <retryInterval>4000</retryInterval>
    <!--Optional. The name of the checkpoint file. Default value: datahub_ogg_plugin.chk.-->
    <checkPointFileName>datahub_ogg_plugin.chk</checkPointFileName>
    <mappings>
        <mapping>
            <! --The name of the MySQL database. This parameter is described in the preceding part.-->
            <oracleSchema></oracleSchema>
            <!--Required. The name of the MySQL table.-->
            <oracleTable>t_person</oracleTable>
            <!--The name of the DataHub project. This parameter is described in the preceding part.-->
            <datahubProject></datahubProject>
            <!--Required. The name of the DataHub topic.-->
            <datahubTopic>t_person</datahubTopic>
            <ctypeColumn></ctypeColumn>
            <ctimeColumn></ctimeColumn>
            <cidColumn></cidColumn>
            <columnMapping>
                <!--
                src: required. The field names in the Oracle table.
                dest: required. The field names in the DataHub topic.
                destOld: optional. The DataHub topic field that stores the data before update.
                isShardColumn: optional. Specifies whether to generate the shard ID based on the hash key value. The configuration can be overwritten by the shardId parameter. Default value: false.
                isDateFormat: specifies whether to convert the timestamp into a string based on the dateFormat parameter. Default value: true. If you set this parameter to false, the data type in the source database must be LONG.
                dateFormat: The format into which the timestamp is converted. If this parameter is left empty, the default format is used.
                -->
                <column src="id" dest="id" isShardColumn="true"  isDateFormat="false" dateFormat="yyyy-MM-dd HH:mm:ss"/>
                <column src="name" dest="name" isShardColumn="true"/>
                <column src="age" dest="age"/>
                <column src="address" dest="address"/>
                <column src="comments" dest="comments"/>
                <column src="sex" dest="sex"/>
                <column src="temp" dest="temp" destOld="temp1"/>
            </columnMapping>
        </mapping>
    </mappings>
</configue>

3. Configure the Manager process in the destination database

Run the edit params mgr command to configure the Manager process.

port 7839
dynamicportlist 7840-7849
purgeoldextracts dirdat/*,usecheckpoints, minkeepdays 7

Run the start mgr command to start the Manager process.

4. Configure the Extract process in the destination writer

Run the edit params dhwriter command to configure the Extract process.

extract dhwriter
getEnv (JAVA_HOME)
getEnv (LD_LIBRARY_PATH)
getEnv (PATH)
CUSEREXIT libggjava_ue.so CUSEREXIT PASSTHRU INCLUDEUPDATEBEFORES, PARAMS "{YOUR_HOME}/datahub-ogg-plugin/conf/javaue.properties"
GetUpdateBefores
table ggtt.*;

After the configurations are edited, save the configurations. Then, add and start the Extract process.

# Add the Extract process.
add extract dhwriter, exttrailsource dirdat/ms

# Start the Extract process.
start dhwriter

3. Mappings between data types of fields in MySQL and DataHub

The unit of field values of the TIMESTAMP type in DataHub is microseconds. When you map field values of the DATETIME or TIMESTAMP type in MySQL, the values are converted into timestamps in microseconds.

MySQL DataHub Description
CHAR and VARCHAR STRING
INT, TINYINT, and BIG INT BIGINT
DECIMAL, FLOAT, and DOUBLE DOUBLE or DECIMAL The data type is selected based on the required accuracy.
DATETIME and TIMESTAMP TIMESTAMP
TEXT and TINYTEXT STRING

The preceding table describes the mappings between common data types in MySQL and DataHub. For more information about the data types supported by MySQL, see 1.3.5 Supported MySQL Data Types.

4. Installation packages

Only the download links of the DataHub plug-in for OGG are provided in this topic. To download OGG for MySQL and OGG Application Adapters, see Oracle GoldenGate Downloads and Oracle Software Delivery Cloud.

datahub-ogg-plugin OGG for MySQL OGG Adapter Supported MySQL version
2.0.3 12.2.0.2 12.2.0.1 Enterprise Edition and Community Edition
2.0.4 12.2.0.2 12.2.0.1 Enterprise Edition and Community Edition
2.0.5 12.2.0.2 12.2.0.1 Enterprise Edition and Community Edition