All Products
Search
Document Center

E-MapReduce:Read data from and write data to MySQL

Last Updated:Sep 19, 2025

Spark natively supports accessing MySQL through a Java Database Connectivity (JDBC) connector. Serverless Spark automatically loads the MySQL JDBC driver (version 8.0.33) at startup. You can use SQL sessions, batch processing tasks, or Notebooks to connect to MySQL to read and write data.

Prerequisites

Important

Ensure that network connectivity is established between Serverless Spark and MySQL. For configuration details, see Establish network connectivity between EMR Serverless Spark and other VPCs.

Note

When you add security group rules, open the required ports. This example requires you to open TCP port 3306.

Procedure

Method 1: Use an SQL session

  1. Create an SQL session. In Session Manager, you can create a session and select a pre-configured Network Connection. For more information, see Create an SQL session.

  2. In Data Development, you can create a SparkSQL task and use the following SQL statements to test the connection.

    CREATE TEMPORARY VIEW test
    USING org.apache.spark.sql.jdbc
    OPTIONS (
      url 'jdbc:mysql://<jdbc_url>/',
      dbtable '<db>.<table>',
      user '<username>',
      password '<password>'
    );
    
    SELECT * FROM test;

    The following table describes the parameters.

    Parameter

    Description

    url

    The JDBC connection string. The format is jdbc:mysql://<jdbc_url>/. Replace <jdbc_url> with the actual value.

    dbtable

    The name of the database table to read. The format is <db>.<table>. This example uses test_mysql_db.test.

    user

    The username for the MySQL database.

    Note

    The user must have read permissions on the target table.

    password

    The password for the MySQL database.

    If the table content is returned correctly, the connection is successful.

    image

  3. Insert data. Use the following commands to insert data into the MySQL table.

    INSERT INTO test VALUES(4, 'd'),(5, 'e');
    SELECT * FROM test;

    If the inserted data can be queried, the write operation is successful.

    image

Method 2: Use a Notebook session

  1. Create a Notebook session. In Session Manager, you can create a Notebook session and select a pre-configured Network Connection. For more information, see Create a Notebook session.

  2. In Data Development, create an Interactive Development > Notebook task and use the following Python code to test the connection.

    df = spark.read \
      .format("jdbc") \
      .option("url", "jdbc:mysql://<jdbc_url>") \
      .option("dbtable", "<db>.<table>") \
      .option("user", "<username>") \
      .option("password", "<password>") \
      .load()
    df.show()

    If the table content is returned correctly, the connection is successful.

    image

  3. Insert data. Use the following code to insert data into the MySQL table.

    df = spark.createDataFrame([(6, 'f'), (7, 'g')], ["id", "name"])
    df.write \
      .format("jdbc") \
      .mode("append") \
      .option("url", "jdbc:mysql://<jdbc_url>") \
      .option("dbtable", "<db>.<table>") \
      .option("user", "<username>") \
      .option("password", "<password>") \
      .save()
    df.show()
    

    The mode("append") parameter specifies that new data is appended to the target table instead of overwriting or deleting existing data.

    If the inserted data is retrieved successfully, the write operation is successful.

    image

Method 3: Use a Spark batch job

  1. Write the test code. Compile the following Scala code and package it into a JAR file.

    package spark.test
    
    import org.apache.spark.sql.SparkSession
    
    object Main {
      def main(args: Array[String]): Unit = {
        val spark = SparkSession.builder()
          .appName("test")
          .getOrCreate()
    
        val newRows = spark.createDataFrame(Seq((6, "f"), (7, "g"))).toDF("id", "name")
        newRows.write.format("jdbc")
          .mode("append")
          .option("url", "jdbc:mysql://<jdbc_url>")
          .option("dbtable", "<db>.<table>")
          .option("user", "<username>")
          .option("password", "<password>")
          .save()
        
        spark.read.format("jdbc")
          .option("url", "jdbc:mysql://<jdbc_url>")
          .option("dbtable", "<db>.<table>")
          .option("user", "<username>")
          .option("password", "<password>")
          .load()
          .show()
        
        spark.stop()
      }
    }
  2. Create a batch job. In Data Development, create a Batch Job > JAR task. Then, configure the following parameters. For configuration details, see Develop a batch job or streaming job.

    • Main JAR Resource: Select or enter the path to the JAR file.

    • The Main Class is spark.test.Main.

    • Network Connection: Select an existing network connection.

  3. To view the verification results after the task is complete, click Log Investigation in the Run Record section. On the Stdout tab of the Driver Log, you can view the content of the corresponding table in the MySQL database.