This topic describes how to use Spark to access MySQL.

Use Spark RDD to access MySQL

Sample code:
val input = getSparkContext.textFile(inputPath, numPartitions)
    input.flatMap(_.split(" ")).map(x => (x, 1)).reduceByKey(_ + _)
      .mapPartitions(e => {
        var conn: Connection = null
        var ps: PreparedStatement = null
        val sql = s"insert into $tbName(word, count) values (?, ?)"
        try {
          conn = DriverManager.getConnection(s"jdbc:mysql://$dbUrl:$dbPort/$dbName", dbUser, dbPwd)
          ps = conn.prepareStatement(sql)
          e.foreach(pair => {
            ps.setString(1, pair._1)
            ps.setLong(2, pair._2)
            ps.executeUpdate()
          })

          ps.close()
          conn.close()
        } catch {
          case e: Exception => e.printStackTrace()
        } finally {
          if (ps != null) {
            ps.close()
          }
          if (conn != null) {
            conn.close()
          }
        }
      Iterator.empty
    }).count()

Use Spark SQL statements to access MySQL

Sample SQL statement:
spark-sql --jars /opt/apps/SPARK-EXTENSION/spark-extension-current/spark3-emrsdk/*,mysql-connector-java-8.0.30.jar 
Note
  • The mysql-connector-java-8.0.30.jar file contains the MySQL JDBC driver. You must specify the version and path of your MySQL JDBC driver.
  • You can obtain the type of the data source that is used to access MySQL from /opt/apps/SPARK-EXTENSION/spark-extension-current/spark3-emrsdk/*. If your E-MapReduce (EMR) cluster uses Spark 2, you must change spark3 in the preceding statement to spark2.
The following example shows how to create a table and read data from the table.
// Create a table.
create table test1(id int)
using jdbc2
options(
  url="jdbc:mysql://mysql_url/test_db?user=root&password=root",
  dbtable="test1",
  driver="com.mysql.jdbc.Driver");

// Read data from MySQL.
select * from test1;

// Write data to MySQL.
insert into test1 values(1);

References