This topic describes how to use Spark to access MySQL.
Use Spark RDD to access MySQL
Sample code:
val input = getSparkContext.textFile(inputPath, numPartitions)
input.flatMap(_.split(" ")).map(x => (x, 1)).reduceByKey(_ + _)
.mapPartitions(e => {
var conn: Connection = null
var ps: PreparedStatement = null
val sql = s"insert into $tbName(word, count) values (?, ?)"
try {
conn = DriverManager.getConnection(s"jdbc:mysql://$dbUrl:$dbPort/$dbName", dbUser, dbPwd)
ps = conn.prepareStatement(sql)
e.foreach(pair => {
ps.setString(1, pair._1)
ps.setLong(2, pair._2)
ps.executeUpdate()
})
ps.close()
conn.close()
} catch {
case e: Exception => e.printStackTrace()
} finally {
if (ps != null) {
ps.close()
}
if (conn != null) {
conn.close()
}
}
Iterator.empty
}).count()
Use Spark SQL statements to access MySQL
Sample SQL statement:
spark-sql --jars /opt/apps/SPARK-EXTENSION/spark-extension-current/spark3-emrsdk/*,mysql-connector-java-8.0.30.jar
Note
- The
mysql-connector-java-8.0.30.jar
file contains the MySQL JDBC driver. You must specify the version and path of your MySQL JDBC driver. - You can obtain the type of the data source that is used to access MySQL from
/opt/apps/SPARK-EXTENSION/spark-extension-current/spark3-emrsdk/*
. If your E-MapReduce (EMR) cluster uses Spark 2, you must changespark3
in the preceding statement tospark2
.
The following example shows how to create a table and read data from the table.
// Create a table.
create table test1(id int)
using jdbc2
options(
url="jdbc:mysql://mysql_url/test_db?user=root&password=root",
dbtable="test1",
driver="com.mysql.jdbc.Driver");
// Read data from MySQL.
select * from test1;
// Write data to MySQL.
insert into test1 values(1);