Topik ini menggunakan dataset OnTime dan Star Schema sebagai contoh untuk menjelaskan cara mengimpor set data uji ke ApsaraDB for ClickHouse dan menjalankan pengujian kinerja.
Prasyarat
Kluster ApsaraDB for ClickHouse telah dibuat. Untuk informasi lebih lanjut, lihat Buat Kluster.
Akun database telah dibuat. Untuk informasi lebih lanjut, lihat Buat Akun Database.
Mesin Linux telah disiapkan dan alamat IP mesin tersebut telah ditambahkan ke daftar putih kluster ApsaraDB for ClickHouse. Untuk informasi lebih lanjut, lihat Konfigurasi Daftar Putih.
clickhouse-client yang sesuai dengan versi kluster ApsaraDB for ClickHouse telah diinstal. Untuk informasi lebih lanjut, lihat clickhouse-client.
Deskripsi pengujian
Semua perintah clickhouse-client dalam pengujian ini dijalankan di direktori tempat clickhouse-client diinstal.
Dataset OnTime dan Star Schema yang digunakan dalam pengujian ini dirujuk dari situs resmi ClickHouse sumber terbuka. Untuk informasi lebih lanjut, lihat Contoh Dataset.
Gunakan dataset OnTime untuk menguji kinerja
1. Hubungkan ke kluster
Jalankan perintah berikut di clickhouse-client untuk terhubung ke kluster ApsaraDB for ClickHouse dalam lingkungan Linux:
./clickhouse-client --host=<Titik akhir kluster> --port=<Nomor port TCP> --user=<Nama pengguna akun database> --password=<Kata sandi akun database>2. Buat tabel
Unduh dan jalankan pernyataan pembuatan tabel berdasarkan edisi kluster ApsaraDB for ClickHouse.
Skrip untuk membuat tabel di kluster ApsaraDB for ClickHouse Edisi Single-replica
Skrip untuk membuat tabel di kluster ApsaraDB for ClickHouse Edisi Double-replica
3. Unduh dataset OnTime
Dataset OnTime mencakup rentang waktu yang sangat besar dan memerlukan waktu lama untuk diunduh. Selain itu, kesalahan kehilangan data dapat muncul selama pengunduhan.
echo https://transtats.bts.gov/PREZIP/On_Time_Reporting_Carrier_On_Time_Performance_1987_present_{1987..2021}_{1..12}.zip | xargs -P10 wget --no-check-certificate --continue4. Impor data ke ApsaraDB for ClickHouse
Jalankan perintah berikut untuk mengonfigurasi variabel lingkungan ck_url, ck_user, ck_pass, dan ck_port serta mengimpor data:
export ck_url=<Titik akhir kluster>
export ck_user=<Nama pengguna akun database>
export ck_pass=<Kata sandi akun database>
export ck_port=<Nomor port TCP>
for i in *.zip; do echo $i; unzip -cq $i '*.csv' | sed 's/\.00//g' | ./clickhouse-client -h $ck_url -u $ck_user --password $ck_pass --port $ck_port --query="INSERT INTO ontime FORMAT CSVWithNames"; done5. Gunakan set data uji
Set data uji dirujuk dari situs resmi ClickHouse sumber terbuka. Untuk informasi lebih lanjut, lihat OnTime.
Q0
SELECT avg(c1)
FROM
(
SELECT Year, Month, count(*) AS c1
FROM ontime
GROUP BY Year, Month
);Q1
SELECT DayOfWeek, count(*) AS c
FROM ontime
WHERE Year>=2000 AND Year<=2008
GROUP BY DayOfWeek
ORDER BY c DESC;Q2
SELECT DayOfWeek, count(*) AS c
FROM ontime
WHERE DepDelay>10 AND Year>=2000 AND Year<=2008
GROUP BY DayOfWeek
ORDER BY c DESC;Q3
SELECT Origin, count(*) AS c
FROM ontime
WHERE DepDelay>10 AND Year>=2000 AND Year<=2008
GROUP BY Origin
ORDER BY c DESC
LIMIT 10;Q4
SELECT Carrier, count(*)
FROM ontime
WHERE DepDelay>10 AND Year=2007
GROUP BY Carrier
ORDER BY count(*) DESC;Q5
set joined_subquery_requires_alias=0;
set any_join_distinct_right_table_keys=1;
SELECT Carrier, c, c2, c*100/c2 as c3
FROM
(
SELECT
Carrier,
count(*) AS c
FROM ontime
WHERE DepDelay>10
AND Year=2007
GROUP BY Carrier
)
ANY INNER JOIN
(
SELECT
Carrier,
count(*) AS c2
FROM ontime
WHERE Year=2007
GROUP BY Carrier
) USING Carrier
ORDER BY c3 DESC;Q6
set joined_subquery_requires_alias=0;
SELECT Carrier, c, c2, c*100/c2 as c3
FROM
(
SELECT
Carrier,
count(*) AS c
FROM ontime
WHERE DepDelay>10
AND Year>=2000 AND Year<=2008
GROUP BY Carrier
)
ANY INNER JOIN
(
SELECT
Carrier,
count(*) AS c2
FROM ontime
WHERE Year>=2000 AND Year<=2008
GROUP BY Carrier
) USING Carrier
ORDER BY c3 DESC;Q7
set joined_subquery_requires_alias=0;
SELECT Year, c1/c2
FROM
(
select
Year,
count(*)*100 as c1
from ontime
WHERE DepDelay>10
GROUP BY Year
)
ANY INNER JOIN
(
select
Year,
count(*) as c2
from ontime
GROUP BY Year
) USING (Year)
ORDER BY Year;Q8
SELECT DestCityName, uniqExact(OriginCityName) AS u
FROM ontime
WHERE Year>=2000 and Year<=2010
GROUP BY DestCityName
ORDER BY u DESC
LIMIT 10;Q9
SELECT Year, count(*) AS c1
FROM ontime
GROUP BY Year;Q10
SELECT
min(Year), max(Year), Carrier, count(*) AS cnt,
sum(ArrDelayMinutes>30) AS flights_delayed,
round(sum(ArrDelayMinutes>30)/count(*),2) AS laju
FROM ontime
WHERE
DayOfWeek NOT IN (6,7) AND OriginState NOT IN ('AK', 'HI', 'PR', 'VI')
AND DestState NOT IN ('AK', 'HI', 'PR', 'VI')
AND FlightDate < '2010-01-01'
GROUP by Carrier
HAVING cnt>100000 and max(Year)>1990
ORDER by laju DESC
LIMIT 1000;Gunakan dataset Star Schema untuk menguji kinerja
1. Hubungkan ke kluster
Jalankan perintah berikut di clickhouse-client untuk terhubung ke kluster ApsaraDB for ClickHouse dalam lingkungan Linux:
./clickhouse-client --host=<Titik akhir kluster> --port=<Nomor port TCP> --user=<Nama pengguna akun database> --password=<Kata sandi akun database>2. Buat tabel
Unduh dan jalankan pernyataan pembuatan tabel berdasarkan edisi kluster ApsaraDB for ClickHouse.
Skrip untuk membuat tabel di kluster ApsaraDB for ClickHouse Edisi Single-replica
Skrip untuk membuat tabel di kluster ApsaraDB for ClickHouse Edisi Double-replica
3. Hasilkan file data
Jalankan perintah Shell berikut untuk mengkloning proyek pembuatan data ssb-dbgen, gunakan Make untuk mengompilasi proyek, dan hasilkan file data. Dalam contoh ini, empat file data dihasilkan: customer.tbl, lineorder.tbl, part.tbl, dan supplier.tbl.
Parameter -s 100 dalam contoh menentukan bahwa sekitar 600 juta baris data dihasilkan setelah perintah dijalankan. Data tersebut berukuran 67 GB. Anda dapat menyesuaikan ukuran data sesuai kebutuhan.
git clone https://github.com/vadimtk/ssb-dbgen.git
cd ssb-dbgen
make
./dbgen -s 100 -T c
./dbgen -s 100 -T l
./dbgen -s 100 -T p
./dbgen -s 100 -T s4. Impor data ke ApsaraDB for ClickHouse
Jalankan perintah berikut untuk mengonfigurasi variabel lingkungan ck_url, ck_user, ck_pass, dan ck_port serta mengimpor data:
export ck_url=<Titik akhir kluster>
export ck_user=<Nama pengguna akun database>
export ck_pass=<Kata sandi akun database>
export ck_port=<Nomor port TCP>
./clickhouse-client -h <Titik akhir kluster> -u <Nama pengguna akun database> --password <Kata sandi akun database> --port <Nomor port TCP> --query "INSERT INTO customer FORMAT CSV" < ssb-dbgen/customer.tbl
./clickhouse-client -h <Titik akhir kluster> -u <Nama pengguna akun database> --password <Kata sandi akun database> --port <Nomor port TCP> --query "INSERT INTO part FORMAT CSV" < ssb-dbgen/part.tbl
./clickhouse-client -h <Titik akhir kluster> -u <Nama pengguna akun database> --password <Kata sandi akun database> --port <Nomor port TCP> --query "INSERT INTO supplier FORMAT CSV" < ssb-dbgen/supplier.tbl
./clickhouse-client -h <Titik akhir kluster> -u <Nama pengguna akun database> --password <Kata sandi akun database> --port <Nomor port TCP> --query "INSERT INTO lineorder FORMAT CSV" < ssb-dbgen/lineorder.tbl 5. Gunakan set data uji
Set data uji dirujuk dari situs resmi ClickHouse sumber terbuka. Untuk informasi lebih lanjut, lihat Star Schema Benchmark.
Q0
INSERT INTO lineorder_flat
SELECT
l.LO_ORDERKEY AS LO_ORDERKEY,
l.LO_LINENUMBER AS LO_LINENUMBER,
l.LO_CUSTKEY AS LO_CUSTKEY,
l.LO_PARTKEY AS LO_PARTKEY,
l.LO_SUPPKEY AS LO_SUPPKEY,
l.LO_ORDERDATE AS LO_ORDERDATE,
l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,
l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,
l.LO_QUANTITY AS LO_QUANTITY,
l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,
l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,
l.LO_DISCOUNT AS LO_DISCOUNT,
l.LO_REVENUE AS LO_REVENUE,
l.LO_SUPPLYCOST AS LO_SUPPLYCOST,
l.LO_TAX AS LO_TAX,
l.LO_COMMITDATE AS LO_COMMITDATE,
l.LO_SHIPMODE AS LO_SHIPMODE,
c.C_NAME AS C_NAME,
c.C_ADDRESS AS C_ADDRESS,
c.C_CITY AS C_CITY,
c.C_NATION AS C_NATION,
c.C_REGION AS C_REGION,
c.C_PHONE AS C_PHONE,
c.C_MKTSEGMENT AS C_MKTSEGMENT,
s.S_NAME AS S_NAME,
s.S_ADDRESS AS S_ADDRESS,
s.S_CITY AS S_CITY,
s.S_NATION AS S_NATION,
s.S_REGION AS S_REGION,
s.S_PHONE AS S_PHONE,
p.P_NAME AS P_NAME,
p.P_MFGR AS P_MFGR,
p.P_CATEGORY AS P_CATEGORY,
p.P_BRAND AS P_BRAND,
p.P_COLOR AS P_COLOR,
p.P_TYPE AS P_TYPE,
p.P_SIZE AS P_SIZE,
p.P_CONTAINER AS P_CONTAINER
FROM lineorder AS l
INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY
INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY
INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY;Q1
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;Q2
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;Q3
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994
AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;Q4
SELECT
sum(LO_REVENUE),
toYear(LO_ORDERDATE) AS year,
P_BRAND
FROM lineorder_flat
WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'
GROUP BY
year,
P_BRAND
ORDER BY
year,
P_BRAND;Q5
SELECT
sum(LO_REVENUE),
toYear(LO_ORDERDATE) AS year,
P_BRAND
FROM lineorder_flat
WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA'
GROUP BY
year,
P_BRAND
ORDER BY
year,
P_BRAND;Q6
SELECT
sum(LO_REVENUE),
toYear(LO_ORDERDATE) AS year,
P_BRAND
FROM lineorder_flat
WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE'
GROUP BY
year,
P_BRAND
ORDER BY
year,
P_BRAND;Q7
SELECT
C_NATION,
S_NATION,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997
GROUP BY
C_NATION,
S_NATION,
year
ORDER BY
year ASC,
revenue DESC;Q8
SELECT
C_CITY,
S_CITY,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997
GROUP BY
C_CITY,
S_CITY,
year
ORDER BY
year ASC,
revenue DESC;Q9
SELECT
C_CITY,
S_CITY,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997
GROUP BY
C_CITY,
S_CITY,
year
ORDER BY
year ASC,
revenue DESC;Q10
SELECT
C_CITY,
S_CITY,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712
GROUP BY
C_CITY,
S_CITY,
year
ORDER BY
year ASC,
revenue DESC;Q11
SELECT
toYear(LO_ORDERDATE) AS year,
C_NATION,
sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')
GROUP BY
year,
C_NATION
ORDER BY
year ASC,
C_NATION ASC;Q12
SELECT
toYear(LO_ORDERDATE) AS year,
S_NATION,
P_CATEGORY,
sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')
GROUP BY
year,
S_NATION,
P_CATEGORY
ORDER BY
year ASC,
S_NATION ASC,
P_CATEGORY ASC;Q13
SELECT
toYear(LO_ORDERDATE) AS year,
S_CITY,
P_BRAND,
sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14'
GROUP BY
year,
S_CITY,
P_BRAND
ORDER BY
year ASC,
S_CITY ASC,
P_BRAND ASC;