本文通過樣本為您介紹,如何快速將資料隨機寫入ClickHouse叢集各個節點的本地表。
前提條件
已建立ClickHouse叢集,詳情請參見建立ClickHouse叢集。
操作步驟
使用SSH方式登入ClickHouse叢集,詳情請參見登入叢集。
執行以下命令,下載官方範例資料集。
curl https://datasets.clickhouse.com/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv執行如下命令,啟動ClickHouse用戶端。
clickhouse-client -h core-1-1 -m說明本樣本登入core-1-1節點,如果您有多個Core節點,可以登入任意一個節點。
執行如下命令,建立資料庫。
可以使用on CLUSTER參數在叢集的所有節點建立資料庫,預設叢集標識為cluster_emr。
CREATE DATABASE IF NOT EXISTS demo on CLUSTER cluster_emr;返回資訊如下所示。

在叢集上的所有節點建立一張複製表(Replicated表)。
複製表(Replicated表)會根據副本的個數,實現資料的多副本,並實現資料的最終一致性。
CREATE TABLE demo.hits_local ON CLUSTER cluster_emr ( `WatchID` UInt64, `JavaEnable` UInt8, `Title` String, `GoodEvent` Int16, `EventTime` DateTime, `EventDate` Date, `CounterID` UInt32, `ClientIP` UInt32, `ClientIP6` FixedString(16), `RegionID` UInt32, `UserID` UInt64, `CounterClass` Int8, `OS` UInt8, `UserAgent` UInt8, `URL` String, `Referer` String, `URLDomain` String, `RefererDomain` String, `Refresh` UInt8, `IsRobot` UInt8, `RefererCategories` Array(UInt16), `URLCategories` Array(UInt16), `URLRegions` Array(UInt32), `RefererRegions` Array(UInt32), `ResolutionWidth` UInt16, `ResolutionHeight` UInt16, `ResolutionDepth` UInt8, `FlashMajor` UInt8, `FlashMinor` UInt8, `FlashMinor2` String, `NetMajor` UInt8, `NetMinor` UInt8, `UserAgentMajor` UInt16, `UserAgentMinor` FixedString(2), `CookieEnable` UInt8, `JavascriptEnable` UInt8, `IsMobile` UInt8, `MobilePhone` UInt8, `MobilePhoneModel` String, `Params` String, `IPNetworkID` UInt32, `TraficSourceID` Int8, `SearchEngineID` UInt16, `SearchPhrase` String, `AdvEngineID` UInt8, `IsArtifical` UInt8, `WindowClientWidth` UInt16, `WindowClientHeight` UInt16, `ClientTimeZone` Int16, `ClientEventTime` DateTime, `SilverlightVersion1` UInt8, `SilverlightVersion2` UInt8, `SilverlightVersion3` UInt32, `SilverlightVersion4` UInt16, `PageCharset` String, `CodeVersion` UInt32, `IsLink` UInt8, `IsDownload` UInt8, `IsNotBounce` UInt8, `FUniqID` UInt64, `HID` UInt32, `IsOldCounter` UInt8, `IsEvent` UInt8, `IsParameter` UInt8, `DontCountHits` UInt8, `WithHash` UInt8, `HitColor` FixedString(1), `UTCEventTime` DateTime, `Age` UInt8, `Sex` UInt8, `Income` UInt8, `Interests` UInt16, `Robotness` UInt8, `GeneralInterests` Array(UInt16), `RemoteIP` UInt32, `RemoteIP6` FixedString(16), `WindowName` Int32, `OpenerName` Int32, `HistoryLength` Int16, `BrowserLanguage` FixedString(2), `BrowserCountry` FixedString(2), `SocialNetwork` String, `SocialAction` String, `HTTPError` UInt16, `SendTiming` Int32, `DNSTiming` Int32, `ConnectTiming` Int32, `ResponseStartTiming` Int32, `ResponseEndTiming` Int32, `FetchTiming` Int32, `RedirectTiming` Int32, `DOMInteractiveTiming` Int32, `DOMContentLoadedTiming` Int32, `DOMCompleteTiming` Int32, `LoadEventStartTiming` Int32, `LoadEventEndTiming` Int32, `NSToDOMContentLoadedTiming` Int32, `FirstPaintTiming` Int32, `RedirectCount` Int8, `SocialSourceNetworkID` UInt8, `SocialSourcePage` String, `ParamPrice` Int64, `ParamOrderID` String, `ParamCurrency` FixedString(3), `ParamCurrencyID` UInt16, `GoalsReached` Array(UInt32), `OpenstatServiceName` String, `OpenstatCampaignID` String, `OpenstatAdID` String, `OpenstatSourceID` String, `UTMSource` String, `UTMMedium` String, `UTMCampaign` String, `UTMContent` String, `UTMTerm` String, `FromTag` String, `HasGCLID` UInt8, `RefererHash` UInt64, `URLHash` UInt64, `CLID` UInt32, `YCLID` UInt64, `ShareService` String, `ShareURL` String, `ShareTitle` String, `ParsedParams` Nested(Key1 String,Key2 String,Key3 String,Key4 String,Key5 String,ValueDouble Float64), `IslandID` FixedString(16), `RequestNum` UInt32, `RequestTry` UInt8 ) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/hits_local', '{replica}') PARTITION BY toYYYYMM(EventDate) ORDER BY (CounterID, EventDate, intHash32(UserID)) SAMPLE BY intHash32(UserID);說明{shard}和{replica}是阿里雲EMR為ClickHouse叢集自動產生的宏定義,可以直接使用。
執行以下命令,建立分布式(Distributed)表。
分布式表不儲存資料,僅僅是底層表的一個View,但可以在多個伺服器上進行分散式查詢。本例中使用隨機函數rand(),表示資料會隨機寫入各個節點的本地表。
CREATE TABLE demo.hits_all on CLUSTER cluster_emr AS demo.hits_local ENGINE = Distributed(cluster_emr, demo, hits_local, rand());退出ClickHouse用戶端,在範例資料的目錄下執行以下命令,匯入資料。
clickhouse-client -h core-1-1 --query "INSERT INTO demo.hits_all FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv;重新啟動ClickHouse用戶端,查看資料。
因為資料是隨機寫入的,各節點資料量可能不同。
查看core-1-1節點demo.hits_all的資料量。
select count(*) from demo.hits_all;查看core-1-1節點demo.hits_local的資料量。
select count(*) from demo.hits_local;查看core-1-2節點demo.hits_local的資料量。
說明其餘節點,您也可以按照以下步驟來查看demo.hits_local的資料量。節點名稱您可以在EMR控制台的節點管理頁面查看。
執行以下命令,登入ClickHouse用戶端。
clickhouse-client -h core-1-2 -m在ClickHouse用戶端,執行以下命令,查看demo.hits_local的資料量。
select count(*) from demo.hits_local;