Clickhouse is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP). It is usually built for SEARCH queries performance on a very large volume of databases.
For clickhouse installation refer to my previous post about the introduction of clickhouse Intro to ClickHouse with demo
Reference: clickhouse-mysql-data-reader
Installation and Configuration
Before going for database migration, we need to install some library that helps to migrate data smoothly.
Install some python libraries: mysqlclient, mysql-replication, clickhouse-mysql.
Be sure to install all the libraries.
$ sudo apt-get install python3-dev default-libmysqlclient-dev build-essential
$ sudo apt-get install python3-pymysql$ pip3 install mysqlclient
$ pip3 install mysql-replication
$ pip3 install clickhouse-mysql
# If getting error on pymysql, Try to reinstall pymysql
$ pip3 install --user 'pymysql<0.9'
Create MySQL users required by Clickhouse
After successful installation of required libraries. Create the mysql users and give all the permission that is required by the clickhouse to access mysql data. For running the below command, need to be login in mysql terminal.
mysql> CREATE USER 'new_user'@'%' IDENTIFIED BY 'My@123' ;
mysql> CREATE USER 'new_user'@'127.0.0.1' IDENTIFIED BY 'My@123' ;
mysql> CREATE USER 'new_user'@'localhost' IDENTIFIED BY 'My@123' ;
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'new_user'@'%' ;
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'new_user'@'127.0.0.1' ;
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE, SUPER ON *.* TO 'new_user'@'localhost' ;
Make sure to have MySQL settings
After creating the mysql users, do the below changes in the mysql configuration file: my.cnf
server_id = <your id>
binlog_format = ROW
binlog_row_image = FULL# To Check the details, type following commands in MySQL console
mysql> Select @@server_id;
mysql> Select @@binlog_format
mysql> Select @@binlog_row_image
I - Migrate data from MySQL to Clickhouse
Create schema of MySQL table for Clickhouse
For migration of mysql table, we need the table structure from mysql which is used to create the same table structure in clickhouse as per the clickhouse create table syntax.
Below commands will generate the table schema for clickhouse.
Syntax
clickhouse-mysql \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-passowrd> \
--create-table-sql-template \
--with-create-database \
--src-tables=<dbname>.<table-name> > <table-name>.sql
Example:
clickhouse-mysql \
--src-host=127.0.0.1 \
--src-user=new_user \
--src-password=My@123 \
--create-table-sql-template \
--with-create-database \
--src-tables=mysql_DB.my_tbl > my_tbl.sql
The above command is using source host ‘127.0.0.1’ with given credentials to generate the table schema in the my_tbl.sql file.
- Before proceeding next steps, change some schema values like Engine, DataType, and remove params from MergeTree(<param1>,<param2>) as convenient.
Create Table in Clickhouse
After generating the my_tbl.sql file, create a table in clickhouse using the below commands.
Why do we need the same schema structure??
Because during migration or replication of data from mysql to clickhouse, the values should be in proper order, so it will insert the same data in the clickhouse table.
Syntax
clickhouse-client \
--host=<clichouse_host> \
--password=<clickhouse_password> \
-mn < /file/path/to/<table_name>.sqlclickhouse-client \
--host=127.0.0.1 \
--password=My@123 \
-mn < /file/path/to/my_tbl.sql
The above command is creating the clickhouse table using the my_tbl.sql file.
- If facing the error of merge engine, edit the .sql file and add “ORDER BY (<col name>)” after Engine.
Migrate Table data from MySQL to Clickhouse
After successful creation of mysql table in the clickhouse. Migrate the mysql table data into the Clickhouse table.
Syntax:
clickhouse-mysql \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--migrate-table \
--src-tables=<mysql-dbname>.<mysql-table_name> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-schema <clickhouse-dbname> \
--dst-table <clickhouse-table_name> \
--log-file=/file/path/to/<table_name_migt>.log
Example:
clickhouse-mysql \
--src-host=127.0.0.1 \
--src-user=new_user \
--src-password=My@123 \
--migrate-table \
--src-tables=mysql_DB.my_tbl \
--dst-host=127.0.0.1 \
--dst-password=My@123 \
--dst-schema clickhouse_DB \
--dst-table my_tbl \
--log-file=/file/path/to/my_tbl_migt.log
The above command is connecting to mysql on host ‘127.0.0.1’ with given credentials, it will copy the data from mysql_DB.my_tbl to the clickhouse table clickhouse_DB.my_tbl.
Logfile print all logs of migrations.
II - Replicate MySQL Table to Clickhouse
As the data is migrated to clickhouse, but what about the new data in mysql. For that replication of the mysql table, data is needed to be replicated in the clickhouse database.
It's an additional step where any new entry is inserted in mysql, it will automatically reflect in clickhouse if the table is syncing with clickhouse.
The below command is used for syncing the table.
Syntax:
clickhouse-mysql \
--src-server-id=<mysql-server-id> \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--src-tables=<mysql_DB>.<my_tbl> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-schema <clickhouse_DB> \
--dst-table <clickhouse_tbl> \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--log-file=/file/path/to/<table_name_repl>.log
Example:
clickhouse-mysql \
--src-server-id=1 \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=127.0.0.1 \
--src-user=new_user \
--src-password=My@123 \
--src-tables=mysql_DB.my_tbl \
--dst-host=127.0.0.1 \
--dst-password=My@123 \
--dst-schema clickhouse_DB \
--dst-table my_tbl \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--log-file=/file/path/to/my_tbl_repl.log
The above command connects the mysql host 127.0.0.1 with the given credentials and replicates the mysql table mysql_DB.my_tbl in the clickhouse table clickhouse_DB.my_tbl
NOTE: To run this command in the background use & after the command, and for any reason to kill this background command, use grep method to find and kill the process id.
$ sudo ps aux | grep clickhouse
$ Kill -9 <process id>
III - Create Schema & Migrate table in Clickhouse
The scenario of creating a mysql table schema and then migrating the table data in clickhouse, it can be done using only one command.
Syntax:
clickhouse-mysql \
--src-server-id=<mysql-server-id> \
--src-wait \
--nice-pause=1 \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--src-tables=<mysql_DB>.<my_tbl> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-create-table \
--migrate-table
Example:
clickhouse-mysql \
--src-server-id=1 \
--src-wait \
--nice-pause=1 \
--src-host=127.0.0.1 \
--src-user=root \
--src-password=root \
--src-tables=myDB.tbl_dnd \
--dst-host=127.0.0.1 \
--dst-create-table \
--migrate-table
The above command will create a table schema and migrate the table data into clickhouse.
IV - Migration & Replication in One command
If it needs to migrate and replicate the mysql table data to clickhouse using one command, we can follow the below command.
So with the migration of the table data, it will start replicating the new insertion value of the mysql table into the clickhouse table.
Syntax:
clickhouse-mysql \
--src-server-id=<mysql-server-id> \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=<mysql-host> \
--src-user=<mysql-user> \
--src-password=<mysql-password> \
--src-tables=<mysql_DB>.<my_tbl> \
--dst-host=<clickhouse-host> \
--dst-password=<clickhouse-password> \
--dst-schema <clickhouse_DB> \
--dst-table <clickhouse_tbl> \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--migrate-table \
--log-file=/file/path/to/<table_name_migt_repl>.log
Example:
clickhouse-mysql \
--src-server-id=1 \
--src-resume \
--src-wait \
--nice-pause=1 \
--src-host=127.0.0.1 \
--src-user=new_user \
--src-password=My@123 \
--src-tables=mysql_DB.my_tbl \
--dst-host=127.0.0.1 \
--dst-password=My@123 \
--dst-schema clickhouse_DB \
--dst-table my_tbl \
--log-level=info \
--csvpool \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=1000 \
--pump-data \
--migrate-table \
--log-file=/file/path/to/my_tbl_migt_repl.log
Validate the successful data import by counting the records of a table
Running the normal select query in both database mysql & clickhouse,
As mysql select query is giving the result in minutes while in clickhouse it returns within a seconds.
MySQL
mysql> SELECT COUNT(*) FROM mysql_DB.my_tbl\G *************************** 1. row *************************** COUNT(*): 935623858
1 row in set (5 min 17.12 sec)
Clickhouse
ClickHouse-LAB :) SELECT COUNT(*) FROM clickhouse_DB.my_tbl\GSELECT COUNT(*)
FROM clickhouse_DB.my_tblRow 1:
────────────
COUNT(): 9356238581 rows in set. Elapsed: 0.233 sec. Processed 935.62 million rows, 491.10 MB (4.91 billion
rows/s., 4.91 GB/s.)
The above count and timing may vary in the different systems.