ClickHouse
ClickHouse is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).
Column-oriented databases store records in blocks grouped by columns instead of rows, spend less time reading data while completing queries.
OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling.
Reference:
1. ClickHouse
2. Installation
Installation
$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E0C56BD4$ echo "deb http://repo.yandex.ru/clickhouse/deb/stable/ main/" | sudo tee /etc/apt/sources.list.d/clickhouse.list$ sudo apt-get update$ sudo apt-get install -y clickhouse-server clickhouse-client
Server commands
$ sudo service clickhouse-server start # To start server$ sudo service clickhouse-server status # To check status$ sudo service clickhouse-server stop # To stop server
Introduction of Clickhouse
1. To Run client session
$ clickhouse-client
2. Create Database
Syntax:
CREATE DATABASE database_name;
Ex:
CREATE DATABASE myDB;
3. Use Database
USE myDB;
4. Create Table
Syntax:
CREATE TABLE table_name (column_name1 column_type [options],column_name2 column_type [options],…) ENGINE = engine;
Ex:
CREATE TABLE student (id UInt64,name String,marks Float64,created DateTime) ENGINE = MergeTree()ORDER BY id ;
5. Data Types
a. UInt64 : storing integer values in the range 0 to 18446744073709551615.
b. Float64 : storing floating point numbers such as 12349.23, 132.3 etc.
c. String : storing variable length characters (Doesn’t require max length).
d. Date : storing dates in format “YYYY-MM-DD”.
e. DateTime: storing datetime in format “YYYY-MM-DD HH:MM:SS”.
6. INSERT, UPDATE, DELETE Data and Columns
a. INSERT Syntax
INSERT INTO table_name VALUES (column_1_value, column_2_value, ….);
Ex:
INSERT INTO student VALUES (1, “Adam”, 10.5, ‘2019–01–01 00:01:01’);INSERT INTO student VALUES (2, “Bruce”, 40.2, ‘2019–01–03 10:01:01’);INSERT INTO student VALUES (3, “Charlie”, 13, ‘2019–01–03 12:01:01’);INSERT INTO student VALUES (4, “Dome”, 20.8, ‘2019–01–04 02:01:01’);
b. UPDATE Syntax
ALTER TABLE table_name UPDATE column_1 = value_1, column_2 = value_2 … WHERE filter_conditions;
Ex:
ALTER TABLE student UPDATE marks=25.5 WHERE name=’Adam’;
c. DELETE Syntax
ALTER TABLE table_name DELETE WHERE filter_conditions;
Ex:
ALTER TABLE student DELETE WHERE name=’Bruce’;
d. Add Columns
ALTER TABLE table_name ADD COLUMN column_name column_type;
Ex:
ALTER TABLE student ADD COLUMN labs String;
e. Drop Column
ALTER TABLE table_name DROP COLUMN column_name;
Ex:
ALTER TABLE student DROP COLUMN labs;
f. RENAME Table
RENAME TABLE table_name to new_table_name;
Ex:
RENAME TABLE student to students;
7. SELECT Querying Data
a. SELECT Syntax
SELECT func_1(column_1), func_2(column_2) FROM table_name WHERE filter_conditions row_options;
Ex:
SELECT name, marks FROM students WHERE marks > 20 LIMIT 2;
b. Aggregate functions
1. count: returns the count of rows matching the conditions specified.
2. sum: returns the sum of selected column values.
3. avg: returns the average of selected column values.
4. uniq: returns an approximate number of distinct rows matched.
5. uniqExact: returns exact distinct rows matched.(Takes memory to traverse)
6. topK: returns an array of the most frequent values of a specific column using an approximation algorithm.
Ex:
SELECT SUM(marks) FROM students;SELECT topK(2)(name) FROM students;
To Check PROCESSLIST
SHOW processlist
or to get some columns from processlist
SELECT query_id, elapsed time, is_cancelled as canc, read_rows, memory_usage mem, peak_memory_usage peak_mem, query FROM system.processes
IMPORTING CSV file to clickhouse
Syntax:
cat /path/to/file/filename.csv | clickhouse-client — query=’Insert into db_name.table_name FORMAT CSV’;
Ex:
cat /path/to/file/student.csv | clickhouse-client — query=’Insert into myDB.students FORMAT CSV’;
EXPORTING Clickhouse data to CSV file
Syntax:
clickhouse-client — query “SELECT * from db_name.table_name” — format FormatName > result.txt
Ex:
clickhouse-client — query “select * from myDB.students” — format CSV > /path/to/file/student.csv
REMOVING DUPLICATE ROWS
[NOTE: Use Engine as engine=ReplacingMergeTree]
Syntax:
OPTIMIZE TABLE table_name FINAL;
Ex:
create table xx (A String, X UInt64) engine=ReplacingMergeTree order by A;insert into xx values (‘a’, 1);insert into xx values (‘a’, 1);select * from xx;
┌─A─┬─X─┐│ a │ 1 ││ a │ 1 │└───┴───┘optimize table xx final;select * from xx┌─A─┬─X─┐│ a │ 1 │└───┴───┘
INSERTING data from Other tables
Syntax:
INSERT into db_name.table_name select col1, col2, … from db_name.tbl_name limit 100
Ex:
INSERT into myDB.stud select * from myDB.students limit 100;
DROP Table and Databases
Syntax:
DROP TABLE table_name;DROP DATABASE database_name;
Ex:
DROP TABLE students;DROP DATABASE myDB;