在虚拟机上测评了下mysql 和 PostgreSQL 的各种LOAD FILE方式以及时间。
MySQL 工具:
1. 自带mysqlimport工具。
2. 命令行 load data infile ...
3. 利用mysql-connector-python Driver来写的脚本。
PostgreSQL 工具:
1. pgloader 第三方工具。
2. 命令行 copy ... from ...
3. 利用psycopg2写的python 脚本。
测试表结构:
测试CSV文件:
t1.csv
MySQL 自身的loader: (时间24妙)
MySQL python 脚本:(时间23秒)
>>>
Running 23.289 Seconds
MySQL 自带mysqlimport:(时间23秒)
[root@mysql56-master ~]# time mysqlimport t_girl '/tmp/t1.csv' --fields-terminated-by=',' --fields-enclosed-by='"' --lines-terminated-by='rn' --use-threads=2 -uroot -proot
t_girl.t1: Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0
real 0m23.664s
user 0m0.016s
sys 0m0.037s
PostgreSQL 自身COPY:(时间7秒)
t_girl=# copy t1 from '/tmp/t1.csv' with delimiter ',';
COPY 1000000
Time: 7700.332 ms
Psycopg2 驱动copy_to方法:(时间6秒)
[root@postgresql-instance scripts]# python load_data.py
Running 5.969 Seconds.
Pgloader 导入CSV:(时间33秒)
[root@postgresql-instance ytt]# pgloader commands.load
table name read imported errors time
ytt.t1 1000000 1000000 0 33.514s
------------------------------ --------- --------- --------- --------------
------------------------------ --------- --------- --------- --------------
Total import time 1000000 1000000 0 33.514s
Pgloader 直接从MySQL 拉数据:(时间51秒)
[root@postgresql-instance ytt]# pgloader commands.mysql
table name read imported errors time
fetch meta data 2 2 0 0.138s
------------------------------ --------- --------- --------- --------------
t1 1000000 1000000 0 51.136s
------------------------------ --------- --------- --------- --------------
------------------------------ --------- --------- --------- --------------
------------------------------ --------- --------- --------- --------------
Total import time 1000000 1000000 0 51.274s
附上commands.load和commands.mysql
commands.load:
LOAD CSV
FROM '/tmp/ytt.csv' WITH ENCODING UTF-8
(
id, rank, log_time
)
INTO postgresql://t_girl:t_girl@127.0.0.1:5432/t_girl?ytt.t1
WITH skip header = 0,
fields optionally enclosed by '"',
fields escaped by backslash-quote,
fields terminated by ','
SET work_mem to '32 MB', maintenance_work_mem to '64 MB';
commands.mysql:
LOAD DATABASE
FROM mysql://python_user:python_user@192.168.1.131:3306/t_girl?t1
INTO postgresql://t_girl:t_girl@127.0.0.1:5432/t_girl?ytt.t1
with data only
SET maintenance_work_mem to '64MB',
work_mem to '3MB',
search_path to 'ytt';
附pgloader 手册:
http://pgloader.io/howto/pgloader.1.html