# TuGraph-DataX > 此文档主要介绍 TuGraph DataX 的安装编译和使用示例 ## 1.简介 TuGraph 在阿里开源的 DataX 基础上添加了 TuGraph 的写插件以及 TuGraph jsonline 数据格式的支持,其他数据源可以通过 DataX 往 TuGraph 里面写数据。 TuGraph DataX 介绍 [https://github.com/TuGraph-family/DataX](https://github.com/TuGraph-family/DataX),支持的功能包括: - 从 MySQL、SQL Server、Oracle、PostgreSQL、HDFS、Hive、HBase、OTS、ODPS、Kafka 等各种异构数据源导入 TuGraph - 将 TuGraph 导入相应的目标源 (待开发) ![导入导出](../../../images/tugraph-datax.png) DataX 原始项目介绍参考 [https://github.com/alibaba/DataX](https://github.com/alibaba/DataX) ## 2.编译安装 ```bash git clone https://github.com/TuGraph-family/DataX.git yum install maven mvn -U clean package assembly:assembly -Dmaven.test.skip=true ``` 编译出来的 DataX 文件在 target 目录下 ## 3. 导入TuGraph ### 3.1.文本数据通过DataX导入TuGraph 我们以 TuGraph 手册中导入工具 lgraph_import 章节举的数据为例子,有三个 csv 数据文件,如下: `actors.csv` ``` nm015950,Stephen Chow nm0628806,Man-Tat Ng nm0156444,Cecilia Cheung nm2514879,Yuqi Zhang ``` `movies.csv` ``` tt0188766,King of Comedy,1999,7.3 tt0286112,Shaolin Soccer,2001,7.3 tt4701660,The Mermaid,2016,6.3 ``` `roles.csv` ``` nm015950,Tianchou Yin,tt0188766 nm015950,Steel Leg,tt0286112 nm0628806,,tt0188766 nm0628806,coach,tt0286112 nm0156444,PiaoPiao Liu,tt0188766 nm2514879,Ruolan Li,tt4701660 ``` 然后建三个 DataX 的 job 配置文件: `job_actors.json` ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": ["actors.csv"], "encoding": "UTF-8", "column": [ { "index": 0, "type": "string" }, { "index": 1, "type": "string" } ], "fieldDelimiter": "," } }, "writer": { "name": "tugraphwriter", "parameter": { "url": "bolt://127.0.0.1:27687", "username": "admin", "password": "73@TuGraph", "graphName": "default", "labelType": "VERTEX", "labelName": "actor", "batchNum": 1000, "properties": ["aid", "name"] } } } ] } } ``` `job_movies.json` ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": ["movies.csv"], "encoding": "UTF-8", "column": [ { "index": 0, "type": "string" }, { "index": 1, "type": "string" }, { "index": 2, "type": "string" }, { "index": 3, "type": "string" } ], "fieldDelimiter": "," } }, "writer": { "name": "tugraphwriter", "parameter": { "url": "bolt://127.0.0.1:27687", "username": "admin", "password": "73@TuGraph", "graphName": "default", "labelType": "VERTEX", "labelName": "movie", "batchNum": 1000, "properties": ["mid", "name", "year", "rate"] } } } ] } } ``` `job_roles.json` ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "txtfilereader", "parameter": { "path": ["roles.csv"], "encoding": "UTF-8", "column": [ { "index": 0, "type": "string" }, { "index": 1, "type": "string" }, { "index": 2, "type": "string" } ], "fieldDelimiter": "," } }, "writer": { "name": "tugraphwriter", "parameter": { "url": "bolt://127.0.0.1:27687", "username": "admin", "password": "73@TuGraph", "graphName": "default", "labelType": "EDGE", "labelName": "play_in", "batchNum": 1000, "properties": ["SRC_ID", "role", "DST_ID"], "startLabel": {"type": "actor", "key": "SRC_ID"}, "endLabel": {"type": "movie", "key": "DST_ID"} } } } ] } } ``` `./lgraph_server -c lgraph_standalone.json -d 'run'` 启动 TuGraph 后依次执行如下三个命令: ``` python3 datax/bin/datax.py job_actors.json ``` ``` python3 datax/bin/datax.py job_movies.json ``` ``` python3 datax/bin/datax.py job_roles.json ``` ### 3.2.MySQL数据通过DataX导入TuGraph 我们在 `test` database 下建立如下电影 `movies` 表 ```sql CREATE TABLE `movies` ( `mid` varchar(200) NOT NULL, `name` varchar(100) NOT NULL, `year` int(11) NOT NULL, `rate` float(5,2) unsigned NOT NULL, PRIMARY KEY (`mid`) ); ``` 往表中插入几条数据 ```sql insert into test.movies (mid, name, year, rate) values ('tt0188766', 'King of Comedy', 1999, 7.3), ('tt0286112', 'Shaolin Soccer', 2001, 7.3), ('tt4701660', 'The Mermaid', 2016, 6.3); ``` 建立一个 DataX 的 job 配置文件 `job_mysql_to_tugraph.json` **配置字段方式** ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "root", "column": ["mid", "name", "year", "rate"], "splitPk": "mid", "connection": [ { "table": ["movies"], "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useSSL=false"] } ] } }, "writer": { "name": "tugraphwriter", "parameter": { "url": "bolt://127.0.0.1:27687", "username": "admin", "password": "73@TuGraph", "graphName": "default", "labelType": "VERTEX", "labelName": "movie", "batchNum": 1000, "properties": ["mid", "name", "year", "rate"] } } } ] } } ``` **写简单 sql 方式** ```json { "job": { "setting": { "speed": { "channel": 1 } }, "content": [ { "reader": { "name": "mysqlreader", "parameter": { "username": "root", "password": "root", "connection": [ { "querySql": [ "select mid, name, year, rate from test.movies where year > 2000;" ], "jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/test?useSSL=false"] } ] } }, "writer": { "name": "tugraphwriter", "parameter": { "url": "bolt://127.0.0.1:27687", "username": "admin", "password": "73@TuGraph", "graphName": "default", "labelType": "VERTEX", "labelName": "movie", "batchNum": 1000, "properties": ["mid", "name", "year", "rate"] } } } ] } } ``` `./lgraph_server -c lgraph_standalone.json -d 'run'` 启动 TuGraph 后执行如下命令: ```shell python3 datax/bin/datax.py job_mysql_to_tugraph.json ``` ## 4.导出TuGraph ### 4.1.配置样例 TuGraph支持使用DataX导出数据,使用如下配置即可将数据导出到文本数据中 ```json { "job": { "setting": { "speed": { "channel":1 } }, "content": [ { "reader": { "name": "tugraphreader", "parameter": { "username": "admin", "password": "73@TuGraph", "graphName": "Movie_8C5C", "queryCypher": "match (n:person) return n.id,n.name,n.born;", "url": "bolt://127.0.0.1:27687" } }, "writer": { "name": "txtfilewriter", "parameter": { "path": "./result", "fileName": "luohw", "writeMode": "truncate" } } } ] } } ``` 使用这个配置文件,可以把TuGraph Movie_8C5C子图中person节点的id,name和born属性全部导出出来, 导出到当前目录下的result目录中,文件名称为luohw+随机后缀。 ### 4.2.参数说明 在使用DataX导出TuGraph数据时,需要将reader设置为tugraphreader并配置以下5个参数: * **url** * 描述:TuGraph的bolt server地址
* 必选:是
* 默认值:无
* **username** * 描述:TuGraph的用户名
* 必选:是
* 默认值:无
* **password** * 描述:TuGraph的密码
* 必选:是
* 默认值:无
* **graphName** * 描述:所选取的需要同步的TuGraph子图
* 必选:是
* 默认值:无
* **queryCypher** * 描述:通过cypher语句读取TuGraph中的数据
* 必选:否
* 默认值:无