index.html 18 KB


  1. <!doctype html>
  2. <html>
  3. <head>
  4. <meta charset="utf-8">
  5. <meta http-equiv="X-UA-Compatible" content="chrome=1">
  6. <title>Canal by alibaba</title>
  7. <link rel="stylesheet" href="stylesheets/styles.css">
  8. <link rel="stylesheet" href="stylesheets/pygment_trac.css">
  9. <script src="javascripts/scale.fix.js"></script>
  10. <meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
  11. <!--[if lt IE 9]>
  12. <script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
  13. <![endif]-->
  14. </head>
  15. <body>
  16. <div class="wrapper">
  17. <header>
  18. <h1 class="header">Canal</h1>
  19. <p class="header">阿里巴巴mysql数据库binlog的增量订阅&amp;消费组件</p>
  20. <ul>
  21. <li class="download"><a class="buttons" href="https://github.com/alibaba/canal/zipball/master">Download ZIP</a></li>
  22. <li class="download"><a class="buttons" href="https://github.com/alibaba/canal/tarball/master">Download TAR</a></li>
  23. <li><a class="buttons github" href="https://github.com/alibaba/canal">View On GitHub</a></li>
  24. </ul>
  25. <p class="header">This project is maintained by <a class="header name" href="https://github.com/alibaba">alibaba</a></p>
  26. </header>
  27. <section>
  28. <div>
  29. <div>
  30. <p> </p>
  31. <h1>背景</h1>
  32. <p> 早期,阿里巴巴B2B公司因为存在杭州和美国双机房部署,存在跨机房同步的业务需求。不过早期的数据库同步业务,主要是基于trigger的方式获取增量变更,不过从2010年开始,阿里系公司开始逐步的尝试基于数据库的日志解析,获取增量变更进行同步,由此衍生出了增量订阅&amp;消费的业务,从此开启了一段新纪元。ps. 目前内部使用的同步,已经支持mysql5.x和oracle部分版本的日志解析</p>
  33. <p> </p>
  34. <p>基于日志增量订阅&amp;消费支持的业务:</p>
  35. <ol>
  36. <li>数据库镜像</li>
  37. <li>数据库实时备份</li>
  38. <li>多级索引 (卖家和买家各自分库索引)</li>
  39. <li>search build</li>
  40. <li>业务cache刷新</li>
  41. <li>价格变化等重要业务消息</li>
  42. </ol>
  43. <h1>项目介绍</h1>
  44. <p> 名称:canal [kə'næl]</p>
  45. <p> 译意: 水道/管道/沟渠 </p>
  46. <p> 语言: 纯java开发</p>
  47. <p> 定位: 基于数据库增量日志解析,提供增量数据订阅&amp;消费,目前主要支持了mysql</p>
  48. <p> </p>
  49. <h2>工作原理</h2>
  50. <h3>mysql主备复制实现</h3>
  51. <p><img src="http://dl.iteye.com/upload/attachment/0080/3086/468c1a14-e7ad-3290-9d3d-44ac501a7227.jpg" alt=""><br> 从上层来看,复制分成三步:</p>
  52. <ol>
  53. <li>master将改变记录到二进制日志(binary log)中(这些记录叫做二进制日志事件,binary log events,可以通过show binlog events进行查看);</li>
  54. <li>slave将master的binary log events拷贝到它的中继日志(relay log);</li>
  55. <li>slave重做中继日志中的事件,将改变反映它自己的数据。</li>
  56. </ol>
  57. <h3>canal的工作原理:</h3>
  58. <p><img width="590" src="http://dl.iteye.com/upload/attachment/0080/3107/c87b67ba-394c-3086-9577-9db05be04c95.jpg" alt="" height="273"></p>
  59. <p>原理相对比较简单:</p>
  60. <ol>
  61. <li>canal模拟mysql slave的交互协议,伪装自己为mysql slave,向mysql master发送dump协议</li>
  62. <li>mysql master收到dump请求,开始推送binary log给slave(也就是canal)</li>
  63. <li>canal解析binary log对象(原始为byte流)</li>
  64. </ol>
  65. <h1>架构</h1>
  66. <p><img width="548" src="http://dl.iteye.com/upload/attachment/0080/3126/49550085-0cd2-32fa-86a6-f676db5b597b.jpg" alt="" height="238"></p>
  67. <p>说明:</p>
  68. <ul>
  69. <li>server代表一个canal运行实例,对应于一个jvm</li>
  70. <li>instance对应于一个数据队列 (1个server对应1..n个instance)</li>
  71. </ul>
  72. <p>instance模块:</p>
  73. <ul>
  74. <li>eventParser (数据源接入,模拟slave协议和master进行交互,协议解析)</li>
  75. <li>eventSink (Parser和Store链接器,进行数据过滤,加工,分发的工作)</li>
  76. <li>eventStore (数据存储)</li>
  77. <li>metaManager (增量订阅&amp;消费信息管理器)</li>
  78. </ul>
  79. <h3>数据对象格式:<a href="https://github.com/otter-projects/canal/blob/master/protocol/src/main/java/com/alibaba/otter/canal/protocol/EntryProtocol.proto">EntryProtocol.proto</a>
  80. </h3>
  81. <pre name="code">Entry
  82. Header
  83. logfileName [binlog文件名]
  84. logfileOffset [binlog position]
  85. executeTime [发生的变更]
  86. schemaName
  87. tableName
  88. eventType [insert/update/delete类型]
  89. entryType [事务头BEGIN/事务尾END/数据ROWDATA]
  90. storeValue [byte数据,可展开,对应的类型为RowChange]
  91. RowChange
  92. isDdl [是否是ddl变更操作,比如create table/drop table]
  93. sql [具体的ddl sql]
  94. rowDatas [具体insert/update/delete的变更数据,可为多条,1个binlog event事件可对应多条变更,比如批处理]
  95. beforeColumns [Column类型的数组]
  96. afterColumns [Column类型的数组]
  97. Column
  98. index
  99. sqlType [jdbc type]
  100. name [column name]
  101. isKey [是否为主键]
  102. updated [是否发生过变更]
  103. isNull [值是否为null]
  104. value [具体的内容,注意为文本]</pre>
  105. <p>说明:</p>
  106. <ul>
  107. <li>可以提供数据库变更前和变更后的字段内容,针对binlog中没有的name,isKey等信息进行补全</li>
  108. <li>可以提供ddl的变更语句</li>
  109. </ul>
  110. <h1>QuickStart</h1>
  111. <h2>几点说明:(mysql初始化)</h2>
  112. <p>a. canal的原理是基于mysql binlog技术,所以这里一定需要开启mysql的binlog写入功能,并且配置binlog模式为row. </p>
  113. <pre name="code">[mysqld]
  114. log-bin=mysql-bin #添加这一行就ok
  115. binlog-format=ROW #选择row模式
  116. server_id=1 #配置mysql replaction需要定义,不能和canal的slaveId重复</pre>
  117. b. canal的原理是模拟自己为mysql slave,所以这里一定需要做为mysql slave的相关权限.</div>
  118. <div>
  119. <pre name="code">CREATE USER canal IDENTIFIED BY 'canal';
  120. GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
  121. -- GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' ;
  122. FLUSH PRIVILEGES;</pre>
  123. <p>针对已有的账户可通过grants查询权限:</p>
  124. <h2>启动步骤:</h2>
  125. <p>1. 下载canal</p>
  126. <p>下载部署包</p>
  127. <pre name="code">wget http://canal4mysql.googlecode.com/files/canal.deployer-1.0.0.tar.gz</pre>
  128. <p>or </p>
  129. <p>自己编译 </p>
  130. <pre name="code">git clone git@github.com:otter-projects/canal.git
  131. cd canal;
  132. mvn clean install -Dmaven.test.skip -Denv=release</pre>
  133. <p> 编译完成后,会在根目录下产生target/canal.deployer-$version.tar.gz </p>
  134. <p> </p>
  135. <p>2. 解压缩</p>
  136. <pre name="code">mkdir /tmp/canal
  137. tar zxvf canal.deployer-1.0.0.tar.gz -C /tmp/canal</pre>
  138. <p> </p>
  139. <p> 解压完成后,进入/tmp/canal目录,可以看到如下结构:</p>
  140. <p> </p>
  141. <pre name="code">drwxr-xr-x 2 jianghang jianghang 136 2013-02-05 21:51 bin
  142. drwxr-xr-x 4 jianghang jianghang 160 2013-02-05 21:51 conf
  143. drwxr-xr-x 2 jianghang jianghang 1.3K 2013-02-05 21:51 lib
  144. drwxr-xr-x 2 jianghang jianghang 48 2013-02-05 21:29 logs</pre>
  145. <p> </p>
  146. <p>3. 配置修改</p>
  147. <p> </p>
  148. <p>公用参数: </p>
  149. <pre name="code">vi conf/canal.properties</pre>
  150. <pre name="code">#################################################
  151. ######### common argument #############
  152. #################################################
  153. canal.id= 1
  154. canal.address=
  155. canal.port= 11111
  156. canal.zkServers=
  157. # flush data to zk
  158. canal.zookeeper.flush.period = 1000
  159. ## memory store RingBuffer size, should be Math.pow(2,n)
  160. canal.instance.memory.buffer.size = 32768
  161. ## detecing config
  162. canal.instance.detecting.enable = false
  163. canal.instance.detecting.sql = insert into retl.xdual values(1,now()) on duplicate key update x=now()
  164. canal.instance.detecting.interval.time = 3
  165. canal.instance.detecting.retry.threshold = 3
  166. canal.instance.detecting.heartbeatHaEnable = false
  167. # support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery
  168. canal.instance.transactionn.size = 1024
  169. # network config
  170. canal.instance.network.receiveBufferSize = 16384
  171. canal.instance.network.sendBufferSize = 16384
  172. canal.instance.network.soTimeout = 30
  173. #################################################
  174. ######### destinations #############
  175. #################################################
  176. canal.destinations= example
  177. canal.instance.global.mode = spring
  178. canal.instance.global.lazy = true ##修改为false,代表立马启动
  179. #canal.instance.global.manager.address = 127.0.0.1:1099
  180. canal.instance.global.spring.xml = classpath:spring/memory-instance.xml
  181. #canal.instance.global.spring.xml = classpath:spring/default-instance.xml</pre>
  182. <p> </p>
  183. <p>应用参数:</p>
  184. <pre name="code">vi conf/example/instance.properties</pre>
  185. <pre name="code">#################################################
  186. ## mysql serverId
  187. canal.instance.mysql.slaveId = 1234
  188. # position info
  189. canal.instance.master.address = 127.0.0.1:3306 #改成自己的数据库地址
  190. canal.instance.master.journal.name =
  191. canal.instance.master.position =
  192. canal.instance.master.timestamp =
  193. #canal.instance.standby.address =
  194. #canal.instance.standby.journal.name =
  195. #canal.instance.standby.position =
  196. #canal.instance.standby.timestamp =
  197. # username/password
  198. canal.instance.dbUsername = retl #改成自己的数据库信息
  199. canal.instance.dbPassword = retl #改成自己的数据库信息
  200. canal.instance.defaultDatabaseName = #改成自己的数据库信息
  201. canal.instance.connectionCharsetNumber = 33 #改成自己的数据库信息
  202. canal.instance.connectionCharset = UTF-8 #改成自己的数据库信息
  203. # table regex
  204. canal.instance.filter.regex = .*\\..*
  205. #################################################
  206. </pre>
  207. <p> </p>
  208. <p> </p>
  209. <p> 说明:</p>
  210. <ul>
  211. <li>canal.instance.connectionCharset 代表数据库的编码方式对应到java中的编码类型,比如UTF-8,GBK , ISO-8859-1</li>
  212. <li>canal.instance.connectionCharsetNumber 代表数据库的编码方式对应mysql中的唯一id,详细的映射关系可查看:com.mysql.jdbc.CharsetMapping.INDEX_TO_CHARSET<br>针对常见的编码:<br>utf-8 &lt;=&gt; 33<br>gb2312 &lt;=&gt; 24<br>gbk &lt;=&gt; 28</li>
  213. </ul>
  214. <p>4. 准备启动</p>
  215. <p> </p>
  216. <pre name="code">sh bin/startup.sh</pre>
  217. <p> </p>
  218. <p>5. 查看日志</p>
  219. <pre name="code">vi logs/canal/canal.log</pre>
  220. <pre name="code">2013-02-05 22:45:27.967 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## start the canal server.
  221. 2013-02-05 22:45:28.113 [main] INFO com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[10.1.29.120:11111]
  222. 2013-02-05 22:45:28.210 [main] INFO com.alibaba.otter.canal.deployer.CanalLauncher - ## the canal server is running now ......</pre>
  223. <p> </p>
  224. <p> 具体instance的日志:</p>
  225. <pre name="code">vi logs/example/example.log</pre>
  226. <pre name="code">2013-02-05 22:50:45.636 [main] INFO c.a.o.c.i.spring.support.PropertyPlaceholderConfigurer - Loading properties file from class path resource [canal.properties]
  227. 2013-02-05 22:50:45.641 [main] INFO c.a.o.c.i.spring.support.PropertyPlaceholderConfigurer - Loading properties file from class path resource [example/instance.properties]
  228. 2013-02-05 22:50:45.803 [main] INFO c.a.otter.canal.instance.spring.CanalInstanceWithSpring - start CannalInstance for 1-example
  229. 2013-02-05 22:50:45.810 [main] INFO c.a.otter.canal.instance.spring.CanalInstanceWithSpring - start successful....</pre>
  230. <p> </p>
  231. <p>6. 关闭</p>
  232. <pre name="code">sh bin/stop.sh</pre>
  233. <p> </p>
  234. <p>it's over. </p>
  235. </div>
  236. <h1>ClientExample</h1>
  237. <p>依赖配置:(目前暂未正式发布到mvn仓库,所以需要各位下载canal源码后手工执行下mvn clean install -Dmaven.test.skip)</p>
  238. <pre name="code">&lt;dependency&gt;
  239. &lt;groupId&gt;com.alibaba.otter&lt;/groupId&gt;
  240. &lt;artifactId&gt;canal.client&lt;/artifactId&gt;
  241. &lt;version&gt;1.0.0&lt;/version&gt;
  242. &lt;/dependency&gt;</pre>
  243. <p> </p>
  244. <p>1. 创建mvn标准工程:</p>
  245. <pre name="code">mvn archetype:create -DgroupId=com.alibaba.otter -DartifactId=canal.sample</pre>
  246. <p> </p>
  247. <p>2. 修改pom.xml,添加依赖</p>
  248. <p> </p>
  249. <p>3. ClientSample代码</p>
  250. <pre name="code">package com.alibaba.otter.canal.sample;
  251. import java.net.InetSocketAddress;
  252. import java.util.List;
  253. import com.alibaba.otter.canal.common.utils.AddressUtils;
  254. import com.alibaba.otter.canal.protocol.Message;
  255. import com.alibaba.otter.canal.protocol.CanalEntry.Column;
  256. import com.alibaba.otter.canal.protocol.CanalEntry.Entry;
  257. import com.alibaba.otter.canal.protocol.CanalEntry.EntryType;
  258. import com.alibaba.otter.canal.protocol.CanalEntry.EventType;
  259. import com.alibaba.otter.canal.protocol.CanalEntry.RowChange;
  260. import com.alibaba.otter.canal.protocol.CanalEntry.RowData;
  261. public class SimpleCanalClientExample {
  262. public static void main(String args[]) {
  263. // 创建链接
  264. CanalConnector connector = CanalConnectors.newSingleConnector(new InetSocketAddress(AddressUtils.getHostIp(),
  265. 11111), "example", "", "");
  266. int batchSize = 1000;
  267. int emptyCount = 0;
  268. try {
  269. connector.connect();
  270. connector.subscribe(".*\\..*");
  271. connector.rollback();
  272. int totalEmtryCount = 120;
  273. while (emptyCount &lt; totalEmtryCount) {
  274. Message message = connector.getWithoutAck(batchSize); // 获取指定数量的数据
  275. long batchId = message.getId();
  276. int size = message.getEntries().size();
  277. if (batchId == -1 || size == 0) {
  278. emptyCount++;
  279. System.out.println("empty count : " + emptyCount);
  280. try {
  281. Thread.sleep(1000);
  282. } catch (InterruptedException e) {
  283. }
  284. } else {
  285. emptyCount = 0;
  286. // System.out.printf("message[batchId=%s,size=%s] \n", batchId, size);
  287. printEntry(message.getEntries());
  288. }
  289. connector.ack(batchId); // 提交确认
  290. // connector.rollback(batchId); // 处理失败, 回滚数据
  291. }
  292. System.out.println("empty too many times, exit");
  293. } finally {
  294. connector.disconnect();
  295. }
  296. }
  297. private static void printEntry(List&lt;Entry&gt; entrys) {
  298. for (Entry entry : entrys) {
  299. if (entry.getEntryType() == EntryType.TRANSACTIONBEGIN || entry.getEntryType() == EntryType.TRANSACTIONEND) {
  300. continue;
  301. }
  302. RowChange rowChage = null;
  303. try {
  304. rowChage = RowChange.parseFrom(entry.getStoreValue());
  305. } catch (Exception e) {
  306. throw new RuntimeException("ERROR ## parser of eromanga-event has an error , data:" + entry.toString(),
  307. e);
  308. }
  309. EventType eventType = rowChage.getEventType();
  310. System.out.println(String.format("================&gt; binlog[%s:%s] , name[%s,%s] , eventType : %s",
  311. entry.getHeader().getLogfileName(), entry.getHeader().getLogfileOffset(),
  312. entry.getHeader().getSchemaName(), entry.getHeader().getTableName(),
  313. eventType));
  314. for (RowData rowData : rowChage.getRowDatasList()) {
  315. if (eventType == EventType.DELETE) {
  316. printColumn(rowData.getBeforeColumnsList());
  317. } else if (eventType == EventType.INSERT) {
  318. printColumn(rowData.getAfterColumnsList());
  319. } else {
  320. System.out.println("-------&gt; before");
  321. printColumn(rowData.getBeforeColumnsList());
  322. System.out.println("-------&gt; after");
  323. printColumn(rowData.getAfterColumnsList());
  324. }
  325. }
  326. }
  327. }
  328. private static void printColumn(List&lt;Column&gt; columns) {
  329. for (Column column : columns) {
  330. System.out.println(column.getName() + " : " + column.getValue() + " update=" + column.getUpdated());
  331. }
  332. }
  333. }</pre>
  334. <p> </p>
  335. <p>4. 运行Client</p>
  336. <p>首先启动Canal Server,可参加QuickStart : <a href="/blogs/1796070">http://agapple.iteye.com/blogs/1796070</a></p>
  337. <p>启动Canal Client后,可以从控制台从看到类似消息:</p>
  338. <pre name="code">empty count : 1
  339. empty count : 2
  340. empty count : 3
  341. empty count : 4</pre>
  342. <p> 此时代表当前数据库无变更数据</p>
  343. <p> </p>
  344. <p>5. 触发数据库变更</p>
  345. <pre name="code">mysql&gt; use test;
  346. Database changed
  347. mysql&gt; CREATE TABLE `xdual` (
  348. -&gt; `ID` int(11) NOT NULL AUTO_INCREMENT,
  349. -&gt; `X` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  350. -&gt; PRIMARY KEY (`ID`)
  351. -&gt; ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 ;
  352. Query OK, 0 rows affected (0.06 sec)
  353. mysql&gt; insert into xdual(id,x) values(null,now());Query OK, 1 row affected (0.06 sec)</pre>
  354. <p> </p>
  355. <p>可以从控制台中看到:</p>
  356. <pre name="code">empty count : 1
  357. empty count : 2
  358. empty count : 3
  359. empty count : 4
  360. ================&gt; binlog[mysql-bin.001946:313661577] , name[test,xdual] , eventType : INSERT
  361. ID : 4 update=true
  362. X : 2013-02-05 23:29:46 update=true</pre>
  363. <p> </p>
  364. <h2>最后:</h2>
  365. <p> 整个代码在附件中可以下载,如有问题可及时联系。 </p>
  366. <p></p>
  367. </div>
  368. <p></p><div>
  369. <a href="http://dl.iteye.com/topics/download/7a893f19-bafb-313a-8a7a-e371a4265ad9">canal.sample.tar.gz</a> (2.2 KB)
  370. </div>
  371. </section>
  372. <footer>
  373. <p><small>Hosted on <a href="https://pages.github.com">GitHub Pages</a> using the Dinky theme</small></p>
  374. </footer>
  375. </div>
  376. <!--[if !IE]><script>fixScale(document);</script><![endif]-->
  377. <script type="text/javascript">
  378. var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
  379. document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
  380. </script>
  381. <script type="text/javascript">
  382. try {
  383. var pageTracker = _gat._getTracker("UA-10379866-5");
  384. pageTracker._trackPageview();
  385. } catch(err) {}
  386. </script>
  387. </body>
  388. </html>