分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > 代码编程

使用flume将数据sink到HBase

发布时间:2023-09-06 01:56责任编辑:蔡小小关键词:暂无标签

===========>先创建Hbase表和列族<================
案例1:源数据一行对应Hbase的一列存储(hbase-1.12没有问题)
================================================================================
#说明:案例是flume监听目录/home/hadoop/flume_hbase采集到hbase;必须先在Hbase中创建表和列族

数据目录:
vi /home/hadoop/flume_hbase/word.txt
1001 pan nan
2200 lili nv

create ‘tb_words‘,‘cf_wd‘

vi flume-hbase.conf
#Name the ?components on this agent ?
a1.sources ?= r1 ?
a1.sinks = ?k1 ?
a1.channels ?= c1 ?
?
#Describe/configure the source ?
a1.sources.r1.type ?= spooldir ?//当监控文件夹时,不用执行文件,只需在文件夹下有操作,就可监听到信息
a1.sources.r1.spoolDir=/home/hadoop/flume_hbase
?
# Describe ?the sink ?
a1.sinks.k1.type =asynchbase
a1.sinks.k1.table = tb_words
a1.sinks.k1.columnFamily = cf_wd
#目前自己处理到支持一个列名的,多个列名称失败了,多个列名考虑使用下面的案例的正则表达式方式匹配
a1.sinks.k1.serializer.payloadColumn=wd ?
a1.sinks.k1.serializer.incrementColumn=last
a1.sinks.k1.serializer.rowPrefix=QM
a1.sinks.k1.serializer.suffix=timestamp
a1.sinks.k1.serializer =org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer

# Use a ?channel which buffers events in memory ?
a1.channels.c1.type ?= memory ?
a1.channels.c1.capacity ?= 1000 ?
a1.channels.c1.transactionCapacity ?= 100 ?
?
# Bind the ?source and sink to the channel ?
a1.sources.r1.channels ?= c1 ?
a1.sinks.k1.channel ?= c1


案例2:使用正则表达式,对行分多个列值
说明:apache-flume-1.7.0-bin.tar.gz 和 Hbase-1.12+
================================================================================
create ‘tb_words2‘,‘words‘

数据目录:
vi /home/hadoop/flume_hbase/data.txt
1001,panzong,nan
2200,lili,nv

flume配置文件:
vi flume_2_hbase.conf
#Name the ?components on this agent
a1.sources ?= r1
a1.sinks = ?k1
a1.channels ?= c1

#Describe/configure the source
a1.sources.r1.type ?= cn.qm.flume.source.MySource //可更换为spooldir
a1.sources.r1.spoolDir=/home/hadoop/flume_hbase

# Describe ?the sink
#a1.sinks.k1.type =org.apache.flume.sink.hbase.HBaseSink
a1.sinks.k1.type =hbase
a1.sinks.k1.table = tb_words2
a1.sinks.k1.columnFamily = words
a1.sinks.k1.serializer.enableWal= true
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
#查看RegexHbaseEventSerializer类源码,可以快速理解rowKeyIndex/colNames属性
a1.sinks.k1.serializer.regex= ^([0-9]+),([a-z]+),([a-z]+)$
# 指定某一列来当主键,而不是用随机生成的key,#第一列为Hbase的rowkey
#RegexHbaseEventSerializer 源码查看
a1.sinks.k1.serializer.rowKeyIndex =0
#ROW_KEY为系统指定列名
a1.sinks.k1.serializer.colNames= ROW_KEY,name,sex
a1.sinks.k1.zookeeperQuorum =hdp-qm-05:2181,hdp-qm-06:2181,hdp-qm-07:2181

# Use a ?channel which buffers events in memory
a1.channels.c1.type ?= memory
a1.channels.c1.capacity ?= 1000
a1.channels.c1.transactionCapacity ?= 100

# Bind the ?source and sink to the channel
a1.sources.r1.channels ?= c1
a1.sinks.k1.channel ?= c1


#第二列为Hbase的rowkey
#a1.sinks.k1.serializer.rowKeyIndex = 1
#a1.sinks.k1.serializer.regex= ^([0-9]+),([a-z]+),([a-z]+)$
#a1.sinks.k1.serializer.colNames= id,ROW_KEY,sex

使用flume将数据sink到HBase

原文地址:https://www.cnblogs.com/pingzizhuanshu/p/9102494.html

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved