分享web开发知识

注册/登录|最近发布|今日推荐

主页 IT知识网页技术软件开发前端开发代码编程运营维护技术分享教程案例
当前位置:首页 > 前端开发

Kubernetes搭建Hadoop服务

发布时间:2023-09-06 02:16责任编辑:赖小花关键词:Hadoop

网上使用Kubernetes搭建Hadoop的资料较少,因此自己尝试做了一个,记录下过程和遇到的问题。

一、选择镜像

首先从官方Docker Hub中选择比较热门的镜像。这里选择了bde2020的系列镜像,因为其Githab上的资料比较完善。https://github.com/big-data-europe/docker-hadoop

二、使用docker-compose进行测试

网站上给出的是使用docker-compose运行此hadoop镜像的方法,按照网站上操作即可。

docker-compose是Docker自带的容器编排工具,操作简单,只需要将docker-compose.yml和hadoop.env文件下载到本地,使用docker-compose up命令即可启动。停止服务执行docker-compose down命令。

三、编写各个组件的Kubernetes yaml文件

上面的docker-compose案例虽然简单,但是功能较少,且运行于同一台机器上。我们要做的就是把docker-compose的yaml文件的语法改写为Kubernetes的yaml文件语法。

1.创建configmap

配置文件可以通过configmap录入。参考hadoop.env,编写configmap.yaml如下:

apiVersion: v1kind: ConfigMapmetadata: ?name: hadoop-configdata: ?CORE_CONF_fs_defaultFS: "hdfs://namenode:8020" ?CORE_CONF_hadoop_http_staticuser_user: "root" ?CORE_CONF_hadoop_proxyuser_hue_hosts: "*" ?CORE_CONF_hadoop_proxyuser_hue_groups: "*" ?HDFS_CONF_dfs_webhdfs_enabled: "true" ?HDFS_CONF_dfs_permissions_enabled: "false" ??YARN_CONF_yarn_log___aggregation___enable: "true" ?YARN_CONF_yarn_resourcemanager_recovery_enabled: "true" ?YARN_CONF_yarn_resourcemanager_store_class: "org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore" ?YARN_CONF_yarn_resourcemanager_fs_state___store_uri: "/rmstate" ?YARN_CONF_yarn_nodemanager_remote___app___log___dir: "/app-logs" ?YARN_CONF_yarn_log_server_url: "http://historyserver:8188/applicationhistory/logs/" ?YARN_CONF_yarn_timeline___service_enabled: "true" ?YARN_CONF_yarn_timeline___service_generic___application___history_enabled: "true" ?YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled: "true" ?YARN_CONF_yarn_resourcemanager_hostname: "resourcemanager" ?YARN_CONF_yarn_timeline___service_hostname: "historyserver" ?YARN_CONF_yarn_resourcemanager_address: "resourcemanager:8032" ?YARN_CONF_yarn_resourcemanager_scheduler_address: "resourcemanager:8030" ?YARN_CONF_yarn_resourcemanager_resource___tracker_address: "resourcemanager:8031"

2.创建namenode

hadoop节点间的通信使用hostname,但是pod在创建时会被系统随机指定一个hostname并写入自己的/etc/hosts文件中,从而造成节点间的通信问题,出现UnresolvedAddressException等错误信息。这里坑了我好久,查了很多资料才发现问题。

解决方法就是在service中将clusterIP指定为None,并在deployment中指定hostname与service名称一致。为了避免混淆,后面的service name、container name、hostname等都设为相同的值。

namenode需要挂载volume,因此先编写pvc.yaml(需要先创建StorageClass,具体可参考我之前的博客https://www.cnblogs.com/00986014w/p/9406962.html):

apiVersion: v1kind: PersistentVolumeClaimmetadata: ?name: hadoop-namenode-pvcspec: ?storageClassName: nfs ?accessModes: ???- ReadWriteMany ?resources: ???requests: ?????storage: 1Gi

  

编写namenode的service和deployment文件namenode.yaml如下:

apiVersion: v1kind: Servicemetadata: ?name: namenode ?labels: ???name: namenodespec: ?ports: ???- port: 50070 ?????name: http ???- port: 8020 ?????name: hdfs ???- port: 50075 ?????name: hdfs1 ???- port: 50010 ?????name: hdfs2 ???- port: 50020 ?????name: hdfs3 ???- port: 9000 ?????name: hdfs4 ???- port: 50090 ?????name: hdfs5 ???- port: 31010 ?????name: hdfs6 ???- port: 8030 ?????name: yarn1 ???- port: 8031 ?????name: yarn2 ???- port: 8032 ?????name: yarn3 ???- port: 8033 ?????name: yarn4 ???- port: 8040 ?????name: yarn5 ???- port: 8042 ?????name: yarn6 ???- port: 8088 ?????name: yarn7 ???- port: 8188 ?????name: historyserver ?selector: ???name: namenode ?clusterIP: None---apiVersion: apps/v1beta1kind: Deploymentmetadata: ?name: namenodespec: ?replicas: 1 ?template: ???metadata: ?????labels: ???????name: namenode ???spec: ?????hostname: namenode ?????containers: ???????- name: namenode ?????????image: bde2020/hadoop-namenode:1.1.0-hadoop2.7.1-java8 ?????????imagePullPolicy: IfNotPresent ?????????ports: ???????????- containerPort: 50070 ?????????????name: http ???????????- containerPort: 8020 ?????????????name: hdfs ???????????- containerPort: 50075 ?????????????name: hdfs1 ???????????- containerPort: 50010 ?????????????name: hdfs2 ???????????- containerPort: 50020 ?????????????name: hdfs3 ???????????- containerPort: 9000 ?????????????name: hdfs4 ???????????- containerPort: 50090 ?????????????name: hdfs5 ???????????- containerPort: 31010 ?????????????name: hdfs6 ???????????- containerPort: 8030 ?????????????name: yarn1 ???????????- containerPort: 8031 ?????????????name: yarn2 ???????????- containerPort: 8032 ?????????????name: yarn3 ???????????- containerPort: 8033 ?????????????name: yarn4 ???????????- containerPort: 8040 ?????????????name: yarn5 ???????????- containerPort: 8042 ?????????????name: yarn6 ???????????- containerPort: 8088 ?????????????name: yarn7 ???????????- containerPort: 8188 ?????????????name: historyserver ?????????env: ???????????- name: CLUSTER_NAME ?????????????value: test ?????????envFrom: ???????????- configMapRef: ???????????????name: hadoop-config ?????????volumeMounts: ???????????- name: hadoop-namenode ?????????????mountPath: /hadoop/dfs/name ?????volumes: ???????- name: hadoop-namenode ?????????persistentVolumeClaim: ???????????claimName: hadoop-namenode-pvc

2.datanode

创建3个datanode。以datanode1为例,编写datanode的datanode.yaml如下(pvc与namenode的类似,不贴出来了):

apiVersion: v1kind: Servicemetadata: ?name: datanode1 ?labels: ???name: datanode1spec: ?ports: ???- port: 50070 ?????name: http ???- port: 8020 ?????name: hdfs ???- port: 50075 ?????name: hdfs1 ???- port: 50010 ?????name: hdfs2 ???- port: 50020 ?????name: hdfs3 ???- port: 9000 ?????name: hdfs4 ???- port: 50090 ?????name: hdfs5 ???- port: 31010 ?????name: hdfs6 ???- port: 8030 ?????name: yarn1 ???- port: 8031 ?????name: yarn2 ???- port: 8032 ?????name: yarn3 ???- port: 8033 ?????name: yarn4 ???- port: 8040 ?????name: yarn5 ???- port: 8042 ?????name: yarn6 ???- port: 8088 ?????name: yarn7 ???- port: 8188 ?????name: historyserver ?selector: ???name: datanode1 ?clusterIP: None---apiVersion: apps/v1beta1kind: Deploymentmetadata: ?name: datanode1spec: ?replicas: 1 ?template: ???metadata: ?????labels: ???????name: datanode1 ???spec: ?????hostname: datanode1 ?????containers: ???????- name: datanode1 ?????????image: bde2020/hadoop-datanode:1.1.0-hadoop2.7.1-java8 ?????????imagePullPolicy: IfNotPresent ?????????ports: ???????????- containerPort: 50070 ?????????????name: http ???????????- containerPort: 8020 ?????????????name: hdfs ???????????- containerPort: 50075 ?????????????name: hdfs1 ???????????- containerPort: 50010 ?????????????name: hdfs2 ???????????- containerPort: 50020 ?????????????name: hdfs3 ???????????- containerPort: 9000 ?????????????name: hdfs4 ???????????- containerPort: 50090 ?????????????name: hdfs5 ???????????- containerPort: 31010 ?????????????name: hdfs6 ???????????- containerPort: 8030 ?????????????name: yarn1 ???????????- containerPort: 8031 ?????????????name: yarn2 ???????????- containerPort: 8032 ?????????????name: yarn3 ???????????- containerPort: 8033 ?????????????name: yarn4 ???????????- containerPort: 8040 ?????????????name: yarn5 ???????????- containerPort: 8042 ?????????????name: yarn6 ???????????- containerPort: 8088 ?????????????name: yarn7 ???????????- containerPort: 8188 ?????????????name: historyserver ?????????envFrom: ???????????- configMapRef: ???????????????name: hadoop-config ?????????volumeMounts: ???????????- name: hadoop-datanode1 ?????????????mountPath: /hadoop/dfs/data ?????volumes: ???????- name: hadoop-datanode1 ?????????persistentVolumeClaim: ???????????claimName: hadoop-datanode1-pvc ????

创建完成后,一定要用kubectl logs查看一下日志,确认没有错误信息后再继续下一步。

3.resourcemanager

编写resourcemanager.yaml文件如下:

apiVersion: v1kind: Servicemetadata: ?name: resourcemanager ?labels: ???name: resourcemanagerspec: ?ports: ???- port: 50070 ?????name: http ???- port: 8020 ?????name: hdfs ???- port: 50075 ?????name: hdfs1 ???- port: 50010 ?????name: hdfs2 ???- port: 50020 ?????name: hdfs3 ???- port: 9000 ?????name: hdfs4 ???- port: 50090 ?????name: hdfs5 ???- port: 31010 ?????name: hdfs6 ???- port: 8030 ?????name: yarn1 ???- port: 8031 ?????name: yarn2 ???- port: 8032 ?????name: yarn3 ???- port: 8033 ?????name: yarn4 ???- port: 8040 ?????name: yarn5 ???- port: 8042 ?????name: yarn6 ???- port: 8088 ?????name: yarn7 ???- port: 8188 ?????name: historyserver ?selector: ???name: resourcemanager ?clusterIP: None---apiVersion: apps/v1beta1kind: Deploymentmetadata: ?name: resourcemanagerspec: ?replicas: 1 ?template: ???metadata: ?????labels: ???????name: resourcemanager ???spec: ?????hostname: resourcemanager ?????containers: ???????- name: resourcemanager ?????????image: bde2020/hadoop-resourcemanager:1.1.0-hadoop2.7.1-java8 ?????????imagePullPolicy: IfNotPresent ?????????ports: ???????????- containerPort: 50070 ?????????????name: http ???????????- containerPort: 8020 ?????????????name: hdfs ???????????- containerPort: 50075 ?????????????name: hdfs1 ???????????- containerPort: 50010 ?????????????name: hdfs2 ???????????- containerPort: 50020 ?????????????name: hdfs3 ???????????- containerPort: 9000 ?????????????name: hdfs4 ???????????- containerPort: 50090 ?????????????name: hdfs5 ???????????- containerPort: 31010 ?????????????name: hdfs6 ???????????- containerPort: 8030 ?????????????name: yarn1 ???????????- containerPort: 8031 ?????????????name: yarn2 ???????????- containerPort: 8032 ?????????????name: yarn3 ???????????- containerPort: 8033 ?????????????name: yarn4 ???????????- containerPort: 8040 ?????????????name: yarn5 ???????????- containerPort: 8042 ?????????????name: yarn6 ???????????- containerPort: 8088 ?????????????name: yarn7 ???????????- containerPort: 8188 ?????????????name: historyserver ?????????envFrom: ???????????- configMapRef: ???????????????name: hadoop-config 

4.nodemanager

编写nodemanager.yaml如下:

apiVersion: v1kind: Servicemetadata: ?name: nodemanager1 ?labels: ???name: nodemanager1spec: ?ports: ???- port: 50070 ?????name: http ???- port: 8020 ?????name: hdfs ???- port: 50075 ?????name: hdfs1 ???- port: 50010 ?????name: hdfs2 ???- port: 50020 ?????name: hdfs3 ???- port: 9000 ?????name: hdfs4 ???- port: 50090 ?????name: hdfs5 ???- port: 31010 ?????name: hdfs6 ???- port: 8030 ?????name: yarn1 ???- port: 8031 ?????name: yarn2 ???- port: 8032 ?????name: yarn3 ???- port: 8033 ?????name: yarn4 ???- port: 8040 ?????name: yarn5 ???- port: 8042 ?????name: yarn6 ???- port: 8088 ?????name: yarn7 ???- port: 8188 ?????name: historyserver ?selector: ????name: nodemanager1 ?clusterIP: None---apiVersion: apps/v1beta1kind: Deploymentmetadata: ?name: nodemanager1spec: ?replicas: 1 ?template: ???metadata: ?????labels: ???????name: nodemanager1 ???spec: ?????hostname: nodemanager1 ?????containers: ???????- name: nodemanager1 ?????????image: bde2020/hadoop-nodemanager:1.1.0-hadoop2.7.1-java8 ?????????imagePullPolicy: IfNotPresent ?????????ports: ???????????- containerPort: 50070 ?????????????name: http ???????????- containerPort: 8020 ?????????????name: hdfs ???????????- containerPort: 50075 ?????????????name: hdfs1 ???????????- containerPort: 50010 ?????????????name: hdfs2 ???????????- containerPort: 50020 ?????????????name: hdfs3 ???????????- containerPort: 9000 ?????????????name: hdfs4 ???????????- containerPort: 50090 ?????????????name: hdfs5 ???????????- containerPort: 31010 ?????????????name: hdfs6 ???????????- containerPort: 8030 ?????????????name: yarn1 ???????????- containerPort: 8031 ?????????????name: yarn2 ???????????- containerPort: 8032 ?????????????name: yarn3 ???????????- containerPort: 8033 ?????????????name: yarn4 ???????????- containerPort: 8040 ?????????????name: yarn5 ???????????- containerPort: 8042 ?????????????name: yarn6 ???????????- containerPort: 8088 ?????????????name: yarn7 ???????????- containerPort: 8188 ?????????envFrom: ???????????- configMapRef: ???????????????name: hadoop-config

5.historyserver

pvc与前面类似。编写historyserver.yaml如下:

apiVersion: v1kind: Servicemetadata: ?name: historyserver ?labels: ???name: historyserverspec: ?ports: ???- port: 50070 ?????name: http ???- port: 8020 ?????name: hdfs ???- port: 50075 ?????name: hdfs1 ???- port: 50010 ?????name: hdfs2 ???- port: 50020 ?????name: hdfs3 ???- port: 9000 ?????name: hdfs4 ???- port: 50090 ?????name: hdfs5 ???- port: 31010 ?????name: hdfs6 ???- port: 8030 ?????name: yarn1 ???- port: 8031 ?????name: yarn2 ???- port: 8032 ?????name: yarn3 ???- port: 8033 ?????name: yarn4 ???- port: 8040 ?????name: yarn5 ???- port: 8042 ?????name: yarn6 ???- port: 8088 ?????name: yarn7 ???- port: 8188 ?????name: historyserver ?selector: ???name: historyserver ?clusterIP: None---apiVersion: apps/v1beta1kind: Deploymentmetadata: ?name: historyserverspec: ?replicas: 1 ?template: ???metadata: ?????labels: ???????name: historyserver ???spec: ?????hostname: historyserver ?????containers: ???????- name: historyserver ?????????image: bde2020/hadoop-historyserver:1.1.0-hadoop2.7.1-java8 ?????????imagePullPolicy: IfNotPresent ?????????ports: ???????????- containerPort: 50070 ?????????????name: http ???????????- containerPort: 8020 ?????????????name: hdfs ???????????- containerPort: 50075 ?????????????name: hdfs1 ???????????- containerPort: 50010 ?????????????name: hdfs2 ???????????- containerPort: 50020 ?????????????name: hdfs3 ???????????- containerPort: 9000 ?????????????name: hdfs4 ???????????- containerPort: 50090 ?????????????name: hdfs5 ???????????- containerPort: 31010 ?????????????name: hdfs6 ???????????- containerPort: 8030 ?????????????name: yarn1 ???????????- containerPort: 8031 ?????????????name: yarn2 ???????????- containerPort: 8032 ?????????????name: yarn3 ???????????- containerPort: 8033 ?????????????name: yarn4 ???????????- containerPort: 8040 ?????????????name: yarn5 ???????????- containerPort: 8042 ?????????????name: yarn6 ???????????- containerPort: 8088 ?????????????name: yarn7 ???????????- containerPort: 8188 ?????????envFrom: ???????????- configMapRef: ???????????????name: hadoop-config ??????????volumeMounts: ???????????- name: hadoop-historyserver ?????????????mountPath: /hadoop/yarn/timeline ?????volumes: ???????- name: hadoop-historyserver ?????????persistentVolumeClaim: ???????????claimName: hadoop-historyserver-pvc

以上几部分都用kubectl create创建后,参考GitHub,按照这5个部件对应的endpoint加上对应的端口,在浏览器上测试(需要在集群内部的某台机器上进行操作),如果能够正确显示Hadoop的页面,说明搭建成功!

6.测试

简单地测试一下节点间是否能够正常通行。

使用kubectl exec -it namenode /bin/bash进入namenode内部,执行hdfs dfs -put /etc/issue /,看看是否能够正常上传。

Kubernetes搭建Hadoop服务

原文地址:https://www.cnblogs.com/00986014w/p/9732796.html

知识推荐

我的编程学习网——分享web前端后端开发技术知识。 垃圾信息处理邮箱 tousu563@163.com 网站地图
icp备案号 闽ICP备2023006418号-8 不良信息举报平台 互联网安全管理备案 Copyright 2023 www.wodecom.cn All Rights Reserved