在 kubernetes 集群部署 hadoop
实际上这是一个差的想法。
使用 helm 安装
下载 helm
curl https://baltocdn.com/helm/signing.asc | sudo apt-key add -
sudo apt-get install apt-transport-https --yes
echo "deb https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt-get update
sudo apt-get install helm
Create a chart
helm create mychart
默认的 chart 是个 nginx
web server。可以部署上去看看
Install a chart
helm install mychart-release-name mychart
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=mychart-release-name,app.kubernetes.io/instance=cluster-name" -o jsonpath="{.items[0].metadata.name}")
export CONTAINER_PORT=$(kubectl get pod --namespace default $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
echo "Visit http://127.0.0.1:8080 to use your application"
kubectl --namespace default port-forward $POD_NAME 8080:$CONTAINER_PORT
Uninstall chart
$ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
mychart-release-name default 1 2020-12-05 18:53:26.6995927 +0900 KST deployed mychart-0.1.0 1.16.0
helm uninstall mychart-release-name
Can also be done by delete
del
un
.
Install hadoop
chart
To install the chart with the release name hadoop that utilizes 50% of the available node resources:
helm install hadoop stable/hadoop
This command will deploy at least 4 pods on your cluster according to the default settings.
- hdfs namenode pod
- hdfs datanode
- yarn resource manager
- yarn name manager
Usage
-
You can check the status of HDFS by running this command:
kubectl exec -n default -it hadoop-hadoop-hdfs-nn-0 -- hdfs dfsadmin -report
-
You can list the yarn nodes by running this command:
kubectl exec -n default -it hadoop-hadoop-yarn-rm-0 -- yarn node -list
-
Create a port-forward to the yarn resource manager UI:
kubectl port-forward -n default hadoop-hadoop-yarn-rm-0 8088:8088
Then open the ui in your browser:
open http://localhost:8088
-
You can run included hadoop tests like this:
kubectl exec -n default -it hadoop-hadoop-yarn-nm-0 -- hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.9.0-tests.jar TestDFSIO -write -nrFiles 5 -fileSize 128MB -resFile /tmp/TestDFSIOwrite.txt
-
You can list the mapreduce jobs like this:
kubectl exec -n default -it hadoop-hadoop-yarn-rm-0 -- mapred job -list
-
This chart can also be used with the zeppelin chart
helm install --namespace default --set hadoop.useConfigMap=true,hadoop.configMapName=hadoop-hadoop stable/zeppelin
-
You can scale the number of yarn nodes like this:
helm upgrade hadoop --set yarn.nodeManager.replicas=4 stable/hadoop
Make sure to update the values.yaml if you want to make this permanent.
Edit helm
release
helm upgrade -f new-values.yml {release name} {package name or path} --version {fixed-version}