安装helm
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 > get_helm.sh
chmod +x get_helm.sh
./get_helm.sh
安装spark-operator
- 安装
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
helm install spark-release spark-operator/spark-operator --namespace spark-operator --create-namespace --set sparkJobNamespace=default
# 删除
helm uninstall spark-release --namespace spark-operator
- 安装完成
可以看到启动了一个sparkoperator的deployment,伴随着sparkoperator pod,负责监听spark请求。
kubectl get pods -n spark-operator
NAME READY STATUS RESTARTS AGE
sparkoperator-7c57499f7b-6rwcf 1/1 Running 0 23s
- sparkctl编译(可选)
# 启用 Go Modules 功能
export GO111MODULE=on
# 配置 GOPROXY 环境变量
export GOPROXY=https://goproxy.io
# 编译sparkctl工具
cd sparkctl && go build -o sparkctl && cp sparkctl /usr/bin/
这个工具是spark-operator在kubectl上的二次封装
更加规范化,简洁的查看spark任务生命周期
- 运行示例
运行官方自带示例
git clone https://github.com/GoogleCloudPlatform/spark-on-k8s-operator.git
## 如下所示, 主要修改spec.image 和 imagePullPolicy
## 其中,需要注意namespace和serviceAccount的对应关系,如果运行不成功,大概率是这两个导致的权限问题
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
type: Scala
mode: cluster
image: "gcr.io/spark-operator/spark:v3.1.1"
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
sparkVersion: "3.1.1"
restartPolicy:
type: Never
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.1.1
serviceAccount: spark
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 2
memory: "512m"
labels:
version: 3.1.1
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
然后运行kubectl执行创建任务
kubectl apply -f examples/spark-pi.yaml
查看结果
[root@node]# kubectl get pods
NAME READY STATUS RESTARTS AGE
spark-pi-driver 0/1 Completed 0 2m
sparkoperator-7c57499f7b-6rwcf 1/1 Running 0 23m
[root@node]# kubectl get sparkapplications
NAME AGE
spark-pi 2m
容器运行完毕,可以查看容器日志,了解任务详细情况
[root@node]# kubectl logs spark-pi-driver
...
...
...
Pi is roughly 3.140515702578513
...
## 在许多info日志中看到我们的输出结果
通过kubectl或者sparkctl命令均可以查看:
[root@node]# sparkctl event spark-pi
+------------+--------+----------------------------------------------------+
| TYPE | AGE | MESSAGE |
+------------+--------+----------------------------------------------------+
| Normal | 13s | SparkApplication spark-pi |
| | | was added, enqueuing it for |
| | | submission |
| Normal | 9s | SparkApplication spark-pi was |
| | | submitted successfully |
| Normal | 8s | Driver spark-pi-driver is |
| | | running |
| Normal | 0s | Executor |
| | | spark-pi-1578926994055-exec-1 |
| | | is pending |
| Normal | 0s | Executor |
| | | spark-pi-1578926994055-exec-2 |
| | | is pending |
| Normal | 0s | Executor |
| | | spark-pi-1578926994055-exec-3 |
| | | is pending |
| Normal | 0s | Executor |
| | | spark-pi-1578926994055-exec-4 |
| | | is pending |
| Normal | 0s | Executor |
| | | spark-pi-1578926994055-exec-5 |
| | | is pending |
+------------+--------+----------------------------------------------------+
此刻对应的kubectl命令可以看到,driver和executor都分配了一个pod来运行:
[root@node]# kubectl get pod
NAME READY STATUS RESTARTS AGE
spark-pi-1578927078367-exec-1 1/1 Running 0 3s
spark-pi-1578927078367-exec-2 1/1 Running 0 3s
spark-pi-1578927078367-exec-3 1/1 Running 0 3s
spark-pi-1578927078367-exec-4 1/1 Running 0 3s
spark-pi-1578927078367-exec-5 1/1 Running