安装helm
curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 > get_helm.sh
chmod +x get_helm.sh
./get_helm.sh
安装spark-operator
- 安装
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
helm install spark-release spark-operator/spark-operator --namespace spark-operator --create-namespace --set sparkJobNamespace=default
# 删除
helm uninstall spark-release --namespace spark-operator
- 安装完成
可以看到启动了一个sparkoperator的deployment,伴随着sparkoperator pod,负责监听spark请求。
kubectl get pods -n spark-operator
NAME READY STATUS RESTARTS AGE
sparkoperator-7c57499f7b-6rwcf 1/1 Running 0 23s
- sparkctl编译(可选)
# 启用 Go Modules 功能
export GO111MODULE=on
# 配置 GOPROXY 环境变量
export GOPROXY=https://goproxy.io
# 编译sparkctl工具
cd sparkctl && go build -o sparkctl && cp sparkctl /usr/bin/
这个工具是spark-operator在kubectl上的二次封装
更加规范化,简洁的查看spark任务生命周期
- 运行示例
运行官方自带示例
git clone https://github.com/GoogleCloudPlatform/spark-on-k8s-operator.git
## 如下所示, 主要修改spec.image 和 imagePullPolicy
## 其中,需要注意namespace和serviceAccount的对应关系,如果运行不成功,大概率是这两个导致的权限问题
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
type: Scala
mode: cluster
image: "gcr.io/spark-operator/spark:v3.1.1"
imagePullPolicy: Always
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
sparkVersion: "3.1.1"
restartPolicy:
type: Never
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.1.1
serviceAccount: spark
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 2
memory: "512m"
labels:
version: 3.1.1
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
然后运行kubectl执行创建任务
kubectl apply -f examples/spark-pi.yaml
查看结果
[root@node]# kubectl get pods
NAME READY STATUS RESTARTS AGE
spark-pi-driver 0/1 Completed 0 2m
sparkoperator-7c57499f7b-6rwcf 1/1 Running 0 23m
[root@node]# kubectl get sparkapplications
NAME AGE
spark-pi 2m
容器运行完毕,可以查看容器日志,了解任务详细情况
[root@node]# kubectl logs spark-pi-driver
...
...
...
Pi is roughly 3.140515702578513
...
## 在许多info日志中看到我们的输出结果
通过kubectl或者sparkctl命令均可以查看:
[root@node]# sparkctl event spark-pi
+------------+--------+----------------------------------------------------+
| TYPE | AGE | MESSAGE |
+------------+--------+----------------------------------------------------+
| Normal | 13s | SparkApplication spark-pi |
| | | was added, enqueuing it for |
| | | submission |
| Normal | 9s | SparkApplication spark-pi was |
| | | submitted successfully |
| Normal | 8s | Driver spark-pi-driver is |
| | | running |
| Normal | 0s | Executor |
| | | spark-pi-1578926994055-exec-1 |
| | | is pending |
| Normal | 0s | Executor |
| | | spark-pi-1578926994055-exec-2 |
| | | is pending |
| Normal | 0s | Executor |
| | | spark-pi-1578926994055-exec-3 |
| | | is pending |
| Normal | 0s | Executor |
| | | spark-pi-1578926994055-exec-4 |
| | | is pending |
| Normal | 0s | Executor |
| | | spark-pi-1578926994055-exec-5 |
| | | is pending |
+------------+--------+----------------------------------------------------+
此刻对应的kubectl命令可以看到,driver和executor都分配了一个pod来运行:
[root@node]# kubectl get pod
NAME READY STATUS RESTARTS AGE
spark-pi-1578927078367-exec-1 1/1 Running 0 3s
spark-pi-1578927078367-exec-2 1/1 Running 0 3s
spark-pi-1578927078367-exec-3 1/1 Running 0 3s
spark-pi-1578927078367-exec-4 1/1 Running 0 3s
spark-pi-1578927078367-exec-5 1/1 Running 0 2s
spark-pi-driver 1/1 Running 0 13s
运行完成后,仅剩下了driver的pod处于complete状态
[root@linux100-99-81-13 test]# kubectl get pod
NAME READY STATUS RESTARTS AGE
spark-pi-driver 0/1 Completed 0 39s
此时再用sparkctl查看状态可以看到:
[root@node]# sparkctl status spark-pi
application state:
+-----------+----------------+----------------+-----------------+---------------------+--------------------+-------------------+
| STATE | SUBMISSION AGE | COMPLETION AGE | DRIVER POD | DRIVER UI | SUBMISSIONATTEMPTS | EXECUTIONATTEMPTS |
+-----------+----------------+----------------+-----------------+---------------------+--------------------+-------------------+
| COMPLETED | 1m | 46s | spark-pi-driver | 10.105.250.204:4040 | 1 | 1 |
+-----------+----------------+----------------+-----------------+---------------------+--------------------+-------------------+
executor state:
+----------------------------------+-----------+
| EXECUTOR POD | STATE |
+----------------------------------+-----------+
| spark-pi-947e477c16f70c5f-exec-1 | COMPLETED |
| spark-pi-947e477c16f70c5f-exec-2 | COMPLETED |
+----------------------------------+-----------+
- FAQ
问题1:
User "system:serviceaccount:sparknamespace:spark" cannot list resource "sparkapplications" in API group "sparkoperator.k8s.io" at the cluster scope
原因: sparknamespace 命名空间中的spark对这个api没有访问权限
解决:查看用户是否存在于这个命名空间中,查看用户是否绑定了clusterrole权限(需要edit,或者完全使用manifest下的授权策略)
kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default
#并在submit时添加如下参数(可选)
spark-submit --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
问题2:只有driver启动,没有excutor执行
原因:一个可能是k8s apiserver未正常运行,一个是镜像有问题。
解决:在apiserver运行正常的情况下,检查spark task driver的日志,若command是spark-operator则是镜像使用错误,换成spark镜像,正确的command是spark-submit。
- 扩展知识:准备spark-pi镜像
使用官方推荐的spark-pi来实践一下,该jar包在spark3.1.1镜像里有,后来由于想本地调测一些东西,在spark github上对应目录能找到该example源码,官方源码如下:
// scalastyle:off println
package org.apache.spark.examples
import scala.math.random
import org.apache.spark.sql.SparkSession
/** Computes an approximation to pi */
object SparkPi {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder
.appName("Spark Pi")
.getOrCreate()
val slices = if (args.length > 0) args(0).toInt else 2
val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
val count = spark.sparkContext.parallelize(1 until n, slices).map { i =>
val x = random * 2 - 1
val y = random * 2 - 1
if (x*x + y*y <= 1) 1 else 0
}.reduce(_ + _)
println(s"Pi is roughly ${4.0 * count / (n - 1)}")
spark.stop()
}
}
将代码打包并生成jar包,然后编写dockerfile生成对应的镜像,dockerfile如下:
ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v3.1.1
FROM gcr.io/spark-operator/spark:v3.1.1
ADD ./SparkPi-1.0-sleep-SNAPSHOT.jar /go/SparkPi-1.0-sleep-SNAPSHOT.jar
RUN chmod +x /go/SparkPi-1.0-sleep-SNAPSHOT.jar
此处jar包的名字自己可以替换一下,加了个sleep只是打包名字而已,通过:
docker build -t spark-pi:\<tag> .
docker tag spark-pi:\<tag> 私仓地址:端口\spark-pi:\<tag>
docker push 私仓地址:端口\spark-pi:\<tag>
生成镜像在本地仓库,需要将本地镜像推到私仓后其他节点才能拉取,这里需要注意的是,docker build的目录要尽量不要放其他的东西,提高打镜像的效率。