Spark on k8s Operator 部署安装

0 / 1071

安装helm

curl https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 > get_helm.sh

chmod +x get_helm.sh

./get_helm.sh

安装spark-operator

  1. 安装
helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
helm install spark-release spark-operator/spark-operator --namespace spark-operator --create-namespace --set sparkJobNamespace=default

# 删除
helm uninstall spark-release --namespace spark-operator
  1. 安装完成
    可以看到启动了一个sparkoperator的deployment,伴随着sparkoperator pod,负责监听spark请求。
kubectl get pods -n spark-operator

NAME                             READY   STATUS      RESTARTS   AGE
sparkoperator-7c57499f7b-6rwcf   1/1     Running     0          23s
  1. sparkctl编译(可选)
# 启用 Go Modules 功能

export GO111MODULE=on

# 配置 GOPROXY 环境变量

export GOPROXY=https://goproxy.io

# 编译sparkctl工具

cd sparkctl && go build -o sparkctl && cp sparkctl /usr/bin/

这个工具是spark-operator在kubectl上的二次封装

更加规范化,简洁的查看spark任务生命周期

  1. 运行示例
    运行官方自带示例
git clone https://github.com/GoogleCloudPlatform/spark-on-k8s-operator.git

## 如下所示, 主要修改spec.image 和 imagePullPolicy

## 其中,需要注意namespace和serviceAccount的对应关系,如果运行不成功,大概率是这两个导致的权限问题

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "gcr.io/spark-operator/spark:v3.1.1"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar"
  sparkVersion: "3.1.1"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.1.1
    serviceAccount: spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 2
    memory: "512m"
    labels:
      version: 3.1.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"

然后运行kubectl执行创建任务

kubectl apply -f examples/spark-pi.yaml

查看结果

[root@node]# kubectl get pods

NAME                             READY   STATUS      RESTARTS   AGE
spark-pi-driver                  0/1     Completed   0          2m
sparkoperator-7c57499f7b-6rwcf   1/1     Running     0          23m

[root@node]# kubectl get sparkapplications

NAME       AGE
spark-pi   2m

容器运行完毕,可以查看容器日志,了解任务详细情况

[root@node]# kubectl logs spark-pi-driver

...
...
...
Pi is roughly 3.140515702578513
...

## 在许多info日志中看到我们的输出结果

通过kubectl或者sparkctl命令均可以查看:

[root@node]# sparkctl event spark-pi
+------------+--------+----------------------------------------------------+
|    TYPE    |  AGE   |                      MESSAGE                       |
+------------+--------+----------------------------------------------------+
| Normal     | 13s    | SparkApplication spark-pi                          |
|            |        | was added, enqueuing it for                        |
|            |        | submission                                         |
| Normal     | 9s     | SparkApplication spark-pi was                      |
|            |        | submitted successfully                             |
| Normal     | 8s     | Driver spark-pi-driver is                          |
|            |        | running                                            |
| Normal     | 0s     | Executor                                           |
|            |        | spark-pi-1578926994055-exec-1                      |
|            |        | is pending                                         |
| Normal     | 0s     | Executor                                           |
|            |        | spark-pi-1578926994055-exec-2                      |
|            |        | is pending                                         |
| Normal     | 0s     | Executor                                           |
|            |        | spark-pi-1578926994055-exec-3                      |
|            |        | is pending                                         |
| Normal     | 0s     | Executor                                           |
|            |        | spark-pi-1578926994055-exec-4                      |
|            |        | is pending                                         |
| Normal     | 0s     | Executor                                           |
|            |        | spark-pi-1578926994055-exec-5                      |
|            |        | is pending                                         |
+------------+--------+----------------------------------------------------+
此刻对应的kubectl命令可以看到,driver和executor都分配了一个pod来运行:

[root@node]# kubectl get pod
NAME                            READY   STATUS    RESTARTS   AGE
spark-pi-1578927078367-exec-1   1/1     Running   0          3s
spark-pi-1578927078367-exec-2   1/1     Running   0          3s
spark-pi-1578927078367-exec-3   1/1     Running   0          3s
spark-pi-1578927078367-exec-4   1/1     Running   0          3s
spark-pi-1578927078367-exec-5   1/1     Running   0          2s
spark-pi-driver                 1/1     Running   0          13s

运行完成后,仅剩下了driver的pod处于complete状态

[root@linux100-99-81-13 test]# kubectl get pod
NAME              READY   STATUS      RESTARTS   AGE
spark-pi-driver   0/1     Completed   0          39s
此时再用sparkctl查看状态可以看到:

[root@node]# sparkctl status spark-pi
application state:
+-----------+----------------+----------------+-----------------+---------------------+--------------------+-------------------+
|   STATE   | SUBMISSION AGE | COMPLETION AGE |   DRIVER POD    |      DRIVER UI      | SUBMISSIONATTEMPTS | EXECUTIONATTEMPTS |
+-----------+----------------+----------------+-----------------+---------------------+--------------------+-------------------+
| COMPLETED | 1m             | 46s            | spark-pi-driver | 10.105.250.204:4040 |                  1 |                 1 |
+-----------+----------------+----------------+-----------------+---------------------+--------------------+-------------------+
executor state:
+----------------------------------+-----------+
|           EXECUTOR POD           |   STATE   |
+----------------------------------+-----------+
| spark-pi-947e477c16f70c5f-exec-1 | COMPLETED |
| spark-pi-947e477c16f70c5f-exec-2 | COMPLETED |
+----------------------------------+-----------+
  1. FAQ

问题1:

User "system:serviceaccount:sparknamespace:spark" cannot list resource "sparkapplications" in API group "sparkoperator.k8s.io" at the cluster scope

原因: sparknamespace 命名空间中的spark对这个api没有访问权限

解决:查看用户是否存在于这个命名空间中,查看用户是否绑定了clusterrole权限(需要edit,或者完全使用manifest下的授权策略)

kubectl create serviceaccount spark
kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=default:spark --namespace=default

#并在submit时添加如下参数(可选)
spark-submit --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark

问题2:只有driver启动,没有excutor执行

原因:一个可能是k8s apiserver未正常运行,一个是镜像有问题。

解决:在apiserver运行正常的情况下,检查spark task driver的日志,若command是spark-operator则是镜像使用错误,换成spark镜像,正确的command是spark-submit。

  1. 扩展知识:准备spark-pi镜像

使用官方推荐的spark-pi来实践一下,该jar包在spark3.1.1镜像里有,后来由于想本地调测一些东西,在spark github上对应目录能找到该example源码,官方源码如下:

// scalastyle:off println
package org.apache.spark.examples

import scala.math.random

import org.apache.spark.sql.SparkSession

/** Computes an approximation to pi */
object SparkPi {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder
      .appName("Spark Pi")
      .getOrCreate()
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
    val count = spark.sparkContext.parallelize(1 until n, slices).map { i =>
      val x = random * 2 - 1
      val y = random * 2 - 1
      if (x*x + y*y <= 1) 1 else 0
    }.reduce(_ + _)
    println(s"Pi is roughly ${4.0 * count / (n - 1)}")
    spark.stop()
  }
}

将代码打包并生成jar包,然后编写dockerfile生成对应的镜像,dockerfile如下:

ARG SPARK_IMAGE=gcr.io/spark-operator/spark:v3.1.1

FROM gcr.io/spark-operator/spark:v3.1.1
ADD ./SparkPi-1.0-sleep-SNAPSHOT.jar /go/SparkPi-1.0-sleep-SNAPSHOT.jar
RUN chmod +x /go/SparkPi-1.0-sleep-SNAPSHOT.jar

此处jar包的名字自己可以替换一下,加了个sleep只是打包名字而已,通过:

docker build -t spark-pi:\<tag> .
docker tag spark-pi:\<tag> 私仓地址:端口\spark-pi:\<tag>
docker push 私仓地址:端口\spark-pi:\<tag>

生成镜像在本地仓库,需要将本地镜像推到私仓后其他节点才能拉取,这里需要注意的是,docker build的目录要尽量不要放其他的东西,提高打镜像的效率。