分布式数据集编排和加速引擎fluid
- 4天前
引言
在 AI 模型训练、大数据离线分析、实时数据处理等场景中,大规模数据集的存储访问效率是制约任务执行速度的关键瓶颈。传统存储方案中,数据集常存储于远端 PVC 或对象存储,任务访问时需频繁进行网络传输,不仅延迟高,还易受网络带宽波动影响。
Fluid 作为云原生分布式数据集编排与加速引擎,核心优势在于融合 Alluxio 分布式缓存技术,可将远端数据源的数据集缓存至计算节点本地内存或磁盘,实现数据 “就近访问”。其支持 PVC、对象存储、HDFS 等多种数据源,能显著降低数据访问延迟、提升任务执行效率,同时具备灵活的数据集管理与调度能力。
本文基于 Fluid v1.0.8 版本,详细介绍其在 Kubernetes 环境中的完整部署流程、数据集配置方法、缓存加速效果测试,同时补充 ARM64 架构镜像的手动构建方案(解决官方镜像适配问题),为多架构集群、多场景下的数据集加速提供可直接落地的实践参考。
环境部署
安装 Fluid(基于 Helm)
- 添加 Fluid Helm 仓库并更新索引:
helm repo add fluid https://fluid-cloudnative.github.io/charts
helm repo update
- 配置镜像前缀与版本参数:
DefaultImagePrefix=swr.cn-east-3.myhuaweicloud.com/lomtom-common
DefaultVersion=v1.1.0-f82c77c4
- 执行安装命令,指定命名空间、版本及镜像配置:
helm install fluid fluid/fluid -n fluid-system --create-namespace --version 1.0.8 \
--set imagePrefix=$DefaultImagePrefix \
--set crdUpgrade.imagePrefix=$DefaultImagePrefix \
--set dataset.controller.imagePrefix=$DefaultImagePrefix \
--set csi.registrar.imagePrefix=$DefaultImagePrefix \
--set csi.plugins.imagePrefix=$DefaultImagePrefix \
--set webhook.imagePrefix=$DefaultImagePrefix \
--set webhook.filePrefetcher.imagePrefix=$DefaultImagePrefix \
--set fluidapp.controller.imagePrefix=$DefaultImagePrefix \
--set runtime.alluxio.controller.imagePrefix=$DefaultImagePrefix \
--set runtime.alluxio.init.imagePrefix=$DefaultImagePrefix \
--set runtime.alluxio.runtime.imagePrefix=$DefaultImagePrefix \
--set runtime.alluxio.fuse.imagePrefix=$DefaultImagePrefix \
--set version=$DefaultVersion \
--set crdUpgrade.imageTag=$DefaultVersion \
--set dataset.controller.imageTag=$DefaultVersion \
--set csi.plugins.imageTag=$DefaultVersion \
--set webhook.imageTag=$DefaultVersion \
--set fluidapp.controller.imageTag=$DefaultVersion \
--set runtime.alluxio.controller.imageTag=$DefaultVersion \
--set runtime.alluxio.init.imageTag=v1.0.4 \
--set runtime.alluxio.runtime.imageTag=2.9.0-openeuler2403sp1 \
--set runtime.alluxio.fuse.imageTag=2.9.0-openeuler2403sp1
安装完成后,可通过 kubectl get pods -n fluid-system 验证组件状态,确保所有 Pod 处于 Running 状态。
镜像手动构建(ARM64 架构适配)
注:该步骤可跳过(上一步安装过程中为双架构镜像),若想自定义构建,可按照以下步骤进行构建。
由于 Fluid 与 Alluxio 官方仅提供 AMD64 架构镜像,针对 ARM64 架构集群(如鲲鹏服务器),需手动构建适配镜像,步骤如下:
构建 Fluid 镜像
Fluid 镜像基于 Golang 构建,通过多平台构建命令可同时生成 AMD64/ARM64 镜像:
# 克隆 Fluid 源码(指定 v1.0.8 版本)
git clone https://github.com/fluid-cloudnative/fluid.git -b v1.0.8
# 配置镜像仓库与构建参数
export IMG_REPO=swr.cn-east-3.myhuaweicloud.com/lomtom-common
export GO_MODULE=on
export DOCKER_PLATFORM=linux/amd64,linux/arm64
# 创建多平台构建环境并推送镜像
docker buildx create --use
make docker-buildx-all-push
构建 Alluxio 镜像
Alluxio 作为 Fluid 的核心缓存运行时,需构建基础镜像(alluxio)与开发镜像(alluxio-dev),基于 openEuler 24.03 LTS SP1 系统 🔗
- Alluxio 需构建基础镜像与开发镜像
# 克隆 Alluxio 源码(指定 v2.9.0 版本)
git clone https://github.com/Alluxio/alluxio.git -b v2.9.0
cd alluxio/integration/docker/
# 修改 Dockerfile:基础镜像替换为 openEuler,手动安装 fuse3-libs(详细Dockerfile见文章末尾)
# 1. 将基础镜像从 centos 改为 openeuler/openeuler:24.03-lts-sp1
# 2. 添加 fuse3-libs 安装命令(解决 openEuler 官方源缺失问题)
# 构建并推送多架构镜像
docker buildx build -t lomtom/alluxio:2.9.0-openeuler2403sp1 --platform linux/amd64,linux/arm64 . --push
- 构建 Alluxio-dev 镜像:
# 修改 Dockerfile-dev 基础镜像为上述构建的 Alluxio 镜像(详细Dockerfile见文章末尾)
# 构建并推送多架构镜像
docker buildx build -t lomtom/alluxio-dev:2.9.0-openeuler2403sp1 \
-f Dockerfile-dev --platform linux/amd64,linux/arm64 . --push
测试环境准备
创建 PVC 与测试 Pod
本次测试基于 CephFS 存储(需提前部署 CephFS 集群并创建存储类 cephfs),通过 PVC 申请存储资源并挂载至测试 Pod:
- 创建 PVC(申请 20GB 存储,用于存储测试大文件):
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fluid-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20G
storageClassName: cephfs
volumeMode: Filesystem
- 创建测试 Pod(挂载 PVC并长期运行,用于后续文件上传与访问测试):
apiVersion: apps/v1
kind: Deployment
metadata:
name: fluid
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/instance: fluid
app.kubernetes.io/name: fluid
serverless.fluid.io/inject: "true"
template:
metadata:
labels:
app.kubernetes.io/instance: fluid
app.kubernetes.io/name: fluid
serverless.fluid.io/inject: "true"
spec:
containers:
- image: swr.cn-east-3.myhuaweicloud.com/lomtom-common/busybox:1.37.0
name: fluid
command:
- sh
- -c
- |
sleep 365d
volumeMounts:
- mountPath: /data
name: storage
volumes:
- name: storage
persistentVolumeClaim:
claimName: fluid-pvc
验证 Pod 状态:
kubectl get po
NAME READY STATUS RESTARTS AGE
fluid-f8764d8d9-hsdnk 1/1 Running 0 85m
fluid1-587bfc8d65-zxc62 1/1 Running 0 87m
预期输出:fluid-f8764d8d9-hsdnk 处于 Running 状态,再以同样的方法创建新的Pod作为对照。
上传测试大文件
将 AI 模型文件(3.3GB)上传至测试 Pod 的 PVC 挂载目录,模拟实际场景中的大规模数据集:
# 复制本地大文件到 Pod 的 /data 目录(示例为 3.3GB 模型文件)
kubectl cp ./DeepSeek-R1-Distill-Qwen-1.5B/model.safetensors fluid-f8764d8d9-hsdnk:/data/
# 验证文件上传结果
kubectl exec -it fluid-f8764d8d9-hsdnk -- ls /data/ -lh
total 3G
-rw-r--r-- 1 root root 3.3G Nov 13 06:37 model.safetensors
预期输出:model.safetensors 文件大小约 3.3GB,确认文件上传成功。
Fluid 数据集配置与缓存加速
创建 Dataset 与 AlluxioRuntime
Fluid 通过 Dataset 定义数据源信息,通过 AlluxioRuntime 配置缓存参数,两者名称必须保持一致:
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: pv-demo-dataset
spec:
mounts:
- mountPoint: pvc://fluid-pvc
name: data
path: /
accessModes:
- ReadOnlyMany
---
apiVersion: data.fluid.io/v1alpha1
kind: AlluxioRuntime
metadata:
name: pv-demo-dataset
spec:
replicas: 1
data:
replicas: 1
tieredstore:
levels:
- path: /dev/shm
mediumtype: MEM
quota: 20Gi
触发数据预加载(可选)
默认情况下,Fluid 采用 “懒加载” 模式,首次访问数据时才开始缓存。为避免首次访问的缓存预热耗时,可通过 DataLoad 资源手动触发全量数据预加载:
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
name: pv-demo-dataset
spec:
dataset:
name: pv-demo-dataset
namespace: default
验证 Dataset 与 AlluxioRuntime 状态,确保缓存就绪:
# 查看 Dataset 状态(缓存进度、容量等)
kubectl get datasets.data.fluid.io
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
pv-demo-dataset 3.31GiB 0.00B 20.00GiB 0.0% Bound 10m
# 查看 AlluxioRuntime 状态(Master/Worker/FUSE 组件状态)
kubectl get alluxioruntimes.data.fluid.io
NAME MASTER PHASE WORKER PHASE FUSE PHASE AGE
pv-demo-dataset Ready Ready Ready 10m
# 查看pod
kubectl get pod
NAME READY STATUS RESTARTS AGE
fluid-f8764d8d9-hsdnk 1/1 Running 0 9m
fluid1-587bfc8d65-zxc62 1/1 Running 0 9m
pv-demo-dataset-master-0 2/2 Running 0 10m
pv-demo-dataset-worker-0 2/2 Running 0 10m
预期输出:
- Dataset 状态为
Bound,UFS TOTAL SIZE和pod内/data目录大小一致; - AlluxioRuntime 的 Master、Worker、FUSE 相位均为
Ready。
性能测试验证
通过对比未缓存与已缓存状态下的文件复制速度,验证 Fluid 缓存加速效果:
未缓存状态测试
在未触发缓存的两个 Pod 中分别复制大文件,记录耗时(若未进行缓存预加载,首次访问无缓存,依赖网络传输):
# 进入测试 Pod
kubectl exec -it fluid-f8764d8d9-hsdnk -- /bin/sh
kubectl exec -it fluid1-587bfc8d65-zxc62 -- /bin/sh
# 复制大文件,统计耗时
time cp /data/model.safetensors /root
real 0m 16.37s
user 0m 0.00s
sys 0m 2.12s
理论上,两个 Pod 首次访问速度相近,均受网络带宽限制。
已缓存状态测试
重建测试 Pod(清除系统 Page Cache 影响),再次测试文件复制速度(利用 Fluid 内存缓存):
# 删除旧的pod
kubectl delete deployment fluid
# 重建之前的pod
kubectl apply -f fluid-deployment.yaml
# 进入重建后的测试 Pod(已缓存加速)
kubectl exec -it fluid-f8764d8d9-hktpt -- /bin/sh
# 进入重建后的对照 Pod(未关联缓存,无加速)
kubectl exec -it fluid1-587bfc8d65-l2lsq -- /bin/sh
# 进行缓存加速的pod
time cp /data/model.safetensors /root
real 0m 1.91s
user 0m 0.00s
sys 0m 1.90s
# 查看数据集
NAME UFS TOTAL SIZE CACHED CACHE CAPACITY CACHED PERCENTAGE PHASE AGE
pv-demo-dataset 3.31GiB 3.31GiB 20.00GiB 100.0% Bound 34m
预期结果:
- 进行缓存加速的Pod(fluid-f8764d8d9-hktpt)在经过Fluid 缓存加速后,3.3GB 大文件的复制耗时从 16.37 秒降至 1.91 秒,访问效率提升约 8.6 倍,缓存加速效果显著。
- 未进行缓存加速的Pod(fluid1-587bfc8d65-l2lsq)访问速度不如加速后的。
参考
Dockerfile(alluxio)
#
# The Alluxio Open Foundation licenses this work under the Apache License, version 2.0
# (the "License"). You may not use this work except in compliance with the License, which is
# available at www.apache.org/licenses/LICENSE-2.0
#
# This software is distributed on an "AS IS" basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
# either express or implied, as more fully set forth in the License.
#
# See the NOTICE file distributed with this work for information regarding copyright ownership.
#
# ARG defined before the first FROM can be used in FROM lines
# Only 8 and 11 are supported.
ARG JAVA_VERSION=8
# Setup CSI
FROM golang:1.15.13-alpine AS csi-dev
ENV GO111MODULE=on
RUN mkdir -p /alluxio-csi
COPY ./csi /alluxio-csi
RUN cd /alluxio-csi && \
CGO_ENABLED=0 go build -o /usr/local/bin/alluxio-csi
# We have to do an ADD to put the tarball into extractor, then do a COPY with chown into final
# ADD then chown in two steps will double the image size
# See - https://stackoverflow.com/questions/30085621/why-does-chown-increase-size-of-docker-image
# - https://github.com/moby/moby/issues/5505
# - https://github.com/moby/moby/issues/6119
# ADD with chown doesn't chown the files inside tarball
# See - https://github.com/moby/moby/issues/35525
FROM alpine:3.10.2 AS alluxio-extractor
# Note that downloads for *-SNAPSHOT tarballs are not available.
ARG ALLUXIO_TARBALL=http://downloads.alluxio.io/downloads/files/2.9.0/alluxio-2.9.0-bin.tar.gz
# (Alert):It's not recommended to set this Argument to true, unless you know exactly what you are doing
ARG ENABLE_DYNAMIC_USER=false
ADD ${ALLUXIO_TARBALL} /opt/
# Remote tarball needs to be untarred. Local tarball is untarred automatically.
# Use ln -s instead of mv to avoid issues with Centos (see https://github.com/moby/moby/issues/27358)
RUN cd /opt && \
(if ls | grep -q ".tar.gz"; then tar -xzf *.tar.gz && rm *.tar.gz; fi) && \
ln -s alluxio-* alluxio
RUN if [ ${ENABLE_DYNAMIC_USER} = "true" ] ; then \
chmod -R 777 /opt/* ; \
fi
# Configure Java
FROM openeuler/openeuler:24.03-lts-sp1 as build_java8
RUN \
yum update -y && yum upgrade -y && \
yum install -y java-1.8.0-openjdk-devel java-1.8.0-openjdk && \
yum clean all
ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk
# Disable JVM DNS cache in java8 (https://github.com/Alluxio/alluxio/pull/9452)
RUN echo "networkaddress.cache.ttl=0" >> /usr/lib/jvm/java-1.8.0-openjdk/jre/lib/security/java.security
FROM openeuler/openeuler:24.03-lts-sp1 as build_java11
RUN \
yum update -y && yum upgrade -y && \
yum install -y java-11-openjdk-devel java-11-openjdk && \
yum clean all
ENV JAVA_HOME /usr/lib/jvm/java-11-openjdk
# Disable JVM DNS cache in java11 (https://github.com/Alluxio/alluxio/pull/9452)
RUN echo "networkaddress.cache.ttl=0" >> /usr/lib/jvm/java-11-openjdk/conf/security/java.security
FROM build_java${JAVA_VERSION} AS final
WORKDIR /
# Install libfuse2 and libfuse3. Libfuse2 setup is modified from cheyang/fuse2:ubuntu1604-customize to be applied on centOS
RUN \
yum install -y ca-certificates pkgconfig wget udev git && \
yum install -y gcc gcc-c++ make cmake gettext-devel libtool autoconf && \
git clone https://github.com/Alluxio/libfuse.git && \
cd libfuse && \
git checkout fuse_2_9_5_customize_multi_threads && \
bash makeconf.sh && \
./configure && \
make -j8 && \
make install && \
cd .. && \
rm -rf libfuse
# https://developer.aliyun.com/packageSearch?word=fuse3-libs-3.6.1-4.1.al7
# 根据架构设置变量
ARG TARGETARCH
ARG TARGETVARIANT
# 根据架构选择不同的 RPM 包
RUN if [ "$TARGETARCH" = "amd64" ]; then \
wget -O /fuse3-libs.rpm https://mirrors.aliyun.com/alinux/2.1903/extras/x86_64/Packages/fuse3-libs-3.6.1-4.1.al7.x86_64.rpm; \
elif [ "$TARGETARCH" = "arm64" ]; then \
wget -O /fuse3-libs.rpm https://mirrors.aliyun.com/alinux/2.1903/extras/aarch64/Packages/fuse3-libs-3.6.1-4.1.al7.aarch64.rpm; \
fi
RUN rpm -ivh --nodeps /fuse3-libs.rpm && \
rm -f /fuse3-libs.rpm
RUN yum remove -y gcc gcc-c++ make cmake gettext-devel libtool autoconf wget git && \
yum install -y fuse3 fuse3-devel && \
yum clean all
# Configuration for the modified libfuse2
ENV MAX_IDLE_THREADS "64"
# /lib64 is for rocksdb native libraries, /usr/local/lib is for libfuse2 native libraries
ENV LD_LIBRARY_PATH "/lib64:/usr/local/lib:${LD_LIBRARY_PATH}"
ARG ALLUXIO_USERNAME=alluxio
ARG ALLUXIO_GROUP=alluxio
ARG ALLUXIO_UID=1000
ARG ALLUXIO_GID=1000
# For dev image to know the user
ENV ALLUXIO_DEV_UID=${ALLUXIO_UID}
ARG ENABLE_DYNAMIC_USER=true
# Add Tini for Alluxio helm charts (https://github.com/Alluxio/alluxio/pull/12233)
# - https://github.com/krallin/tini
ENV TINI_VERSION v0.18.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-static /usr/local/bin/tini
RUN chmod +x /usr/local/bin/tini
# If Alluxio user, group, gid, and uid aren't root|0, create the alluxio user and set file permissions accordingly
RUN if [ ${ALLUXIO_USERNAME} != "root" ] \
&& [ ${ALLUXIO_GROUP} != "root" ] \
&& [ ${ALLUXIO_UID} -ne 0 ] \
&& [ ${ALLUXIO_GID} -ne 0 ]; then \
groupadd --gid ${ALLUXIO_GID} ${ALLUXIO_GROUP} && \
useradd --system -m --uid ${ALLUXIO_UID} --gid ${ALLUXIO_GROUP} ${ALLUXIO_USERNAME} && \
usermod -a -G root ${ALLUXIO_USERNAME} && \
mkdir -p /journal && \
chown -R ${ALLUXIO_UID}:${ALLUXIO_GID} /journal && \
chmod -R g=u /journal && \
mkdir /mnt/alluxio-fuse && \
chown -R ${ALLUXIO_UID}:${ALLUXIO_GID} /mnt/alluxio-fuse; \
fi
# Docker 19.03+ required to expand variables in --chown argument
# https://github.com/moby/buildkit/pull/926#issuecomment-503943557
COPY --from=alluxio-extractor --chown=${ALLUXIO_USERNAME}:${ALLUXIO_GROUP} /opt /opt/
COPY --chown=${ALLUXIO_USERNAME}:${ALLUXIO_GROUP} conf /opt/alluxio/conf/
COPY --chown=${ALLUXIO_USERNAME}:${ALLUXIO_GROUP} entrypoint.sh /
COPY --from=csi-dev /usr/local/bin/alluxio-csi /usr/local/bin/
RUN if [ ${ENABLE_DYNAMIC_USER} = "true" ] ; then \
chmod -R 777 /journal; \
chmod -R 777 /mnt; \
# Enable user_allow_other option for fuse in non-root mode
echo "user_allow_other" >> /etc/fuse.conf; \
fi
USER ${ALLUXIO_UID}
WORKDIR /opt/alluxio
ENV PATH="/opt/alluxio/bin:${PATH}"
ENTRYPOINT ["/entrypoint.sh"]
Dockerfile(alluxio-dev)
#
# The Alluxio Open Foundation licenses this work under the Apache License, version 2.0
# (the "License"). You may not use this work except in compliance with the License, which is
# available at www.apache.org/licenses/LICENSE-2.0
#
# This software is distributed on an "AS IS" basis, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,
# either express or implied, as more fully set forth in the License.
#
# See the NOTICE file distributed with this work for information regarding copyright ownership.
#
FROM lomtom/alluxio:2.9.0-openeuler2403sp1
USER root
RUN \
yum update -y && yum upgrade -y && \
yum install -y java-11-openjdk-devel java-11-openjdk && \
yum install -y ca-certificates pkgconfig wget udev git gcc gcc-c++ make cmake gettext-devel libtool autoconf unzip vim && \
yum clean all
# Create a symlink for setting JAVA_HOME depending on java version. ENV cannot be set conditionally.
RUN ln -s /usr/lib/jvm/java-1.8.0-openjdk /usr/lib/jvm/java-8-openjdk
ARG JAVA_VERSION=8
ENV JAVA_HOME /usr/lib/jvm/java-${JAVA_VERSION}-openjdk
# Disable JVM DNS cache in java11 (https://github.com/Alluxio/alluxio/pull/9452)
RUN echo "networkaddress.cache.ttl=0" >> /usr/lib/jvm/java-11-openjdk/conf/security/java.security
# Install arthas(https://github.com/alibaba/arthas) for analyzing performance bottleneck
RUN wget -qO /tmp/arthas.zip "https://github.com/alibaba/arthas/releases/download/arthas-all-3.4.6/arthas-bin.zip" && \
mkdir -p /opt/arthas && \
unzip /tmp/arthas.zip -d /opt/arthas && \
rm /tmp/arthas.zip
# 根据架构设置变量
ARG TARGETARCH
# Install async-profiler(https://github.com/jvm-profiling-tools/async-profiler/releases/tag/v1.8.3)
RUN if [ "$TARGETARCH" = "amd64" ]; then \
wget -qO /tmp/async-profiler-1.8.3-linux-x64.tar.gz "https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.8.3/async-profiler-1.8.3-linux-x64.tar.gz" && \
tar -xvf /tmp/async-profiler-1.8.3-linux-x64.tar.gz -C /opt && \
mv /opt/async-profiler-* /opt/async-profiler && \
rm /tmp/async-profiler-1.8.3-linux-x64.tar.gz; \
elif [ "$TARGETARCH" = "arm64" ]; then \
wget -qO /tmp/async-profiler-1.8.3-linux-arm64.tar.gz "https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.8.3/async-profiler-1.8.3-linux-arm64.tar.gz" && \
tar -xvf /tmp/async-profiler-1.8.3-linux-arm64.tar.gz -C /opt && \
mv /opt/async-profiler-* /opt/async-profiler && \
rm /tmp/async-profiler-1.8.3-linux-arm64.tar.gz; \
fi
USER ${ALLUXIO_DEV_UID}