hdfs支持把fs mount到本地,这种需求的使用场景基本集中在AI上,hadoop从2开始,就支持通过fuse的方式把fs mount上去。

具体的文档可以查看:

https://cwiki.apache.org/confluence/display/HADOOP2/MountableHDFS

由于hadoop fuse是属于hadoop native的一部分,所以在编译的时候需要额外处理:

mvn clean package -Pnative -DskipTests

相关的编译物会出现在:

mkdir output
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_dfs_wrapper.sh
hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/target/usr/local/lib/libhdfs.so*
hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/fuse_dfs

把fuse编译好之后,就可以使用上面提到的fuse_dfs_wrapper.sh来进行挂载了,命令方式为:

./fuse_dfs_wrapper.sh dfs://127.0.0.1:9000 /mnt/hdfs

脚本的主要作用是设置一些jar包和动态链接库的加载路径,然后调用fuse_dfs的可执行程序。该脚本在使用上有几个点需要注意:

  • 要把编译hadoop时生成的jar包都放到java的CLASSPATH中。更简单的做法,是把前面下载的预编译hadoop包中的jar包加进去。
  • fuse_dfs的程序会依赖hdfs的C接口动态库,从而会间接依赖到libjvm.so;而后者位于jre的目录下。需要修改LD_LIBRARY_PATH,使得程序可以找到这些动态库。
  • fuse_dfs在运行时,依赖于fuse内核模块及其相应的设备文件。最简单的检测方式是看系统中是否存在/dev/fuse的设备文件。如果不存在,请安装fuse:
apt-get install fuse

./fuse_dfs_wrapper.sh在使用的时候可以通过加-d参数来打印调试信息。这种方法可以非常方便的用来排错。

写了一个简单的脚本:

#!/bin/sh

if [ $# -lt 1 ];then
echo "Usage: sh mount.sh <MOUNT_PATH>"
exit 1
fi

mountpoint=$1
export OS_ARCH=amd64
export rdbuffer=65536

# Check fuse tools
which fusermount > /dev/null
[ $? -ne 0 ] && {
echo "ERROR: FUSE toolschain is not installed"
exit 1
}

# Check modprobe
sudo modprobe fuse
[ $? -ne 0 ] && {
echo "ERROR: Load probe failed, maybe FUSE toolschain not installed or root privilege is not grant"
exit 1
}

# Check JAVA_HOME
[ -z "$JAVA_HOME" ] && {
echo "ERROR: Java home is not set, please install openjdk and set JAVA_HOME env first"
exit 1
}

# Check java native
[ -d "${JAVA_HOME}/jre/lib/${OS_ARCH}/server" ] || {
echo "ERROR: not found java jni env at ${JAVA_HOME}/jre/lib/${OS_ARCH}/server"
exit 1
}


export LDFS_FUSE_HOME="$(cd $(dirname ${BASH_SOURCE[0]})/..; pwd)"

# Check argument not empty
[ -z "$mountpoint" ] && {
echo "ERROR: \$mountpoint not set"
exit 1
}

# Check mountpoint
[ -d "$mountpoint" ] || {
echo "ERROR: $mountpoint is not a dir"
exit 1
}
[ "${mountpoint:0:1}" = "/" ] || {
echo "ERROR: \$mountpoint=$mountpoint must be an absolute path"
exit 1
}
cat /proc/mounts | grep "$mountpoint" > /dev/null
[ $? -eq 0 ] && {
echo "ERROR: $mountpoint is already mounted"
exit 1
}

export HADOOP_HOME=$LDFS_FUSE_HOME
export HADOOP_CONF_DIR=$LDFS_FUSE_HOME/etc/hadoop
export HADOOP_LOG_DIR=$LDFS_FUSE_HOME/logs
export HADOOP_PID_DIR=$LDFS_FUSE_HOME/pid
export LIBHDFS_OPTS="-Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Djdk.nio.maxCachedBufferSize=262144 -Xloggc:${HADOOP_LOG_DIR}/gc-fuse.log -Dhadoop.log.dir=${HADOOP_LOG_DIR} -Dhadoop.log.file=hadoop-fuse.log -Dhadoop.root.logger=INFO,RFA"

export LD_LIBRARY_PATH=${JAVA_HOME}/jre/lib/${OS_ARCH}/server:${HADOOP_HOME}/lib/native
export CLASSPATH=$CLASSPATH:`${HADOOP_HOME}/bin/hadoop classpath --glob`

grep "^user_allow_other" /etc/fuse.conf > /dev/null
[ $? -ne 0 ] && {
sudo sh -c 'echo "user_allow_other" >> /etc/fuse.conf'
[ $? -ne 0 ] && {
echo "ERROR: Edit /etc/fuse.conf failed"
exit 1
}
}

$HADOOP_HOME/bin/fuse_dfs dfs://xx:8020/ ${mountpoint} -ordbuffer=${rdbuffer} -obig_writes

使用:

/bin/bash bin/mount_to.sh /tmp/m0

进行挂载和使用:

fusermount -u /tmp/m0

进行卸载。


扫码手机观看或分享: