hdfs支持把fs mount到本地,这种需求的使用场景基本集中在AI上,hadoop从2开始,就支持通过fuse的方式把fs mount上去。
具体的文档可以查看:
https://cwiki.apache.org/confluence/display/HADOOP2/MountableHDFS
由于hadoop fuse是属于hadoop native的一部分,所以在编译的时候需要额外处理:
mvn clean package -Pnative -DskipTests
|
相关的编译物会出现在:
mkdir output hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_dfs_wrapper.sh hadoop-hdfs-project/hadoop-hdfs-native-client/target/native/target/usr/local/lib/libhdfs.so* hadoop-hdfs-project/hadoop-hdfs-native-client/target/main/native/fuse-dfs/fuse_dfs
|
把fuse编译好之后,就可以使用上面提到的fuse_dfs_wrapper.sh来进行挂载了,命令方式为:
./fuse_dfs_wrapper.sh dfs://127.0.0.1:9000 /mnt/hdfs
|
脚本的主要作用是设置一些jar包和动态链接库的加载路径,然后调用fuse_dfs的可执行程序。该脚本在使用上有几个点需要注意:
- 要把编译hadoop时生成的jar包都放到java的CLASSPATH中。更简单的做法,是把前面下载的预编译hadoop包中的jar包加进去。
- fuse_dfs的程序会依赖hdfs的C接口动态库,从而会间接依赖到libjvm.so;而后者位于jre的目录下。需要修改LD_LIBRARY_PATH,使得程序可以找到这些动态库。
- fuse_dfs在运行时,依赖于fuse内核模块及其相应的设备文件。最简单的检测方式是看系统中是否存在/dev/fuse的设备文件。如果不存在,请安装fuse:
./fuse_dfs_wrapper.sh在使用的时候可以通过加-d参数来打印调试信息。这种方法可以非常方便的用来排错。
写了一个简单的脚本:
#!/bin/sh
if [ $# -lt 1 ];then echo "Usage: sh mount.sh <MOUNT_PATH>" exit 1 fi
mountpoint=$1 export OS_ARCH=amd64 export rdbuffer=65536
# Check fuse tools which fusermount > /dev/null [ $? -ne 0 ] && { echo "ERROR: FUSE toolschain is not installed" exit 1 }
# Check modprobe sudo modprobe fuse [ $? -ne 0 ] && { echo "ERROR: Load probe failed, maybe FUSE toolschain not installed or root privilege is not grant" exit 1 }
# Check JAVA_HOME [ -z "$JAVA_HOME" ] && { echo "ERROR: Java home is not set, please install openjdk and set JAVA_HOME env first" exit 1 }
# Check java native [ -d "${JAVA_HOME}/jre/lib/${OS_ARCH}/server" ] || { echo "ERROR: not found java jni env at ${JAVA_HOME}/jre/lib/${OS_ARCH}/server" exit 1 }
export LDFS_FUSE_HOME="$(cd $(dirname ${BASH_SOURCE[0]})/..; pwd)"
# Check argument not empty [ -z "$mountpoint" ] && { echo "ERROR: \$mountpoint not set" exit 1 }
# Check mountpoint [ -d "$mountpoint" ] || { echo "ERROR: $mountpoint is not a dir" exit 1 } [ "${mountpoint:0:1}" = "/" ] || { echo "ERROR: \$mountpoint=$mountpoint must be an absolute path" exit 1 } cat /proc/mounts | grep "$mountpoint" > /dev/null [ $? -eq 0 ] && { echo "ERROR: $mountpoint is already mounted" exit 1 }
export HADOOP_HOME=$LDFS_FUSE_HOME export HADOOP_CONF_DIR=$LDFS_FUSE_HOME/etc/hadoop export HADOOP_LOG_DIR=$LDFS_FUSE_HOME/logs export HADOOP_PID_DIR=$LDFS_FUSE_HOME/pid export LIBHDFS_OPTS="-Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Djdk.nio.maxCachedBufferSize=262144 -Xloggc:${HADOOP_LOG_DIR}/gc-fuse.log -Dhadoop.log.dir=${HADOOP_LOG_DIR} -Dhadoop.log.file=hadoop-fuse.log -Dhadoop.root.logger=INFO,RFA"
export LD_LIBRARY_PATH=${JAVA_HOME}/jre/lib/${OS_ARCH}/server:${HADOOP_HOME}/lib/native export CLASSPATH=$CLASSPATH:`${HADOOP_HOME}/bin/hadoop classpath --glob`
grep "^user_allow_other" /etc/fuse.conf > /dev/null [ $? -ne 0 ] && { sudo sh -c 'echo "user_allow_other" >> /etc/fuse.conf' [ $? -ne 0 ] && { echo "ERROR: Edit /etc/fuse.conf failed" exit 1 } }
$HADOOP_HOME/bin/fuse_dfs dfs://xx:8020/ ${mountpoint} -ordbuffer=${rdbuffer} -obig_writes
|
使用:
/bin/bash bin/mount_to.sh /tmp/m0
|
进行挂载和使用:
进行卸载。
扫码手机观看或分享: