共计 19812 个字符,预计需要花费 50 分钟才能阅读完成。
介绍
这篇文档描述了如何为 Hadoop 在安全模式下配置认证。当 Hadoop 被配置运行在安全模式下时,每个 Hadoop 服务和每个用户都必须被 Kerberos 认证。正向方向的主机去查找所有服务的主机,必须被正确地配置来相互认证。主机查找可能都被配置在 DNS 或者 /etc/hosts 文件中。推荐你在尝试配置 Hadoop 安全模式前,先了解 kerberos 和 DNS 的工作原理。
kerberos 相关详细介绍 见 http://www.linuxidc.com/Linux/2016-09/134949.htm。
Hadoop 的安全特性,由 Authentication(认证), Service Level Authorization(服务级别认证), (Authentication for Web Consoles)(http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/HttpAuthentication.html)(web 控制台认证)和 Data Confidentiality(数据保密)组成。
Authentication(认证)
终端用户帐号
当服务基本的认证开启时,终端用户必须在和 Hadoop 服务交互前认证。最简单的方式就是使用 Kerberos 的 kinit 命令来交互认证。使用 Kerberos keytab 文件的程序认证可能会在使用 kinit 的交互登录不可用时使用。
Hadoop 进程的用户帐号
确认 HDFS 和 YARN 进程跑在不同的 Unix 用户下,比如:hdfs 和 yarn。还有,保证 MapReduce JobHistory 服务也跑在不同的用户之下,比如 mapred。
推荐它们使用同一个 Unix 组,比如:hadoop。参考“Mapping from user to group”进行组的管理。
用户:组 | 进程 |
---|---|
hdfs:hadoop | NameNode, Secondary NameNode, JournalNode, DataNode |
yarn:hadoop | ResourceManager, NodeManager |
zebra stripes | MapReduce JobHistory Server |
Hadoop 进程的 Kerberos principals(实体)
每个 Hadoop 服务实例都必须配置他的 Kerberos principal 和 keytab 文件位置。
一个服务实体的一般格式是:服务名 /_HOST@REALM.TLD。比如:dn/_HOST@EXAMPLE.COM。
Hadoop 通过允许服务 principal 的主机组件被指定为_HOST 通配符来简化配置文件的部署。每个服务实例都会用它们自己当前运行的合法主机名来代替_HOST。这就允许管理员给所有节点部署相同设置的配置文件。但是,keytab 文件将会不同。
HDFS
NameNode 在每个 NameNode 主机上的 keytab 文件,应该看起来像这样:
$ klist -e -k -t /etc/security/keytab/nn.service.keytab
Keytab name: FILE:/etc/security/keytab/nn.service.keytab
KVNO Timestamp Principal
4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
Seconday NameNode 在主机上的 keytab 文件,应该看起来像这样:
$ klist -e -k -t /etc/security/keytab/sn.service.keytab
Keytab name: FILE:/etc/security/keytab/sn.service.keytab
KVNO Timestamp Principal
4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
DataNode 在每个主机上的 keytab 文件, 应该看起来像这样:
$ klist -e -k -t /etc/security/keytab/dn.service.keytab
Keytab name: FILE:/etc/security/keytab/dn.service.keytab
KVNO Timestamp Principal
4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
YARN
在资源管理器(ResourceManager)主机上的资源管理器 keytab 文件,应该看起来像这样:
$ klist -e -k -t /etc/security/keytab/rm.service.keytab
Keytab name: FILE:/etc/security/keytab/rm.service.keytab
KVNO Timestamp Principal
4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
在每个主机上的节点管理器(NodeManager)的 keytab 文件,应该看起来像这样:
$ klist -e -k -t /etc/security/keytab/nm.service.keytab
Keytab name: FILE:/etc/security/keytab/nm.service.keytab
KVNO Timestamp Principal
4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
MapReduce JobHistory Server
在 MapReduce JobHistory Server 主机上的 keytab 文件,应该看起来像这样:
$ klist -e -k -t /etc/security/keytab/jhs.service.keytab
Keytab name: FILE:/etc/security/keytab/jhs.service.keytab
KVNO Timestamp Principal
4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
映射 Kerberos principals 到操作系统用户账号
Hadoop 使用被 hadoop.security.auth_to_local 指定的规则来映射 kerberos principals 到操作系统(系统)用户账号。这些规则使用和在 Kerberos configuration file (krb5.conf) 中的 auth_to_local 相同的方式工作。另外,hadoop auth_to_local 映射支持 / L 标志来是返回的名字小写。
默认会取 principal 名字的第一部分作为系统用户名如果 realm 匹配 defaul_realm(通常被定义在 /etc/krb5.conf)。比如:默认的的规则映射 principal host/full.qualified.domain.name@REALM.TLD 到系统用户 host。默认的规则可能对大多数的集群都不合适。
在一个典型的集群中,HDFS 和 YARN 服务将分别由 hdfs 和 yarn 用户启动。hadoop.security.auth_to_local 可以被配置成这样:
<property>
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[2:$1@$0](nn/.*@.*REALM.TLD)s/.*/hdfs/
RULE:[2:$1@$0](jn/.*@.*REALM.TLD)s/.*/hdfs/
RULE:[2:$1@$0](dn/.*@.*REALM.TLD)s/.*/hdfs/
RULE:[2:$1@$0](nm/.*@.*REALM.TLD)s/.*/yarn/
RULE:[2:$1@$0](rm/.*@.*REALM.TLD)s/.*/yarn/
RULE:[2:$1@$0](jhs/.*@.*REALM.TLD)s/.*/mapred/
DEFAULT
</value>
</property>
自定义规则可以使用 Hadoop kerbname 命令测试,这个命令运行你指定一个 principal 并应用 Hadoop 当前的 auth_to_local 规则设置。
映射用户到组
系统用户到系统组的映射机制可以通过 hadoop.security.group.mapping 配置。更多细节查看 HDFS Permissions Guide。
实际上,你需要在 Hadoop 安全模式中使用 Kerberos with LDAP 管理 SSO(单点登录) 环境。
代理用户
有些访问终端用户维护的 Hadoop 服务的产品,比如 Apache Oozie,需要能够模拟终端用户。更多细节查看 the doc of proxy user。
保护 DataNode
因为 DataNode 的数据传输协议没有使用 Hadoop RPC 框架,DataNodes 必须使用被 dfs.datanode.address 和 dfs.datanode.http.address 指定的特权端口来认证他们自己。该认证是基于假设攻击者无法获取在 DataNode 主机上的 root 特权。
当你使用 root 执行 hdfs datanode 命令时,服务器进程首先绑定特权端口,随后销毁特权并使用被 HADOOP_SECURE_DN_USER 指定的用户账号运行。这个启动进程使用被安装在 JSVC_HOME 的 the jsvc program。你必须在启动项中(hadoop-env.sh)指定 HADOOP_SECURE_DN_USER 和 JSVC_HOME 做为环境变量。
2.6.0 版本开始起,SASL 可以被使用来认证数据传输协议。这不再需要安全集群使用 jsvc 的用户启动 DataNode 并绑定特权接口。要在数据传输协议上启用 SASL,在 hdfs-site.xml 设置 dfs.data.transfer.protection,为 dfs.datanode.address 设置一个免特权端口,设置 dfs.http.policy to HTTPS_ONLY 并保证 HADOOP_SECURE_DN_USER 环境变量没有设置。注意,如果 dfs.datanode.address 是设置了一个特权端口将不可能在数据传输协议上使用 SASL。这是向后兼容的原因所要求的。
为了迁移一个存在的使用 root 认证的集群用使用 SASL 启动的方式替代。首先保证 2.6.0 或以上版本的 hadoop 已经被部署在所有的集群节点上,同时所有外部应用程序需要连接在这个集群上。只有 2.6.0 或以上版本的 HDFS 客户端可以使用 SASL 认证数据传输协议来连接 DataNode。所以,在迁移前保证所有的节点版本正确是至关重要的。所有地方的 2.6.0 或以上版本被部署之后,更新所有外部应用程序的配置来是 SASL 生效。如果以个 HDFS 客户端使用了 SASL,那么他可以成功的连接一个 DataNode,不管它使用的事 root 认证或者是 SASL 认证。配置所有的客户端保证以后在 DataNode 上的配置改变不会破坏这个应用程序。最后,每个 DataNode 个体都可以通过改变它的配置和重启来迁移。
数据保密
在 RPC 上的数据加密
在 hadoop 服务端和客户端之间传输的数据可以被加密。在 core-site.xml 上设置 hadoop.rpc.protection 隐私来激活加密。
块数据传输的数据加密
你需要在 hdfs-site.xml 上设置 dfs.encrypt.data.transfer 成 true 来激活为 Datanode 的数据传输协议的数据加密。
你可以选择性的设置 dfs.encrypt.data.transfer.algorithm 为 3des 或者 rc4 来选择使用特定的加密算法。如果不指定,那么在这个系统中,被配置的 JCE 将被默认使用,它通常情况使用 3DES。
设置 dfs.encrypt.data.transfer.cipher.suites 成 AES/CTR/NoPadding 激活 AES 加密。默认情况下,这不被指定,所以 AES 不被使用。当 AES 被使用时,在一个初始密钥交换过程中被指定在 dfs.encrypt.data.transfer.algorithm 中的算法仍然被使用。AES 密钥的长度可以通过设置 dfs.encrypt.data.transfer.cipher.key.bitlength 成 128,192,或者 256 来配置。默认是 128.
AES 提供最大的加密强度和最佳的性能。目前,3DES 和 RC4 已经经常在 Hadoop 集群中使用。
HTTP 上的数据加密
在 Web-console 和客户端的数据传输被 SSL(HTTPS) 保护。SSL 配置是推荐的,但是不需要使用 kerberos 配置 Hadoop 的安全。
配置
对于 HDFS 和本地文件系统路径的权限
下面的表格列出了各种 HDFS 和本地文件系统的路径(在所有节点上)和推荐的权限设置:
Filesystem | Path | User:Group | Permissions |
---|---|---|---|
local | dfs.namenode.name.dir | hdfs:hadoop | drwx—— |
local | dfs.datanode.data.dir | hdfs:hadoop | drwx—— |
local | $HADOOP_LOG_DIR | hdfs:hadoop | drwxrwxr-x |
local | $YARN_LOG_DIR | yarn:hadoop | drwxrwxr-x |
local | yarn.nodemanager.local-dirs | yarn:hadoop | drwxr-xr-x |
local | yarn.nodemanager.log-dirs | yarn:hadoop | drwxr-xr-x |
local | container-executor | root:hadoop | –Sr-s–* |
local | conf/container-executor.cfg | root:hadoop | r——-* |
hdfs | / | hdfs:hadoop | drwxr-xr-x |
hdfs | /tmp | hdfs:hadoop | drwxrwxrwxt |
hdfs | /user | hdfs:hadoop | drwxr-xr-x |
hdfs | yarn.nodemanager.remote-app-log-dir | yarn:hadoop | drwxrwxrwxt |
hdfs | mapreduce.jobhistory.intermediate-done-dir | mapred:hadoop | drwxrwxrwxt |
hdfs | mapreduce.jobhistory.done-dir | mapred:hadoop | drwxr-x— |
常见的配置
为了在 Hadoop 上开启 RPC 认证,设置 hadoop.security.authentication 的属性值为“kerberos”,并且合理地设置在下面列出的安全相关的配置项。
下面的属性应该在集群中所有节点的 core-site.xml 文件中。
Parameter | Value | Notes |
---|---|---|
hadoop.security.authentication | kerberos | simple : No authentication. (default) kerberos : Enable authentication by Kerberos. |
hadoop.security.authorization | true | Enable RPC service-level authorization. |
hadoop.rpc.protection | authentication | authentication : authentication only (default); integrity : integrity check in addition to authentication; privacy : data encryption in addition to integrity |
hadoop.security.auth_to_local | RULE:exp1 RULE:exp2 … DEFAULT | The value is string containing new line characters. See Kerberos documentation for the format of exp. |
hadoop.proxyuser.superuser.hosts | comma separated hosts from which superuser access are allowed to impersonation. * means wildcard. | |
hadoop.proxyuser.superuser.groups | comma separated groups to which users impersonated by superuser belong. * means wildcard. |
NameNode
Parameter | Value | Notes |
---|---|---|
dfs.block.access.token.enable | true | Enable HDFS block access tokens for secure operations. |
dfs.namenode.kerberos.principal | nn/_HOST@REALM.TLD | Kerberos principal name for the NameNode. |
dfs.namenode.keytab.file | /etc/security/keytab/nn.service.keytab | Kerberos keytab file for the NameNode. |
dfs.namenode.kerberos.internal.spnego.principal | HTTP/_HOST@REALM.TLD | The server principal used by the NameNode for web UI SPNEGO authentication. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is‘*’, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} i.e use the value of dfs.web.authentication.kerberos.principal. |
dfs.web.authentication.kerberos.keytab | /etc/security/keytab/spnego.service.keytab | SPNEGO keytab file for the NameNode. In HA clusters this setting is shared with the Journal Nodes. |
下面的设置允许配置 SSL 访问 NameNode 的 web UI(可选)。
Parameter | Value | Notes |
---|---|---|
dfs.http.policy | HTTP_ONLY or HTTPS_ONLY or HTTP_AND_HTTPS | HTTPS_ONLY turns off http access. This option takes precedence over the deprecated configuration dfs.https.enable and hadoop.ssl.enabled. If using SASL to authenticate data transfer protocol instead of running DataNode as root and using privileged ports, then this property must be set to HTTPS_ONLY to guarantee authentication of HTTP servers. (See dfs.data.transfer.protection.) |
dfs.namenode.https-address | nn_host_fqdn:50470 | |
dfs.https.port | 50470 | |
dfs.https.enable | true | This value is deprecated. Use dfs.http.policy |
Secondary NameNode
Parameter | Value | Notes |
---|---|---|
dfs.namenode.secondary.http-address | snn_host_fqdn:50090 | |
dfs.secondary.namenode.keytab.file | /etc/security/keytab/sn.service.keytab | Kerberos keytab file for the Secondary NameNode. |
dfs.secondary.namenode.kerberos.principal | sn/_HOST@REALM.TLD | Kerberos principal name for the Secondary NameNode. |
dfs.secondary.namenode.kerberos.internal.spnego.principal | HTTP/_HOST@REALM.TLD | The server principal used by the Secondary NameNode for web UI SPNEGO authentication. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is‘*’, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} i.e use the value of dfs.web.authentication.kerberos.principal. |
dfs.namenode.secondary.https-port | 50470 |
JournalNode
Parameter | Value | Notes |
---|---|---|
dfs.journalnode.kerberos.principal | jn/_HOST@REALM.TLD | Kerberos principal name for the JournalNode. |
dfs.journalnode.keytab.file | /etc/security/keytab/jn.service.keytab | Kerberos keytab file for the JournalNode. |
dfs.journalnode.kerberos.internal.spnego.principal | HTTP/_HOST@REALM.TLD | The server principal used by the JournalNode for web UI SPNEGO authentication when Kerberos security is enabled. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is‘*’, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} i.e use the value of dfs.web.authentication.kerberos.principal. |
dfs.web.authentication.kerberos.keytab | /etc/security/keytab/spnego.service.keytab | SPNEGO keytab file for the JournalNode. In HA clusters this setting is shared with the Name Nodes. |
DataNode
Parameter | Value | Notes |
---|---|---|
dfs.datanode.data.dir.perm | 700 | |
dfs.datanode.address | 0.0.0.0:1004 | Secure DataNode must use privileged port in order to assure that the server was started securely. This means that the server must be started via jsvc. Alternatively, this must be set to a non-privileged port if using SASL to authenticate data transfer protocol. (See dfs.data.transfer.protection.) |
dfs.datanode.http.address | 0.0.0.0:1006 | Secure DataNode must use privileged port in order to assure that the server was started securely. This means that the server must be started via jsvc. |
dfs.datanode.https.address | 0.0.0.0:50470 | |
dfs.datanode.kerberos.principal | dn/_HOST@REALM.TLD | Kerberos principal name for the DataNode. |
dfs.datanode.keytab.file | /etc/security/keytab/dn.service.keytab | Kerberos keytab file for the DataNode. |
dfs.encrypt.data.transfer | false | set to true when using data encryption |
dfs.encrypt.data.transfer.algorithm | optionally set to 3des or rc4 when using data encryption to control encryption algorithm | |
dfs.encrypt.data.transfer.cipher.suites | optionally set to AES/CTR/NoPadding to activate AES encryption when using data encryption | |
dfs.encrypt.data.transfer.cipher.key.bitlength | optionally set to 128, 192 or 256 to control key bit length when using AES with data encryption | |
dfs.data.transfer.protection | authentication : authentication only; integrity : integrity check in addition to authentication; privacy : data encryption in addition to integrity This property is unspecified by default. Setting this property enables SASL for authentication of data transfer protocol. If this is enabled, then dfs.datanode.address must use a non-privileged port, dfs.http.policy must be set to HTTPS_ONLY and the HADOOP_SECURE_DN_USER environment variable must be undefined when starting the DataNode process. |
WebHDFS
Parameter | Value | Notes |
---|---|---|
dfs.web.authentication.kerberos.principal h | ttp/_HOST@REALM.TLD | Kerberos principal name for the WebHDFS. In HA clusters this setting is commonly used by the JournalNodes for securing access to the JournalNode HTTP server with SPNEGO. |
dfs.web.authentication.kerberos.keytab | /etc/security/keytab/http.service.keytab | Kerberos keytab file for WebHDFS. In HA clusters this setting is commonly used the JournalNodes for securing access to the JournalNode HTTP server with SPNEGO. |
ResourceManager
Parameter | Value | Notes |
---|---|---|
yarn.resourcemanager.principal | rm/_HOST@REALM.TLD | Kerberos principal name for the ResourceManager. |
yarn.resourcemanager.keytab | /etc/security/keytab/rm.service.keytab | Kerberos keytab file for the ResourceManager. |
NodeManager
Parameter | Value | Notes |
---|---|---|
yarn.nodemanager.principal | nm/_HOST@REALM.TLD | Kerberos principal name for the NodeManager. |
yarn.nodemanager.keytab | /etc/security/keytab/nm.service.keytab | Kerberos keytab file for the NodeManager. |
yarn.nodemanager.container-executor.class | org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor | Use LinuxContainerExecutor. |
yarn.nodemanager.linux-container-executor.group | hadoop | Unix group of the NodeManager. |
yarn.nodemanager.linux-container-executor.path | /path/to/bin/container-executor | The path to the executable of Linux container executor. |
WebAppProxy 配置
WebAppProxy 在应用程序输出的 web 应用和一个终端用户之间提供一个代理。如果安全机制被启用,在用户访问一个潜在不安全的 web 应用时它会发出警告。认证和使用代理的认证和其他加密的 web 应用一样被处理。
Parameter | Value | Notes |
---|---|---|
yarn.web-proxy.address | WebAppProxy host:port for proxy to AM web apps. | host:port if this is the same as yarn.resourcemanager.webapp.address or it is not defined then the ResourceManager will run the proxy otherwise a standalone proxy server will need to be launched. |
yarn.web-proxy.keytab | /etc/security/keytab/web-app.service.keytab | Kerberos keytab file for the WebAppProxy. |
yarn.web-proxy.principal | wap/_HOST@REALM.TLD | Kerberos principal name for the WebAppProxy. |
LinuxContainerExecutor
一个被 YARN 框架使用的 ContainerExecutor(容器执行者)定义了任何 container 如何被启动和控制。
下面在 Hadoop YARN 中是可用的:
ContainerExecutor | Description |
---|---|
DefaultContainerExecutor | The default executor which YARN uses to manage container execution. The container process has the same Unix user as the NodeManager. |
LinuxContainerExecutor | Supported only on GNU/Linux, this executor runs the containers as either the YARN user who submitted the application (when full security is enabled) or as a dedicated user (defaults to nobody) when full security is not enabled. When full security is enabled, this executor requires all user accounts to be created on the cluster nodes where the containers are launched. It uses a setuid executable that is included in the Hadoop distribution. The NodeManager uses this executable to launch and kill containers. The setuid executable switches to the user who has submitted the application and launches or kills the containers. For maximum security, this executor sets up restricted permissions and user/group ownership of local files and directories used by the containers such as the shared objects, jars, intermediate files, log files etc. Particularly note that, because of this, except the application owner and NodeManager, no other user can access any of the local files/directories including those localized as part of the distributed cache. |
构建 LinuxContainerExecutor 可执行文件,执行:
$ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/
这个可执行文件必须有特殊的权限:6050 或者–Sr-s—权限被 root 用户所拥有(super-user)和被特殊组(比如:hadoop)所拥有,这个组中 NodeManager Unix 用户是他的成员并且没有其他普通应用用户。如果有其他应用的用户属于这个特殊的组,那么安全性就不能得到保证了。这个特殊的组的名字应该被指定在 yarn.nodemanager.linux-container-executor.group 配置属性中,conf/yarn-site.xml 和 conf/container-executor.cfg 有需要。
比如,假设 NodeManager 使用 yarn 用户(是 users 和 hadoop 组的一部分,他们中的任何一个都是主要的组)运行。让 users 组中处理 yarn 还有另外一个用户 alice(应用提交者),并且 alice 不在 hadoop 组中。根据以上的描述,setuid/setgid 可执行文件一个被设置成 6050 或者–Sr-s—,user-owner 是 yarn,group-owner 是 hadoop,yarn 是 hadoop 的成员(而不是 users 组,它出了 yarn 用户外还有一个 alice 的用户)。
LinuxTaskController 要求被指定在 yarn.nodemanager.local-dirs 和 yarn.nodemanager.log-dirs 的包含路径和引导到的目录,它就像上面的表格中描述的一样被设置成 775 权限在权限路径上。
- conf/container-executor.cfg
这个可执行文件需要一个叫做 container-executor.cfg 的配置文件,在配置路径中出现,通过之前提到的 MVN target。
这个配置文件必须被运行 NodeManager 的用户所拥有(比如上面例子中的 yarn 用户),被任何拥有 0400 或 r——–权限的组所拥有。
这个可执行文件需要以下在 conf/container-executor.cfg 文件中出现的配置项。这些项目应该被要求成简单的 key=value(键值对),每一项一行。
Parameter | Value | Notes |
---|---|---|
yarn.nodemanager.linux-container-executor.group | hadoop | Unix group of the NodeManager. The group owner of the container-executor binary should be this group. Should be same as the value with which the NodeManager is configured. This configuration is required for validating the secure access of the container-executor binary. |
banned.users | hdfs,yarn,mapred,bin | Banned users. |
allowed.system.users | foo,bar | Allowed system users. |
min.user.id | 1000 | Prevent other super-users. |
复习一下,这里是本地文件系统各种与 LinuxContainerExecutor 相关的路径的权限要求:
Filesystem | Path | User:Group | Permissions |
---|---|---|---|
local | container-executor | root:hadoop | –Sr-s–* |
local | conf/container-executor.cfg | root:hadoop | r——-* |
local | yarn.nodemanager.local-dirs | yarn:hadoop | drwxr-xr-x |
local | yarn.nodemanager.log-dirs | yarn:hadoop | drwxr-xr-x |
MapReduce JobHistory Server
Parameter | Value | Notes |
---|---|---|
mapreduce.jobhistory.address | MapReduce | JobHistory Server host:port Default port is 10020. |
mapreduce.jobhistory.keytab | /etc/security/keytab/jhs.service.keytab | Kerberos keytab file for the MapReduce JobHistory Server. |
mapreduce.jobhistory.principal | jhs/_HOST@REALM.TLD | Kerberos principal name for the MapReduce JobHistory Server. |
多宿主
多宿主(每个主机可能在 DNS 上有多个主机名,比如:不同的主机名对应公共和私有的网络接口)的设置,可需要额外的配置来使 kerberos 工作。查看 HDFS Support for Multihomed Networks。
参考
- O’Malley O et al. Hadoop Security Design
- O’Malley O, Hadoop Security Architecture
- Troubleshooting Kerberos on Java 7
- Troubleshooting Kerberos on Java 8
- Java 7 Kerberos Requirements
- Java 8 Kerberos Requirements
- Loughran S., Hadoop and Kerberos: The Madness beyond the Gate
更多 Hadoop 相关信息见 Hadoop 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=13
本文永久更新链接地址 :http://www.linuxidc.com/Linux/2016-09/134948.htm