Flink 遇到的问题总结

QA

  • 相关日志

text

2023-04-10 15:16:05.056 [jobmanager-io-thread-10] WARN  org.apache.hadoop.ipc.Client  - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (token for hdfs: HDFS_DELEGATION_TOKEN owner=hdfs/com66, renewer=yarn, realUser=, issueDate=1680504045174, maxDate=1681108845174, sequenceNumber=21818, masterKeyId=58) can't be found in cache
2023-04-10 15:16:05.059 [Checkpoint Timer] INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Failed to trigger checkpoint for job 045e5c5ec32df9b607bd01acc9e2ea37 because An Exception occurred while triggering the checkpoint. IO-problem detected.
  • 解决方法

添加 flink 启动参数

text

-yD security.kerberos.login.use-ticket-cache=false
  • 资料

https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/security/security-kerberos/#hadoop-security-module

https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/security/security-kerberos/#kerberos-authentication-setup-and-configuration

https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#security-kerberos-login-contexts

  • 描述

我的场景是 hadoop 启用了 kerberos,zookeeper 未启用身份认证,Flink 启动参数添加了 -yD security.kerberos.login.use-ticket-cache=false 之后就会出现 AuthFailed

  • 解决方法

添加 flink 启动参数,这个配置项默认没有设置

text

-yD security.kerberos.login.contexts=Client
  • 资料

https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#security-kerberos-login-contexts

  • 描述

Kafka 客户端在 poll 时会申请 Direct Memory,此时如果kafka消息比较大的话容易产生 Direct Memory 溢出

  • 解决办法

添加 flink 启动参数,调整 task 的 off-heap,如果 Flink 不能提交则需要调大 taskmanager.memory.process.size

text

-Dtaskmanager.memory.task.off-heap.size=1g