Bug #12593
closedUnable to resolve localhost in GATK4 log4j setup
Description
I'm running into an issue running bcbio workflows with GATK4, where they fail in a localhost lookup when setting up logging. This is an example run with a failed job:
https://cloud.curoverse.com/container_requests/qr1hi-xvhdp-r6aq106llhpnfg5#Log
Erroring out with:
Using GATK jar /usr/local/share/bcbio-nextgen/anaconda/share/gatk4-4.0b6-0/gatk-package-4.beta.6-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xms500m -Xmx45864m -Djava.io.tmpdir=/var/spool/cwl/bcbiotx/tmpJVk4FE -jar /usr/local/share/bcbio-nextgen/anaconda/share/gatk4-4.0b6-0/gatk-package-4.beta.6-local.jar BaseRecalibratorSpark -I /keep/4261647a740c6eb75fea9494de0bf667+5868/NA12878-sort.bam --sparkMaster local[16] --output /var/spool/cwl/bcbiotx/tmpJVk4FE/NA12878-sort-recal.grp --reference /keep/38a3166acddf30ff581c249ece68e7f5+47411/collections/hg38/ucsc/hg38.2bit --conf spark.local.dir=/var/spool/cwl/bcbiotx/tmpJVk4FE --knownSites /keep/349e8c8ef6d90edc7a4a43153d160950+2339/dbsnp-147.vcf.gz -L /keep/0211312cfad709cd84f418f8749671a5+1388/bedprep/Exome-AZ_V2_pluschr20-hg38.bed --interval_set_rule INTERSECTION ERROR Could not determine local host name java.net.UnknownHostException: 8b29e178684a: 8b29e178684a: Temporary failure in name resolution at java.net.InetAddress.getLocalHost(InetAddress.java:1505) at org.apache.logging.log4j.core.util.NetUtils.getLocalHostname(NetUtils.java:53) at org.apache.logging.log4j.core.LoggerContext.setConfiguration(LoggerContext.java:486) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:562) at org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:578) at org.apache.logging.log4j.core.LoggerContext.start(LoggerContext.java:214) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:145) at org.apache.logging.log4j.core.impl.Log4jContextFactory.getContext(Log4jContextFactory.java:41) at org.apache.logging.log4j.LogManager.getContext(LogManager.java:182) at org.apache.logging.log4j.LogManager.getLogger(LogManager.java:455) at org.broadinstitute.hellbender.utils.Utils.<clinit>(Utils.java:72) at org.broadinstitute.hellbender.Main.<clinit>(Main.java:43) Caused by: java.net.UnknownHostException: 8b29e178684a: Temporary failure in name resolution at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at java.net.InetAddress.getLocalHost(InetAddress.java:1500)
Thanks for any tips or tricks to avoid the issue.
Updated by Nico César over 7 years ago
I see that the current version of log4j does a try/catch on that particular line (look like android devices fail too):
https://logging.apache.org/log4j/2.0/log4j-core/apidocs/src-html/org/apache/logging/log4j/core/LoggerContext.html#line.539
(with a different exception BTW: https://issues.apache.org/jira/browse/LOG4J2-719 )
but the config variable seems to be named "hostName" and according to http://logging.apache.org/log4j/2.x/manual/lookups.html#Log4jConfigLookup this is retrived as ${log4j:hostName} I wonder if appending -Dlog4j.hostName=localhost
to the execution will make log4j use localhost as hostname
Updated by Tom Clegg over 7 years ago
I suspect this is a side effect of disabling networking in the container environment, which Arvados does by default.
I tried re-running this job with networking enabled (by adding "API":true
to runtime_constraints; can be done via arv:APIRequirement: {}
in cwl) and it seems to have progressed well past this failure point.
FWIW, a newer (current?) version of log4j seems to warn about this situation and continue, rather than crashing. https://logging.apache.org/log4j/2.x/log4j-core/apidocs/src-html/org/apache/logging/log4j/core/util/NetUtils.html
Updated by Brad Chapman over 7 years ago
Nico and Tom -- thanks so much for digging into this. This is really helpful. log4j bundled with the GATK jar so I don't have an easy way to update it but that's helpful to know it would do a better job going forward at some point.
Is it possible to enable networking from CWL right now? That seems like the fastest workaround if it's possible. If not, I could explore adjusting log4j.hostName on the GATK command line.
Updated by Brad Chapman over 7 years ago
Sorry I shouldn't read this in a meeting: `arv:APIRequirement` to the `requirements`. Got it. I'll give that a try and report back. Thanks again.
Updated by Brad Chapman over 7 years ago
- Status changed from New to Closed
Nico and Tom -- thanks again for the help. The `arv:APIRequirement` trick seems to have done it and I can now progress past the point I was failing. I've added this into bcbio CWL generation and will keep working on getting the pipeline running. Thank you again for the help.