six demon bag

Wind, fire, all that kind of thing!

2016-04-12

Spark Workers Not Starting

If you're using Apache Spark and run into an issue where your workers fail to start, make sure that the workers use the same SPARK_MASTER_IP value that was used when starting the master. You can see it in the top left corner of the master's web interface:

URL field in Spark web interface

If the master was launched using an FQDN: use the FQDN for the workers as well. If the master was launched using just the hostname: use the hostname for the workers as well. If the master was launched using its IP address: use the IP address for the workers as well. If you didn't specify a value for SPARK_MASTER_IP on the master it defaults to SPARK_MASTER_IP=`hostname`.

Failing to use the correct name/address will result in the following (not entirely helpful) error:

16/02/16 17:28:33 INFO Worker: Connecting to master spark0:7077...
16/02/16 17:28:33 INFO Worker: Connecting to master spark1:7077...
16/02/16 17:28:33 INFO Worker: Connecting to master spark2:7077...
16/02/16 17:28:39 INFO Worker: Retrying connection to master (attempt # 1)
16/02/16 17:28:39 INFO Worker: Connecting to master spark0:7077...
16/02/16 17:28:39 INFO Worker: Connecting to master spark1:7077...
16/02/16 17:28:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[sparkWorker-akka.actor.default-dispatcher-6,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@6352ae22 rejected from java.util.concurrent.ThreadPoolExecutor@49266d1f[Running, pool size = 3, active threads = 2, queued tasks = 0, completed tasks = 2]
  at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
  at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
  at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
  ...
  at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
16/02/16 17:28:39 INFO ShutdownHookManager: Shutdown hook called

whereas a successful start will look like this:

16/02/16 18:06:19 INFO Worker: Connecting to master spark0.example.org:7077...
16/02/16 18:06:19 INFO Worker: Connecting to master spark1.example.org:7077...
16/02/16 18:06:19 INFO Worker: Connecting to master spark2.example.org:7077...
16/02/16 18:06:20 INFO Worker: Successfully registered with master spark://spark0.example.org:7077
16/02/16 18:06:20 INFO Worker: Worker cleanup enabled; old application directories will be deleted in: /opt/spark-1.5.2-bin-hadoop2.6/work

Posted 21:37 [permalink]