Distributed Spark Environments
• Data scientists want flexibility:
– New tools, latest versions of Spark, Kafka, H2O, et.al.
– Multiple options – e.g. Zeppelin, RStudio, JupyterHub
– Fast, iterative prototyping
• IT wants control:
– Multi-tenancy
– Data security
– Network isolation