Apache Spark is an open-source unified analytics engine for large-scale data processing that has taken the data industry by storm since its inception in 2014. 

 With a clean, integrated interface for programming entire clusters, Spark shines with capabilities of implicit data parallelism and fault tolerance, making it a great open-source or proprietary platform for open-ended big data processing. With a very broad range of capabilities and programmability, Spark can be intimidating for SQL pros that have not entered the world of Big Data. However, many of the same constructs and concepts that apply to an RDBMS can be applied to accelerate the understanding and adoption of Spark as well!


Warner Chaves

Warner is a SQL Server MCM, Data Platform MVP and Principal Consultant at Pythian, a global Canada-based company specialized in DBA services. A brief stint in .NET programming led to his early DBA formation working for enterprise customers in Hewlett-Packard ITO organization. From there he transitioned to his current position at Pythian, building and managing data solutions in many industry verticals while leading a highly talented team of multi-platform consultants and engineers.