Highlights from Spark Summit 2017: New Features in Spark 2.2, Deep Learning Integration, and Structured Streaming
The article recaps Liulishuo engineers' experience at Spark Summit 2017, covering Spark 2.2's cost‑based optimizer, production‑ready Structured Streaming, deep‑learning support via UDFs, live demos recognizing James Bond, and insights from vendor booths and industry case studies.
Liulishuo’s data engineers attended Spark Summit 2017 in San Francisco, sharing observations and technical highlights from the conference.
Community : Spark’s community continues to grow, with 365 K participants in Spark meetups worldwide, and the conference emphasized Spark’s philosophy as a unified engine for complete data applications.
New in Spark 2.2 : The release introduces a cost‑based SQL optimizer that improves Spark SQL performance, and Structured Streaming becomes production‑ready, simplifying real‑time analytics compared to Kafka Streams.
Databricks Open‑Source Efforts :
Deep Learning – Spark now allows training models on the cluster, registering them as UDFs that can be invoked directly from SQL, enabling real‑time inference.
Streaming Performance – Structured Streaming reduces code complexity and boosts performance for streaming workloads.
Example deep‑learning pipeline: a trained model driven_by_007() is registered as a UDF and used in a SQL query to rank car images by the probability of containing James Bond.
-- driven_by_007() is the trained model registered as a UDF
SELECT image, driven_by_007(image) AS probability
FROM car_examples
ORDER BY probability ASC
LIMIT 5;
-- The AI engineer’s dream: a single SQL statement does the job! 😜Structured Streaming : Now GA, it dramatically lowers development effort for real‑time applications, outperforming Kafka Streams in both simplicity and speed, as shown by performance benchmarks.
Live demo extended the James Bond theme: streaming images from Kafka, applying the deep‑learning model, and instantly locating the latest Bond appearance.
The conference also featured a bustling exhibition area with major cloud providers (AWS, Azure) showcasing Spark integrations, and case studies from companies such as Airbnb, Uber, Netflix, Audi, and BMW.
In summary, Liulishuo’s fourth attendance at Spark Summit yielded valuable insights into Spark’s evolving ecosystem, and the team looks forward to the next summit while inviting readers to explore more Spark‑related articles.
Liulishuo is hiring backend engineers, big‑data engineers, data analysts, and algorithm engineers; interested candidates can contact [email protected].
Liulishuo Tech Team
Help everyone become a global citizen!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.