Big Data 5 min read

Highlights from Spark Summit 2017: New Features in Spark 2.2, Deep Learning Integration, and Structured Streaming

The article recaps Liulishuo engineers' experience at Spark Summit 2017, covering Spark 2.2's cost‑based optimizer, production‑ready Structured Streaming, deep‑learning support via UDFs, live demos recognizing James Bond, and insights from vendor booths and industry case studies.

Liulishuo Tech Team
Liulishuo Tech Team
Liulishuo Tech Team
Highlights from Spark Summit 2017: New Features in Spark 2.2, Deep Learning Integration, and Structured Streaming

Liulishuo’s data engineers attended Spark Summit 2017 in San Francisco, sharing observations and technical highlights from the conference.

Community : Spark’s community continues to grow, with 365 K participants in Spark meetups worldwide, and the conference emphasized Spark’s philosophy as a unified engine for complete data applications.

New in Spark 2.2 : The release introduces a cost‑based SQL optimizer that improves Spark SQL performance, and Structured Streaming becomes production‑ready, simplifying real‑time analytics compared to Kafka Streams.

Databricks Open‑Source Efforts :

Deep Learning – Spark now allows training models on the cluster, registering them as UDFs that can be invoked directly from SQL, enabling real‑time inference.

Streaming Performance – Structured Streaming reduces code complexity and boosts performance for streaming workloads.

Example deep‑learning pipeline: a trained model driven_by_007() is registered as a UDF and used in a SQL query to rank car images by the probability of containing James Bond.

-- driven_by_007() is the trained model registered as a UDF
SELECT image, driven_by_007(image) AS probability
FROM car_examples
ORDER BY probability ASC
LIMIT 5;

-- The AI engineer’s dream: a single SQL statement does the job! 😜

Structured Streaming : Now GA, it dramatically lowers development effort for real‑time applications, outperforming Kafka Streams in both simplicity and speed, as shown by performance benchmarks.

Live demo extended the James Bond theme: streaming images from Kafka, applying the deep‑learning model, and instantly locating the latest Bond appearance.

The conference also featured a bustling exhibition area with major cloud providers (AWS, Azure) showcasing Spark integrations, and case studies from companies such as Airbnb, Uber, Netflix, Audi, and BMW.

In summary, Liulishuo’s fourth attendance at Spark Summit yielded valuable insights into Spark’s evolving ecosystem, and the team looks forward to the next summit while inviting readers to explore more Spark‑related articles.

Liulishuo is hiring backend engineers, big‑data engineers, data analysts, and algorithm engineers; interested candidates can contact [email protected].

Data EngineeringBig DataSQLDeep LearningSparkSpark 2.2Structured Streaming
Liulishuo Tech Team
Written by

Liulishuo Tech Team

Help everyone become a global citizen!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.