DataOps Unleashed 2022 | Peer-to-Peer Virtual DataOps Conference

2022 Speakers

Andrei Lopatenko

VP of Engineering

Matt Turck

Managing Director

Kunal Agarwal

Co-Founder & CEO

Sarah Krasnik

Lead Data Engineer

Gokul Prabagaren

Software Engineering Manager

Ryan Kinney

Senior Data Engineer

Natalie Godec

Staff Cloud Engineer

Glenn Solomon

Managing Partner

Nikhil Gulati

Head of Applied ML & Engineering

Nnamdi Iregbulem

Partner

Aaron Richter

Data Engineer

Boris Jabes

CEO

Venky Ganesan

Partner

Disha Ahuja

Senior Manager,

AI/ML/DS

Joseph Chmielewski

Senior Technical Engineer

Luis Carlos Cruz Huertas

Head of Technology Infrastructure & Automation

Srini Kadamati

Senior Data Scientist

Torsten Steinbach

Lead Architect

Abhishek Kashyap

Head of Spark on Google Cloud

Shivnath Babu

Co-Founder & CTO

David Wallace

Senior Staff Engineer

Angelo Carvalho

Principal Solutions Architect

Michael DePrizio

Principal Architect

Kevin Hu

Co-founder

Andrew Gelinas

Co-Founder

Prashant Dubey

Senior Manager Big Data Services

Karan Hiremath

Data Engineer

Christopher Bertrand

Data Scientist

Noah Carr

Partner

Nick Schrock

Founder & CEO

Donny Flynn

Customer Data

Architect

Sandeep Uttamchandani

CPO

Abhi Vaidyanatha

Senior Developer

Advocate

Jins Kadwood

CTO

Jordan Tigani

CPO

2022 On-Demand Sessions

Keynote panel conversation: the state of DataOps and the modern data stack

Kunal Agarwal, Co-Founder & CEO @ Unravel Data

Matt Turck, Managing Director @ Firstmark

Venky Ganesan, Partner @ Menlo Ventures

Glenn Solomon, Managing Partner @ GGV Capital

Every company is a data company. Data pipelines, AI/ML models are creating strategic business advantages and companies are depending on them more than ever before. While the advantages of becoming data-driven are clear, it’s often hard to grasp what’s changing, why it’s changing, and what it means for consumers of this technology.

This roundtable is made up of investors who have a unique vantage point on what’s changing and emerging in the modern data world, the effects of these changes, and the opportunities being created. This discussion will help set the context for the 30+ talks to follow for the remainder of the day.

Morning keynote: Operational analytics loop: making the virtuous cycle of data a reality with the modern data stack

Boris Jabes, CEO @ Census

Software practices are eating the business, and data is no exception. The rise of DataOps has pushed companies toward more repeatability, flexibility, and speed in data operations and processes. However, despite its best efforts, DataOps has largely struggled to close the gap between the data stakeholders who use data and the data teams that leverage it.

In this talk, Boris Jabes, CEO of reverse ETL and operational analytics pioneer Census, will break down how DataOps teams can finally bridge this gap with operational analytics to achieve the gold standard of the DevOps principles they've adopted: The virtuous cycle of data.

How the team at Slack streamlined its DataOps stack with Slack

Ryan Kinney, Senior Data Engineer @ Slack

Many teams already use Slack for data pipeline monitoring & alerts - but data engineers at Slack have streamlined their kit with Slack.

Ryan Kinney joins us to share how he and his team have integrated their DataOps stack to not only collaborate, but to observe their data pipelines and orchestration, enforce data quality and governance, manage their CloudOps, and unlock the entire data science and analytics platform for their customers and stakeholders.

A farewell to broken data pipelines and delayed releases

Joseph Chmielewski, Senior Technical Engineer @ MANTA

Achieving data pipeline observability in complex data environments is becoming more and more challenging and is forcing businesses to allocate extra resources and spend hundreds of person-hours carrying out manual impact analyses before making any changes.

Challenge this status quo and learn how to enable DataOps in your environment, ensure automated monitoring and testing, and make sure that your teams are not wasting their precious time on tedious manual tasks.

This session will help:

Uncover your data blind spots
Define validation and reconciliation rules across your on-premises and cloud platforms
Deliver pipeline visibility to all data users so they know exactly which areas demand immediate attention
Monitor data conditioning over time to ensure data accuracy and trustworthiness
Carry out automated impact analyses to prevent data incidents and accelerate application migration

Building a multi-cloud data platform in a tightly regulated industry

Natalie Godec, Staff Cloud Engineer @ Babylon

Building a data platform in the healthcare industry is complicated when you take into account all of the privacy aspects of health data.

Babylon's Natalie Godec joins us to show how they enabled innovative, AI-driven products while dealing with highly sensitive data and how they chose and integrated data tools together to build their multi-cloud platform.

Natalie's session will also cover Babylon's approach to legal, regulatory, and compliance requirements, which security best practices they chose to build into their platform, how they leverage their modern tech to achieve their goals, and how they create processes the helped developers instead of restricting them.

A cloud native data lakehouse is only possible with open tech

Torsten Steinbach, Lead Architect @ IBM

Walk through how Torsten and his team at IBM foster and incorporate different open tech into a state-of-the-art data lakehouse platform. We'll look at real-world examples of how open tech is the critical factor that makes successful lakehouses possible.

Torsten's session will include insight on table formats for consistency, metastores and catalogs for usability, encryption for data protection, data skipping indexes for performance, and data pipeline frameworks for operationalization.

Building a scalable data platform with Google Cloud

Abishek Kashyap, Head of Spark on Google Cloud @ Google

Learn how data teams are building a secure and scalable data platform with Google Cloud to support their data analytics and machine learning needs. We will give a brief overview of customer use cases and Google Cloud products that power those.

Panel Conversation: Building vs. buying when it comes to your data stack

Nnamdi Iregbulem, Partner @ Lightspeed

Andrei Lopatenko, VP of Engineering @ Zillow

Aaron Richter, Data Engineer @ Squarespace

Gokul Prabagaren, Software Engineering Manager @ Capital One

Join Nnamdi, Andrei, Aaron, and Gokul as they discuss their decision-making methods and strategies for evaluating whether to build or buy components for their data stacks.

Panel Conversation: Strategic data team composition

Srini Kadamati, Senior Data Scientist @ Preset

Sarah Krasnik, Lead Data Engineer @ PerPay

Nikhil Gulati, Head of Applied ML & Engineering @ Baker Hughes

Disha Ahuja, Senior Manager, AI/ML/DS @ Cisco

This panel will explore the strategic thinking around the enablement and composition of data teams in terms of their capabilities to meet objectives for the organization and the specific needs of internal clients.

Each day, perspectives and considerations are made for the distribution of essential roles and responsibilities like privacy and data ownership, tool evaluation on macro and micro levels, internal and external team communication and collaboration, team size and structure, and more.

With many stacks also including data products, enabled by MSPs and 3rd parties, which factor "extended resources", often in the form of support that also impact team decisions and composition.

What's beyond observability and how to get there

Luis Carlos Cruz Huertas, Head of Technology Infrastructure & Automation @ DBS

Shivnath Babu, Co-Founder & CTO @ Unravel

Sandeep Uttamchandani, CPO @ Unravel

Businesses are struggling to realize the full value of their data in timely and cost-effective ways. Unprecedented challenges are being posed by the expanding data landscape, scarcity of skilled data professionals, and migration to cloud-native architectures. Observability solutions hold the promise of helping data professionals tackle these challenges and improve the business agility of data initiatives.

We start by taking a deep look into the challenges where observability helps or is supposed to. By staying focused on customer pain points, we will attempt to separate reality from aspiration. What emerges are some clear directions for what lies beyond observability, as well as a path to get there. If you’re building or running a modern data stack, then this talk will help you answer questions like:

Which of my team’s pain points can today’s observability solutions truly solve, and how best to roll them out?
What might the future hold for observability in an AI-infused world?
How do I apply observability to develop world-class DataOps practices within my organization?

How AgriDigital Repurposed ETL for Caching: Insights into Data Warehouse Performance Tuning

Jins Kadwood, CTO @ AgriDigital

Abhi Vaidyanatha, Senior Developer Advocate @ Airbyte

Abhi and Jins are joining us for a sharp discussion on how ETL can be implemented into your caching strategy, the pros and cons of various types of microservice data access, and architectural tips for data warehousing at scale.

When you're responsible for transacting over 50 million metric tons of agricultural volume, efficient solutions for optimizing your data warehouse performance are critical to ensuring that global supply chains stay afloat. Find out how AgriDigital leveraged Airbyte to improve microservice data access in this fireside chat!

The centrality of orchestration in modern data platforms

David Wallace, Senior Staff Engineer @ Dutchie

Nick Schrock, Founder & CEO @ Elementl

The modern data stack is a big wave in data infrastructure, promising simplicity, and productivity in the cloud era.

However, the reality is more challenging when assembling that stack into a true data platform.

David and Nick will discuss the centrality of data orchestration when it comes to building and running modern data platforms.

Modernizing big data workloads with Amazon EMR

Angelo Carvalho, Principal Solutions Architect @ AWS

In this session, Angelo will give an overview of how customers are modernizing their big data workloads with EMR, and show examples of how cloud journeys are being modernized.

Angelo will walk through the EMR Migration Program(EMP) that helps customers in their migration journey. Session attendees will walk away with a good understanding of several best practices and new EMR features that enable you to cut operating costs and create efficiencies when processing vast amounts of data using Amazon EMR.

How Akamai handled 13x data growth and moved from batch to near-real-time visibility and analytics

Michael DePrizio, Principal Architect @ Akamai

Jordan Tigani, Chief Product Officer @ SingleStore

Michael and Jordan join us for a conversation about optimizing for modern database performance and how Akamai was able to tackle 13x data growth while moving from batch to near-real-time visibility and analytics.

We'll hear insight and best practices on gaining large performance increases, 70k upserts/second to 12M upserts/second, to slash cycle time for critical processes to gain a window into business operations.

The origins, purpose, and practice of data observability

Kevin Hu, Co-Founder @ Metaplane

Data Observability (DO) is an emerging category that proposes to help organizations identify, resolve, and prevent data quality issues by continuously monitoring the state of their data over time. This talk is a deep dive into DO, starting from its origins (why it matters), defining the scope and components of DO (what it is), and finally closing with actionable advice for putting observability into practice (how to do it).

We’ll rigorously define data observability to understand why it is different from software observability and existing data quality monitoring. We will derive the four pillars of DO (metrics, metadata, lineage, and logs) then describe how these pillars can be tied to common use cases encountered by teams using popular data architectures, especially on cloud data stacks.

Finally, we’ll close with pointers for how to put observability into practice, drawing from our experience helping teams across sizes, from fast-growing startups to large enterprises, successfully implement DO. Successfully implementing observability throughout an organization involves not only using the right technology, whether that be a commercial solution, an in-house initiative, or an open source project, but implementing the correct processes with the right people responsible for specific jobs. Talk participants can expect to leave with new concepts to understand how DO can help their organizations and ideas for how to implement DO.

Building a resilient system: How Wistia ensured interoperability and observability in their data pipelines

Christopher Bertrand, Data Scientist @ Wistia

Donny Flynn, Customer Data Architect @ Census

Data teams spend what seems like countless hours playing data detective to find out where their pipelines fail. You probably, unfortunately, know the feeling.

In this talk, Donny Flynn, customer data architect at reverse ETL pioneer Census, joins Chris Bertrand, data scientist at Wistia, to break down the importance of interoperability and observability in building resilient data systems, and how reverse ETL helps you shift from data detective to data hero.

Panel Conversation: Stack resiliency

Noah Carr, Partner @ Point 72 Ventures

Karan Hiremath, Data Engineer @ EasyPost

Prashant Dubey, Senior Manager Big Data Services @ Johnson & Johnson

This peer-to-peer panel examines and shares perspectives around shifting post-build focus to keep it from breaking. This conversation will be about more than just maximizing availability. Who's responsible? What's communicated and when? Are certain tools preferred to simplify this process?

Closing remarks

Andrew Gelinas, Co-Founder @ Solution Monday

What is the cost to attend and watch the virtual sessions?

Data Team Summit is always free and open for all to attend.

When is Data Teams Summit 2024?

Data Teams Summit 2024 was held on January 24, 2024.

What is Data Teams Summit?

Data Teams Summit is an annual peer-to-peer day of empowerment for data teams that reflects our focus on the teams and individuals running, managing, and monitoring data pipelines.

Data Teams Summit is a full-day virtual conference, led by real-world data practitioners and leaders at future-forward organizations about how they're establishing predictability, increasing reliability, and creating economic efficiencies with their data.

Who comes to the Data Teams Summit?

Data professionals and experts including data engineers, administrators, architects, analysts, AI/ML professionals, and relevant data technology leadership.