Speakerline
Speakers
Proposals
Events
Tags
Edit a proposal
Adam Cuppy
Ahmed Omran
Alan Ridlehoover
Amit Zur
Andrew Mason
Andrew Nesbitt
Andy Andrea
Andy Croll
Asia Hoe
Avdi Grimm
Ben Greenberg
Bhavani Ravi
Brandon Carlson
Brittany Martin
Caleb Thompson
Caren Chang
Chiu-Ki Chan
Christine Seeman
Cody Norman
Devon Estes
Eileen Uchitelle
Emily Giurleo
Emily Samp
Enrico Grillo
Espartaco Palma
Fito von Zastrow
Frances Coronel
Hilary Stohs-Krause
Jalem Raj Rohit
Jemma Issroff
Jenny Shih
Joel Chippindale
Justin Searls
Katrina Owen
Kevin Murphy
Kudakwashe Paradzayi
Kylie Stradley
Maeve Revels
Maryann Bell
Matt Bee
Mayra Lucia Navarro
Molly Struve
Nadia Odunayo
Nickolas Means
Noah Gibbs
Olivier Lacan
Ramón Huidobro
Richard Schneeman
Rizky Ariestiyansyah
Saron Yitbarek
Sean Moran-Richards
Shem Magnezi
Srushith Repakula
Stefanni Brasil
Stephanie Minn
Sweta Sanghavi
Syed Faraaz Ahmad
Tekin Suleyman
Thomas Carr
Tom Stuart
Ufuk Kayserilioglu
Valentino Stoll
Victoria Gonda
Vladimir Dementyev
Title
Tags (comma-separated, max 3)
Body
Description: My team has just finished building a scalable, resilient, serverless distributed data pipeline which scales seamlessly with the amount of data it takes in as input. We have used several tools like Ansible, Lambda, Terraform, etc. And, also learned a lot of lessons along the way, in the form of pitfalls, failures, and wins. This talk is about that system and the lessons learned. Session type: Session Topics: Serverless Abstract: We wanted to build a serverless data pipeline for coding medical charts using NLP. However, we didn’t want it to be real-time (which is most serverless systems). So, we used a queue and a monitoring system’s (AWS CloudWatch) alarms to pull off the serverless, batch processing pipeline. Next, we wanted to make it a serverless, batch, distributed pipeline. So, we made use of Ansible and made the Master-Workers architecture. However, AWS Lambda has a time-limit of 5 minutes. But our entire NLP pipeline flow takes 30 minutes to complete. So, we stumbled upon an idea wherein we create a Master server via Lambda and run Ansible in nohup mode. And then, we learned some very important lessons while doing nohup monitoring. Now, we realized that Ansible can terminate the workers once the tasks are completed, but we want to delete the master also. So, we again built the Ansible playbook such that the Master kills itself once the workers are terminated. Also, we built a serverless API for querying the results of the data-pipeline, using AWS Lambda and API Gateway. And, all this have to be built keeping in mind the HIPAA compliance, which means that the data needs to be encrypted both at rest, and in motion. So, along the way of building this complete architecture, we experienced a lot of gotcha moments, failures, huge wins and pitfalls which taught us very important lessons. This talk pinpoints those lessons, and what we learned out of them, and what the audience can learn out of them.
Back to Speaker Directory