Software Engineer – Data Platform Jobs in San Francisco, CA at Databricks
Title: Software Engineer – Data Platform
Location: San Francisco, CA
Salary: $80K – $100K*
Category: Enterprise Technology
At Databricks, we are obsessed with enabling data teams to solve the world’s toughest problems, from security threat detection to cancer drug development. We do this by building and running the world’s best data and AI infrastructure platform, so our customers can focus on the high value challenges that are central to their own missions.
Founded in 2013 by the original creators of Apache Spark, Databricks has grown from a tiny corner office in Berkeley, California to a global organization with over 1000 employees. Thousands of organizations, from small to Fortune 100, trust Databricks with their mission-critical workloads, making us one of the fastest growing SaaS companies in the world.
Our engineering teams build highly technical products that fulfill real, important needs in the world. We constantly push the boundaries of data and AI technology, while simultaneously operating with the resilience, security and scale that is critical to making customers successful on our platform.
We develop and operate one of the largest scale software platforms. The fleet consists of millions of virtual machines, generating terabytes of logs and processing exabytes of data per day. At our scale, we regularly observe cloud hardware, network, and operating system faults, and our software must gracefully shield our customers from any of the above.
As a software engineer working on the Data team you will help build the data platform for Databricks. You will architect and run high quality, large scale, multi-geo data pipelines for analyzing product telemetry and logs, and using it to drive business decisions. You will do this using Databricks – the Data team also functions as a large, production, in-house “customer” that dogfoods Databricks and drives the future direction of the products.
As a software engineer, you will:
Design and implement reliable data pipelines using Spark and Delta.
Establish conventions and create new APIs for telemetry, debug and audit logging data, and evolve them as the product and underlying services change.
Create understandable SLAs for each of the production data pipelines.
Develop best practices and frameworks for unit, functional and integration tests around data pipelines, and drive the team towards increased overall test coverage.
Design CI and deployment processes and best practices for the production data pipelines.
Design schemas for financial, sales and support data in the data warehouse.
BS/MS/PhD in Computer Science, or a related field
Experience building, shipping and operating multi-geo data pipelines at scale.
Experience with working with and operating workflow or orchestration frameworks, including open source tools like Airflow and Luigi or commercial enterprise tools.
Experience with large scale messaging systems like Kafka or RabbitMQ or commercial systems.
Excellent communication (writing, conversation, presentation) skills, consensus builder
Strong analytical and problem solving skills
Passion for data engineering and for enabling others by making their data easier to access.
Medical, dental, vision
401k Retirement Plan
Unlimited Paid Time Off
Catered lunch (everyday), snacks, and drinks
Employee referral bonus program
Maternity and paternity plans