Senior Platform Data Engineer

Other Jobs To Apply

<p style="text-align:left"><b>Location:</b></p>Work from home (Pennsylvania)<p style="text-align:inherit"></p><p style="text-align:left"><b>Shift:</b></p>Days (United States of America)<p style="text-align:inherit"></p><p style="text-align:left"><b>Scheduled Weekly Hours:</b></p>40<p style="text-align:inherit"></p><p style="text-align:left"><b>Worker Type:</b></p>Regular<p style="text-align:inherit"></p><p style="text-align:left"><b>Exemption Status:</b></p>Yes<p style="text-align:inherit"></p><p style="text-align:left"><b>Job Summary:</b></p>The Senior Platform Data Engineer owns roadmap, priorities, platform standards, and architecture reviews; provides formal input on performance reviews. This position makes clinical data ready for AI at scale: owning the shared data products, retrieval infrastructure, and platform administration that the entire AI portfolio depends on. Owns Real-time data feeds. Reusable clinical data models and feature pipelines. RAG retrieval infrastructure (ingestion, chunking, embeddings, vector DB, retrieval pipelines). Databricks platform administration.<p style="text-align:inherit"></p><p style="text-align:left"><b>Job Duties:</b></p><ul><li><p>Streams <span>data from Epic SDE, ADT feeds, lab results, and other clinical sources into Databricks for downstream model consumption.</span></p></li><li><p>Curates shared clinical feature tables (patient demographics, labs, vitals, diagnoses, utilization history, imaging metadata) in Databricks/Unity Catalog that multiple AI programs consume for model training, validation, and monitoring.</p></li><li><p>Owns RAG Infrastructure, the shared retrieval-augmented generation platform that agentic and generative AI programs use to ground LLM outputs in organizational knowledge.</p></li><li><p>Designs and operates document ingestion pipelines: normalizing clinical documents, policies, guidelines, and unstructured data sources into formats ready for embedding and retrieval.</p></li><li><p>Implements and optimizes chunking strategies tailored to healthcare content (e.g., preserving clinical note structure, section-aware chunking for guidelines and protocols).</p></li><li><p>Manages the embedding pipeline: selecting, tuning, and versioning embedding models (domain-specific clinical models where they outperform general-purpose).</p></li><li><p>Administers the vector database: schema design, indexing, metadata management, access controls, and performance tuning.</p></li><li><p>Builds and maintains retrieval pipelines: hybrid search (vector + keyword/BM25), reranking, and relevance filtering to maximize retrieval precision for downstream agents and LLM applications.</p></li><li><p>Establishes data quality gates for RAG: automated profiling, completeness checks, and accuracy scoring before content enters the vector store.</p></li><li><p>Monitors retrieval quality metrics (Precision@K, Recall@K, MRR) and continuously optimize retrieval performance.</p></li><li><p>Databricks workspace configuration and Unity Catalog governance.</p></li><li><p>Cluster policies, compute management, and cost monitoring.</p></li><li><p>Manges user/group management and access control.</p></li><li><p>Administrator for Feature Store.</p></li></ul><p></p><p>Work is typically performed in an office environment. Accountable for satisfying all job specific obligations and complying with all organization policies and procedures. The specific statements in this profile are not intended to be all-inclusive. They represent typical elements considered necessary to successfully perform the job.</p><p></p><p>*Relevant experience may be a combination of related work experience and degree obtained (Master's Degree = 2 years).</p><p style="text-align:inherit"></p><p style="text-align:left"><b>Position Details:</b></p><p><b>Key Technologies:</b></p><ul><li>Databricks (Delta Live Tables, Feature Store, PySpark, Unity Catalog)</li><li>Epic SDE / epic-ws for real-time clinical data extraction</li><li>Vector databases (Pinecone, Weaviate, Qdrant, or Databricks Vector Search)</li><li>Embedding models and pipelines (clinical domain-specific and general-purpose)</li><li>SQL, pandas</li><li>Streaming and batch ingestion patterns</li><li>CDIS Data Warehouse (source system for batch clinical data)</li></ul><p></p><p><b><span>Required Skills & Qualifications:</span></b></p><ul><li>5+ years in data engineering, with strong experience building both batch and streaming data pipelines</li><li>Expert-level Databricks skills: Delta Live Tables, PySpark, Unity Catalog, Feature Store</li><li>Hands-on experience with real-time data ingestion (Kafka, Spark Structured Streaming, or comparable frameworks)</li><li>Strong SQL and Python (pandas, PySpark) skills for data transformation and feature engineering</li><li>Experience administering Databricks workspaces: cluster policies, compute management, access controls, cost monitoring</li><li>Familiarity with clinical data models and healthcare data sources (EHR extracts, ADT feeds, lab results, claims data) strongly preferred</li><li>Experience with Epic data extraction methods (SDE, FHIR, epic-ws) a significant plus</li><li>Understanding of data governance principles: lineage, quality monitoring, access controls</li></ul><p style="text-align:inherit"></p><p style="text-align:left"><b>Education:</b></p>Bachelor's Degree-Related Field of Study (Required), Master's Degree-Related Field of Study (Preferred)<p style="text-align:inherit"></p><p style="text-align:left"><b>Experience:</b></p>Minimum of 5 years-Relevant experience* (Required)<p style="text-align:inherit"></p><p style="text-align:left"><b>Certification(s) and License(s):</b></p><p style="text-align:inherit"></p><p style="text-align:left"><b>Skills:</b></p><p></p><p><b>OUR PURPOSE & VALUES: </b><span>Everything we do is about caring for our patients, our members, our students, our Geisinger family and our communities. </span></p><ul><li><b>KINDNESS: </b><span>We strive to treat everyone as we would hope to be treated ourselves. </span></li><li><b>EXCELLENCE: </b><span>We treasure colleagues who humbly strive for excellence. </span></li><li><b>LEARNING: </b><span>We share our knowledge with the best and brightest to better prepare the caregivers for tomorrow. </span></li><li><b>INNOVATION</b><span>: We constantly seek new and better ways to care for our patients, our members, our community, and the nation.</span></li><li><span><b>SAFETY:</b> We provide a safe environment for our patients and members and the Geisinger family. </span></li></ul><p></p><p></p><p>We offer healthcare benefits for full time and part time positions from day one, including vision, dental and domestic partners. Perhaps just as important, we encourage an atmosphere of collaboration, cooperation and collegiality.</p><p></p><p><span>We know that a diverse workforce with unique experiences and backgrounds makes our team stronger. Our patients, members and community come from a wide variety of backgrounds, and it takes a diverse workforce to make better health easier for all.  We are proud to be an affirmative action, equal opportunity employer and all qualified applicants will receive consideration for employment regardless to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or status as a protected veteran.</span></p>

Back to blog
Ads

Common Interview Questions And Answers

1. HOW DO YOU PLAN YOUR DAY?

This is what this question poses: When do you focus and start working seriously? What are the hours you work optimally? Are you a night owl? A morning bird? Remote teams can be made up of people working on different shifts and around the world, so you won't necessarily be stuck in the 9-5 schedule if it's not for you...

2. HOW DO YOU USE THE DIFFERENT COMMUNICATION TOOLS IN DIFFERENT SITUATIONS?

When you're working on a remote team, there's no way to chat in the hallway between meetings or catch up on the latest project during an office carpool. Therefore, virtual communication will be absolutely essential to get your work done...

3. WHAT IS "WORKING REMOTE" REALLY FOR YOU?

Many people want to work remotely because of the flexibility it allows. You can work anywhere and at any time of the day...

4. WHAT DO YOU NEED IN YOUR PHYSICAL WORKSPACE TO SUCCEED IN YOUR WORK?

With this question, companies are looking to see what equipment they may need to provide you with and to verify how aware you are of what remote working could mean for you physically and logistically...

5. HOW DO YOU PROCESS INFORMATION?

Several years ago, I was working in a team to plan a big event. My supervisor made us all work as a team before the big day. One of our activities has been to find out how each of us processes information...

6. HOW DO YOU MANAGE THE CALENDAR AND THE PROGRAM? WHICH APPLICATIONS / SYSTEM DO YOU USE?

Or you may receive even more specific questions, such as: What's on your calendar? Do you plan blocks of time to do certain types of work? Do you have an open calendar that everyone can see?...

7. HOW DO YOU ORGANIZE FILES, LINKS, AND TABS ON YOUR COMPUTER?

Just like your schedule, how you track files and other information is very important. After all, everything is digital!...

8. HOW TO PRIORITIZE WORK?

The day I watched Marie Forleo's film separating the important from the urgent, my life changed. Not all remote jobs start fast, but most of them are...

9. HOW DO YOU PREPARE FOR A MEETING AND PREPARE A MEETING? WHAT DO YOU SEE HAPPENING DURING THE MEETING?

Just as communication is essential when working remotely, so is organization. Because you won't have those opportunities in the elevator or a casual conversation in the lunchroom, you should take advantage of the little time you have in a video or phone conference...

10. HOW DO YOU USE TECHNOLOGY ON A DAILY BASIS, IN YOUR WORK AND FOR YOUR PLEASURE?

This is a great question because it shows your comfort level with technology, which is very important for a remote worker because you will be working with technology over time...