|
We're seeking a future team member for the role of Data Pipeline Engineer to join our Data Innovation team. In this role, you will design and build the data pipelines that power our Investment Data Standard (IDS) and knowledge graph, enabling a unified, high-quality data ecosystem that supports analytics, AI, and client-facing solutions. You will partner closely with the Ontology/Knowledge Architecture lead and collaborate across platform, product, and data teams to deliver scalable, production-ready solutions aligned to our broader data transformation strategy. This role is located in Pittsburgh, PA or Lake Mary, FL. In this role, you'll make an impact in the following ways:
- Design and build scalable pipelines to ingest and process data from internal platforms and external vendors across batch, streaming, and near real-time patterns.
- Transform diverse data formats (APIs, flat files, streaming, unstructured) into clean, standardized time-series and event-driven datasets aligned to IDS entity models.
- Develop reusable frameworks to normalize identifiers, symbology, units, hierarchies, and event data (e.g., corporate actions, transactions).
- Partner with Ontology/Knowledge architecture team to map source data to canonical entities, relationships, and attributes, enabling graph ingestion and entity resolution.
- Implement robust data quality controls (completeness, accuracy, consistency, schema drift, anomaly detection) with full lineage, provenance, and traceability (source IDS product).
- Enable multi-vendor data ingestion, comparison, and reconciliation, including source prioritization, hierarchy logic, and coverage/quality analytics.
- Build modular, reusable, cloud-native pipelines optimized for scale, performance, and cost (e.g., Snowflake), with monitoring and SLA-driven reliability.
- Collaborate cross-functionally to translate business and data requirements into production-ready pipelines and support downstream distribution via APIs, data products, and client platforms.
To be successful in this role, we're seeking the following:
- Bachelor's degree in a related discipline or equivalent work experience required. An advanced degree with a preference in statistics/statistical analysis is preferred.
- At least six years' total work experience, with at least 3 years' experience with a strong focus on data analysis and business intelligence is preferred.
- Extensive experience in data engineering, building and scaling production-grade data pipelines.
- Deep hands-on expertise in Python, Spark, and SQL, with strong experience in ETL/ELT frameworks and orchestration tools.
- Proven ability to design and operate high-volume, resilient pipelines across batch, streaming, and distributed environments.
- Strong understanding of structured and semi-structured data modeling, including time-series and event-driven architectures.
- Experience designing data transformation and normalization layers, including schema evolution and backward compatibility.
- Expertise with modern data platforms (e.g., Snowflake, AWS, Databricks), lakehouse architectures, and API-based data integration.
- Strong capabilities in performance tuning, cost optimization, and implementing data quality, monitoring, logging, and lineage frameworks.
- Domain experience with financial datasets (market data, pricing, reference data, portfolio holdings, transactions, corporate actions) and familiarity with key vendors (e.g., Bloomberg, ICE, MSCI).
- Exposure to knowledge graph/ontology-driven systems, entity resolution workflows, AI/LLM-based unstructured data integration (e.g., documents, PDFs), and data entitlements, licensing, and usage tracking is preferred.
|