The Principal Data Management Architect is responsible for all aspects of data acquisition, data transformation, analytics scheduling and operationalization to drive high-visibility, cross-division outcomes. Investigate, evaluate, test and lead technical solutions for future systems. The Data Management Architect will define and lead the big data and information architecture needs for the company. The data management architect will partner with the data science leader to deliver data driven solutions that enable operational efficiency, operational excellence and improve product experience by customers.
Data Design Management
- Design and deliver a portfolio of product data sets from the definition phase through to production deployment.
- Design and architect for distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time.
- Architect solutions for the design and implementation of EMR Cluster and Big Data Infrastructure.
- Create data environments and/or data sets to serve a wide range of data users, including but not limited to Data Scientists, Data Analysts, Business Analysts, etc.
- Perform offline analysis of large data sets using components of a big data software ecosystem.
- Evaluate big data technologies and prototype solutions to improve data processing architecture.
- Lead efforts to determine root cause of complex data provenance, metadata issues and engineering questions that may involve interfacing with various technical staff in multiple organizations and with differing levels of expertise.
- Lead the process to investigate, evaluate, test and recommend technical solutions for future systems.
- Define information models supporting product assets ranging from IoT things to complex data structures represented in various data management systems such as graph, relational and hierarchical.
- Define and manage and master data management strategy for cloud-based data sets.
- Guide other teams to design, develop, and deploy data sets and tools that support product use cases.
- Guide and support data compliance that meets global data governance requirements.
- Manage portfolio of product data set backlog and support prioritization to ensure alignment with MDM.
- Knowledge of database concepts, object and data modeling techniques and design principles
- Detailed knowledge of database architectures, software, and facilities
- Successful history of manipulating, processing, and extracting value from large disconnected data sets
- Experience with programming languages - Python (required), Scala, Ruby, R
- Database technologies - SQL, performance tuning concepts, AWS RDS, RedShift, MySQL
- Experience with big data batch processing tools: Hadoop MapReduce, ElasticSearch, PIG, Hive, Cascading/Scalding, Apache Spark, AWS EMR
- Experience with stream-processing systems: Kinesis, Kafika, MQTT
- Experience with relational NoSQL databases including DyanamoDB
- Ability to write JSON, XML, YAML and other data definition schemas
- Master’s degree in data science, information management or other knowledge management field
- Bachelor’s degree in computer science, computer engineering, or a related field, or the equivalent combination of education and related experience.
- 12 years of professional experience as a data software engineer
- 3 years of experience with cloud and Big Data computing design, provisioning, and tuning.
- Excellent verbal and written communication skills necessary to effectively collaborate in a team environment and present and explain technical information and provide advice to management.