Design, develop, and maintain scalable and efficient data pipelines using Python, Pandas, Spark, and Google Dataflow.
Strong background in relational databases such as SQL Server and MySQL along with a deep understanding of the Google Cloud Platform.
Putting together large, intricate data sets to satisfy both functional and non-functional business needs.
Collaborate with cross-functional teams to understand data requirements and implement solutions that meet business needs.
Optimize and troubleshoot data pipelines for performance, reliability, and scalability.
Ensure data integrity and quality throughout the pipeline by implementing validation and testing strategies.
Determining, creating, and implementing internal process improvements, such as redesigning infrastructure for increased scalability, improving data delivery, and automating manual procedures
Develop and maintain documentation for data pipelines, including design documents, code comments, and operational guides.
Provide technical guidance and mentorship to junior team members, promoting best practices and continuous learning.
Having knowledge on data engineering, cloud computing, and analytics using PowerBI, Tableau, Data studio and Superset
Work Experience & Qualifications
Proven 8 to 10 Yrs. in design and development.
Excellent customer-facing and internal communication skills.
A bachelor’s degree in computer science / information science / data science.