Data Engineer | Tech Lead

Offer by Colaberry Data Analytics

java

pyspark

hadoop

apache-kafka

hive

About this job

Compensation: $120k - 140k
Location options: Paid relocation
Job type: Full-time
Experience level: Mid-Level, Senior, Lead
Role: Backend Developer
Industry: Data & Analytics, Data Science, Machine Learning
Company size: 51–200 people
Company type: Private



Technologies

java, pyspark, hadoop, apache-kafka, hive



Job description

This opportunity will allow you to work on an open analytics platform leading a big data services initiative as a Data Engineer or Full-Stack Data Scientist.  

30,000 ft Platform Overview: 
Cloud-based data platform built entirely from open source components that provides the user with the ability to efficient ingest, process, store, and access datasets without compromising ease of use, governance, or security. The platform was conceived to provide a simple tool to store files that reside on local computer drives and file shares into a central repository. Besides a user-friendly file ingestion interface, the original tool also gathered metadata both through user input and automatic parsing of files, and the uploaded content was immediately made available via an API. From those humble beginnings, this platform has now turned into a full-blown well-managed data lake and is continuously being enhanced with new features.

Providing batch, streaming, and API-based ingestion in addition to simple file ingestion. As data is ingested, metadata is collected at the time of ingestion, making datasets immediately searchable in other tools such as: enterprise metadata management system and the enterprise data catalog. The platform can be accessed via an API or SQL queries. Security on datasets is controlled through an existing entitlement workflow based on virtual directory services. Even though the system is relatively young, it is already being used by several predictive models that query data out of this platform using an access API. In addition, descriptive analytics have been enabled via ODBC/JDBCconnectivity, allowing traditional BI tools to interact with the datasets directly, thus increasing the utility of the platform.

Work Overview:
Working as a Data Engineering / Tech Lead. Your focus will be involving a Hadoop platform\related tools, and with leading teams to deliver complex products.

• Build new data pipelines, identify existing data gaps and provide automated solutions to deliver analytical capabilities and enriched data to applications.
• Develop Hadoop applications to analyze massive data collections, processing framework to detect conditions, and techniques that will be geared towards supporting trends and analytic decision-making processes.
• Design, build, and manage analytics infrastructure that can be utilized by data analysts, data scientists, and non-technical data consumers.

Tech Overview:

Programming: Java, Scala, Go, or Python [Any]
- Hadoop [HDFS, Hive, MapReduce, Sqoop, Oozie etc]
- Hbase/Phoenix, Spark/PySpark, Kafka

Nice to Have:

- Containers: Kubernetes and Docker expertise
- Orchestration Processes | Aggregation Strategies | Microservice architecture
- Vault authorizations [nice to have]
- Experience with continuous integration and build tools [CI/CD]
<>

                                  *Thank you very much for taking the time to check this out*



A new version is available REFRESH