Data Engineer

Offer by GeoPhy



About this job

Location options: Visa sponsor
Job type: Full-time
Experience level: Mid-Level, Senior
Industry: Financial Technology, Real Estate
Company size: 51-200 people
Company type: VC Funded


python, sql

Job description

The data pipeline at GeoPhy starts with you as you will gather the necessary information about real estate objects, real estate investment trusts, and environmental factors. You will gather, scrape, extract, clean and structure raw data in such a way that it can be uploaded to our semantic datastore. You will also be looking into automating any manual or semi-manual work that is currently being done, reusing artefacts from our Data Science and Data Extraction teams, in order to build a sustainable data pipeline for the Portfolio Data.

What you’ll be responsible for

  • Gathering, cleaning and structuring data
  • Scrape websites to extract data from it using Python
  • Applying PDF extraction / text recognition artefacts from other teams to automate your work
  • Identifying automatable components across the existing data pipeline
  • Collaborate with colleagues across multiple locations to resolve requirements for the data to be gathered and processed
  • Contribute to creating a great work atmosphere in the office

What we’re looking for

  • Demonstrable experience in Python
  • Experience with metadata-driven solutions
  • Experience in working with large datasets
  • Understanding of Data Quality concepts
  • A self-starting attitude
  • A willingness to learn about new products, tools, and techniques related to data engineering
  • Problem-solving mentality leveraging internal and/or external resources
  • Strong commitment to teamwork
  • Working proficiency in English paired with an international mindset

Bonus points for

  • Experience with or knowledge of scraping
  • Background in mathematics or IT/coding
  • Specific experience with SQL
  • Specific experience with Semantic Databases and RDF
  • Experience with or knowledge of ELT/ETL tools such as Matillion, Teradata, Informatica PowerCenter or Talend
  • Experience with Agile/Scrum environment
  • Some understanding of real estate industry/market

A new version is available REFRESH