Data Engineer Media Screen Scraping in Fusemachines

Closed job - No longer receiving applicants

Fusemachines is a leading AI strategy, talent, and education services, provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, the United States, Canada, and the Dominican Republic and more than 250 full-time employees) Fusemachines seeks to bring its global expertise in AI to transform companies around the world.

Job functions

Overview

There exists a Python-based screen scraping application. The application connects to a Media Monitors website, and, for a particular radio station, can collect either "song" or "panelist" information for a given period of time. A "station week" consists of a station and both song and panelist information for a week with a start and end date.

This application is to be extended to be able to systematically backfill this data when executed.
A database table in Postgres is to be created which will keep track of the station, start date, end date and type (song or panelist) when the data was collected.

When the application is started, it will begin backfilling from the oldest required week, obtaining both song and panelist information for all stations before moving on to the next week.

The application will have a maximum number of "station weeks" it will collect before exiting. When the application is started again, it will read the postgres table and begin collecting again. The maximum number of station-weeks per execution is a means to avoid over-loading the server.

This project will not involve terraform or devops. It is sufficient the developer demonstrate the code working locally.

It is possible, while the work outlined here is in progress. That changes to the eternal website being “scraped” may require changes in the scraping code as part of this deliverable.

Qualifications and requirements

Minimum of 3+ years of experience
Strong written and verbal communication along with proven fluency of English
Python
Selenium
Postgres SQL

Likely Duration
3 months, dependent on the stability of the website being scraped.

Conditions

Fully remote You can work from anywhere in the world.
Pet-friendly Pets are welcome at the premises.
Informal dress code No dress code is enforced.

Remote work policy

Fully remote

Candidates can reside anywhere in the world.

Life's too short for bad jobs.
Sign up for free and find jobs that are truly your match.