Web Data Mining Specialist
Looking for an opportunity that stands out? We are a Vancouver-based, Silicon Valley funded startup looking for candidates that want the chance to be an integral part in the company that they work for. If you are looking to be more than just a number and work in an environment where your contribution makes an impact, then we are the perfect fit for you!
With all that personal identity information out there on the Internet, why is it so hard to find it? One reason is that with the tremendous amount of data available, it actually becomes very difficult to extract useful information. We are looking for candidates with the skills and experience to take on this challenge and produce great results.
The primary mission of this role is to design and build distributed and optimal crawlers. Furthermore, one will develop and execute data mining solutions to extract personal identity information from the collected data.
Work at an industry pioneer with the best engineers and data scientists in the business to build the world’s largest database of user identity information.
• Working closely with the data science team in order to find what kind of identity information needs to be collected.
• Identifying various sources of information which can be used to collect identity information data.
• Designing and developing distributed and optimal web crawlers to mine identified sources.
• Designing and developing technologies to process collected data and extract, aggregate, and resolve fragmented identity attributes to individual identities.
• Organize and index aggregated identity data in order to be efficiently consumable by Trulioo APIs.
• Communicate findings to engineering and management teams.
Requirements (Experience in the following is a MUST)
• Degree in computer science or software engineering or related fields
• Solid understanding of Computer Science fundamentals
• Having background in data mining
• Proficiency in scripting languages like Python and C/C++ or Java
• Ability to work with big data and highly scalable programming/distributed computing
• Experience working with databases including SQL and NoSQL
• Experience building web crawlers
• Experience with Linux (Bash scripting)
Nice to Have
• MS or PhD in CS is a plus
• Knowledge in machine learning a BIG plus
• Experience with R, Hadoop, MapReduce
• Experience with message queues (RabbitMQ)
• Experience with software version control
• Ability to document work
• Proven ability to work in a fast-paced environment and to meet changing deadlines/priorities in simultaneous projects
• Excellent organizational, communication and interpersonal skills; enjoy working in both individual and team settings
Located in downtown Vancouver, this is a full-time position that has a competitive salary, stock options and medical benefits all wrapped up in a fun “startup” environment.