Business component of the project
The customer needs to develop a solution for monitoring and collecting information on various resources to increase its sales.
The developed system should significantly accelerate and automate the routine actions taken to search for and collect contact data of potential customers.
Technological features of the project
The system is designed to collect and display data from such sources as Crunchbase, Angel.co, etc., Google Play and App Store. All information received is integrated with the customer’s CRM.
The application allows collecting any information that may be useful for the sales department.
The received data is stored in a cloud database. Sorting by different fields, as well as export of data to CSV were implemented. The project is deployed on the Google Cloud infrastructure.
The parsers are run according to the scheduler, so it allows avoiding too high load on the customer’s capacities. In addition to the main crawlers and parsers, there is a queue for the updating parser, which allows updating information on the data that has already been collected. There is also a special service that displays the “health” of the parsers. This makes it possible to prevent serious errors in the system operation in a timely manner and to implement the necessary changes in time to restore operability.
Approaches and solutions
- Improvement and refactoring of the customer’s parsers. The most important layers were not selected in the parsers implemented in Node.js that had been prepared by the customer’s team. And the frontend and access to the database from the parsers were stored in one place. The following steps were taken to implement the microservice architecture:
- Separate service on Kotlin for access to the Google Cloud SQL database. gRPC was used as an external interface.
- Refactoring and stabilization of parsers.
- Clearing the parser component from unused code.
- Implementation of the functionality that subscribes contacts to the company’s newsletters. For the needs of the marketing department, functionality that allows to quickly and conveniently make decisions on the subscription/unsubscription of contacts collected from CRM to the company’s advertising newsletters in SendGrid was implemented.
- Preparing a special parser for Twitter. JazzTeam prepared a separate parser for processing information from a new source. This made it possible to gather a new audience to display targeted advertising about the company’s services. Subscribers of certain accounts are added to the audience, and the received data is exported to a separate CSV file. After that this information is automatically sent to the advertising campaign, and the necessary contacts see targeted advertising of the customer’s services.
- When our team proceeded to work, Google Play, App Store, Angel.co and Crunchbase parsers had already been implemented on the project. The existing solutions needed to be further developed and improved.
- Execution of the entire development complex by JazzTeam specialists: preparation of both the frontend and backend parts of the system.
- Instant delivery of the developed functionality to the customer. At the request of the customer, the stage version of the system was not tested on the project. New functionality deliveries were carried out on the production version. This required the team to be very careful and focus on automated testing.
- High dependence on innovations and security mechanisms of websites and stores on which information is collected. To support the work of parsers, it was necessary to monitor changes in conditions and quickly adapt to them.
- Google Cloud: App Engine, Cloud Tasks, Cloud Scheduler, Cloud Storage
- Libraries: express, axios, cheerio, close.io, google-play-scraper, @sendgrid/eventwebhook, etc.
- Google Cloud: Cloud Run, Cloud SQL, Cloud Tasks
- Kotlin, Armeria, Flyway, MySQL, jOOQ, gRPC, SendGrid
Company’s achievements on the project
- A few new parsers were created for different sites (Twitter, Gmail).
- Successfully conducted research on the methods of data uploading, as a result of which an optimal and convenient solution was found.
- The speed of working with the cloud database was optimized.
- Particular attention was paid to optimizing queries for displaying data from Google Play.
Clients about cooperation with JazzTeam