Data-Powered Success: Boosting Sales and Marketing

Business component of the project

The project is a platform intended to make collection and processing of data from various business portals automated. The support of analysis of relationships between account data from various sources is implemented in it.

The introduction of such an automated platform allows to increase the speed and efficiency of the Sales and Marketing Department, which is confirmed by the results of this platform implementation in our company. The developed application makes it possible to significantly increase the amount of information collected, the speed of collection and processing per unit of time, and also automates the analysis of relationships, which eliminates errors associated with the human factor.

The Sales and Marketing Department, which acted as the customer of the project, highly appreciated the practical value of the developed application. And the project continues to develop although its main functionality has already been implemented. The project team plans not only to implement the support of the application’s performance, but also to expand its capabilities taking into account the proposals received from the customer.

Technical description of the project

The developed application is a crawler that scans and collects data from sites containing information about potential business contacts according to predefined search parameters.

The data source for the crawler can be any business portal/social network/site that contains contact information, for example, LinkedIn, angel.co, crunchbase, etc. Data from all systems is collected, compared and processed to find duplicates, check for the availability of these records in the existing database of customers. After this the data is merged with the existing crawler database. To launch the crawler, one should configure it and specify which sites to collect information from, what information should be collected, i.e. the tuning is quite fine.

Solution for Component Exception Handling in Crawling

Here is an example of a crawler script. We go to angel.co, scan the list of companies with the given condition: “If a company received investments on this site, I want to see if there is a vacancy for a Java developer in its Linkedin profile. If there is such a vacancy, we add the contact data of this company to the database of potential customers.

Selenium WebDriver automation tool is used to read information from the site. Search parameters are stored in Excel files on local or cloud storage, such as Google Drive, and are loaded from the storage when the application is launched.

The application is launched on the company’s server using Jenkins, in the form of individual builds. Each build is configured taking into account the general crawler launch parameters and the selected parameters of data search on the site. After completion of the work, the crawler sends users a report on the results and Excel files containing the data read by e-mail.

The general parameters for launching the application include the amount of data to be read, bot logins and passwords, the name of the file with filters, the date and time the build was launched, etc.

In the current configuration, the application performs two types of data search on the site – search for information about people and search for companies based on the vacancies they published.

The system performs the following functions:

System functions

automatic login on the site and verification of authorization;
processing of obstacles that arise during authorization or crawler operation (captchas, verification codes, etc.);
reading the necessary information from the obtained search results, as well as from web-pages of people and companies;
formation and automatic sending of reading results to users by e-mail.

It is possible to launch several search queries at a time through the application (for both types of search, different regions, etc.).

The application itself starts automatically based on the specified parameters, and the user can only influence the result of its work by changing these parameters in advance.

CRM Integration with Crawler

During the work with the Crawler, the sales department continually increases the number of test run scenarios. The volume of information grows, and the obtained data is subsequently used for new runs. The information needs to be stored, filtered, and processed: compared, carefully monitored to ensure nothing is lost, and avoid data duplication. As a result, we found it necessary to create a separate storage for this data.
Our team needed to find a solution that would automate the sales department’s work, allowing data to be collected, stored, and processed correctly in one place.

To solve this task quickly, efficiently, and inexpensively, we decided to integrate the Crawler with a CRM system. The goal was to obtain a ready-made database with a convenient UI and the ability to customize it to our needs without significant costs. Moreover, the solution had to fit our internal business model for data storage.

First, our team analyzed available open-source solutions. We implemented three concepts using different CRM systems. The final choice fell on VTiger CRM. This open-source system has a well-designed API that meets our requirements and data storage model and offers customization capabilities.

To implement the CRM integration, we had to rework the Crawler and refactor its code. On the one hand, this was necessary to enable compatibility with the chosen CRM system’s API; on the other hand, it was to ensure compatibility with APIs of other systems, should it be required. For this, we implemented a universal class that dynamically forms requests for interacting with open APIs of other systems.

After integration, we started customizing the system. For ease of use, after each Crawler run, the sales department receives reports via email and corporate messenger. These reports contain links for quick access to the CRM system, where the results of the run are displayed, filtered by the parameters set earlier. This was achieved by creating specific tags for particular filter requests in our CRM, which help generate links. Initially, the CRM system did not support this level of customization, but we managed to implement the changes, added our custom field, and tailored the system to the sales department’s needs.

One of the drawbacks of the chosen CRM was the lack of quality documentation for setup and customization. Our team frequently encountered a lack of sufficient information, requiring in-depth research. Every time we successfully completed research, we documented our steps in detail. As a result, we now have a comprehensive guide describing the configurations, API usage, and our solutions. We created this guide, in part, to avoid dependency on specific developers. Now, any engineer can perform CRM customization tasks.

As a result, we achieved integration with the CRM system without the need to develop our own software from scratch. We did not have to create a full backend with a database and UI, which significantly saved time and effort. The solution turned out to be cost-effective and convenient.

Technologies used on the project

Programming languages: Java SE 8, JavaScript.

Frameworks, libraries: Lombok, Log4J, Selenium WebDriver, TestNG.

APIs used: Apache POI, JavaMail, Google API (Drive, GMail), REST-assured (2captcha API).

Infrastructure: Apache Subversion (SVN), Jenkins, IntelliJ IDEA.

Project features

The work is carried out in accordance with the Scrum/Agile methodology.
Manual testing was used on the project. This is explained in particular by the following:

running Unit tests in parallel with the main builds can lead to exceeding the limits on the number of requests per unit of time (additional bans);
some situations (for example, failures in site operation) cannot be reproduced to test the crawler automatically.

Identification of errors in the operation of the application is based on the study of the contents of reports and data files sent to users. When examining files, the data actually collected by the crawler and the data on the crawled site are compared.

Portals for searching and finding business contacts are regularly updated and expand their capabilities. In this case the correct operation of the crawler requires updating from time to time. This requires the involvement of specialists, but the crawler is designed in such a way that we do not need to change a large amount of code (do a lot of refactoring) for updating. It is enough to make minor changes to certain software layers of the crawler.
The duration of application operation can vary significantly and largely depends on the amount of data being read.

Screenshots

Project results

Automation of the work of JazzTeam Sales and Marketing Department specialists was completed, which made it possible to reduce their labor costs for collecting data on potential customers by several times.
The application is used every day by the Sales and Marketing Department. We always have access to up-to-date data on vacancies (and the companies that published them) published on various portals, which serves to search for and establish business contacts.

Company’s achievements on the project

During the first 8 months of the crawler’s operation, the Sales and Marketing Department managed to collect and process the number of leads that exceeded the number of leads collected over the same period of time by several dozen times.
The crawler has already helped to find several interesting partners and customers for the company, including the ones from new geographic regions.
Thanks to a thorough study of the interface and features of operation of various portals for searching and finding business contacts, a number of tasks related to optimizing the operation of the crawler and ensuring its continuity during data reading have been successfully solved.