Back

More data for the data God: how we turned 60 mln records into B2B service for sales departments

What should developers do when they are to develop a service with a 60 million contacts database? Take it up and start developing! Even if they have no experience working with such a large amount of data. My name is Sergey Nikonenko, I’m a COO at Purrweb. Today I’ll tell you about a mid-project database and search engine shift and a secret unveiled after 3 years of working with a client.

Reading time: 8 minutes

Сервис лидогенерации
Table of contents

    In a dark, dark wood, there was a dark, dark path… Along the dark, dark path, there was a dark, dark client

    The story I am about to share with you is almost mystical: back in 2017, we got a request from a client who turned out to be a true conspirator. He came up with the idea of a mega lead generation service that would allow companies to find leads (contacts of organizations) in the B2B sector. The service was supposed to become a universal tool for any more or less big sales department which had to work with cold emailing and calling.

    The client didn’t come empty-handed: he already had a database that contained 60 million records (don’t ask where he got it from — we don’t know and are bound by NDA). The database had everything a sales department needed for lead generation:

    • email
    • email status
    • company name
    • address (city, state, zip, country)
    • website
    • phone number
    • fax number
    • SIC code
    • industry name
    • NAICS code
    • NAICS description
    • LinkedIn industry name
    • stock ticker
    • revenue
    • number of employees
    • year of foundation
    • Alexa rank
    • first name, last name
    • job function, management level
    • etc.

    The lead generation service was to help sales reps filter people and companies they are interested in and get up-to-date contacts: emails and phone numbers.

    We called the client a conspirator because even a signed NDA wasn’t enough for him to be sure of us — we learned his real name only after 3 years of cooperation. However, it wasn’t an issue as by that time we managed to make a good team. Now we are working on other projects together.

    What about the moon?

    When we first met on a call, we estimated the project at $15,000. The client described the idea quite roughly, and we were young, green, and inspired. We clapped our hands, thinking that all we needed to do was to link up the database, connect the interface and finish some little things.

    However, by the end of the first development stage, the project exceeded time and money expectations. The client was constantly coming up with new details, so I was worried that nothing would work out, and we would end up far beyond the initial agreement.

    They say, do what you must and come what may. At some moment, we found a way to hit it off with the client. I stopped worrying and panicking, and we managed to prioritize the yards of new features and ideas. The work got back on track.

    Later on, we learned that the client didn’t expect anything from us — he knew the real complexity of that project. It was his second attempt as he already had some questionable results with a team from India. In addition, the client knew that such a large lead generation project would take larger investments.

    In God we trust, technologies we use

    The client’s service would not be a never-before-seen product. In the market, there were already quite some competitors. The biggest ones are ZoomInfo, Clearbit, and D&B Hoovers. However, we outplayed all of these services since the client’s database comprised more accurate contact and company data.

    The project let us try out lots of services that we had never used before:

    • For the frontend part, we used React, for the backend — Ruby on Rails. We chose Ruby because it gave us a number of out-of-box tools.
    • As a Database Management System, we initially chose MongoDB that we then replaced it with PostgreSQL because it was able to build connections, thus, better for working with structured databases
    • As a search engine that would help search data, we used Elasticsearch (it was replaced with Apache Solr later on)
    • For email verification, we decided to use ZeroBounce
    • For payment proceeding — BrainTree
    • … and other infrastructure services

    We split the development process into 5 steps:

    1. Building Frontend and Backend architectures
    2. Working on the database with MongoDB and PostgreSQL
    3. Developing of the service
    4. QA testing
    5. Supporting and maintaining

    Let me tell you about everything in detail.

    First replacement in game: Elasticsearch cannot make it

    When we began setting up the search engine, we initially decided to use Elasticsearch — it was pretty simple to work with. However, as time went on, its functionality turned out to be insufficient:

    • Elasticsearch is bad at working with pagination — you cannot group the search results and divide them into separate pages. For example, when the user scrolls down the contacts of a company’s employees, the search engine cannot arrange them according to the companies they relate to
    • With Elasticsearch, you cannot make search suggestions — users won’t see any suggestions as they type in letters

    Lead generation services: search engine

    We had to find an alternative (as you may remember, it’s not the first replacement made during the development). Apache Solr was our champion. It wasn’t difficult to do since Elasticsearch and Apache Solr are by and large similar — they are built upon the same technology.

    As a result, the search became faster and more convenient for end-users.

    Second substitute in game: PostgreSQL instead of MongoDB

    When we started working on the project, we didn’t have a large experience in managing databases. We chose MongoDB as a database management system — quite a popular tool, yet not the most convenient. MongoDB allows you to pile up all data in a ‘heap’ and utilize whatever you need later on. When we puzzled over how to structure the data, it was time to change the DBMS.

    We got two main reasons to change it:

    • At that time, we were using AWS for hosting. It didn’t have a dedicated cloud service for MongoDB but had one for PostgreSQL. We had to pick up and set up scaling, replicating, backups
    • PostgreSQL is an object-relational database that is designed for work with relational data. For example, it provides you with transactions guaranteeing that all the data will be saved if something goes wrong. MongoDB gives you no safety bags

    We changed the DBMS bit by bit: first, moved all the entities, except for contacts. For a while, we worked with 2 databases simultaneously. We gradually adapted the code to work with PostgreSQL, then took all the data from MongoDB and finally moved to PostgreSQL.

    Verify, enrich and come up with new features

    When you have 60 million emails, you need to think of a way to verify them in order to get rid of dead and broken ones. At first, we chose BriteVerify but later switched to ZeroBounce (which was easy to connect with). BriteVerify granted us more accurate verification for better lead generation yet was much more expensive than ZeroBounce. That’s why we agreed on using ZeroBounce as a sweet spot.

    Lead generation services: verify

    That’s how it worked: we send an email to the service, the service reveals the status of the email. For the user, verification requests are paid yet the price is low.

    Typically, 10 statuses could be assigned to an email in the system. To make the development more user-friendly, we merged them into 3 groups: valid — you can send emails to this account (most likely that you’ll get a response), accept all — in-between state (you can start sending emails but the mail server may decline your request), and invalid — it’s pointless to try.

    It’s not the only benefit provided by Zerobounce: users can upload their CSV file containing emails into the service to check the validity.

    Updates never end

    When a sales department works with a particular CRM, it w o n ’ t e v e r c h a n g e i t will sit tight and try to build its ecosystem upon it.

    We chose the 15 most popular CRM systems and integrated them with the lead generation service. All CRMs are different, so we had to read through the documentation of each one and implement integration manually — unfortunately, you cannot automate this process. Most of the chosen CRMs went with clear documentation and a built-in sandbox: in such a case, all you need to do is read the documentation and double-check yourself to make everything work flawlessly. However, some CRMs are difficult to integrate, and you need to spend some time tinkering around.

    Lead generation services: integrations

    When you’re working on online lead generation software, be ready to connect it to as many CRMs as possible

    The integration spread over several years: the process was set to go step-by-step. When it was needed, the client added new CRMs requiring integration into the backlog. Besides, this task had no due time: to make the user experience of lead generation better, all the integrations need to be supported, maintained, and timely updated.

    It’s not the only part of the service that we regularly update. Even 60 million contacts in the database aren’t enough: one day, users will wear it out, and the service will become useless. That’s why we systematically update the data.

    Every 3-4 weeks the client sends us a new archive that contains about 1000 JSON-flies. It usually takes us a week to pull out all the data and renew the database. To do it, we:

    1. Gather all data in one large JSON file which size is around 1TB

    2. Parse this file, pick up all needed information and create a CSV file — about 2-3 days

    3. Upload CSV file to PostgreSQL — about 2 days

    4. Index the data with Solr — another 3 days

    One trigger, two triggers, three triggers

    Besides data from the database, we added up some useful features to the lead service. This way, the lead generation service got a news feed and sales triggers:

    1. The user picks a company he/she wants to monitor and sets a trigger on a particular type of news
    2. When @eventname happens, the service sends the user a notification

    To implement the feature, we integrated the service called Contify that sends us news about companies we have in the database. Users can choose what type of triggers they need: for instance, if they are interested in the bankruptcy of all IT companies, they will get all news related to the topic.

    For example, if the user is a developer, they can set up triggers on news about startups that get investments. The plot is simple: you learn that a startup got the money → you email the startup and offer your services.

    Offline meetings were, are and will be the most impactful networking method. That’s why we taught the lead generation service to send news about various upcoming events and conferences. As icing on the cake, now the service can not only send the news but also bring registration links.

    Tweaking payment

    Stripe is considered to be the best payment system the world has ever seen. However, when working on the service, we faced an unexpected problem: Stripe didn’t approve the client’s product because of its ‘characteristic aspects’. Of course, we were up to the task and quickly brought an alternative — BrainTree. In fact, BrainTree is not worse than Stripe, but we had never worked with it before. Recently, Stripe changed its mind, so now we are setting up another payment system that will work as an alternative option for users.

    What to expect?

    Although the service was released, we continue working on it — the client is 100% into the project and has an unquenchable flow of ideas. At first, it was a medium-sized lead generation service with a subscription. Now it is a giant with its own public API (companies can connect to the service’s databases).

    As the client expected, the project reached a break-even point and began to bring more than $5 mln per year. The service has approximately 10 thousand active users per month  (with 110 thousand users being registered).

    There are still many ideas waiting to be implemented. The client doesn’t plan to stop. So don’t we. After all, if we learned the real name of the client only by the fourth year of working together —  who knows what else this cooperation can bring to the world. 🙃

    ‘Unlike other developers I’ve worked with, Purrweb genuinely cares for our project. They’re a true partner in the sense that they’re investing in our relationship and making our product a success. When I compare Purrweb to past development partners, they have exceeded my expectations. Selfishly, I’d like to keep Purrweb to myself, but their team is fantastic, and I’d highly recommend them to anyone looking for a web development team’ — the client comments.

    Management
    • Anastasia Enina
    Development
    • Boris Shatalov
    QA testing
    • Tatyana Perina
    Content
    • Daria Lobacheva
    With 250 startups under our belt, we’re happy to share the takeaways. Get our MVP launch checklist in your email.