BIG DATA, SMART SOLUTIONS

Social Media, Business Intelligence and the Internet of Things generate data, a lot of data. And the volume is currently growing at an exponential rate. Are you ready to handle it? Big Data, Smart Solutions..

As a solution to this new paradigm, a new field of study in computer science came to light: Distributed Systems. In plain English, grouping computers to work together as a cluster, resulting in a single powerful computer to the end-user. Here we discuss a couple of tips to tackle big data problems based on this concept and ask you to help us in a real case!

NON RELATIONAL DATABASES

One of the main problems when implementing machine learning solutions is scalability. Your product might handle thousands of operations, but what happens if we change the order of magnitude to millions? NoSQL non-relational databases provide scalability.

The non-relational concept encompasses all databases that are not SQL and the main advantage of this system is that you can read and write data, at scale and at speed. The magic about them lies on the way objects are saved, contrary to the very constrained rows in tables approach of SQL databases. In addition, availability and flexibility (no rigid schema) are more advantages of NoSQL databases. However, the strong relationships provided by a relational database will allow you to carry out complex queries.

Before choosing a database system, you need to understand the needs of your product. Otherwise, you might end up in a situation where your business needs to migrate to a new database. This is hard and surely you don’t want to waste time on this instead of running your operations.

DISTRIBUTED COMPUTING

Now we have the best possible DB type for our purpose, let’s start forecasting!

In many situations, the same operation needs to be perform for different clients or different accounts. If you plan on doing them one by one, it can take hours, even days. Can this be solved?

Yes, by using more than one computer (or fragments of a computer), we can calculate some, or maybe, all of them in parallel. This is Distributed Computing.

PRACTICAL EXAMPLE: CUSTOMER SEGMENTATION

Companies and brands segmentate their clients in order to apply effective actions at each different cohort and make them buy or engage more with their products. For example, a meaningful segmentation for a multinational fashion brand might take hours, even days if not performed properly. How do we tackle this with what we have discussed so far?

Let’s start by getting all the information from the database. It is likely that millions of transactions are performed yearly. Next, complex operations might be performed (NLP, anomaly detection, clustering…) and efficiency in computation becomes key.

Taking all this into account, let us know in the comments below, which kind of database would you use as well as which kind of distributed system you would be more likely to try.

Leave a reply