Popular Data Science Trends in 2020


Companies all over the world have one goal, pass and complete the digital transformation. Companies digitalize everything and are using technology to improve their state and to beat up the competition. Data Science is an essential part of this story and thanks to it, the organizations no longer have difficulties in making important decisions.

There are a few trends that will take over in 2020 and they are important to talk about because they are not planning to leave and plan on making a great impact.

Data Science takes over Conservative areas

Remember that in 2019, not only the health care but also a lot more regulated conservative fields touched the infamous machine learning solutions? Well, this isn’t stopping there.

All those that know Kaggle’s challenges know that the most famous challenge the previous year was the identification of Pneumothorax disease in chest x-rays. The identification of this condition is hard to even the doctors admit. This wasn’t just another challenge, this included implementation of it by institutions that study this condition thus step ahead. This type of machine learning algorithm can be really helpful and a system like this one is already approved by the FDA.

The benefits are in infinite number if the system is approved and implemented. This will help not only the patients but also will benefit the higher hierarchy entities. Unfortunately, its implementation and approval are hard but according to a couple of new statistics, it can be overcome.

Data privacy by design

2019 is the year in which a lot of data breaches happened and those breaches left a lot of users unhappy. Engineers and data scientists since then work on multiple projects that will satisfy the users. Federal Learning for example, although introduced to us a few years ago, has gained a lot of popularity lately. It is not the final solution sure but it can benefit the software engineers and data scientists to deliver systems following privacy by design. They will allow the creation of products that are useful while at the same time useless data.

The question is: How will users trust software companies with their data? This is going to improve in 2020 for sure and Federated Learning here can be quite helpful. To put in other words, if Federated Learning is used then, prediction models for mobile phones are going to be trained without uploading sensitive typing data to servers.

Remove bias and discrimination cases

Apple had big problems back in November 2019 when a gender discrimination case came out in public. It was estimated that it offered smaller lines of credit to women than to men. The story came out in public thanks to a tweet that for a short matter of time became viral and unfortunately this tweet only reminded us about a problem people are facing for a long time. There is an issue with fully automated machine decisions because the final decision they bring is not the right one.

In his statement Goldman Sachs, the accused stated that modern algorithms often result in proxy discrimination and this type of discrimination is a subset of disparate impact. The practice was neutral according to him and it unreasonably hurts members of a particular class.

We shouldn’t forget that models that are trained on biased data are capable of learning discrimination patters. Also, these models can find the proxies that ultimately bring discrimination. This is a major problem and it has been estimated that in 2020 the problem will become permanently fixed by multiple data scientists. Data scientists have already started exploring new ways and architectures that can spot and later determine these biases.

Python and Data Science

Python is a programming language that data scientists should focus on when it comes to data science. Its popularity is enormous and it became the second most loved language. Many websites mention Python and give positive critics. With Python data scientists won’t face any difficulty and to be honest, this programming language is helping them more.

Focus is everything in 2020. Every data scientist should focus more and choose the right path. The first is the heavy engineer path and the second is the heavy analytical path. People that come from computer science should focus on the first one while people that come from applied math, social field or physics should focus more on the second. Sometimes the two paths cross but one is recommended.