We often hear that an object-oriented programming language, Python, brings great advantages of modularity, abstraction, productivity and reusability, safety... Python itself is one of them, which is one of the reasons why it is so popular. But what is Object Oriented Programming (OOP)? Why is it so useful?
Article written by Mohamed Zebli, a student of the 5th promotion of our Data Fullstack training.
Did you say OOP?
Object-oriented programming (OOP) is a paradigm within computer programming. It is a representation of things, a coherent model - shared across different languages that allow its use (Python, Java, C++).The aim of OOP is to define and make objects interact with each other, understood here as all types of structures from a given language. However, for convenience,
objects are usually complex variables, which are themselves composed of variables or functions.
A little history
The OOP programming paradigm was defined by the Norwegians Ole-Johan Dahl and Kristen Nygaard in the early 1960s. Later, their work was taken up and amended in the 1970s by the American Alan Kay. This is how the principles of OOP were established and were later refined. Here are the principles.
The principles of OOP
We come back to these principles with examples, just below!
--> Encapsulation
it is a grouping of data with a set of routines to read or manipulate it. Each class defines methods or properties to interact with the data. It is from the class that the different objects will be created. When one of the objects of the class is integrated into the program, this object is referred to as an instance of the class: the object is created with the properties of its class.
--> Abstraction
It consists in hiding unnecessary details from the user. The user can then implement his own more complex logic without having to take into account the hidden and underlying complexity.
--> Legacy
This means that a class B inherits from class A. In other words, class B inherits the attributes and methods of class A. The methods contained in class A can then be called by class B as soon as an instance of class B is created. This saves a lot of time.
--> Polymorphism
It allows the developer to use a method or an attribute in several ways, depending on his needs. The same method can, for example, be used on different entities. The method with the same name will produce different effects depending on its context of use.
Procedural programming vs OOP
Before OOP was used, computer programming was done using procedural programming. Solving a problem was done by a top-down analysis that broke the problem down into sub-problems until very simple actions were identified. Thus, the program is broken down into procedures that interact with each other to solve the problem.
While procedural programming is intuitive when it comes to learning programming, there are a number of drawbacks to this method in the long run. The first is that the smallest change in the structure of the program data calls for a change in all the procedures that interact with that data. In addition, developing a very large program in procedural form can be time consuming and tedious.
While OOP does not fundamentally allow you to do more than procedural programming, it does allow you to organise your code better. It also facilitates cooperative work and long-term maintenance.
Why is it so useful for Data Science?
OOP has undoubtedly made it possible to democratise access to data science for a significant number of people. Indeed, libraries allow the use of methods and functions defined by others. Libraries contain modules, which themselves contain classes. Within these classes, methods are programmed. This is how encapsulation manifests itself.
The data scientist who imports a library has direct access to all the functions, sometimes complex, without having to code their intrinsic operation himself.
Through abstraction, it can implement predefined methods without needing to understand how they were constructed. Thanks to polymorphism, it is possible to apply the same method to very different data and contexts.
For example, the Seaborn library offers to implement - on a dataset - different graphical representations. The Data Scientist does not need to know the detailed code contained in the library's classes. Only the methods and the logic of their operation are useful to him in order to achieve his goals. He can thus obtain a graphic representation on two different data sets, simply by applying the desired method.
This is why various libraries have met with a certain amount of success and greatly simplify a number of tasks: Numpy for manipulating matrices and multidimensional tables, Pandas for data analysis, or Scikitlearn for implementing machine learning algorithms.
Finally, large technology companies develop - at a very high level of technical expertise - their own libraries, classes and methods. In the manner of an industrial secret, they guarantee that the said company will have its own machine learning tools (for facial recognition at Facebook, for content recommendation at YouTube/Google, etc.). This is in order to gain a competitive advantage through the performance of the tools and the R&D efforts thus made.
Want to learn about data? 👉 Check out the best courses!
If you want to acquire the data skills that recruiters are looking for, take a look at the data courses that Jedha Bootcamp offers.