Workshop : Introduction to Python
Thursday, June 15, 7 p.m.
On line

Take your first steps in Python, the programming language you need to master in Data and Cybersecurity!

Sign up
Back to the blog homepage
The job of Lead Data Scientist
Increase your skills
 Published on 

The job of Lead Data Scientist

Find out in this interview the background of Guilhem, Lead Data Scientist at Host'n'fly, his vision of the job of Data Scientist and his recruitment advice!

Hello Guilhem, what is your background ?

I started with a maths preparatory course after a Bac S. I went to ENSAI, which offered a 3-year engineering course, very data-oriented. The term "Data Science" didn't even exist or hardly existed. When I joined the school in 2013, we were talking about statistical engineers. 

It was not the 'data' that particularly attracted me, but more the assistance to decision making through mathematics. The sector started to evolve and I then saw the professions and branches taking shape, in analysis or in model building for example.

My first job was at Nickel (formerly Compte Nickel, the simplified and universal bank account that can be opened at the tobacconist's, bought by BNP in 2017). I stayed there for 3 years, there were about 50 of us when I joined, including 2 in Data. Three years later there were about 400 of us, including 4-5 in Data.

My first mission was to build a Data environment. The aim was to set up a data warehouse in collaboration with IT in order to be as autonomous as possible, to start monitoring the activities of the various divisions, and to continue with more complex projects. I quickly became familiar with the technology, the idea was to have an ever more robust infrastructure to dump the data from our historical system into Google Cloud Platform, which we used for Data and which offered a lot of scalability and potential, at a lower cost (pay per use model).

Which application cases did you work on? 

Fraud was becoming an important issue.

My job was to provide a tool for prioritising anti-fraud / terrorist financing and money laundering alerts. There were a lot of analytical questions coming from the back office that needed to be understood: what are the elements that will make me think it's a fraud? Which suspicious transaction should I prioritise? 

The greatest difficulty was that the final deliverable (on which we iterated a lot) had to take into account several constraints: the choices had to be understandable by the teams and the supervisory authorities, and we had to show that the gains in terms of processing could be very significant.

We also had a subject of churn (knowing whether one of our customers will stay with us from one year to the next or not): being able to estimate the probability that a given customer will no longer use our product the following year is an exciting subject and one to which Data had a lot to contribute, because the business stakes were very high.

On these large subjects, several of the following were usedMachine Learning algorithms.

When I left, we were beginning to be quite mature in Data: we had passed the "simple" stage of reporting / using predictive models in batch mode, and we had begun to set up real-time Data microservices for IT, which could query them to get an answer to certain specific problems. 

The fact that I was autonomous on the issues and had a free hand brought me a lot. In addition to this, I had a lot of BI missions (reporting for different departments / ad hoc analyses), an exercise in popularisation / communication was often necessary. It was very formative.

Then you came to Host'n'Fly (The Airbnb Concierge)!

A complete change of sector, from banking to tourism! 

Host'n'Fly allows you to manage the rental of your home from A to Z and generate income during your absence.

I chose this company because I was really looking for a position closer to production, a company where Data was at the heart of its business, as opposed to a company where Data is more of a "support" function (where the product can function very well without it)

In my opinion, a very important condition when joining a company is to recognise yourself in the product and to use it: this was the case for Host'n'Fly.  

My main mission is to seek to optimise our revenues by pricing our flats, while guaranteeing the satisfaction of our customers (the people who rent out their flat on Airbnb). This is the main problem in Data: finding the right price to generate maximum revenue and to ensure that our customers are satisfied and continue to entrust us with their accommodation afterwards.

What are the missions you are working on? 

There are a lot of projects going on at the moment that involve A/B testing. We are trying to understand how our travellers react to our different pricing strategies, because with the health crisis, we have decided to go back (at least temporarily) to "simpler" and short-term strategies to make our predictions (the long term being too uncertain). We invite you to read our article oncovid-19 proof AI to understand everything. We analyse the impact of our short-term features a posteriori via our A/B test campaigns and make decisions.

In a more "normal" context, there is always the challenge of opening up new cities, new markets. If historically we were very focused on Paris, a major decision was to choose to develop everywhere in France (and even in Europe recently), especially in the so-called "leisure" areas (seaside resorts and ski resorts).

This is a sector that will grow for us, given that we are starting from scratch to build all the algorithms on it. This new sector is also interesting but more complex to deal with from a data point of view, on the one hand because there is less data, and on the other hand, prices are intrinsically more volatile than in a classic urban market like Paris (for example in high season it can easily be x3 on prices). It's a big challenge!

The questions that need to be answered in Data are: if I want to offer Host'n'Fly services in Biarritz, how much income do I think I can get from a given flat?

What income can they be guaranteed? Once they are clients, what prices do we decide to put the flats on the platforms for travellers? How do we want these prices to vary over time? 

We do a lot of behavioural analysis, particularly in relation to the crisis. We notice, for example, that people don't book very far in advance now. We also have to adapt to the data.

What is the lifecycle of a machine learning project in your company?

The stages are fairly standard: definition of business needs and translation into Data problems (what output do we want to have), data collection, exploratory analyses, cleaning, feature engineering, training / fine-tuning of the model, performance evaluation, production launch, monitoring. 

Data at HostnFly works in collaboration with the various teams, but is autonomous throughout the lifecycle. As far as deliverables are concerned, we work a lot in an iterative way: we quickly try to arrive at an initial result to define a base, and then to improve it.

Taking up the issue of revenue estimation in new markets, we are currently rethinking our approach to make it more scalable. Given the large number of future zones to be launched, we want to change our historical approach from a "one algo per city" approach to a "generic algo" that is self-adapting for each zone.

The first steps (data collection, exploratory analysis to understand the data, cleaning) are really key because we have less data available and it is very heterogeneous.

We also pay particular attention to model validation: the most important thing for us, beyond the pure performance of the algo with regard to classic data metrics, is that our output makes sense from a business point of view (e.g. it would be inaccurate for a studio to be more expensive than a villa in the end). Also, we will rarely try to optimise the algo's performance at all costs if the gain does not seem interesting enough (gain/time ratio). The objective is to arrive at a satisfactory version quickly, and possibly to iterate afterwards.

In short, we are quite involved in business development because of the need to produce estimates.

Are you in charge of production at Host'n'Fly? 

As three employees in Data, we are autonomous, with a data "backend" that we manage ourselves and to which we have access. It's a mix between the three jobs of Data Analyst, Scientist, Engineer. 

We have our own models running in production, which allow us to generate revenue estimates. The IT backend then takes care of retrieving these prices and pushing them onto our platforms. 

Our daily focus is to ensure that for each flat that is entrusted to us for future dates, we manage to predict each of its nights and make them available to the IT backend.

And if we take a step back, our daily driver is to find ways to optimise the income from our flats, to improve our customer satisfaction and therefore the sustainability of Host'n'Fly.

When you left school, did you expect to be confronted with production issues?

Taking into account the need and translating it into a data problem is important and must benefit the company. That's why I love doing this job. However, I have seen that the organisation in other companies can be quite different, where a single team is in charge of doing only the models for example. 

Being able to put your code and models into production is very rewarding and satisfying for a data scientist.

A Data Scientist who is not going to touch production at all or who is not going to be interested in the whole aspect of production, scalability of operations, will be less interesting for companies in my opinion, at least in smaller structures such as start-ups / SMEs / datalabs. 

At Jedha? Learn how to put your Machine Learning algorithms into production with our Data Science program then add comprehensive Data Engineering & DevOps skills with our training to become a Data Engineer.

What does the Data Scientist of tomorrow look like? 

In my opinion, the data scientist of tomorrow has many skills and tools at his disposal. And the challenge is to determine which ones he or she will be able to use to solve the business problem.

The tech side is taking a big turn in the skills that are in demand. 

I find that there are fewer and fewer companies where the sector is segmented, there are more and more Data Labs where the profiles sought are multi-taskers. 

There has also been a lot of change in the cloud, 90% of the players are asking for AWS or GCP skills (at least this is a big plus in job offers). 

What would be your advice on how to analyse a job description?

The longer the job description, the more doubtful I will be. 

When it says "excel" in the skill set and some Python, I know I'll never do Python. The trap job description for a data scientist is one where the 'optional' languages are Python / R. 

Because they are the ones that are used the most: there are unlikely to be many Data Science opportunities in this position, unless there is everything to build on the spot and the autonomy is there.

Then, it depends on the individual's appetite. If the main mission of the data scientist is not specified (e.g. a mission to improve a particular algorithm or product), this often means that it is a "support" position. The topics may be just as exciting, but you need to dig deep and find out about potential future topics. 

Personally, I have always focused my research on the contribution that data has on the company's product, and where the use of the cloud internally is important. I also look at whether the technical environment is specified. If it's AWS, GCP that's a positive for me. It's also very reassuring to see the tools used internally, for example, Python, Git, SQL, which are the most used tools at the moment on this kind of job.

Myriam Emilion
Written by
Myriam Emilion
Marketing Director

Programmes, Data themes, admissions,
have a question?

Make an appointment with our teams