The Data Scientist has experience in machine learning/deep learning techniques applied to structured and unstructured data (i.e. textual data).
The candidate will research and develop new statistical and machine-learning methods for the analysis of medical reports, applying Natural Language Processing (NLP) techniques to real-world healthcare data.
The candidate will contribute to developing new technologies for data synthesis and digital twins using a wide variety of machine learning and deep learning methods; investigating various research topics in machine learning and statistics to determine the best method for medical data synthesis and effective approach for generated data validation.
The AI Center of Humanitas is focused in research in the field of Artificial Intelligence applied in healthcare. Research and development areas include predictive and decision-support systems based on data-driven model (ML/DL models) to optimize clinical processes and ultimately improve the quality of patient care. We are a team of multidisciplinary scientists who work day by day on e-health and AI projects, by collaborating with clinical staff (doctors, nurses, researchers) and management staff.
Responsibilities and Main activities
- Collaborate in research and development of innovative generative data models for effective synthetic data generation and digital twins in healthcare;
- Development of statistical, machine learning and deep learning models on medical data, including time-series/longitudinal data;
- Explore, define and support the clinical validation of the statistical and machine learning models applied to real-world data;
- Exploratory data analysis and integration of highly fragmented data;
- Explore the possibility and application of Large Language Models (LLM) in healthcare;
- Visualize data, report effective results and derive useful knowledge using a data-driven approach;
- Collaborate with international partners in both private industry and academia;
Skills and Qualifications
- Experience with textual data and NLP techniques and models (named entity recognition, document clustering, summarization, text classification, MLM, LLM, etc);
- Experience in developing machine learning and deep learning techniques and algorithms (such as k-NN, Naive Bayes, Support Vector Machines, Random Forests, etc) in healthcare, also applied to time-series/longitudinal data;
- Good knowledge of generative AI and LLM;
- Experience in developing generative models (e.g. statistical, GAN, VAE, etc.) applied to medical data for synthetic data generation and digital twins;
- Good knowledge of Computer Vision and/or NLP is appreciated;
- Understanding of data structures, data modeling and software architecture;
- Experience in applied statistics skills, such as distributions, statistical testing, regression, etc;
- Good scripting and programming skills;
- Good proficiency in Python, R programming languages;
- Experience with data science frameworks (e.g. tensorflow, pytorch, scikit learn, scipy, pandas, numpy) and visualization frameworks (e.g. plotly, seaborn, matplotlib);
- Experience with cloud (GCP, AWS, Azure) and/or distributed computing is appreciated;
- Knowledge of database systems and data lakes, good knowledge of SQL is appreciated;
- Knowledge of MLOps practices, IT infrastructures, back end frontend development is appreciated;
- Master (PhD would be a plus) in a STEM discipline;
- Fluent in written and spoken English and Italian;
Soft Skills
- Excellent team-working capabilities even with colleagues from different research areas and backgrounds;
- Strong self-motivation, commitment and proactive approach;
- Ability to meet deadlines and work autonomously in rapidly changing environments;
- Curiosity and ability of stepping outside your comfort zone.
Why you should consider this opportunity
Humanitas Research Hospital is investing in data driven research and development of clinical support tools based on AI, you will contribute to the design and application of breakthrough technologies to be deployed in advanced clinical institutions.
You will get access to an extraordinary group of talented and passionate people coming from fields ranging from clinical sciences and healthcare management to informatics, bioinformatics and systems biology, including our extended international research network.
All candidate data collected from the application shall be processed in accordance with applicable law: Dlgs 198/2006 e dei Dlgs 215/2003 e 216/2003; privacy ex artt. 13 e 14 del Reg. UE 2016/679.