We are seeking a Data Scientist to support infrastructure for homogenization and integration of omics related data. Data comes in different formats and the different programs typically have to be carefully put into workflows where the output format from one method needs to be changed to the input format of downstream methods. Furthermore, Biomedical research is characterized by a high degree of heterogeneity. Data formats and standards exist in a chaotic tapestry, reflecting the diverse nature of experimental platforms and analytical tools. This heterogeneity necessitates laborious manual processing, a
process that is both time-consuming and prone to errors. There are “workflow wrappers” for some tasks, however, these are mainly suitable for sufficiently standardized tools using standardized data formats. However, even with these wrappers, most of the bioinformatician’s time in a project goes into this manual gluing of different tools with the data at hand. The efforts in this job concern tackling these challenges.
The Data Scientist will be located in the Gorodkin lab (
http://ivh.ku.dk/bioinformatics), Center for non-coding RNA in Technology and Health (RTH), (
http://rth.dk) at Department of Veterinary and Animal Sciences (IVH) (
http://ivh.ku.dk/english), Faculty of Health and Medical Sciences at University of Copenhagen and will over time closely interact with a range of national bioinformatics research groups and in the context of ELIXIR, both the national node (
https://elixir-denmark.org/) and the international node (
https://elixir-europe.org/). The Data Scientist will be a part of the Novo Nordisk Foundation funded BioGLUE infrastructure funded project.
Start date is May 1st, 2025 or as soon as possible thereafter. The position is time-limited to one year, but with the possibility of making it permanent employment thereafter.
Our research and research group The bioinformatics group has a strong profile in data analysis, algorithm development and in building bioinformatic tools, many involving computational RNA biology also relevant for this position. We have excellent computational infrastructure and access to supercomputing when needed. Our research environment is highly dynamic and international and stimulating with a wide range of activities from seminars, workshops, summer schools and retreats.
Your role and key responsibilities
In this job, you will address the "gluing" of diverse and heterogeneous datasets from both a general perspective and in specific use cases matching parts of the Danish national bioinformatics environment. A key focus will be to leverage Large Language Models (LLMs) to facilitate the "gluing" of diverse and heterogeneous datasets, enabling intelligent and scalable integration workflows. The specific use cases comprise data in areas concerning RNA structure, CRISPR, mass spec, microbiome, genomics, transcriptomics and personalized medicine. Additionally, the role will also involve server and software maintenance tasks including ensuring a seamless computational infrastructure.
Your key responsibilities will be to
- Utilize LLMs to enable homogenization of heterogeneous datasets for seamless integration at a general level.
- Work on and contribute to the use cases and the design relevant workflows for these.
- Ensure advancements in LLM applications enters the infrastructure when relevant.
- Document implementations, pipelines, methods and infrastructure configurations.
- Collaborate with the ELIXIR related bioinformaticians and their research groups to evaluate the integrated data resources and their applicability.
- Manage, maintain, and improve data storage solutions and computational infrastructure for bioinformatics workflows. In here contribute to build server capacity with connection to external supercomputing when/if needed.
- Provide access the to the resources built within the infrastructure.
- Incorporate user experience.
- Provide training events both online and physically.
You and your qualifications If you are enthusiastic about the described role, are highly dedicated and possess strong skills you can well be the right candidate key to obtain successful results. You will work in a bioinformatically multi-disciplinary environment and collaborate with other researchers internally in the group and with the collaborating groups in the Danish ELIXIR node and European ELIXIR hub.
Essential experience and skills:
- You have completed a Master degree in bioinformatics, computer science or in a similar area
- You are highly experienced in Python and R
- You have strong experience with the Linux/Unix environment, command lines and shell scripting
- Experience with running local LLMs, ideally through frameworks such as LangChain or LlamaIndex.
- Experience with machine learning.
- Familiar with omics data analysis
- Familiar with data standards & API systems
- You have proficient communication skills
- You have excellent English skills written and spoken
- You are emphatic and a strong and supportive team player who can navigate in a multi-disciplinary context.
Key selection criteria:
- A PhD degree in bioinformatics, computer science or in a similar area
- Professional qualifications relevant for the position
- Relevant work experience
- Publications
- Language skills
- Creativity
What we offer - Great opportunities to grow professionally.
- Being on the forefront of tomorrows data tools for seamless data integration in range of rapidly growing bioinformatics areas.
- Unique network possibilities within the Danish and European bioinformatics communities.
- Opportunity to work in a highly international network (ELIXIR).
Place of employment The place of employment is at the Department of Veterinary and Animal Sciences, University of Copenhagen.
Grønnegårdsvej 15, st.
2000 Frederiksberg
Terms of employment The average weekly working hours are 37 hours per week.
The starting date is May 1st, 2025 or as soon as possible thereafter.
The position is time-limited to one year, with the possibility of permanent employment.
Employment will be as Data scientist (research consultant). Salary, pension and other conditions of employment are set in accordance with the Agreement between the Ministry of Taxation and AC (Danish Confederation of Professional Associations) or another relevant organisation. Currently, the monthly salary starts at 46,145. DKK/approx. 6,186 EUR (April 2024 level). Depending on qualifications, a supplement may be negotiated. The employer will pay an additional 17.1 % to your pension fund.
Questions For further information about the scientific content of the position please contact Professor Jan Gorodkin, email
[email protected], phone +45 23375667 or for application procedure and formalities, please contact HR Officer, email
[email protected];
www.sund.ku.dk Foreign applicants may find this link useful:
www.ism.ku.dk (International Staff Mobility).
Application procedure Your online application must be submitted in English by clicking ‘Apply now’ below. Furthermore, your application must include the following documents/attachments – all in PDF format:
- Motivation letter of application. In this letter you must briefly detail, one by one, how you meet the requirements for each item listed under “Essential experience and skills” and “Desirable experience and skills”.
- CV incl. education, work/research experience, language skills and other skills relevant for the position.
- A certified/signed copy of Master certificate including transcripts, and if obtained, then also PhD certificate
- List of publications.
- Personal Recommendations. If none, please explain why it has not been possible to obtain.
Deadline for applications: February 11th 2025, 23.59pm CET We reserve the right not to consider material received after the deadline, and not to consider applications that do not live up to the above-mentioned requirements.
The University of Copenhagen wish to reflect the diversity of society and encourage all qualified candidates to apply regardless of personal background.