Hi, I am Muhammad Ghalib Mirza A Data Scientist & Machine Learning Engineer Advisor & Mentor

Yeah! That's Me

Professional Me

I am an Automation and Robotics Graduate with majors in Cognitive Sciences from Technical University Dortmund, Germany with intent to pursue my career Data and Machine Learning Application. I also have a Bachelors degree in Information Technology majoring in Software Development. I have been involved in few Software Development, Data Science and Machine Learning projects. I am interested in exploring Cloud Engineering and MlOps domain, and I am open to opportunities for the same.

What am I up to?

Past Experiences

Lisios GmbH

Data Scientist and Machine Learning Developer (Apr 2022 - Aug 2023)

Lisios GmbH a young startup located in the vibrant city of Cologne with a vision to make an impact not only in the Real Estate domain but in technology as well by developing a hardware IoT device for detecting water leakage in pipelines using telemetric data by analyzing and applying intelligent algorithmic techniques. In the first phase of development we are targeting single family and double family house owners.

My Achievements:

  • Developed a tree-based ML Algorithm for Water Leakage Detection, achieving 85% event detection accuracy, with ∼ 86ms prediction time, resulting in near real-time predictions.
  • Optimized the algorithm for a 2MB ARM Microcontroller, reducing the size from 2.5MB to ∼ 0.55MB, resulting in cost saving for higher memory microcontroller.
  • Automate a Data processing pipeline using Python scripts, reducing overall temporal cost by 35%.
  • Created and managed Grafana dashboards for analytics, helping the Team Lead in tracking the device and data performance, resulting in a 10% overall process boost.
  • Managed InfluxDB (NoSQL) in Docker for efficient storage of time series data.
  • Incorporated Azure IoTHub and MQTT as IoT broker resulting in seamless data communication.

                 
Learn More

ADITION Technologies AG

Research Student (Mar 2019 - Feb 2020)

Click Fraud Detection Using Machine Learning This project was aimed at detecting the click-fraud events from the unlabeled click/view log data from the ad-server with machine learning.
The objective is obtained in four different stages:

  • Feature Engineering.
  • Big data Analysis/ Exploratory Analysis.
  • Manual labeling the examples using custom-defined rules.
  • Machine learning Application.

My Achievements:

  • Implemented Supervised and Semi-Supervised ML approaches with 89% fraud detection accuracy, with ∼ 167ms prediction time, enabling real-time predictions, resulting in cost reduction for customers by about 7%.
  • Employed ETL on Hadoop server using spark accessing ad-campaign data, reducing querying time by about 10%.
  • Labeled 20% of the data using custom defined-rules and 60% using Semi-Supervised Self-Training Algorithm, from a total of 2+ Million data points, reducing labeling time by about 40% with the accuracy of 70%

In this research supervised and semi-supervised machine learning methods are used. Supervised on the available labeled information and semi-supervised to used the available labeled and the remaining huge part of the unlabeled data. In both methods, algorithm is trained with and without the features on which the rules are based and the ones that are computationally expensive to calculate on run-time.

               
Learn More

GE Healthcare IT

Software Engineering Intern (July 2018 - Dec 2018)

Centricity Cardiovascular Workflow The CCW is a medical application which offer comprehensive collection of tools for data and information management in the cardiovascular department. It tracks and manage inventory, creates structured clinical reports and runs administrative and clinical queries. It features ECG, supports decision making with proven scientific algorithms and measurements validated against thousands of records in multiple peer reviewed studies and editing tools developed with leading cardiology hospitals.

My Achievements:

  • development of tools for the Dependency Analysis in CCW
  • watching and analysing the performance of cardio workflow system.

Last but not the least I also studied how can Machine Learning be integrated in the system for better decision making and provide decision support to the doctors.

   
Learn More

RIF Institut für Forschung und Transfer e.V.

Student Assistant (Jan 2018 - June 2018)

RIF implements the latest research findings in simulation and virtual reality technology directly into products. Findings from microstructure technology, material technology and testing make it possible to improve products and make them sustainable. Innovative tools from quality management, ergonomics and logistics as well as automation solutions help companies to increase their productivity. The holistic approach of the institute is rounded off by projects in industrial marketing as well as innovative controlling concepts and modern methods of personnel development.
At RIF I was working in Robotics team and helping them in:

  • CAD Modeling of Automation Systems for Simulation System (VEROSIM)
  • Literature Research for Scientific Projects.

 
Learn More

Featured Projects

Water Leakage Detection Water Leakage Detection

Analysis of sensor data to detect leakage in water pipelines incorporating Machine Learning

         

This project investigates the telemetric sensor timeseries data from an IoT device. The aim of this project is to be able to predict the events in the real-time based on the sensor temperature data that is collected from the device. Based on the data collected the algorithm is able to predict the four different types of events and provides ~85% of the event detection accuracy in the real environment. The training of the Machine Learning algorithm is trained on both, the lab augmented data and the real environment data.




Thesis Project Thesis Project Thesis Project

Click Fraud detection using Machine Learning

               

The process of deliberately making illegitimate clicks to generate revenue. Human and non-Human trained botnets are hired or developed by clickfraud specialists to maximize the revenue of specific users and drain advertiser’s advertising budget using the ads publish on their websites, or to launch an attack between competing businesses. Click fraud costs, advertising companies, billions of dollars in the lost advertising budget every year. Yet despite efforts to reduce this budget waste, click fraud is still set to rise over the upcoming years. Digital marketing involves a massive amount of data from advertisers, publishers, and users in the form of logs. Modern data exploration, in combination with machine learning methods, enables us to analyze big data to identify these frauds efficiently.
The novelty of this thesis is first to label the data manually using the custom-defined rules from the extracted features. The features extracted are domain-dependent and reflect normal human behavior. Moreover, the data which is still unlabeled after applying rules for labeling is then used along with labeled data in a semi-supervised training fashion using a self-training semi-supervised algorithm, and the results compared with the traditional supervised approach. To answer the question that does the algorithm learns from the weak descriptive features, both these experiments conducted in the presence and absence of the features that are either important to an algorithm or are computationally expensive to calculate on the real-time system. Another question, can we use semi-supervised techniques in clickfraud detection effectively, is also answered in this thesis.

University Project University Project University Project

Analysis of Hyperspectral Image processing using Machine Learning

       

This project investigates the utilization of machine learning techniques for the analysis of the hyperspectral image data which yields information about the chemical composition of the imaged material. An adequate amount of training examples have been collected from the sample hyperspectral image using spectral unmixing which attempts to model an observed spectrum by reconstructing it as a mixture of reference material spectra, the so-called endmembers. These endmember spectra are usually acquired in the laboratory. Spectral unmixing can be divided into two subproblems: (1) Determination of the endmembers whose spectra are contained in the observed spectrum and (2) computation of the relative fractions of the comprised endmembers. Data Augmentation techniques have been employed as a way to enlarge the size of data set, for better generalization. Various classifiers and learning architectures including support vector machine (SVM) classifier and the deep neural network (DNN) have been trained to predict the endmembers present in the hyperspectral image. The task has been formulated as a classification problem and the performances have been compared. The trained networks have been tested. A statistical analysis of the ability of the trained networks to successfully predict the material present in the image which performs well on unseen data, has been presented.

Power BI sales Dashboard

Sales Dashboard - PowerBI

This dashboard is a result of the online course Introduction to Data Modeling for PowerBI that I have taken from SQL BI. The course was 2 hours long mainly focuses on data modeling in different scenarios.

Education

Technische Universität Dortmund

Masters in Automation and Robotics (2015 - 2021)

The Master's degree program in Automation and Robotics provides the necessary fundamentals for a professional career in the information age fields of Automation, Robotics and Cognitive Science. This English language degree program is aligned with international benchmarks, offering very good teaching, well-equipped modern laboratories, and opportunities for application oriented research. Interdisciplinarity is built into its structure, with active participation by the faculties of Mechanical Engineering, Electrical Engineering and Information Technology, Computer Science, Biochemical and Chemical Engineering, and Mathematics as well as the Robotics Research Institute and the Fraunhofer Institute for Logistics. The challenging range of courses enables students to develop their strengths and thus form their own profile in one of three major fields:

  • Cognitive Science
  • Process Automation
  • Robotics

Read more about the course!
GCUF

Government College University Faisalabad

Bachelor of Science in Information Technology (2010 - 2014)

The Bachelor's degree program in Information Technology is an extremely diverse program to realize the success in IT domain. This program is based on thoery as well as the training. It covers fundamentals of Coumputer Science, Computer Networks, Software Development, Database and also Business Aspect of IT. The vast range of courses enables students to develop skills in one of three major fields:

  • Software Engineering
  • Communication and Networks
  • Database Administration and Management

Read more about the course!

Affiliations

I have been associated with PSA Dortmund since 2017, initially as a volunteer for the cause up until Feb 2019, when we had our first election and I became the president and remained the President until Sep 2020. In the next election I stepped down to become an Advisor for the term 2020/21. During my time with PSA Dortmund I started a campaign to mentor and help incoming students to dortmund in their difficulties which I am still operating.
Apart from that I was also became the President for DEGIS Dortmund to take care of the International Student Community in Dortmund. Click on the following links to know more.