Career Profile
As a Senior ML Engineer with a robust background in Full Stack Development, I bring a versatile skill set to the realms of artificial intelligence and data engineering. My expertise encompasses AWS, Text Analysis, Retrieval-Augmented Generation (RAG), chatbot development, AI implementation, and comprehensive data engineering solutions.
What sets me apart is my ability to quickly adapt to new technologies, keeping me at the forefront of the ever-evolving tech landscape. This adaptability, combined with my full-stack experience, allows me to tackle machine learning projects holistically, ensuring seamless integration from backend algorithms to user-facing applications.
Experiences
As a Senior Machine Learning Engineer at Merck KGaA, I drive innovation in AI and machine learning, focusing on Text Analytics, Large Language Models (LLMs), and AWS solutions. I play a pivotal role in my team, leveraging my expertise in AWS and Foundry technologies. With strong skills in Python and a flexible mindset for new technologies, I support the Data Science team by enhancing infrastructure and deployment processes. My role includes a variety of tasks, supported by key achievements that demonstrate my commitment to advancing our AI capabilities.
- Developing large language model (LLM) use cases to support business needs, including Retrieval-Augmented Generation (RAG) pipelines, chatbots, and question-answering systems.
- Experienced in LLM evaluation frameworks such as TrueLens and RAGAS.
- Skilled in LLM tracing and observability using LangFuse.
- Experienced in OpenAI Specification to Develop Model APIs for LLMs Tools.
- Experienced in Architecting and Deploying Solutions in AWS using IaC and Azure DevOps.
- Mastering the creation of API services with advanced technologies like FastAPI, Docker, Redis, and Celery.
- Proficiently managing the complete data collection, processing, and analysis lifecycle to address diverse business challenges.
- Implemented multiple solutions for text analaysis using NLP for complex business problems, including Q&A systems, product prediction, and keyword extraction.
- Demonstrating the ability to translate complex business issues into practical technical prototypes for research and development.
- Adapting to a wide range of databases, such as MySQL, MongoDB, PostgreSQL, Redis, and BigQuery.
- Collaborating effectively with cross-functional teams to ensure successful project execution.
- Possessing extensive expertise in various data sources, including Dimensions, PitchBook, SocialGist, PubMed, Scopus, clinicaltrials.org, and others.
- Leveraging AI capabilities, such as ChatGPT, to enhance data-driven decision-making and provide valuable insights for solving complex problems.
As a vital member of the Data Analytics team, my responsibilities encompass industrializing data feeds, crafting pipelines into established data systems, and fostering seamless connectivity between external and internal data sources, all while facilitating the integration of cutting-edge technologies.
- Proficiently managing the complete life cycle of Data Collection, processing, and Analysis to address a range of business challenges
- Demonstrated expertise in designing and crafting customized frameworks to enhance Data Extraction and Analysis.
- Proven ability to translate complex business problems into practical technical prototypes.
- Hands-on experience in implementing custom Named Entity Recognition (NER) methods to extract entities from biomedical text data.
- Adept at automation, utilizing Python and Browser Automation through PySelenium to streamline various tasks.
- Proficiency in developing and configuring Apache Airflow-based pipelines for efficient data workflows.
- Skilled in creating API services and Front-end Applications using Flask and Django.
- Solid understanding of deploying and managing services on Linux machines.
- Successfully deployed services such as Apache Airflow, Label Studio, Django, and Flask.
- Versatility in working with multiple databases, including MySQL, MongoDB, and PostgreSQL.
- Familiarity with Agile methodologies and ticket-based environments like Jira, Bitbucket, and ServiceNow
- Extensive knowledge of data sources, including PubMed, Scopus, clinicaltrials.org, and more.
- Committed to ongoing exploration of methods and technologies to enhance and optimize solutions, both current and past.
As a Senior team member, I have led pivotal Legal Domain projects, optimizing data extraction and analysis through a user-friendly UI platform. I create dynamic Python code, use image processing tools like OpenCV and Pillow, and employ web browser automation for efficient data mining. I’ve developed API services and custom NER methods, enhancing productivity, while also leading, training, and managing multiple projects.
- Proficiently managed Data extraction and Analysis within the Legal Domain, delivering critical insights.
- Employed dynamic, template based Python code, significantly reducing development efforts.
- Established API Services to offer data services to other applications and teams.
- Developed and deployed a user friendly UI based Data extraction Platform, streamlining data retrieval from diverse websites.
- Utilized OpenCV and Pillow libraries for Image processing, supporting the frontend team in detecting and cropping human faces from profile pictures.
- Implemented Web Browser-based Automation using Node.js and Selenium for efficient data mining automation.
- Designed a custom Named Entity Recognition (NER) method using Spacy to extract user related information such as names, qualifications, and job titles.
- Achieved a substantial increase in the production speed, scaling up from 4 new domains per day to an impressive 10 per day.
- Actively led and trained team members to stay updated with the latest technological advancements.
- Effectively managed multiple independent projects and collaborative team endeavors.
As a Data Engineer with extensive experience, my role encompasses:
- Profound expertise in e-commerce datasets, leveraging this knowledge for insightful data analysis.
- Development of multiple data parsing methods using a customized framework for efficient data extraction.
- Handling multilingual datasets and deep familiarity with leading e-commerce websites.
- Execution of automation tasks, including extensive data processing, automated data validation, and quality assurance.
- End-to-end client project management, from scheduling and monitoring to exporting and validating results.
- Implementation of comprehensive automated solutions for projects with complex multi-stage workflows.
- Effective use of technologies like Node.js and Selenium for data retrieval and browser automation.
- Collaborative work with multiple technology teams, contributing to platform improvements and rigorous testing.
- Conducting both manual and automated code auditing and data auditing as part of a platform migration from V1 to V2.
- Providing continuous training and guidance to executive and development teams to facilitate smooth transitions and keep up with evolving technologies.
As a Python Programmer, my expertise spans a wide array of challenging problem-solving areas:
- E-commerce
- Familiar with ecommerce product data structures and management.
- Developed over 200 Python scripts specifically for precise data extraction at high SLA Environment.
- Leveraged distributed Hadoop architecture for efficient data extraction.
- Managed and analyzed multilingual datasets effectively.
- Executed end-to-end automation processes on diverse e-commerce datasets.
- Legal
- Possess comprehensive knowledge of legal datasets, having worked on over 300 websites.
- Managed automated data cleaning, validation, and matching processes.
- Supported the implementation of disambiguation solutions to improve data accuracy.
- Life Science (Research Outputs)
- Deep understanding of multiple data sources in bioinformatics.
- Extracted data from diverse sources, including PubMed, Clinical trials, Scopus, grants, and patents.
- Implemented methods to extract information such as author details, affiliations, and products from text data.
- Proficient in handling custom NLP Models.
- Deep experience with paid datasets like Dimensions, Pitchbook, SocialGist
- Familiarity with author networks and collaborations in the field.
- News Sources
- Constructed a customized data mining platform employing a stack of Python libraries.
- Developed a feature-rich front-end for configuring domains.
- Supported data extraction from HTML, JSON, and RSS feeds.
- Successfully extracted and parsed data from approximately 500 domains on a daily basis.
As a Python Programmer with a strong background in AI, AWS, and data engineering, I have contributed to a variety of individual and team projects, each presenting unique challenges and opportunities for innovation in pipeline development and data-driven solutions
- Project 1, Research Insights Application
- Developed an application to provide insights about authors, institutes, and their research.
- Implemented an end-to-end solution for data extraction and processing from various data endpoints.
- Technology Stack; Python, Pandas, MySQL, MongoDB, and other Python libraries.
- Project 2, Biomedical Data Monitoring and Insights
- Designed a centralized platform to monitor multiple biomedical data sources and extract insights.
- Created pipelines to process data from various sources using a custom framework on top of Apache Airflow.
- Developed a customized framework for news data extraction from diverse domains.
- Implemented common functionality for team-wide use.
- Technology Stack; Python, Apache Airflow, Pandas, Spacy, and other relevant libraries.
- Project 3, COVID-19 Dashboard
- Contributed to the development and maintenance of data pipelines for a detailed COVID-19 dashboard.
- Technology Stack; Python, Pandas, Airflow, MySQL, MongoDB.B
- Project 4, News Data Extraction
- Extracted news from various sources and facilitated daily data delivery to multiple teams.
- Automated end-to-end pipelines to ensure uninterrupted data delivery.
- Technology Stack; Python, Pandas, cron jobs, MongoDB.
- Project 5, Customer Support Assistant
- Developed an assistant chatbot leveraging the Retrieval-Augmented Generation (RAG) framework.
- Implemented end-to-end chatbot development for enhanced customer interactions.
- Technology Stack; Python, Langfuse, Langchain, Ragas, Truelens, Langdock, GPT.
- Project 6, Oligo Sequence Analysis
- Contributed Development of Oligo Sequence Analysis Application.
- Handled end-to-end DevOps role.
- Technology Stack; Python, CloudFromation, Lambda, AWS Batch, Palantir Foundry, Azure DevOps
- Adhoc & Automation Projects
- Supported various adhoc and one-time projects across multiple teams.
- Leveraged various technologies for data processing, visualization, and data mining.
- Technology Stack; Python, Pandas, cron jobs, MongoDB, MySQL, Matplotlib, Plotly, SciSpacy, Spacy, and more
- Legal Data Insights Platform
- Assisted in building an infrastructure capable of handling large datasets and extracting data from numerous sources.
- Technology Stack; Python, Pandas, cron jobs, MongoDB, MySQL, Spacy, Flask, and Django.
- Ugam Solutions
- Collaborated on extracting data from major e-commerce websites like Amazon, Sears, Jet, and Walmart.
- Technology Stack; Python, Hbase, Hadoop, and other distributed concepts.
Personal Projects and Ventures
In addition to my professional work, I've also pursued several personal projects that reflect my passion for innovation and technology. These projects have allowed me to explore my creative side and develop new skills while tackling interesting challenges. Let's delve into some of these exciting endeavors