Career Profile
Experienced Data Analytics Lead with a proven knack for delving into vast datasets and harnessing advanced data tools. Proficient in the latest technologies like AWS, Azure, and Google BigQuery, I excel at optimizing Data Engineering and Analytics workflows for maximum efficiency. My robust domain expertise and adept leadership abilities empower me to guide teams effectively, translating data insights into strategic decision-making. I am unwavering in my dedication to staying at the cutting edge of the industry, perpetually engaging in ongoing learning, and consistently delivering outstanding outcomes. My unbridled enthusiasm for data analytics fuels groundbreaking growth and innovation within organizations, rendering me an indispensable catalyst for data-driven success.
Experiences
As a Data Analytics Lead, I spearhead the entire data life cycle, from collection to analysis, addressing a spectrum of business challenges. My role involves crafting custom data frameworks, pioneering NLP solutions for complex problems, and leveraging AI capabilities like ChatGPT for enhanced decision-making. I excel in deploying cutting-edge technologies, creating API services, and managing services on Linux machines while collaborating seamlessly with diverse teams to drive successful project execution.
- Building custom OPENAI Based RAG Pipeliens, Chatbots and Question Answering, etc
- Expriance in Truelens Evaluation, Langfuse for LLM evaluation and tracing
- Proficiently orchestrating the entire Data Collection, Processing, and Analysis life cycle to tackle diverse business challenges.
- Showcasing expertise in designing and tailoring bespoke frameworks to elevate Data Extraction and Analysis capabilities.
- Pioneering NLP solutions for intricate business dilemmas such as Q&A, Product Prediction, and keyword extraction.
- Demonstrating a knack for translating intricate business issues into pragmatic technical prototypes.
- Leveraging various cloud technologies like AWS, GCP, and Azure to develop and deploy applications.
- Mastering the creation of API services using cutting-edge technologies like FastAPI, Docker, Redis, Celery, and more.
- Ensuring seamless deployment and management of services on Linux machines, encompassing Docker and AWS environments.
- Adapting to a wide array of databases, including MySQL, MongoDB, PostgreSQL, Redis, and BigQuery.
- Possessing extensive expertise in diverse data sources, including Dimensions, Pitchbook, SocialGist, PubMed, Scopus, clinicaltrials.org, and others.
- Collaborating effectively with multiple teams, driving them towards successful project execution.
- Leveraging the capabilities of AI, such as ChatGPT, to enhance data-driven decision-making and provide valuable insights for complex problem-solving.
As a vital member of the Data Analytics team, my responsibilities encompass industrializing data feeds, crafting pipelines into established data systems, and fostering seamless connectivity between external and internal data sources, all while facilitating the integration of cutting-edge technologies.
- Proficiently managing the complete life cycle of Data Collection, processing, and Analysis to address a range of business challenges
- Demonstrated expertise in designing and crafting customized frameworks to enhance Data Extraction and Analysis.
- Adept at automation, utilizing Python and Browser Automation through PySelenium to streamline various tasks.
- Proven ability to translate complex business problems into practical technical prototypes.
- Hands-on experience in implementing custom Named Entity Recognition (NER) methods to extract entities from biomedical text data.
- Proficiency in developing and configuring Apache Airflow-based pipelines for efficient data workflows.
- Skilled in creating API services and Front-end Applications using Flask and Django.
- Solid understanding of deploying and managing services on Linux machines.
- Successfully deployed services such as Apache Airflow, Label Studio, Django, and Flask.
- Versatility in working with multiple databases, including MySQL, MongoDB, and PostgreSQL.
- Familiarity with Agile methodologies and ticket-based environments like Jira, Bitbucket, and ServiceNow
- Extensive knowledge of data sources, including PubMed, Scopus, clinicaltrials.org, and more.
- Committed to ongoing exploration of methods and technologies to enhance and optimize solutions, both current and past.
As a Senior team member, I’ve led pivotal Legal Domain projects, optimizing data extraction and analysis through a user-friendly UI platform. I create dynamic Python code, use image processing tools like OpenCV and Pillow, and employ web browser automation for efficient data mining. I’ve developed API services and custom NER methods, enhancing productivity, while also leading, training, and managing multiple projects.
- Proficiently managed Data extraction and Analysis within the Legal Domain, delivering critical insights. - Developed and deployed a user-friendly UI-based Data extraction Platform, streamlining data retrieval from diverse websites. - Employed dynamic, template-based Python code, significantly reducing development efforts. - Utilized OpenCV and Pillow libraries for Image processing, supporting the front-end team in detecting and cropping human faces from profile pictures. - Implemented Web Browser-based Automation using Node.js and Selenium for efficient data mining automation. - Established API Services to offer data services to other applications and teams. - Designed a custom Named Entity Recognition (NER) method using Spacy to extract user-related information such as names, qualifications, and job titles. - Achieved a substantial increase in the production speed, scaling up from 4 new domains per day to an impressive 10 per day. - Actively led and trained team members to stay updated with the latest technological advancements. - Effectively managed multiple independent projects and collaborative team endeavors.
As a Data Engineer with extensive experience, my role encompasses:
- Profound expertise in e-commerce datasets, leveraging this knowledge for insightful data analysis.
- Development of multiple data parsing methods using a customized framework for efficient data extraction.
- Handling multilingual datasets and deep familiarity with leading e-commerce websites.
- Execution of automation tasks, including extensive data processing, automated data validation, and quality assurance.
- End-to-end client project management, from scheduling and monitoring to exporting and validating results.
- Implementation of comprehensive automated solutions for projects with complex multi-stage workflows.
- Effective use of technologies like Node.js and Selenium for data retrieval and browser automation.
- Collaborative work with multiple technology teams, contributing to platform improvements and rigorous testing.
- Conducting both manual and automated code auditing and data auditing as part of a platform migration from V1 to V2.
- Providing continuous training and guidance to executive and development teams to facilitate smooth transitions and keep up with evolving technologies.
As a Python Programmer, my expertise spans a wide array of challenging problem-solving areas:
- E-commerce
- Designed a custom framework for Data Extraction, optimized for distributed architecture.
- Developed over 200 Python scripts tailored for precise Data Extraction.
- Proficient in managing Multilingual datasets.
- Executed various end-to-end automations on diverse e-commerce datasets.
- Legal
- Possess comprehensive knowledge of legal datasets, having worked on over 300 websites.
- Created a user-friendly UI-based data platform to enhance data retrieval efficiency.
- Managed automated data cleaning, validation, and matching processes.
- Life Science (Research Outputs)
- Deep understanding of multiple data sources in bioinformatics.
- Extracted data from diverse sources, including PubMed, Clinical trials, Scopus, grants, and patents.
- Implemented methods to extract information such as author details, affiliations, and products from text data.
- Proficient in handling custom NLP Models.
- Deep experience with paid datasets like Dimensions, Pitchbook, SocialGist
- Familiarity with author networks and collaborations in the field.
- News Sources
- Constructed a customized data mining platform employing a stack of Python libraries.
- Developed a feature-rich front-end for configuring domains.
- Supported data extraction from HTML, JSON, and RSS feeds.
- Successfully extracted and parsed data from approximately 500 domains on a daily basis.
As a Python Programmer, I’ve been involved in a multitude of individual and team projects, each presenting unique challenges and opportunities for innovation:
- Project 1, Research Insights Application
- Developed an application to provide insights about authors, institutes, and their research.
- Implemented an end-to-end solution for data extraction and processing from various data endpoints.
- Technology Stack; Python, Pandas, MySQL, MongoDB, and other Python libraries.
- Project 2, Biomedical Data Monitoring and Insights
- Designed a centralized platform to monitor multiple biomedical data sources and extract insights.
- Created pipelines to process data from various sources using a custom framework on top of Apache Airflow.
- Developed a customized framework for news data extraction from diverse domains.
- Implemented common functionality for team-wide use.
- Technology Stack; Python, Apache Airflow, Pandas, Spacy, and other relevant libraries.
- Project 3, COVID-19 Dashboard
- Contributed to the development and maintenance of data pipelines for a detailed COVID-19 dashboard.
- Technology Stack; Python, Pandas, Airflow, MySQL, MongoDB.B
- Project 4, News Data Extraction
- Extracted news from various sources and facilitated daily data delivery to multiple teams.
- Automated end-to-end pipelines to ensure uninterrupted data delivery.
- Technology Stack; Python, Pandas, cron jobs, MongoDB.
- Adhoc & Automation Projects
- Supported various adhoc and one-time projects across multiple teams.
- Leveraged various technologies for data processing, visualization, and data mining.
- Technology Stack; Python, Pandas, cron jobs, MongoDB, MySQL, Matplotlib, Plotly, SciSpacy, Spacy, and more
- Legal Data Insights Platform
- Assisted in building an infrastructure capable of handling large datasets and extracting data from numerous sources.
- Technology Stack; Python, Pandas, cron jobs, MongoDB, MySQL, Spacy, Flask, and Django.
- Ugam Solutions
- Collaborated on extracting data from major e-commerce websites like Amazon, Sears, Jet, and Walmart.
- Technology Stack; Python, Hbase, Hadoop, and other distributed concepts.
Personal Projects and Ventures
In addition to my professional work, I've also pursued several personal projects that reflect my passion for innovation and technology. These projects have allowed me to explore my creative side and develop new skills while tackling interesting challenges. Let's delve into some of these exciting endeavors