Dhinakar Yalla — Data Engineer

Where I've worked

Experience

Databricks

Data Engineering Intern

Jan 2025 – Jun 2025
Remote, USA

Built Delta Lake-based ELT pipelines processing 2TB+ of daily event data using PySpark and Databricks Workflows — improved data freshness SLAs from 4 hours to under 45 minutes.
Optimized Spark job performance with Z-order and liquid clustering, reducing BI dashboard query scan time by 55%.
Designed a data quality monitoring framework using Great Expectations integrated into CI/CD pipelines, catching schema drift and null violations before production.
Containerized pipeline jobs with Docker and deployed onto Kubernetes clusters, cutting deployment setup time by 65% and enabling full environment parity across dev, staging, and production.

Fractal Analytics

Data Engineering Intern

Aug 2023 – Feb 2024
Chennai, India

Built and maintained ELT pipelines ingesting structured and semi-structured data from 10+ client sources into Snowflake, reducing manual handoff time by 45%.
Designed Spark-based batch processing jobs for 500GB+ datasets, improving job completion time by 30% through partition pruning and broadcast join optimization.
Engineered 15+ features from raw transactional data, reducing feature computation latency by 20% and improving downstream model quality.
Automated pipeline monitoring with Apache Airflow DAGs, cutting mean time to resolution by 40%.

1Stop.ai

Data Science Intern

Feb 2023 – Jul 2023
Remote, India

Built Python and SQL ETL pipelines to automate ingestion from multiple data sources, cutting manual processing time by 35%.
Integrated AWS S3 and EC2 into pipeline workflows, reducing data transfer overhead and improving end-to-end throughput.
Delivered Power BI and Tableau dashboards tracking 5+ KPIs in real time, adopted by business teams for weekly stakeholder reporting.

What I've built

Projects

City Bike Price Prediction

End-to-end ETL and ML pipeline across MySQL, Snowflake, and AWS (S3, EC2). Reduced query latency by 40% through star-schema design and warehouse optimization. Improved prediction accuracy by 25% by catching bad records before model training. Containerized with Docker for one-command deployment.

MySQLSnowflakeAWSDockerML

⌥ View on GitHub ↗

AI-Powered Pneumonia Detection

Deep learning inference pipeline using ResNet50 and CLIP for medical image classification. Added Grad-CAM explainability and automated structured report generation for clinical workflows. Deployed via Streamlit for real-time inference, replacing a manual review process.

ResNet50CLIPGrad-CAMStreamlitPyTorch

⌥ View on GitHub ↗

Cancer Classification Pipeline

End-to-end cancer classification pipeline built with BiLSTM achieving 99.8% accuracy. Integrated PostgreSQL for data storage, Apache Airflow for pipeline orchestration, and deployed an interactive Streamlit app on AWS for real-time inference — turning a research model into a fully production-ready system.

BiLSTMPostgreSQLAirflowStreamlitAWSPython

⌥ View on GitHub ↗

NYC Taxi Trip Analytics

Large-scale data engineering pipeline built on NYC's TLC taxi trip dataset. Ingested and processed millions of trip records using PySpark, applied geospatial and temporal feature engineering, and built an analytics layer to surface insights on trip patterns, demand hotspots, and fare trends across NYC boroughs.

PySparkPythonSQLAWSAnalytics

⌥ View on GitHub ↗

Licenses & Certifications

Certifications

AWS Certified Data Engineer – Associate

Amazon Web Services (AWS)

November 2025

Validated expertise in designing and building data pipelines on AWS, covering data ingestion, transformation, storage, and orchestration using services like S3, Glue, Redshift, Kinesis, and Lake Formation. Demonstrates hands-on ability to architect scalable, production-grade data engineering solutions on the AWS cloud.

AWSS3GlueRedshiftData Engineering

Show Credential ↗

Google Data Analytics

Google

August 2025

Completed Google's professional certificate covering the full data analytics workflow — from data cleaning and preparation to analysis, visualization, and storytelling. Gained proficiency in SQL, R, Tableau, and spreadsheets to derive actionable insights and communicate findings effectively to stakeholders.

Data AnalyticsSQLTableauRBigQuery

Show Credential ↗

edX Verified Certificate for Introduction to Cloud Computing

edX

July 2021

Gained foundational knowledge of cloud computing concepts including service models (IaaS, PaaS, SaaS), deployment models, and the core benefits of cloud infrastructure. Built a solid understanding of how cloud platforms enable scalable and cost-efficient application and data workloads.

Cloud ComputingAWSIaaSPaaSSaaS

Show Credential ↗

Cloud Computing Core

edX

July 2021

Deepened understanding of core cloud computing principles including virtualization, distributed storage, cloud networking, and security. Covered key architectural patterns used in modern cloud-native systems and how enterprises leverage cloud infrastructure for resilience and scalability.

Cloud ArchitectureVirtualizationStorageNetworkingSecurity

Show Credential ↗

Introduction to IoT

Cisco

April 2021

Learned the fundamentals of the Internet of Things including how connected devices communicate, collect, and transmit data across networks. Covered IoT sensors, protocols, security considerations, and real-world use cases — providing useful context for understanding data generation at the edge.

IoTNetworkingSensorsProtocolsCisco

Show Credential ↗

Programming for Everybody (Getting Started with Python)

Coursera

March 2021

Completed the foundational Python programming course by Dr. Chuck at the University of Michigan. Covered core programming concepts including variables, conditionals, loops, functions, and data structures — laying the groundwork for the data engineering and ML work that followed.

PythonProgrammingData StructuresFunctionsOOP

Show Credential ↗

Introduction to Cybersecurity

Cisco

December 2020

Gained awareness of the cybersecurity landscape including common threats, attack vectors, and best practices for protecting data and systems. Covered topics such as network security, encryption, malware, and how organizations defend against cyber threats — relevant to building secure data pipelines.

CybersecurityNetwork SecurityThreatsEncryptionCisco

Show Credential ↗

Dhinakar
Yalla

Technical skills

Experience

Projects

Publications

Certifications

Education

Volunteering

Get in touch

DhinakarYalla

Technical skills

Experience

Projects

Publications

Certifications

Education

Volunteering

Get in touch

Dhinakar
Yalla