Welcome to Shop.jobdekhe.com We provide job seekers with information gathered from various publicly available job posting websites, including but not limited to Google, Indeed, LinkedIn, and other well-known job platforms. Our mission is to help individuals find employment opportunities by offering up-to-date job listings and career-related resources. We do not charge any fees for accessing or using our website, and all job information is provided free of charge.
Shop.jobdekhe.com does not directly offer, manage, or engage in the hiring process for any of the job listings featured on our website. All listings are sourced from third-party job posting platforms such as Indeed, LinkedIn, and other recognized job websites.
By using our website, you acknowledge and accept the above terms and conditions. Thank you for visiting Shop.jobdekhe.com, and we wish you success in your job search.
<b>Job Title: ML Engineer – Experimentation Platform<br></b><b>Experience:</b> 3 – 4 Years
<p></p><b>Location:</b> Remote<br> <b>Notice Period:</b> Immediate Joiners Only<br> <br><b>About the Role</b><br>We are looking for a highly skilled ML Engineer to join our Test & Learn Platform team. In this role, you will build and scale experimentation and causal inference services that enable business teams to make data-driven decisions globally.<br>
<p></p>
<p>You will work across statistical modeling, API development, cloud-native infrastructure, and large-scale data processing to deliver reliable and production-ready ML solutions.<br></p>
<p><b>Key Responsibilities</b><br></p>
<ul>
<li>Develop and maintain statistical and machine learning modules for:<br></li>
<ul>
<li>Difference-in-Differences (DID)<br></li>
<li>Synthetic Control<br></li>
<li>A/B Testing<br></li>
<li>Multi-Treatment Effects<br></li>
</ul>
<li>Build and extend RESTful APIs using FastAPI and integrate them with web applications through SDK wrappers<br></li>
<li>Design and optimize large-scale data pipelines using PySpark, Delta Lake, and Azure Data Lake<br></li>
<li>Diagnose and resolve Out-of-Memory (OOM) issues in PySpark workloads by optimizing:<br></li>
<ul>
<li>Memory allocation<br></li>
<li>Partitioning<br></li>
<li>Broadcast joins<br></li>
<li>Caching strategies<br></li>
<li>Spark configurations<br></li>
</ul>
<li>Deploy and manage Databricks workloads including notebooks, job clusters, and Delta Lake tables<br></li>
<li>Containerize and deploy services using Docker, Kubernetes, and CI/CD pipelines<br></li>
<li>Ensure code quality, testing, and security using PyTest, SonarCloud, and Snyk<br></li>
<li>Collaborate closely with Data Scientists and Product teams to convert research concepts into scalable production systems<br></li>
<li> <br></li>
</ul>
<p><b>Mandatory Skills</b><br></p>
<ul>
<li>Strong experience in Python (3.9+)<br></li>
<li>Hands-on expertise in:<br></li>
<ul>
<li>PySpark & Spark Internals<br></li>
<li>Databricks<br></li>
<li>FastAPI / API Development<br></li>
<li>Azure Cloud Platform<br></li>
<li>Kubernetes & Docker<br></li>
<li>PyTest<br></li>
</ul>
<li>Strong understanding of:<br></li>
<ul>
<li>DID<br></li>
<li>Synthetic Control<br></li>
<li>A/B Testing<br></li>
<li>Hypothesis Testing<br></li>
<li>Panel Data Methods<br></li>
</ul>
<li>Expertise in statistical and ML libraries:<br></li>
<ul>
<li>statsmodels<br></li>
<li>scikit-learn<br></li>
<li>SciPy<br></li>
<li>Pandas<br></li>
<li>NumPy<br></li>
</ul>
</ul>
<p><b>Technical Requirements</b><br></p>
<p><b>PySpark & Spark Internals</b><br></p>
<ul>
<li>Strong understanding of Spark memory model<br></li>
<li>Executor tuning and shuffle optimization<br></li>
<li>Diagnosing and resolving OOM errors<br></li>
<li>Experience with:<br></li>
<ul>
<li>Broadcast thresholds<br></li>
<li>Partition skew handling<br></li>
<li>Spill-to-disk optimization<br></li>
<li> <br></li>
<li>GC tuning<br></li>
</ul>
</ul>
<p><b>Databricks</b><br></p>
<ul>
<li>Hands-on experience with:<br></li>
<ul>
<li>Job orchestration<br></li>
<li>Cluster configuration<br></li>
<li>Notebook workflows<br></li>
<li>Delta Lake optimization<br></li>
<li>Z-ordering, compaction, and caching<br></li>
</ul>
</ul>
<p><b>Cloud & DevOps</b><br></p>
<ul>
<li>Azure Storage, Azure ML, and Azure Data Lake<br></li>
<li>Docker-based containerization<br></li>
<li>Kubernetes orchestration for ML workloads<br></li>
<li>CI/CD pipeline integration<br></li>
</ul>
<p><b>Testing & Quality</b><br></p>
<ul>
<li>Unit and integration testing using PyTest<br></li>
<li>Familiarity with SonarCloud, Snyk, and GitHub Actions<br></li>
</ul>
<p><b>Good-to-Have Skills</b><br></p>
<ul>
<li>Experience with Celery and Redis for async task orchestration<br></li>
<li>Familiarity with Polars, PyArrow, or SQLAlchemy<br></li>
<li>Background in econometrics or experimental design<br></li>
<li>Experience with Spark UI profiling and performance benchmarking<br></li>
<li>Knowledge of advanced CI/CD tooling and automation practices<br></li>
</ul>
<p><b>Preferred Candidate Profile</b><br></p>
<ul>
<li>Strong analytical and problem-solving abilities<br></li>
<li>Ability to work independently in a remote setup<br></li>
<li>Excellent collaboration and communication skills<br></li>
<li>Passion for building scalable ML and experimentation platforms<br></li>
</ul>
<p><b>Tech Stack</b><br></p>
<p></p><b>Languages & Libraries:</b> Python, Pandas, NumPy, SciPy, statsmodels, scikit-learn<br> <b>Big Data:</b> PySpark, Spark Internals, Delta Lake<br> <b>Cloud & Platforms:</b> Azure, Databricks, Azure Data Lake<br><b>APIs & Backend:</b> FastAPI<br> <b>DevOps:</b> Docker, Kubernetes, GitHub Actions<br> <b>Testing & Security:</b> PyTest, SonarCloud, Snyk<br> <br>
<p></p><br><br>