Large Scale Data Integration Project (LSDIPro) SoSe2025

In this course, the students will develop solutions for large scale data integration. Working in groups of up to 4 students, the goal is to reproduce an existing research prototype starting from the related paper and enhance it with their own ideas. All groups are accompanied by a mentor from the D2IP group to report and capture progress. The students will learn to implement scalable algorithms, evaluate them systematically, read and interpret technical papers, and critically judge experimental results. At the same time, students will learn to deal with data heterogeneity problems at scale.


Content

  • Selection of a project and building of a team.
  • Discussion rounds on design, implementation, tests, and experiments.
  • Prototype implementation, tests, and experiments.
  • Discussion about possible further developments to improve the prototype.
  • 15min oral presentation of the created prototype.


Deliverables

  • Code with documentation (made available in a dedicated GitHub repository).
  • Final presentation.
  • Individual contribution sheet (as a single A4 page).


Schedule (Thursday 10:15 - 11:45, E-N 719)

  • 17.04.2025 - Weekly meeting: introduction, topic selection, team building
  • 24.04.2025 - Weekly meeting: identify the addressed problem and the main idea, development plan proposal
  • 15.05.2025 - Weekly meeting (development): first pipeline of the software should be ready
  • 22.05.2025 - Weekly meeting (development): progress assessment
  • 05.06.2025 - Weekly meeting (development): progress assessment
  • 12.06.2025 - Weekly meeting: expert review (a different group will test your system)
  • 19.06.2025 - Weekly meeting: system improvement and start experiments
  • 26.06.2025 - Weekly meeting (experiments) - Start writing report
  • 03.07.2025 - Weekly meeting (experiments)
  • 10.07.2025 - Weekly meeting (experiments and visualization)
  • 17.07.2025 - Weekly meeting: wrap up documentation and presentation slides
  • 24.07.2025 - Final presentations
  • 31.07.2025 - Final presentations - Contribution sheet submission deadline


Organization

  • Lecturer: Prof. Dr. Ziawasch Abedjan, D2IP
  • Teaching Assistant: Dr. Luca Zecchini, D2IP
  • Grading: passed ≥ 40% points
  • Meetings: Thursday 10:15 - 11:45, E-N 719