class: center, middle, inverse, title-slide .title[ # Data version control for researchers ] .subtitle[ ## Edinburgh Open Research Conference ] .author[ ### Stefano Coretta ] .institute[ ### University of Edinburgh ] .date[ ### 2016/12/12 (updated: 2022-05-11) ] --- class: middle .pull-left[ ![](img/jamie-street-Zqy-x7K5Qcg-unsplash.jpg) ] .pull-right[ # Why should I learn version control? ] ??? Photo by Jamie Street on Unsplash --- ![](img/folder-1.png) --- ![](img/folder-2.png) --- ![](img/folder-3.png) --- ![](img/folder-4.png) --- class: middle .pull-left[ ![](img/charlesdeluvio-pOUA8Xay514-unsplash.jpg) ] .pull-right[ # Ok, I am convinced now! ] ??? Photo by charlesdeluvio on Unsplash --- class: middle ![](img/matthew-henry-n5vuEc86Zg8-unsplash.jpg) # How does it work? ??? Photo by Matthew Henry on Unsplash --- # Version 1 ![](img/project-v1.png) --- # Version 1 snapshot ![](img/project-v1-g.png) --- # Version 2 ![](img/project-v2.png) --- # Version 2 snapshot ![](img/project-v2-g.png) --- # Version 3 ![](img/project-v3.png) --- # Version 3 snapshot ![](img/project-v3-g.png) --- # The versioning system `git` <br> <br> - `git` is a very popular choice for **software development**. -- - Tailored for tracking changes in software files. -- - But, also useful with anything that is text-based (like analysis scripts, papers, dissertations, ...). --- # What can `git` do for you? <br> <br> - Keep track of **new or deleted files** in a project. - Keep track of **changes to individual files** in a project. - Done on a line-by-line basis. - Roll **back to a previous version** of the project or files. - Make **back-ups of** your files. --- class: middle center inverse # Let's try it! --- # But what about data? <br> <br> - `git` is not great with non-text data. -- - Non-text data have no "lines" so `git` tracks changes to the entire file (rather than portions of it). -- - This is very inefficient. --- # Enter `dvc` <br> <br> - `dvc` works on top of `git` to make data versioning easier and efficient. -- - It has many features that can be used for many different purposes. -- - You can use it as a back-up system as well! --- # A complex project without `dvc` ![](img/data-ver-complex.png) --- # A complex project with `dvc` ![](img/project-versions.png) --- class: middle center inverse # Let's try it! --- # Summary <br> <br> - Versioning your projects allows you to take "screenshots" of the current state of your files. - `git` and `dvc` are a great combo for data versioning. - Versioning with `git` and `dvc` is safe (if you make a mistake, you can always go back) and flexible. - There's much more that you can do! (Branching, collaboration, remote storage, reproduce pipelines, ...) --- class: middle center ![:scale 60%](img/howie-r-CjI_2QX7hvU-unsplash.jpg) ??? Photo by Howie R on Unsplash