class: center, middle, inverse, title-slide .title[ # Data version control for researchers ] .subtitle[ ## Edinburgh Open Research Conference ] .author[ ### Stefano Coretta ] .institute[ ### University of Edinburgh ] .date[ ### 2016/12/12 (updated: 2022-05-11) ] --- class: middle .pull-left[  ] .pull-right[ # Why should I learn version control? ] ??? Photo by Jamie Street on Unsplash ---  ---  ---  ---  --- class: middle .pull-left[  ] .pull-right[ # Ok, I am convinced now! ] ??? Photo by charlesdeluvio on Unsplash --- class: middle  # How does it work? ??? Photo by Matthew Henry on Unsplash --- # Version 1  --- # Version 1 snapshot  --- # Version 2  --- # Version 2 snapshot  --- # Version 3  --- # Version 3 snapshot  --- # The versioning system `git` <br> <br> - `git` is a very popular choice for **software development**. -- - Tailored for tracking changes in software files. -- - But, also useful with anything that is text-based (like analysis scripts, papers, dissertations, ...). --- # What can `git` do for you? <br> <br> - Keep track of **new or deleted files** in a project. - Keep track of **changes to individual files** in a project. - Done on a line-by-line basis. - Roll **back to a previous version** of the project or files. - Make **back-ups of** your files. --- class: middle center inverse # Let's try it! --- # But what about data? <br> <br> - `git` is not great with non-text data. -- - Non-text data have no "lines" so `git` tracks changes to the entire file (rather than portions of it). -- - This is very inefficient. --- # Enter `dvc` <br> <br> - `dvc` works on top of `git` to make data versioning easier and efficient. -- - It has many features that can be used for many different purposes. -- - You can use it as a back-up system as well! --- # A complex project without `dvc`  --- # A complex project with `dvc`  --- class: middle center inverse # Let's try it! --- # Summary <br> <br> - Versioning your projects allows you to take "screenshots" of the current state of your files. - `git` and `dvc` are a great combo for data versioning. - Versioning with `git` and `dvc` is safe (if you make a mistake, you can always go back) and flexible. - There's much more that you can do! (Branching, collaboration, remote storage, reproduce pipelines, ...) --- class: middle center  ??? Photo by Howie R on Unsplash