Notes/rabbit hole: auto-generating Git co-authors
Published (gregorian) (ornellember)
Tags: tech
refactoring gitAs a refactoring freak, I sometimes run into the scenario where someone reaches out to me about code that I’ve “written” - but in reality all I’ve done is move it to a new file. Because my name is on the commit message for this code, and there’s no other history for it, it looks like I’m the actual author.
Right now, I’m reorganizing a package in a monorepo and I want to make sure I attribute credit where it’s due / don’t erase the git history and end up having to “support” questions about the content of the files.
So I am digging into git stuff and I have a few TILs. They’re not all relevant. Then some notes about the actual solutions towards the bottom.
# Background/goal
Github has the ability to parse commit messages for co-authors, when you format the commit in a certain way ie
My commit msg.
Co-authored-by: NAME <[email protected]>
Co-authored-by: ANOTHER-NAME <[email protected]>
This is particularly applicable for moving stuff within files, as renames are well-supported. I use VSCode and I believe when you rename a file, it implements git mv
under the hood.
So basically I want to do something like: when I move functionality around, I add (in a somewhat straightforward way) the original author(s) as (a) co-author(s).
-
Use case 1: file renames This is already pretty well-supported by Git with
git mv
. The contents of the file should still show as being authored by whoever did the work. -
Use case 2: minor changes to a file, that edit all the lines, like indent change. the commit history for the file would show that all you changed is the indent. For folks who only look at the last commit, yes you’d look like you’re the main author, but digging a bit would give some context.
-
Use case 3: moving a big chunk of a file to another file. this is the big one that happens to me. I wish you could print out all the authors of a given file, or ideally only of the functionality that you’re moving, and aggregate them into your commit as co-authors. That way in the new file, even though you lost the history, you’d still have some indication of the point-persons (similar to when you cherry-pick another person’s commit).
-
Use case 4: moving an entire group of files out of a repo, and into a new repo with a new git initialization.
Say you got a repo like the below:
repo
├── README.md
├── ui
| └── foo.md
| └── bar.md
├── api
| └── foo
| └── bar
You want to move the ui/ folder out of this repo and into its own repo. You kind of don’t want to “take credit” for all the files, aka lose all the git history and stamp your own name all over the files like you pulled them out of your booty.
So you could add everyone as a co-author there too. I just found out about something that could help me a bunch to go that and I’m excited. filtering git log! you can add filtering in pretty nifty ways, like per directory or file, or search by author name, or aggregate the output e.g. by uniqueness, and even format the output.
There must be a better way to do this - like, chop down the history for a set of files and/or import it - but this is really fun and maybe I’ll research that afterwards.
*** Update I asked my friend and git expert Pauline Vos (author of the upcoming course Git Legit!!) and she said in this case, she’d fork the repo, and then delete everything except the ui folder to keep the history!! genius.
So basically this is about use case 3.
# TILs
Basically, TIL that git log is a lot more powerful than I thought.
# git log
# You can filter git logs??!
- by directory or file path
git log -- {path}
returns commit history for the path
- if within a file, by line!
git log -L{number}:{filePath}
e.g.
git log -L44:foo/bar.ts
will return the commit history for line 44.
- by GROUP OF LINES bruh 🤯
git log -- -L{number},+{count}:{filePath}
e.g.
git log -L44,+35:foo/bar.ts
will return the commit history for lines 44 to 79.
NOW, by default, this’ll output the patch (content of the file) - you can suppress that with –no-patch.
- by author This was a dead end, in this case, but I also liked it a lot for future ref.
git log --author=0rnella
and many more!
# side note on git blame
Git blame is aight but I’m having trouble working with the output for my usecase. It basically tells you what commit wrote each line, but I am trying to get all the commits for each line and so I’m p sure need log for that.
# format the git log output
- –pretty flag with the –pretty flag you can format your output with some presets like oneline, short, medium, full or even email
git log --pretty=oneline
outputs the commit history with only 1 line per commit e.g.
1896157796c1427d284f2b46a1d3a47fbef4b18b (HEAD -> main, origin/main) update to react composability
aac0a034f2df2a9a48a131beedcfb68b66cae377 react composability notes
5c46bd45c807988013e19a871981d4ab827b7da7 new post
91073e4272614229bac38cea4596ba7f0b4d36c5 forgotten update from january
(side note, kind of meta that this is the git history for this project no?)
- custom formatsssss????? then it gets really fucking interesting because you can pass a custom format, referencing specific info in the commit with shorthand!
git log --pretty=format:"%an + %ae"
an
stands for author name, and ae
for email. This will output the commit history as a list of names + emails.
# unix commands
TIL about a bunch of unix commands from stackoverflow - didn’t think to note down all the urls, but here’s the main one. I also got sort and uniq from github copilot - can’t attribute credit further bc gen AI.
# wc to output a count
TIL about the wc command in unix which outputs the word count (or line count, or whatever) of text you pass it.
# uniq for unique values
TIL also about uniq, which gets all unique values from text
# sort for sorting
TIL also about sort, which sorts text that it’s fed. by default it sorts alphabetically. It’s not useful for me though because I’d prefer to write the authors in backwards chronological order, which is what git outputs. but worth noting.
# Something useful
Finally!
# 1. aggregating co-authors for work you’re moving
Going back to the scenario 3, where I am moving some functionality to a different file. I can create a list of co-authors for the given set of lines that i’m moving like this:
git log --pretty=format:"Co-authored-by: %an <%ae>" -L {startLine},{endLine}:{fileName} --no-patch
e.g. (let’s get meta and use something from this repo)
git log --pretty=format:"Co-authored-by: %an <%ae>" -L 3,28:layouts/blog/single.html --no-patch
This outputs what I want to stick onto the end of my file!!
Co-authored-by: friggito <[email protected]>
Co-authored-by: friggito <[email protected]>
Co-authored-by: friggito <[email protected]>
Co-authored-by: friggito <[email protected]>
Co-authored-by: friggito <[email protected]>
now let’s make this unique by sticking | uniq
at the end
git log --pretty=format:"Co-authored-by: %an <%ae>" -L 3,28:layouts/blog/single.html --no-patch | uniq
the result:
Co-authored-by: friggito <[email protected]>
obviously I could have used something with more co-authors…. but was lazy.
# 2. Counting…?
shit, I had something with the commit count, but I forgot. fuck. I know at some point I was wondering how to count commits - ah, yeah! to verify that I was really filtering correctly. So this is more of a test.
If I’m getting the commit history for a file, it should be more commits (or the same) than the number of commits for a specific line in that file. So I wanted to test that filtering by a specific line, or group of lines, was working. And my idea to do that was to output the number of unique commits for that group of lines and verifying it’s less than for the whole file.
By the way by doing that, I did realize that my initial filtering command was malformatted (things won’t always error out in this case, unfortunately) so yeah, this served it purpose.
I’m going to make each commit output on one line, then count the number of lines. (We’re talking git log output lines here, not lines in the file.)
git log --oneline -- {filePath} | wc -l
that outputs a commit count for the whole file.
now commit count for a specific line or group of lines:
git log --oneline -L {lineNumber}:{filePath} --no-patch | wc -l
No patch is crucial here bc otherwise you end up counting the lines in the patch too!
So anyway this is good for testing.
# Conclusion
Am I going to use this? Probably not, but at least it’s written down somewhere. Byeeee!