2022/01/15/standard CV data format: Difference between revisions

From Woozle Writes Code
Jump to navigation Jump to search
No edit summary
No edit summary
 
Line 1: Line 1:
{{nav/codeblog}}
__NOTOC__
''This was originally posted to [https://www.linkedin.com/posts/woozle_tldr-why-isnt-there-a-universal-data-format-activity-6888590321362452480-_0rw LinkedIn] and [https://toot.cat/@woozle/107634232290378715 Mastodon]. It was reshared on [https://news.ycombinator.com/item?id=29960279 Hacker News].''
''This was originally posted to [https://www.linkedin.com/posts/woozle_tldr-why-isnt-there-a-universal-data-format-activity-6888590321362452480-_0rw LinkedIn] and [https://toot.cat/@woozle/107634232290378715 Mastodon]. It was reshared on [https://news.ycombinator.com/item?id=29960279 Hacker News].''



Latest revision as of 16:14, 11 June 2022

Codeblog

This was originally posted to LinkedIn and Mastodon. It was reshared on Hacker News.

TL;DR: Why isn't there a universal data-format for résumés?

So, here's a thing.

A lot of job application sites will ask you to upload a résumé (henceforth "CV") and then attempt to parse it out into something more like a set of database entries, presumably with the idea of making it easier for prospective employers to search for people with particular types of experience.

The problem is, the parsing software they all seem to use is hot garbage.

I thought at first it was something to do with my CV's decorative layout, so I tried creating a new one from scratch that's laid out with the intention of being minimally confusing for a parsing algorithm. My name and contact info is at the top, then an "Experience" header, followed by sub-headers with the name of each job, with labeled bullet-items under each one which include things like dates worked, tech used, tasks performed, etc.

The algorithm doesn't seem to do any better with this. It picked two work-experience items seemingly at random (including the oldest one, down at the end of the list), and it got the dates completely wrong for the first one (it showed dates which are not given anywhere on my CV; it seems to have completely invented them, for no apparent reason).

Yes, I can go in and manually enter all my work experience -- but I have to do this for every new site where I apply for work, AND -- this is the key thing -- there's no way to SAVE THAT WORK in a format which I can reload elsewhere.

The larger employers and job-hosting web sites need to get their acts together and work this out. Some kind of XML microformat seems like a good idea to me, though JSON would probably work fine too (though I prefer XML because JSON doesn't support comments).

Follow-up

Several comments indicated that some work has been done on this:

  • JSON Resume (h/t) is an open-source standard for CV data, with a growing collection of tools
  • luc LMZ said that https://rezi.ai uses a standard methodology for parsing data from document structure and metadata.
  • CSepp suggested that the Europass CV format might have a machine readable representation.

My comments about the Hacker News reshare:

...the discussion there is interesting, but people seem to be overlooking my primary motivation in wanting this to happen, i.e. not having to frickin' manually enter my entire frickin' job-history every time I apply for a job on a new venue.

Yes, the job-hunting process is dehumanized and yes I suppose you could argue that greasing the process would just encourage that -- but it would make it easier for the job-seeker, whereas so far most of the greasing has been strictly for the benefit of the employers.

In my view, giving job-seekers more control over their data would be a small step towards rehumanizing the process, since we'd have to waste less time feeding the existing machinery and could also put in a lot more information (it's currently Just Not Worth It for me to enter even half my work-history if I have to do it by hand -- so a large part of my history effectively gets erased).