What we've done in 2015 so far

Posted by Zev Averbach on March 1, 2015

Since mid-January we’ve been building a PDF parser that takes SEC 10-K (annual) filings and extracts key pieces of information from them. We succeeded in batch converting several PDFs into strings, locating the starting page number of a specific table of contents entry – more or less – then extracting the text from that section.

Since SEC filings are available on a rolling basis from SEC’s EDGAR database, we’re going to focus on building our API skills, after/during which we’ll start the building process over again, this time interacting with XBRL data.