ETF's, Mutual Funds, and Holdings Data: Content Retrieval
- Claude Paugh
- 3 days ago
- 3 min read
Updated: 1 day ago
As I mentioned in an earlier post, I choose Couchbase to warehouse the documents due to its SQL-like access, ability to have multiple nodes for horizontal scalability, and built in analytics features as well. I had intended to load company specific filings (10-Q, 10-K, etc.) and try the full-text search capability, but I have not been able to do that yet.
If you're a software engineer, there are various SDK's and connectors available. On the other hand if you just want to look at document content, either the built-in "Query" section on the Couchbase console, or a third-party tool that has a driver to connect. You can also buy drivers from a partner of Couchbase, and use those with any tool that supports Couchbase.
I use DataGrid from Jetbrains (bought a license), and they provide their own drivers:


As you can see in the result set above, the "columns" are keys from the underlying JSON document, and the values correspond to the data in the grid. References are the same when using DataGrid or Couchbase UI Query.
The "columns" are the keys, if you have nesting in the document, then it's the path within the JSON structure, e.g. first.second.third is the "column". The "table" in the from clause consists of bucket-name.scope.collection. You must use ticks( ` ) to surround the bucket-name, and can use them for the scope and collection, like I have above. There "where" clause and some functions like count, avg, min, max are the same.
Example SQL++ to get Funds
Index Creation
Why did I choose to store this data?
I thought the consolidation of this data could serve a few different use cases:
Trending Across asset managers or within individual asset managers
Pattern analysis on what positions changed quarter over quarter, and mapping to news events could detect habits or patterns which could be an advantage
Accuracy of filings: I wanted to know if the filings matched the investor reports that the asset managers product quarterly. The answer should be yes, but if its no, why?
Currency exposure over periods of time, and which asset mangers have more or less exposure
Monthly fund flows within an asset manager complex, or across the industry in the U.S. This could be analyzed across asset types to determine which characteristics are popular at the moment.
That's a start, but there are other analytics and research use cases that could be applied. I had also thought of a twist on the analysis; what could I get from loading the data in a graph database?