LinkedIn today announced that it’s open-sourcing a piece of its software called WhereHows, which allows anyone in a company to learn about and share information on data that company has under management. The software is now available under an open-source Apache license.

 has many systems for storing and processing data, including Teradata’s data warehousing technology, the open source Hadoop distributed file system, the open source Hive data warehousing software, and its own open source real-time analytics software. It’s not trivial to know exactly where a kind of data lives. WhereHows can help with that, because it lets people run wide-ranging searches across everything, and people can post about the data for which they have knowledge.

Rather than viewing data, WhereHows lets people track the specific types of data that are available. In other words, it’s a tool for discovering and managing metadata. WhereHows is available to people at LinkedIn in the form of a user interface and an application programming interface (API) for developers. It serves up information on more than 25,000 嘉盛 publicly shared data sets from HDFS alone. It also takes into consideration flows of data through multiple tools; so, for example, it surfaces 150,000 flows from its open source job scheduler. But instead of LinkedIn keeping the software to itself, the company is opening up and sharing it for other companies with complex systems to use and even build on.

“We are open sourcing WhereHows on GitHub, as well as our , to share our work with the broader data community,” LinkedIn staff data engineer Eric Sun wrote in a . “We highly encourage contributors from different companies to create new features and commit important bug fixes. Though metadata management tends to be tightly coupled to other components in the company, we will continue to try to refactor LinkedIn-internal integrations into WhereHows into generic templates or plugins in open source.”

This is hardly LinkedIn’s first open source contribution. Pinot became available last year, and before that, there were Azkaban, Kafka, Samza, and Voldemort.

But data discovery, or the data catalog, is a whole other type of software. Many proprietary tools are available. For instance, startup came out with something last year. So the WhereHows release could be a big deal for companies with complex data infrastructures. In return, LinkedIn could easily find people willing to improve the technology and maybe even join the company’s ranks.

LinkedIn wants to enhance the software by giving it integration with tools like Kafka, Samza, Gobblin, and Nuage, and it could also add in information on joins between different types of data, wrote Sun.

Documentation for all parts of WhereHows is .

More information:



About Rusty McMillen

Nationally recognized SMB Sales & Reputation Marketing Expert, creator of B.A.R.S. CEO of onstraints which includes services from Website Design and Developement, SEO, SEM, SMM, Press Releases, Webinars and Reputation Marketing, to just mention a few of the included services. We also believe that success depends on an Experienced and Proven Leadership Team that truly understands and embraces the nuances of your Small Business environment, brand, products, customers, goals, and, most importantly, your vision of success. Small and Medium sized business owners typically spend 95% of their time "IN" their business, rather than "ON" their business. We solve this. Our B.A.R.S.program allows the business owner to be able to monitor everything pertaining to their online presence through proprietary systems and years of experience and expertise in multiple disciplines, allowing owners to see a global view of their marketing and online presence in order to make intelligent, well informed and decisive business decisions that dramatically effect their ROI and bottom line.