Pinterest has implemented a prodigious decision by disclosing its sophisticated big data analytics tool, Querybook, as open-source
Pinterest has taken a momentous step by releasing its data management solution for enterprise-scale remote engineering collaboration, Querybook, as open-source. This cutting-edge tool, which Pinterest has been using internally, facilitates engineers in creating analyses, composing queries, and collaborating via a notebook interface.
The concept of Querybook originated in 2017 as a project by an intern at Pinterest, and the development team soon decided to create a document-like interface that would enable users to write queries and analyze data in a single location. The tool was launched internally in March 2018 and has since become Pinterest’s go-to solution for big data analytics. The tool currently boasts an average of 500 daily active users and 7,000 daily query runs.
Each query executed on Querybook is meticulously analyzed to extract metadata such as referenced tables and query runners. This information is then utilized to update the tool’s data schema automatically and search ranking, and show a table’s frequent users and query examples. The continuous feeding of queries into Querybook further enhances the documentation of the tables.
One of Querybook’s notable features is its admin interface, which enables administrators to configure query engines, table metadata ingestion, and access permissions. Admins can effectuate live Querybook changes without the need to go through code or config files and can create various visualizations, including lines, bars, stacked areas, pies, donuts, scatter charts, and table charts.
“We developed Querybook with a vision to provide a responsive and straightforward web user interface for analysis so that data scientists, product managers, and engineers can explore the relevant data, compose their queries, and disseminate their findings,” Pinterest wrote in a blog post.