Artsy: Artsy uses an automated data pipeline to compute an artwork similarity graph that underpins The Art Genome Project's discovery features, processing large batches of artwork data offline to generate similarity scores used in search and browse results. | AI Trace
Data AnalysisVerified
Artsy uses an automated data pipeline to compute an artwork similarity graph that underpins The Art Genome Project's discovery features, processing large batches of artwork data offline to generate similarity scores used in search and browse results.
Details
According to Artsy's engineering blog, the artwork similarity graph that powers The Art Genome Project is processed offline by a generic job engine written in Ruby or by Amazon Elastic MapReduce. The system takes data snapshots from MongoDB, runs computation jobs, and exports results back to the production database. Artsy also uses Jupyter Notebooks with pandas and scikit-learn for more in-depth data analysis work. This pipeline feeds the similarity scores that surface related artworks throughout the platform.