Listing ID : 426378
Posted on January 3, 2024 44
Our client’s website is a directory and search engine which helps people find and compare open-source projects to use in building software. The website helps software engineers find the right open-source project to solve their problems. Finding the right open-source software to solve a software engineering problem is often extremely difficult using GitHub or Google. Our client’s website also helps engineers compare various options across several key metrics to determine which open-source project will best solve their problem.
At its most successful month the website had 1,700,000 unique users and $19,400 in passive advertising income revenue and a monthly growth rate of 7%.
The website provides a faceted search and browse feature with autocomplete which allows users to see open-source projects that are useful to them. The site accomplishes this by combining topics which describe each project under development, so users can filter down into more specific lists of projects to find the project they need. The site uses a machine learning model created to add topics to projects that don’t have topics on GitHub. This is important because more and better search results are created through this process, along with lists of relevant open-source projects gleaned from a huge resource of open-source projects.
The site provides human manual categorization of the most popular 7,000 topics, such as putting the topic “reactjs” into the category “Web User Interface”. This helps users find what they need. This categorization is utilized throughout the site, in the directory user interface, the search, the alternatives, and many other places. These categorizations were hand categorized by the owner of the website who has 25 years’ experience in software engineering. A drag and drop tool can be used to continue updating categories. The site utilizes extensive manual and automated canonicalization; i.e., the process for converting data that has more than one possible representation into a “standard”, “normal”, or canonical form. The result for users is that if they search for or look at a list of similar and duplicated topics, (such as, for example, any of ‘crawler’, ‘crawlers’, ‘web-crawler’ or ‘web-crawlers’) by viewing those topics, a user will see results which map to one single topic since all those terms really mean the same thing.
The site collects and combines package manager metadata from around 30 package managers with project metadata from GitHub. This allows users to see key metrics of the open-source projects to decide which resource best solves their problem. The site provides alternatives to the project users are viewing and a user interface making it easy to compare the various alternatives. This uses advanced lemmatization and string-matching algorithms. On a high level what is done is to match parts of words with other parts of words inside the description of the project. So the description is taken from project Foo, and the algorithm looks and sees if there are partially matched topics in that description that map to something in the list of 7,000 topics hand categorized and canonicalized. These are matched against different root word forms and partial string matches on various positions to map many variants that don’t match one for one but mean the same thing. It has extensive unit tests to ensure the quality of the algorithm is exceptional.
NDA is required to secure comprehensive Confidential Information Memorandum (CIM) crafted by Pronova Partners.
Listings You May Also Like
Brighton, Boston, MA, USA
Quality and healthy snack distribution business with territory in MA, NH, and RI. Flexible and simple business model with current cash flow from exist
The Small Arms Manufacturing Company (SAM) offers a wide range of high-quality firearms and related products with a primary mission of delivering top-