Architecture
- How does the Praxeon API integrate into my platform?
The API is an HTTP web service running in the Amazon EC2 hosted environment. The API itself is language independent and can be conveniently accessed from Java, Ruby, PHP, Python, Perl, or .NET. Responses can be encoded in XML or JSON; search results can also be provided in Atom format according to the Open Search protocol.
- How do I utilize both internal (intranet, proprietary systems) and external data sources?
Internal content must be loaded into the search engine. Many relevant external data sources are already built in to the search engine, such as PubMed, ClinicalTrials.gov, National Guidelines Clearinghouse, FDA DailyMed, news, blogs, community discussions, videos, etc. The engine also includes a robust feed fetcher and parser, so if your content is available through an RSS or Atom feed a custom connector will not be required.
- How do you locate content sources?
The search engine is not a web crawler. Each content source must be identified through a collaborative process and integrated. Our process for fingerprinting data is the key to granular data management: We pay careful attention to text, document sections, titles, metadata, etc. and ensure that the results are as you expect. We can also incorporate content provided by any valid RSS or Atom feed. This is the primary means by which we import news, blogs, and community discussions.
- Does the search engine support fuzzy matching?
The search engine accepts naturally phrased questions. The hits returned by the search engine are fuzzy matches to the user's inputs, with some constraints that have been heuristically determined to reduce false positives. These constraints are based on Fingenprint.MD's deep understanding of medical terminology and structure.
- How does the system identify keywords?
The system uses its medical model to identify keywords. For example multiple sclerosis is treated as a single concept, rather than 2 keywords. Words that are not recognized by the medical model are treated as regular text, just as a non-medical search engine would treat them. With our proprietary algorithms and medical model we get 99% of the medical terminology. In those cases of where some concepts and terms are not recognized we revert to general keyword searches... at our worst we do AT LEAST as well as general searches. We also curate our medical model to ensure that we provide the highest quality possible.
- How does the system handle document metadata?
Document tags are recognized and indexed. If the tags are medical terms, they will be recognized and searchable just like the text of the document. Searches can also specify tag values explicitly.
Scoring
- How is the hit score computed?
In the most general case, the quality of a hit is based on its relevance to the user's question (or keyword), the quality of the content source, alignment with a user profile (if any), and the timeliness of the information (older content gets a lower score).
- Can we set up our own scoring metric?
The relative weighting of the different quality metrics is customizable per-search. Customers can also set up their own content collections with various preferences. Custom refinements can also be provided along with the search query to influence the scoring. Results can also be selected statistically and then sorted by date.
- How are user profiles incorporated into the scores?
The profile information can be passed in along with the query. The medical model is applied to the profile information just as it is applied to the document collection and the query. The degree to which the profile influences the hit scoring is customizable, but generally speaking the profile servers to guide the relevance of the hits.
Customization
- How do I search a specific set of designated sources?
Each search operates on a specific Corpus. A corpus is a specific collection of documents, which also includes preferences based on tags and content sources.
- How can the user get answers from preferred sources?
A boost factor can be applied at search time to the preferred sources. Or a Corpus can be defined that includes only the preferred sources.
- Can the system learn from past searches?
We are currently developing a system that collaboratively rates medical news articles and applies the learned information back into the search results. We can demonstrate this with MyDailyApple News which provides the “My Personal News” and "Most Popular News" based on the activity of the community.
- Can you implement different content requirements for different user populations?
Absolutely; see Curbside.MD and MyDailyApple, similar functionality tailored for different audiences. In addition, our unique "Your Medical Expertise" feature enables a user to choose between Introductory, Intermediate, and Professional search results.
