generated from discourse/discourse-plugin-skeleton
-
Notifications
You must be signed in to change notification settings - Fork 38
FEATURE: add inferred concepts system #1330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+2,713
−20
Merged
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
04c6e55
FEATURE: add inferred concepts system
xfalcox 1fec8dc
FEATURE: Extend inferred concepts to include posts
xfalcox 446e6e7
lint
xfalcox 170c657
small fixes
xfalcox 990a17a
tests
xfalcox 7fdd67f
pass context
xfalcox 5fe78aa
dots are not commas
xfalcox ad94cb7
Dedup concepts
xfalcox 6ad2f7b
Working deduplication
xfalcox 0e817a7
cleaning up
xfalcox 2312a6f
post rebase fixes
xfalcox c1fc3c7
cleanup
xfalcox 3a69133
rubocop
xfalcox f42af99
create/add
xfalcox 5682e8d
loooooooong
xfalcox a73e9e6
Add proper support for array types in structured outputs
romanrizzi 5500b1c
tests
xfalcox 2d6f4ea
rubocop
xfalcox 6cb7b4d
stree
xfalcox 491691f
annoying fixes
xfalcox 8355a23
tests
xfalcox 2b52ea5
job test
xfalcox 016f8be
moar tests
xfalcox cd3fd8a
linter
xfalcox 1485e46
fix tests
xfalcox 34c28e6
simplify persona tests
xfalcox 23530a9
make it simpler
xfalcox e8d8340
rubocop
xfalcox 5d9613b
review fixes
xfalcox 9a8207d
remove settings area for now
xfalcox fcce5ed
post review follow up
xfalcox d9fc621
fix area name
xfalcox a8e1918
fix table name for rails convention
xfalcox b73fec4
annotation
xfalcox 849acd0
feature tests
xfalcox 2958417
Update app/controllers/discourse_ai/ai_bot/bot_controller.rb
xfalcox File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# frozen_string_literal: true | ||
|
||
module Jobs | ||
class GenerateInferredConcepts < ::Jobs::Base | ||
sidekiq_options queue: "low" | ||
|
||
# Process items to generate new concepts | ||
# | ||
# @param args [Hash] Contains job arguments | ||
# @option args [String] :item_type Required - Type of items to process ('topics' or 'posts') | ||
# @option args [Array<Integer>] :item_ids Required - List of item IDs to process | ||
# @option args [Integer] :batch_size (100) Number of items to process in each batch | ||
# @option args [Boolean] :match_only (false) Only match against existing concepts without generating new ones | ||
def execute(args = {}) | ||
return if args[:item_ids].blank? || args[:item_type].blank? | ||
|
||
if %w[topics posts].exclude?(args[:item_type]) | ||
Rails.logger.error("Invalid item_type for GenerateInferredConcepts: #{args[:item_type]}") | ||
return | ||
end | ||
|
||
# Process items in smaller batches to avoid memory issues | ||
batch_size = args[:batch_size] || 100 | ||
|
||
# Get the list of item IDs | ||
item_ids = args[:item_ids] | ||
match_only = args[:match_only] || false | ||
|
||
# Process items in batches | ||
item_ids.each_slice(batch_size) do |batch_item_ids| | ||
process_batch(batch_item_ids, args[:item_type], match_only) | ||
end | ||
end | ||
|
||
private | ||
|
||
def process_batch(item_ids, item_type, match_only) | ||
klass = item_type.singularize.classify.constantize | ||
items = klass.where(id: item_ids) | ||
manager = DiscourseAi::InferredConcepts::Manager.new | ||
|
||
items.each do |item| | ||
begin | ||
process_item(item, item_type, match_only, manager) | ||
rescue => e | ||
Rails.logger.error( | ||
"Error generating concepts from #{item_type.singularize} #{item.id}: #{e.message}\n#{e.backtrace.join("\n")}", | ||
) | ||
end | ||
end | ||
end | ||
|
||
def process_item(item, item_type, match_only, manager) | ||
# Use the Manager method that handles both identifying and creating concepts | ||
if match_only | ||
if item_type == "topics" | ||
manager.match_topic_to_concepts(item) | ||
else # posts | ||
manager.match_post_to_concepts(item) | ||
end | ||
else | ||
if item_type == "topics" | ||
manager.generate_concepts_from_topic(item) | ||
else # posts | ||
manager.generate_concepts_from_post(item) | ||
end | ||
end | ||
end | ||
end | ||
end |
87 changes: 87 additions & 0 deletions
87
app/jobs/scheduled/generate_concepts_from_popular_items.rb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
# frozen_string_literal: true | ||
|
||
module Jobs | ||
class GenerateConceptsFromPopularItems < ::Jobs::Scheduled | ||
every 1.day | ||
|
||
# This job runs daily and generates new concepts from popular topics and posts | ||
# It selects items based on engagement metrics and generates concepts from their content | ||
def execute(_args) | ||
return unless SiteSetting.inferred_concepts_enabled | ||
|
||
process_popular_topics | ||
process_popular_posts | ||
end | ||
|
||
private | ||
|
||
def process_popular_topics | ||
# Find candidate topics that are popular and don't have concepts yet | ||
manager = DiscourseAi::InferredConcepts::Manager.new | ||
candidates = | ||
manager.find_candidate_topics( | ||
limit: SiteSetting.inferred_concepts_daily_topics_limit || 20, | ||
min_posts: SiteSetting.inferred_concepts_min_posts || 5, | ||
min_likes: SiteSetting.inferred_concepts_min_likes || 10, | ||
min_views: SiteSetting.inferred_concepts_min_views || 100, | ||
created_after: SiteSetting.inferred_concepts_lookback_days.days.ago, | ||
) | ||
|
||
return if candidates.blank? | ||
|
||
# Process candidate topics - first generate concepts, then match | ||
Jobs.enqueue( | ||
:generate_inferred_concepts, | ||
item_type: "topics", | ||
item_ids: candidates.map(&:id), | ||
batch_size: 10, | ||
) | ||
|
||
if SiteSetting.inferred_concepts_background_match | ||
# Schedule a follow-up job to match existing concepts | ||
Jobs.enqueue_in( | ||
1.hour, | ||
:generate_inferred_concepts, | ||
item_type: "topics", | ||
item_ids: candidates.map(&:id), | ||
batch_size: 10, | ||
match_only: true, | ||
) | ||
end | ||
end | ||
|
||
def process_popular_posts | ||
# Find candidate posts that are popular and don't have concepts yet | ||
manager = DiscourseAi::InferredConcepts::Manager.new | ||
candidates = | ||
manager.find_candidate_posts( | ||
limit: SiteSetting.inferred_concepts_daily_posts_limit || 30, | ||
min_likes: SiteSetting.inferred_concepts_post_min_likes || 5, | ||
exclude_first_posts: true, | ||
created_after: SiteSetting.inferred_concepts_lookback_days.days.ago, | ||
) | ||
|
||
return if candidates.blank? | ||
|
||
# Process candidate posts - first generate concepts, then match | ||
Jobs.enqueue( | ||
:generate_inferred_concepts, | ||
item_type: "posts", | ||
item_ids: candidates.map(&:id), | ||
batch_size: 10, | ||
) | ||
|
||
if SiteSetting.inferred_concepts_background_match | ||
# Schedule a follow-up job to match against existing concepts | ||
Jobs.enqueue_in( | ||
1.hour, | ||
:generate_inferred_concepts, | ||
item_type: "posts", | ||
item_ids: candidates.map(&:id), | ||
batch_size: 10, | ||
match_only: true, | ||
) | ||
end | ||
end | ||
end | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# frozen_string_literal: true | ||
|
||
class InferredConcept < ActiveRecord::Base | ||
has_many :inferred_concept_topics | ||
has_many :topics, through: :inferred_concept_topics | ||
|
||
has_many :inferred_concept_posts | ||
has_many :posts, through: :inferred_concept_posts | ||
|
||
validates :name, presence: true, uniqueness: true | ||
end | ||
|
||
# == Schema Information | ||
# | ||
# Table name: inferred_concepts | ||
# | ||
# id :bigint not null, primary key | ||
# name :string not null | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
# Indexes | ||
# | ||
# index_inferred_concepts_on_name (name) UNIQUE | ||
# |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# frozen_string_literal: true | ||
|
||
class InferredConceptPost < ActiveRecord::Base | ||
belongs_to :inferred_concept | ||
belongs_to :post | ||
|
||
validates :inferred_concept_id, presence: true | ||
validates :post_id, presence: true | ||
validates :inferred_concept_id, uniqueness: { scope: :post_id } | ||
end | ||
|
||
# == Schema Information | ||
# | ||
# Table name: inferred_concept_posts | ||
# | ||
# inferred_concept_id :bigint | ||
# post_id :bigint | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
# Indexes | ||
# | ||
# index_inferred_concept_posts_on_inferred_concept_id (inferred_concept_id) | ||
# index_inferred_concept_posts_uniqueness (post_id,inferred_concept_id) UNIQUE | ||
# |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# frozen_string_literal: true | ||
|
||
class InferredConceptTopic < ActiveRecord::Base | ||
belongs_to :inferred_concept | ||
belongs_to :topic | ||
|
||
validates :inferred_concept_id, presence: true | ||
validates :topic_id, presence: true | ||
validates :inferred_concept_id, uniqueness: { scope: :topic_id } | ||
end | ||
|
||
# == Schema Information | ||
# | ||
# Table name: inferred_concept_topics | ||
# | ||
# inferred_concept_id :bigint | ||
# topic_id :bigint | ||
# created_at :datetime not null | ||
# updated_at :datetime not null | ||
# | ||
# Indexes | ||
# | ||
# index_inferred_concept_topics_on_inferred_concept_id (inferred_concept_id) | ||
# index_inferred_concept_topics_uniqueness (topic_id,inferred_concept_id) UNIQUE | ||
# |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# frozen_string_literal: true | ||
|
||
class AiInferredConceptPostSerializer < ApplicationSerializer | ||
attributes :id, | ||
:post_number, | ||
:topic_id, | ||
:topic_title, | ||
:username, | ||
:avatar_template, | ||
:created_at, | ||
:updated_at, | ||
:excerpt, | ||
:truncated, | ||
:inferred_concepts | ||
|
||
def avatar_template | ||
User.avatar_template(object.username, object.uploaded_avatar_id) | ||
end | ||
|
||
def excerpt | ||
Post.excerpt(object.cooked) | ||
end | ||
|
||
def truncated | ||
object.cooked.length > SiteSetting.post_excerpt_maxlength | ||
end | ||
|
||
def inferred_concepts | ||
ActiveModel::ArraySerializer.new( | ||
object.inferred_concepts, | ||
each_serializer: InferredConceptSerializer, | ||
) | ||
end | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# frozen_string_literal: true | ||
|
||
class InferredConceptSerializer < ApplicationSerializer | ||
attributes :id, :name, :created_at, :updated_at | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.