Skip to content

FEATURE: add inferred concepts system #1330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jun 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
04c6e55
FEATURE: add inferred concepts system
xfalcox May 8, 2025
1fec8dc
FEATURE: Extend inferred concepts to include posts
xfalcox May 9, 2025
446e6e7
lint
xfalcox May 9, 2025
170c657
small fixes
xfalcox May 9, 2025
990a17a
tests
xfalcox May 9, 2025
7fdd67f
pass context
xfalcox May 9, 2025
5fe78aa
dots are not commas
xfalcox May 9, 2025
ad94cb7
Dedup concepts
xfalcox May 15, 2025
6ad2f7b
Working deduplication
xfalcox May 16, 2025
0e817a7
cleaning up
xfalcox May 16, 2025
2312a6f
post rebase fixes
xfalcox May 16, 2025
c1fc3c7
cleanup
xfalcox May 29, 2025
3a69133
rubocop
xfalcox May 29, 2025
f42af99
create/add
xfalcox May 29, 2025
5682e8d
loooooooong
xfalcox May 29, 2025
a73e9e6
Add proper support for array types in structured outputs
romanrizzi May 29, 2025
5500b1c
tests
xfalcox May 29, 2025
2d6f4ea
rubocop
xfalcox May 29, 2025
6cb7b4d
stree
xfalcox May 29, 2025
491691f
annoying fixes
xfalcox May 29, 2025
8355a23
tests
xfalcox May 29, 2025
2b52ea5
job test
xfalcox May 30, 2025
016f8be
moar tests
xfalcox May 30, 2025
cd3fd8a
linter
xfalcox May 30, 2025
1485e46
fix tests
xfalcox May 30, 2025
34c28e6
simplify persona tests
xfalcox May 30, 2025
23530a9
make it simpler
xfalcox May 30, 2025
e8d8340
rubocop
xfalcox May 30, 2025
5d9613b
review fixes
xfalcox May 30, 2025
9a8207d
remove settings area for now
xfalcox May 30, 2025
fcce5ed
post review follow up
xfalcox May 30, 2025
d9fc621
fix area name
xfalcox May 30, 2025
a8e1918
fix table name for rails convention
xfalcox May 30, 2025
b73fec4
annotation
xfalcox May 30, 2025
849acd0
feature tests
xfalcox May 30, 2025
2958417
Update app/controllers/discourse_ai/ai_bot/bot_controller.rb
xfalcox May 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions app/jobs/regular/generate_inferred_concepts.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# frozen_string_literal: true

module Jobs
class GenerateInferredConcepts < ::Jobs::Base
sidekiq_options queue: "low"

# Process items to generate new concepts
#
# @param args [Hash] Contains job arguments
# @option args [String] :item_type Required - Type of items to process ('topics' or 'posts')
# @option args [Array<Integer>] :item_ids Required - List of item IDs to process
# @option args [Integer] :batch_size (100) Number of items to process in each batch
# @option args [Boolean] :match_only (false) Only match against existing concepts without generating new ones
def execute(args = {})
return if args[:item_ids].blank? || args[:item_type].blank?

if %w[topics posts].exclude?(args[:item_type])
Rails.logger.error("Invalid item_type for GenerateInferredConcepts: #{args[:item_type]}")
return
end

# Process items in smaller batches to avoid memory issues
batch_size = args[:batch_size] || 100

# Get the list of item IDs
item_ids = args[:item_ids]
match_only = args[:match_only] || false

# Process items in batches
item_ids.each_slice(batch_size) do |batch_item_ids|
process_batch(batch_item_ids, args[:item_type], match_only)
end
end

private

def process_batch(item_ids, item_type, match_only)
klass = item_type.singularize.classify.constantize
items = klass.where(id: item_ids)
manager = DiscourseAi::InferredConcepts::Manager.new

items.each do |item|
begin
process_item(item, item_type, match_only, manager)
rescue => e
Rails.logger.error(
"Error generating concepts from #{item_type.singularize} #{item.id}: #{e.message}\n#{e.backtrace.join("\n")}",
)
end
end
end

def process_item(item, item_type, match_only, manager)
# Use the Manager method that handles both identifying and creating concepts
if match_only
if item_type == "topics"
manager.match_topic_to_concepts(item)
else # posts
manager.match_post_to_concepts(item)
end
else
if item_type == "topics"
manager.generate_concepts_from_topic(item)
else # posts
manager.generate_concepts_from_post(item)
end
end
end
end
end
87 changes: 87 additions & 0 deletions app/jobs/scheduled/generate_concepts_from_popular_items.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# frozen_string_literal: true

module Jobs
class GenerateConceptsFromPopularItems < ::Jobs::Scheduled
every 1.day

# This job runs daily and generates new concepts from popular topics and posts
# It selects items based on engagement metrics and generates concepts from their content
def execute(_args)
return unless SiteSetting.inferred_concepts_enabled

process_popular_topics
process_popular_posts
end

private

def process_popular_topics
# Find candidate topics that are popular and don't have concepts yet
manager = DiscourseAi::InferredConcepts::Manager.new
candidates =
manager.find_candidate_topics(
limit: SiteSetting.inferred_concepts_daily_topics_limit || 20,
min_posts: SiteSetting.inferred_concepts_min_posts || 5,
min_likes: SiteSetting.inferred_concepts_min_likes || 10,
min_views: SiteSetting.inferred_concepts_min_views || 100,
created_after: SiteSetting.inferred_concepts_lookback_days.days.ago,
)

return if candidates.blank?

# Process candidate topics - first generate concepts, then match
Jobs.enqueue(
:generate_inferred_concepts,
item_type: "topics",
item_ids: candidates.map(&:id),
batch_size: 10,
)

if SiteSetting.inferred_concepts_background_match
# Schedule a follow-up job to match existing concepts
Jobs.enqueue_in(
1.hour,
:generate_inferred_concepts,
item_type: "topics",
item_ids: candidates.map(&:id),
batch_size: 10,
match_only: true,
)
end
end

def process_popular_posts
# Find candidate posts that are popular and don't have concepts yet
manager = DiscourseAi::InferredConcepts::Manager.new
candidates =
manager.find_candidate_posts(
limit: SiteSetting.inferred_concepts_daily_posts_limit || 30,
min_likes: SiteSetting.inferred_concepts_post_min_likes || 5,
exclude_first_posts: true,
created_after: SiteSetting.inferred_concepts_lookback_days.days.ago,
)

return if candidates.blank?

# Process candidate posts - first generate concepts, then match
Jobs.enqueue(
:generate_inferred_concepts,
item_type: "posts",
item_ids: candidates.map(&:id),
batch_size: 10,
)

if SiteSetting.inferred_concepts_background_match
# Schedule a follow-up job to match against existing concepts
Jobs.enqueue_in(
1.hour,
:generate_inferred_concepts,
item_type: "posts",
item_ids: candidates.map(&:id),
batch_size: 10,
match_only: true,
)
end
end
end
end
25 changes: 25 additions & 0 deletions app/models/inferred_concept.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# frozen_string_literal: true

class InferredConcept < ActiveRecord::Base
has_many :inferred_concept_topics
has_many :topics, through: :inferred_concept_topics

has_many :inferred_concept_posts
has_many :posts, through: :inferred_concept_posts

validates :name, presence: true, uniqueness: true
end

# == Schema Information
#
# Table name: inferred_concepts
#
# id :bigint not null, primary key
# name :string not null
# created_at :datetime not null
# updated_at :datetime not null
#
# Indexes
#
# index_inferred_concepts_on_name (name) UNIQUE
#
25 changes: 25 additions & 0 deletions app/models/inferred_concept_post.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# frozen_string_literal: true

class InferredConceptPost < ActiveRecord::Base
belongs_to :inferred_concept
belongs_to :post

validates :inferred_concept_id, presence: true
validates :post_id, presence: true
validates :inferred_concept_id, uniqueness: { scope: :post_id }
end

# == Schema Information
#
# Table name: inferred_concept_posts
#
# inferred_concept_id :bigint
# post_id :bigint
# created_at :datetime not null
# updated_at :datetime not null
#
# Indexes
#
# index_inferred_concept_posts_on_inferred_concept_id (inferred_concept_id)
# index_inferred_concept_posts_uniqueness (post_id,inferred_concept_id) UNIQUE
#
25 changes: 25 additions & 0 deletions app/models/inferred_concept_topic.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# frozen_string_literal: true

class InferredConceptTopic < ActiveRecord::Base
belongs_to :inferred_concept
belongs_to :topic

validates :inferred_concept_id, presence: true
validates :topic_id, presence: true
validates :inferred_concept_id, uniqueness: { scope: :topic_id }
end

# == Schema Information
#
# Table name: inferred_concept_topics
#
# inferred_concept_id :bigint
# topic_id :bigint
# created_at :datetime not null
# updated_at :datetime not null
#
# Indexes
#
# index_inferred_concept_topics_on_inferred_concept_id (inferred_concept_id)
# index_inferred_concept_topics_uniqueness (topic_id,inferred_concept_id) UNIQUE
#
34 changes: 34 additions & 0 deletions app/serializers/ai_inferred_concept_post_serializer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# frozen_string_literal: true

class AiInferredConceptPostSerializer < ApplicationSerializer
attributes :id,
:post_number,
:topic_id,
:topic_title,
:username,
:avatar_template,
:created_at,
:updated_at,
:excerpt,
:truncated,
:inferred_concepts

def avatar_template
User.avatar_template(object.username, object.uploaded_avatar_id)
end

def excerpt
Post.excerpt(object.cooked)
end

def truncated
object.cooked.length > SiteSetting.post_excerpt_maxlength
end

def inferred_concepts
ActiveModel::ArraySerializer.new(
object.inferred_concepts,
each_serializer: InferredConceptSerializer,
)
end
end
5 changes: 5 additions & 0 deletions app/serializers/inferred_concept_serializer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# frozen_string_literal: true

class InferredConceptSerializer < ApplicationSerializer
attributes :id, :name, :created_at, :updated_at
end
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,20 @@ export default class AiPersonaResponseFormatEditor extends Component {
type: "string",
},
type: {
type: "string",
enum: ["string", "integer", "boolean", "array"],
},
array_type: {
type: "string",
enum: ["string", "integer", "boolean"],
options: {
dependencies: {
type: "array",
},
},
},
},
required: ["key", "type"],
},
};

Expand All @@ -41,7 +51,11 @@ export default class AiPersonaResponseFormatEditor extends Component {
const toDisplay = {};

this.args.data.response_format.forEach((keyDesc) => {
toDisplay[keyDesc.key] = keyDesc.type;
if (keyDesc.type === "array") {
toDisplay[keyDesc.key] = `[${keyDesc.array_type}]`;
} else {
toDisplay[keyDesc.key] = keyDesc.type;
}
});

return prettyJSON(toDisplay);
Expand Down
12 changes: 12 additions & 0 deletions config/locales/server.en.yml
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,15 @@ en:
short_summarizer:
name: "Summarizer (short form)"
description: "Default persona used to power AI short summaries for topic lists' items"
concept_finder:
name: "Concept Finder"
description: "AI Bot specialized in identifying concepts and themes in content"
concept_matcher:
name: "Concept Matcher"
description: "AI Bot specialized in matching content against existing concepts"
concept_deduplicator:
name: "Concept Deduplicator"
description: "AI Bot specialized in deduplicating concepts"
topic_not_found: "Summary unavailable, topic not found!"
summarizing: "Summarizing topic"
searching: "Searching for: '%{query}'"
Expand Down Expand Up @@ -549,6 +558,9 @@ en:
discord_search:
name: "Discord Search"
description: "Adds the ability to search Discord channels"
inferred_concepts:
name: "Inferred Concepts"
description: "Classifies topics and posts into areas of interest / labels."

errors:
quota_exceeded: "You have exceeded the quota for this model. Please try again in %{relative_time}."
Expand Down
52 changes: 52 additions & 0 deletions config/settings.yml
Original file line number Diff line number Diff line change
Expand Up @@ -417,3 +417,55 @@ discourse_ai:
default: false
client: false
hidden: true

inferred_concepts_enabled:
default: false
client: true
area: "ai-features/inferred_concepts"
inferred_concepts_background_match:
default: false
client: false
area: "ai-features/inferred_concepts"
inferred_concepts_daily_topics_limit:
default: 20
client: false
area: "ai-features/inferred_concepts"
inferred_concepts_min_posts:
default: 5
client: false
area: "ai-features/inferred_concepts"
inferred_concepts_min_likes:
default: 10
client: false
area: "ai-features/inferred_concepts"
inferred_concepts_min_views:
default: 100
client: false
area: "ai-features/inferred_concepts"
inferred_concepts_lookback_days:
default: 30
client: false
area: "ai-features/inferred_concepts"
inferred_concepts_daily_posts_limit:
default: 30
client: false
area: "ai-features/inferred_concepts"
inferred_concepts_post_min_likes:
default: 5
client: false
area: "ai-features/inferred_concepts"
inferred_concepts_generate_persona:
default: "-15"
type: enum
enum: "DiscourseAi::Configuration::PersonaEnumerator"
area: "ai-features/inferred_concepts"
inferred_concepts_match_persona:
default: "-16"
type: enum
enum: "DiscourseAi::Configuration::PersonaEnumerator"
area: "ai-features/inferred_concepts"
inferred_concepts_deduplicate_persona:
default: "-17"
type: enum
enum: "DiscourseAi::Configuration::PersonaEnumerator"
area: "ai-features/inferred_concepts"
Loading
Loading